Files
hpr-knowledge-base/hpr_transcripts/hpr0773.txt
Lee Hanken 7c8efd2228 Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use
- Search episodes, transcripts, hosts, and series
- 4,511 episodes with metadata and transcripts
- Data loader with in-memory JSON storage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00

283 lines
23 KiB
Plaintext

Episode: 773
Title: HPR0773: Gabriel Weinberg of DuckDuckGo
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0773/hpr0773.mp3
Transcribed: 2025-10-08 02:11:47
---
.
Hello ladies and gentlemen my name is Kent Falon and today I'm very pleased to
bring you an interview with the founder of Duck Duck Goh Mr. Gabriel Weinberg
hi Gabriel did I pronounce that correctly Gabriel but thank you for having me
hi so you're the founder of Duck Duck Goh just in case any of our listeners
don't know can you tell us a little bit about the site sure there to go is a
general purpose search engine so it's designed to replace your generally
Google use it does a number of things sort of differently better but you know
generally it works the same as a regular search engine we've tried to focus on
a lot more instant answers above the results and we also focus on getting
rid of a lot of spam and irrelevant results and then we the third thing we
focused on is privacy and we try to say that you know we have real privacy
which means that we really don't know who you are when you search and there's
you know no way for us to track you so aside from IP addresses I guess and
cookies we don't store IP addresses actually and we don't say cookies by default
we also don't store user agents which have been known to track you're be
able to track your sessions even if you don't store those other things and
yeah and they're doing stuff now with fonts on your PC to be able to identify
your browser yeah the FF put out a tool that you're referring to is you can do
like a unique fingerprinting of your browser which is pretty interesting but
yeah we obviously don't store any of that stuff either okay just to give the
listeners a little bit of a background so tell me how you ended up did you just
wake up one day and decide okay I want to go against up against them you know
most powerful company in the world and do a better search engine or how did
this come about it came about more of a sort of dissatisfaction with Google and
then a recognition that I was messing around with Wikipedia and delicious and
finding often better links external links in those sources and so I was
actually going there to search for things instead of Google and you know
the thought occurred to me okay well what if you take this to the logical
conclusion Google's getting more and more spam in it and there's more and more
of these external sources crowdsource APIs that have good results on so what if
you sort of mashed those all together would it create a decent search engine and
so I built sort of a prototype on a weekend and I messed around with it and
liked it and sort of just grew organically from there okay fantastic it does
I must say I've been using it for a while now and it gives different results to
Google I guess I guess people are now trained that the Google results are the
correct search results can you tell me why that would be you know why I might
want to why am I getting different results in your search engine as opposed to
if I type in hacker public radio for instance and into Google why do I get
two different results right so the short answer is is that you know obviously
Google and Dr. Go are somewhat black boxes and we don't know each other's
algorithms so they're naturally going to lead to somewhat different results but
a sort of a more there's more of a philosophical answer I guess which is that you
know each search engine sort of tries to concentrate on different things and
so what we've tried to concentrate on more is this notion of zero click info this
more conceptual results and we'd also try to do things like this is where
different from Google more where we'll really respect your query more and not
try to change it around and give you results for things that are slightly
different than what you searched for and we're way more aggressive at removing
the you know SEO type of sites so you'll see a lot less of those but
ultimately it depends on a given query you know what's going on and so these are
just somewhat generalizations but one thing I'd say is that like yeah we
definitely give different results than Google so there there is definitely if
you're having trouble with doing deep searches there's really a reason to go
to other search engines and use multiple ones because you will get different
results I just saw on your webpage that's this whole concept of a bubble I wonder
could you explain that to the listeners yeah so there's this concept that was
best or more recently nicely enumerated a book called The Filter Bubble and the
concept goes like this when you search at Google now and most other search engines
and even other sites use other sites like Facebook they're using what you
previously done on those sites I you Google what you've clicked on and what
you search for to tailor your results on different searches so when you search
for you know something like climate change it may be impacted by what you
previously searched for and where you live and things like that even though you
don't that may think those things are related and so what that means is because
you often search and click for things that you like you end up seeing more and
more of things that Google thinks you like and that may leave out some opposing
viewpoints and you know things that you're less likely to click on but
otherwise contain information that's valid so what they call us is a bubble
because you're sort of living in a bubble that Google is presenting to you and
you're you're missing things that are outside of that bubble which generally
over time may contain opposing information to your core beliefs okay I actually
did the example on your website and although I don't search that much in Google I
did end up getting very different results than what was on the what was on the
example page for other people so they're obviously taking Google reader into
account as well so it's kind of scary stuff I wonder could you just tell us a
little bit about your own background and how you you know what your educational
background is how you kind of got into doing these servers and putting
this stuff together what it runs on that sort of thing sure so I'll start with
background and you can hit me up for the server questions yeah so I I grew up
in Atlanta I spent a couple years in Philippines before that and then I went to
college in Boston Massachusetts at MIT I got a degree in physics and then I
graduate degree in technology and policy and then basically right out of
school I started doing startup stuff and I started an educational software
company that was about increasing parental involvement in schools and that
ultimately didn't really go very far and then I started another company that
was about finding old friends and classmates pre-facebook yeah that did
that was one that did well and I sold that in 2006 which enabled me to sort of
take on this bigger problem duck to go you know so then about a year after that
I started you know mess around duck to go and I've been doing it for about three
years now I'm sure you have a massive team of people working behind the scenes
there so can you tell me how many people are employed full-time on duck to go
you would be surprised maybe you already know but it's just me full-time still at
the moment although there are I don't want to short sell other people's
contributions because there are many people who have contributed you know
significantly but I'm still the only full-time person okay and how how in the
name of all can you attempt to to best the likes and resources of Google or
Yahoo or Bing or anybody else for that matter this is the beauty of the
external API age so I did initially start out doing my own crawling and
everything and building everything from the ground up but quickly realized of
course that you know you need you mean Microsoft and Google basically spent
hundreds of millions dollars a year on that component alone so obviously I'm not
doing that right and so what happened was is Yahoo boss came out which was
there exposing their search feed and I decided I could use that and concentrate
on value ads identified which are you know adding sort of better results on
top of theirs and getting rid of spam and then over time it also turned into
changing their results a lot which is why you see our results are pretty
different than Bing and Yahoo even though we use their feeds and so now it's
it's a and then we also started using a bunch of external other external APIs
like well from Alpha and there's about you know 40 sources and so what you get
is an amalgamation of our code plus everyone else's code which is sort of
what I call hybrid search engine and what that really enables you to do is if
I'm using say like a I want to get good music results say right then I'm
going to use a music API from a company that just concentrates on music it's as
if we had employees that we're doing that but we can use it just based on
calling their API so if you think about it we have tons and tons of people
working for us but they're not working directly for Duck Duck go you don't have
to pay exactly I mean it's similar to the argument of open using open
source right yeah exactly what do you just when I typed in Hacker Public
Radio there for instance I get some pages obviously that are from Wikipedia
because it says Wikipedia and I see the Facebook link there but results brought
you by Bing built on Yahoo you know what does this mean is is also there yes does
that refer to a single search or does it refer to all of them so when you use
the Bing or Yahoo APIs they require attribution so it is that's how I've
decided to be good to them and display it which I think is fair yeah but it's
what it's not doing is using their feeds exactly which is why it doesn't
look the same so really all that means is that on some calls we use their
services for some things and but it's giving them proper attribution yeah
on your right and proper I guess I must say once I guess once people start
using Duk Duk Go and okay I'll ask the question why Duk Duk Go calm I wish I
had a decent answer for you it popped into my head one day and I really liked it
and my wife liked it so I went with it I'm generally super bad at names and it
seemed like a good name so I just went with it there is a you're not from the
U.S. so there there is a childhood game in the U.S. called Duck Duck Goose yeah
which is probably where it was derived from in my head but it was they're gonna
be coming after you for copyright infringement right it's nothing related to that
absolutely but you probably have a different name for the game or something
it's like where children chase around each other in a circle after they tap
each other on the head and we call it Tigga thank you what do you call it Tig
Tigga yeah or tag but I must say coming from Ireland you should never let the truth stand
on the way of a good story so you can always I'll have the listeners here send in
better better stories where the name came from now I'm completely lost
sorry no that's fine what I wanted to say was once you get over the it's not Google
thing I was very comfortable putting it as my home page especially because when you go into search
preferences you can specify that it's HTTPS that your local country is the Netherlands that your
language is going to be this that you want the links on the side you do or you don't want the ads
all that sort of stuff comes up and you can have it as a URL parameter on the bottom of your page
so you seem to be really serious about this whole no tracking privacy thing why do you feel so
strongly about that I you know got into it not really thinking about that stuff at all and it
wasn't like a core thing when I started but then we're really triggered it was some well two
things really one there were some comments on Reddit that were like why are you stored in the
stuff and I had never really thought about it you know it's really the default when you turn on
your web server that it prints out IP addresses and then I Google came out with a report that was
saying how many requests they have gotten from governments across the world and law enforcement
and that they have to deal with these and court and whatnot and you know I looked at it and I
thought a it's actually pretty creepy that I would know what people are searching for you know
yeah and b I don't want to deal with any law enforcement requests whatsoever so that the first
one is more of a privacy user protection creepiness thing a second is more of a personal preference
I just don't want to deal with it you know yeah and so that's sort of where it came from and then
once it became that then I then you have the the mode of thinking about it and once I started
thinking about it over time incrementally some from user feedback some from my own I realized
there's a whole bunch of other things you can do and other leaks that were going on that we could
you know close and so I've done that over time yeah it is I must say it's what's what I like about
it is it looks like Google you know 10 years ago when I started off yeah it was just you type in
your question you get the results that's it thank you very much ma'am and just to let the listeners know
if you know if you're there and you've run a search on drco and it doesn't work out you can just
go back up to the page put an exclamation mark g and a space and then it'll put the whole
that whole string and send it over to google for you that's a stroke genius can you tell us more
about that sort of that functionality yeah it's called I had called it bang syntax off of the
unix bang and what it does is it'll send it you know there's different commands for different
search engines but the basic ideas it'll send your query anywhere you want to so you can even do that
on the home page you can bypass us altogether and send it right to google or amazon or you know
wherever you want to go there's a there's a thousand of the commands at this point so a lot of
sites are covered slash w is wikipedia for for so the genesis was really I actually just built
this feature for myself and didn't really have any intention of exposing it because originally I
just wanted I constantly I'm searching cpan because it's written in mainly in pearl and cpan is
where all the pearl modules are stored yeah and going to search the cpan at org is a pain and
then you got a search and cpan search their site is already so slow yes that skipping the first
page just saved me a lot of time and so I just built this in and then over time I realized
every time I showed it to someone they're like what are you doing and then they thought I was
pretty interesting so the next I exposed it and then over time people who you know understand
the syntax seem to really like it yeah I think it's uh I think it's fantastic I now have uh
ducto go on all my own machines as the home page and it's right there it's up in two seconds there's
no java script going on in the background there's no notifications uh really sends google pluses
come on it's it's turns the google home page into an application and you know I just can't be dealing
with that first thing in the morning I just said no to go back to the question about the servers and
stuff can you give us a background of your massive data center that's out there in uh Philly
so I started um using a ISP local here and running our own servers because that's what I had been
used to um when that sort of somewhat recently but really sort of reach capacity I've switched
to amazon EC2 um so that's really the front where all the front-end stuff is right now um
and unfortunately I had to move in doing that from free bsd to a boon two um or telenex and uh
but it's worked out well so far I mean there's there's things I like and don't like about EC2 but it
generally is a great alternative okay uh it's feeling they play more stuff that we all know and love
why uh why is it a problem going from bsd to linux um there's no real problem with it it's just that
I had been doing uh you've been using bsd for a long time and was less familiar with linux you know
aside I had all these scripts and things that were pretty bsd specific that I had to like you know
port over and that was just sort of a one-time pain yeah I got you um so it runs um you mentioned
pearl it runs on a boot to bsd um so what else is what else is there we're using engine x mainly
for the web server um we use a bunch of different data stores uh solar postgres uh flat files
this cdb which is a weird read-only database format um we use memcash for caching
then there's a bunch of other side components that some other people have written in like
there's some python stuff and um there's uh so we have a jabber um client that answers things over
iam and that's written in node um okay but yeah mainly the bulk of it is in pearl and javascript
so there's there's there's you know some front-end stuff that helps do all these externally
api calls um most of that's in javascript looking on your wikipedia page which is cool to have a
wikipedia page um it says that you started off a size with the option uh a community size with
with the view to open sourcing is is that something that's going to happen or is a practical or
so we have yeah so i am very much focused on on doing more of that we have a github account
with a bunch of repositories now and um i'm trying to people write all the time asking to help out
if they can have out and i'm trying to make it so anyway any time someone helps that'll be open
source um or if they want and so more and more of it's coming open source the actual core
some of the core stuff i don't i'm not prepared yet to open source for various business reasons
but i definitely would like to open source smart as possible okay speaking of business uh where
you're gonna make your money from on this so as you noted before there are a few ads on it but um
you can turn them off um that it turns out to be enough um for the moment to break even and
hopefully over time it'll you know become more profitable yeah um surprisingly
unsurprisingly i had the i was very impressed that the option was there to turn it off and uh
did for a second to see if yes in fact it turns them off and then i turned them back on because uh
well hey why not give you uh give you a few shackles if um it works you also do affiliate programs
via amazon and and the like yep that's right the um we're somewhat limited in that approach because
well we're so exploring but basically i'm wary of going through third parties which can do some
tracking you know yeah um although the the good thing about amazon and ebay is that they they run
their own affiliate programs so you don't have to do that okay very good um the one thing that would
prevent me from moving from google of course would be the the changing logo from time to time
yeah until you get that fixed i'm afraid i'm gonna have to stay with google
okay um we're actually working on setting a custom logo um but i actually see here i'm a bit
bit facetious here yeah i know i know we see that you do have uh you're being facetious but
there you'd be surprised that i get actually requests usually i'd say from the UK more than not
that it's the duck is too unprofessional i can't use your search engine please professional enough
i am serious this is this comes up this comes up a lot actually and it seems a bit crazy
to me but um because we have all these sort of cool logos that people made i wanted to make the
option in any case for you able to set um one of these alternative logos but at that point
you might as well just let people set their own logo yeah well just to let people know that if you
do login you have the logo the duck logo does change from time to time for different celebrations
it is kind of cool um you know you should name the other one uh duck duck pro
we better go register that before before this errors um was there anything else i'm just looking
down through the uh list of questions that i had was there anything else that i missed or
that you'd like to bring up um no but if you'd like uh i don't know how a technical
audience says but i'm happy to answer any other weird off-technical questions you have
our audience varies from the novice right up to the the uh
geekiest of the geeks so um if there's anything you want to tell us uh for free to do so
you can um sorry come no i don't have anything i would just say um after listening to this whole
podcast he should give it a try yes you you should definitely do that the way to do that is to
set it as your default for a week because you sort of have to give it a little bit of time yeah
and there is that phase where it isn't uh uh it isn't google and once you get over that
it's it actually turns out to be quite nice especially they uh the red box gives you a lot of
interesting information and because you can see the sources that it comes from Wikipedia or
it comes from archive.org then you can uh i find myself trusting it more you know
or the red the stuff on the red box i know is going to be an actual result um okay i just will
last thing here you see in your website that you give 10% of your income to free and open source
projects yes what prompted you to do that well i had been sort of like we talked about before
I rely on these external APIs and usually those are businesses and in some sense that's a win win
or we even will pay but in the case of open source i mean our the deducted go is essentially built on
open source software and we're not you know paying for that obviously um and so i wanted to
encapsulate that by giving a giving something back i mean i'd use it also my previous company and
we didn't really do that um but i thought this would be a good way and honestly i hope to uh
it's it's by other people do the same that hasn't happened too much yet but um either way
i enjoy doing it okay so how does yahoo make money on your search results if if they
are put into your page and they have no tracking or any other information
they charge us per call uh so by searching on duck to go the money is going to eventually go into
Microsoft to yahoo to Microsoft yes at least you know where it's going okay how much is a
requester can i not ask that question oh it's uh you know i actually it varies by
kipa call and i don't have it off top of my head but if you just search yahoo boss yeah
they're pricing is you know public and everything okay um the just to be clear right if you want to
mess around with the Bing API there is a free Bing API but you you're limited if you um you can't
commercialize it yeah i understand no i i'm must say i'm very very happy with the with the results
that i'm getting back i like the um exclamation mark or the bank command it's pretty cool
and uh yeah look forward to uh the surge of interest coming from the HPR community as uh once this
show good is aired well thank you very much okay um shall we call this uh show call us a day
there sounds good okay uh just like to thank you very very much for coming on and uh
recording this it's i'll um i can't tell you when it's going to be up it'll probably next week
but i'll send you a link to us when uh it gets posted okay thank you again for having me no problem bye
thank you for listening to Hacker Public Radio for more information on the show
and how to contribute your own shows visit Hacker Public Radio dot org
you