283 lines
23 KiB
Plaintext
283 lines
23 KiB
Plaintext
|
|
Episode: 773
|
||
|
|
Title: HPR0773: Gabriel Weinberg of DuckDuckGo
|
||
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0773/hpr0773.mp3
|
||
|
|
Transcribed: 2025-10-08 02:11:47
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
.
|
||
|
|
Hello ladies and gentlemen my name is Kent Falon and today I'm very pleased to
|
||
|
|
bring you an interview with the founder of Duck Duck Goh Mr. Gabriel Weinberg
|
||
|
|
hi Gabriel did I pronounce that correctly Gabriel but thank you for having me
|
||
|
|
hi so you're the founder of Duck Duck Goh just in case any of our listeners
|
||
|
|
don't know can you tell us a little bit about the site sure there to go is a
|
||
|
|
general purpose search engine so it's designed to replace your generally
|
||
|
|
Google use it does a number of things sort of differently better but you know
|
||
|
|
generally it works the same as a regular search engine we've tried to focus on
|
||
|
|
a lot more instant answers above the results and we also focus on getting
|
||
|
|
rid of a lot of spam and irrelevant results and then we the third thing we
|
||
|
|
focused on is privacy and we try to say that you know we have real privacy
|
||
|
|
which means that we really don't know who you are when you search and there's
|
||
|
|
you know no way for us to track you so aside from IP addresses I guess and
|
||
|
|
cookies we don't store IP addresses actually and we don't say cookies by default
|
||
|
|
we also don't store user agents which have been known to track you're be
|
||
|
|
able to track your sessions even if you don't store those other things and
|
||
|
|
yeah and they're doing stuff now with fonts on your PC to be able to identify
|
||
|
|
your browser yeah the FF put out a tool that you're referring to is you can do
|
||
|
|
like a unique fingerprinting of your browser which is pretty interesting but
|
||
|
|
yeah we obviously don't store any of that stuff either okay just to give the
|
||
|
|
listeners a little bit of a background so tell me how you ended up did you just
|
||
|
|
wake up one day and decide okay I want to go against up against them you know
|
||
|
|
most powerful company in the world and do a better search engine or how did
|
||
|
|
this come about it came about more of a sort of dissatisfaction with Google and
|
||
|
|
then a recognition that I was messing around with Wikipedia and delicious and
|
||
|
|
finding often better links external links in those sources and so I was
|
||
|
|
actually going there to search for things instead of Google and you know
|
||
|
|
the thought occurred to me okay well what if you take this to the logical
|
||
|
|
conclusion Google's getting more and more spam in it and there's more and more
|
||
|
|
of these external sources crowdsource APIs that have good results on so what if
|
||
|
|
you sort of mashed those all together would it create a decent search engine and
|
||
|
|
so I built sort of a prototype on a weekend and I messed around with it and
|
||
|
|
liked it and sort of just grew organically from there okay fantastic it does
|
||
|
|
I must say I've been using it for a while now and it gives different results to
|
||
|
|
Google I guess I guess people are now trained that the Google results are the
|
||
|
|
correct search results can you tell me why that would be you know why I might
|
||
|
|
want to why am I getting different results in your search engine as opposed to
|
||
|
|
if I type in hacker public radio for instance and into Google why do I get
|
||
|
|
two different results right so the short answer is is that you know obviously
|
||
|
|
Google and Dr. Go are somewhat black boxes and we don't know each other's
|
||
|
|
algorithms so they're naturally going to lead to somewhat different results but
|
||
|
|
a sort of a more there's more of a philosophical answer I guess which is that you
|
||
|
|
know each search engine sort of tries to concentrate on different things and
|
||
|
|
so what we've tried to concentrate on more is this notion of zero click info this
|
||
|
|
more conceptual results and we'd also try to do things like this is where
|
||
|
|
different from Google more where we'll really respect your query more and not
|
||
|
|
try to change it around and give you results for things that are slightly
|
||
|
|
different than what you searched for and we're way more aggressive at removing
|
||
|
|
the you know SEO type of sites so you'll see a lot less of those but
|
||
|
|
ultimately it depends on a given query you know what's going on and so these are
|
||
|
|
just somewhat generalizations but one thing I'd say is that like yeah we
|
||
|
|
definitely give different results than Google so there there is definitely if
|
||
|
|
you're having trouble with doing deep searches there's really a reason to go
|
||
|
|
to other search engines and use multiple ones because you will get different
|
||
|
|
results I just saw on your webpage that's this whole concept of a bubble I wonder
|
||
|
|
could you explain that to the listeners yeah so there's this concept that was
|
||
|
|
best or more recently nicely enumerated a book called The Filter Bubble and the
|
||
|
|
concept goes like this when you search at Google now and most other search engines
|
||
|
|
and even other sites use other sites like Facebook they're using what you
|
||
|
|
previously done on those sites I you Google what you've clicked on and what
|
||
|
|
you search for to tailor your results on different searches so when you search
|
||
|
|
for you know something like climate change it may be impacted by what you
|
||
|
|
previously searched for and where you live and things like that even though you
|
||
|
|
don't that may think those things are related and so what that means is because
|
||
|
|
you often search and click for things that you like you end up seeing more and
|
||
|
|
more of things that Google thinks you like and that may leave out some opposing
|
||
|
|
viewpoints and you know things that you're less likely to click on but
|
||
|
|
otherwise contain information that's valid so what they call us is a bubble
|
||
|
|
because you're sort of living in a bubble that Google is presenting to you and
|
||
|
|
you're you're missing things that are outside of that bubble which generally
|
||
|
|
over time may contain opposing information to your core beliefs okay I actually
|
||
|
|
did the example on your website and although I don't search that much in Google I
|
||
|
|
did end up getting very different results than what was on the what was on the
|
||
|
|
example page for other people so they're obviously taking Google reader into
|
||
|
|
account as well so it's kind of scary stuff I wonder could you just tell us a
|
||
|
|
little bit about your own background and how you you know what your educational
|
||
|
|
background is how you kind of got into doing these servers and putting
|
||
|
|
this stuff together what it runs on that sort of thing sure so I'll start with
|
||
|
|
background and you can hit me up for the server questions yeah so I I grew up
|
||
|
|
in Atlanta I spent a couple years in Philippines before that and then I went to
|
||
|
|
college in Boston Massachusetts at MIT I got a degree in physics and then I
|
||
|
|
graduate degree in technology and policy and then basically right out of
|
||
|
|
school I started doing startup stuff and I started an educational software
|
||
|
|
company that was about increasing parental involvement in schools and that
|
||
|
|
ultimately didn't really go very far and then I started another company that
|
||
|
|
was about finding old friends and classmates pre-facebook yeah that did
|
||
|
|
that was one that did well and I sold that in 2006 which enabled me to sort of
|
||
|
|
take on this bigger problem duck to go you know so then about a year after that
|
||
|
|
I started you know mess around duck to go and I've been doing it for about three
|
||
|
|
years now I'm sure you have a massive team of people working behind the scenes
|
||
|
|
there so can you tell me how many people are employed full-time on duck to go
|
||
|
|
you would be surprised maybe you already know but it's just me full-time still at
|
||
|
|
the moment although there are I don't want to short sell other people's
|
||
|
|
contributions because there are many people who have contributed you know
|
||
|
|
significantly but I'm still the only full-time person okay and how how in the
|
||
|
|
name of all can you attempt to to best the likes and resources of Google or
|
||
|
|
Yahoo or Bing or anybody else for that matter this is the beauty of the
|
||
|
|
external API age so I did initially start out doing my own crawling and
|
||
|
|
everything and building everything from the ground up but quickly realized of
|
||
|
|
course that you know you need you mean Microsoft and Google basically spent
|
||
|
|
hundreds of millions dollars a year on that component alone so obviously I'm not
|
||
|
|
doing that right and so what happened was is Yahoo boss came out which was
|
||
|
|
there exposing their search feed and I decided I could use that and concentrate
|
||
|
|
on value ads identified which are you know adding sort of better results on
|
||
|
|
top of theirs and getting rid of spam and then over time it also turned into
|
||
|
|
changing their results a lot which is why you see our results are pretty
|
||
|
|
different than Bing and Yahoo even though we use their feeds and so now it's
|
||
|
|
it's a and then we also started using a bunch of external other external APIs
|
||
|
|
like well from Alpha and there's about you know 40 sources and so what you get
|
||
|
|
is an amalgamation of our code plus everyone else's code which is sort of
|
||
|
|
what I call hybrid search engine and what that really enables you to do is if
|
||
|
|
I'm using say like a I want to get good music results say right then I'm
|
||
|
|
going to use a music API from a company that just concentrates on music it's as
|
||
|
|
if we had employees that we're doing that but we can use it just based on
|
||
|
|
calling their API so if you think about it we have tons and tons of people
|
||
|
|
working for us but they're not working directly for Duck Duck go you don't have
|
||
|
|
to pay exactly I mean it's similar to the argument of open using open
|
||
|
|
source right yeah exactly what do you just when I typed in Hacker Public
|
||
|
|
Radio there for instance I get some pages obviously that are from Wikipedia
|
||
|
|
because it says Wikipedia and I see the Facebook link there but results brought
|
||
|
|
you by Bing built on Yahoo you know what does this mean is is also there yes does
|
||
|
|
that refer to a single search or does it refer to all of them so when you use
|
||
|
|
the Bing or Yahoo APIs they require attribution so it is that's how I've
|
||
|
|
decided to be good to them and display it which I think is fair yeah but it's
|
||
|
|
what it's not doing is using their feeds exactly which is why it doesn't
|
||
|
|
look the same so really all that means is that on some calls we use their
|
||
|
|
services for some things and but it's giving them proper attribution yeah
|
||
|
|
on your right and proper I guess I must say once I guess once people start
|
||
|
|
using Duk Duk Go and okay I'll ask the question why Duk Duk Go calm I wish I
|
||
|
|
had a decent answer for you it popped into my head one day and I really liked it
|
||
|
|
and my wife liked it so I went with it I'm generally super bad at names and it
|
||
|
|
seemed like a good name so I just went with it there is a you're not from the
|
||
|
|
U.S. so there there is a childhood game in the U.S. called Duck Duck Goose yeah
|
||
|
|
which is probably where it was derived from in my head but it was they're gonna
|
||
|
|
be coming after you for copyright infringement right it's nothing related to that
|
||
|
|
absolutely but you probably have a different name for the game or something
|
||
|
|
it's like where children chase around each other in a circle after they tap
|
||
|
|
each other on the head and we call it Tigga thank you what do you call it Tig
|
||
|
|
Tigga yeah or tag but I must say coming from Ireland you should never let the truth stand
|
||
|
|
on the way of a good story so you can always I'll have the listeners here send in
|
||
|
|
better better stories where the name came from now I'm completely lost
|
||
|
|
sorry no that's fine what I wanted to say was once you get over the it's not Google
|
||
|
|
thing I was very comfortable putting it as my home page especially because when you go into search
|
||
|
|
preferences you can specify that it's HTTPS that your local country is the Netherlands that your
|
||
|
|
language is going to be this that you want the links on the side you do or you don't want the ads
|
||
|
|
all that sort of stuff comes up and you can have it as a URL parameter on the bottom of your page
|
||
|
|
so you seem to be really serious about this whole no tracking privacy thing why do you feel so
|
||
|
|
strongly about that I you know got into it not really thinking about that stuff at all and it
|
||
|
|
wasn't like a core thing when I started but then we're really triggered it was some well two
|
||
|
|
things really one there were some comments on Reddit that were like why are you stored in the
|
||
|
|
stuff and I had never really thought about it you know it's really the default when you turn on
|
||
|
|
your web server that it prints out IP addresses and then I Google came out with a report that was
|
||
|
|
saying how many requests they have gotten from governments across the world and law enforcement
|
||
|
|
and that they have to deal with these and court and whatnot and you know I looked at it and I
|
||
|
|
thought a it's actually pretty creepy that I would know what people are searching for you know
|
||
|
|
yeah and b I don't want to deal with any law enforcement requests whatsoever so that the first
|
||
|
|
one is more of a privacy user protection creepiness thing a second is more of a personal preference
|
||
|
|
I just don't want to deal with it you know yeah and so that's sort of where it came from and then
|
||
|
|
once it became that then I then you have the the mode of thinking about it and once I started
|
||
|
|
thinking about it over time incrementally some from user feedback some from my own I realized
|
||
|
|
there's a whole bunch of other things you can do and other leaks that were going on that we could
|
||
|
|
you know close and so I've done that over time yeah it is I must say it's what's what I like about
|
||
|
|
it is it looks like Google you know 10 years ago when I started off yeah it was just you type in
|
||
|
|
your question you get the results that's it thank you very much ma'am and just to let the listeners know
|
||
|
|
if you know if you're there and you've run a search on drco and it doesn't work out you can just
|
||
|
|
go back up to the page put an exclamation mark g and a space and then it'll put the whole
|
||
|
|
that whole string and send it over to google for you that's a stroke genius can you tell us more
|
||
|
|
about that sort of that functionality yeah it's called I had called it bang syntax off of the
|
||
|
|
unix bang and what it does is it'll send it you know there's different commands for different
|
||
|
|
search engines but the basic ideas it'll send your query anywhere you want to so you can even do that
|
||
|
|
on the home page you can bypass us altogether and send it right to google or amazon or you know
|
||
|
|
wherever you want to go there's a there's a thousand of the commands at this point so a lot of
|
||
|
|
sites are covered slash w is wikipedia for for so the genesis was really I actually just built
|
||
|
|
this feature for myself and didn't really have any intention of exposing it because originally I
|
||
|
|
just wanted I constantly I'm searching cpan because it's written in mainly in pearl and cpan is
|
||
|
|
where all the pearl modules are stored yeah and going to search the cpan at org is a pain and
|
||
|
|
then you got a search and cpan search their site is already so slow yes that skipping the first
|
||
|
|
page just saved me a lot of time and so I just built this in and then over time I realized
|
||
|
|
every time I showed it to someone they're like what are you doing and then they thought I was
|
||
|
|
pretty interesting so the next I exposed it and then over time people who you know understand
|
||
|
|
the syntax seem to really like it yeah I think it's uh I think it's fantastic I now have uh
|
||
|
|
ducto go on all my own machines as the home page and it's right there it's up in two seconds there's
|
||
|
|
no java script going on in the background there's no notifications uh really sends google pluses
|
||
|
|
come on it's it's turns the google home page into an application and you know I just can't be dealing
|
||
|
|
with that first thing in the morning I just said no to go back to the question about the servers and
|
||
|
|
stuff can you give us a background of your massive data center that's out there in uh Philly
|
||
|
|
so I started um using a ISP local here and running our own servers because that's what I had been
|
||
|
|
used to um when that sort of somewhat recently but really sort of reach capacity I've switched
|
||
|
|
to amazon EC2 um so that's really the front where all the front-end stuff is right now um
|
||
|
|
and unfortunately I had to move in doing that from free bsd to a boon two um or telenex and uh
|
||
|
|
but it's worked out well so far I mean there's there's things I like and don't like about EC2 but it
|
||
|
|
generally is a great alternative okay uh it's feeling they play more stuff that we all know and love
|
||
|
|
why uh why is it a problem going from bsd to linux um there's no real problem with it it's just that
|
||
|
|
I had been doing uh you've been using bsd for a long time and was less familiar with linux you know
|
||
|
|
aside I had all these scripts and things that were pretty bsd specific that I had to like you know
|
||
|
|
port over and that was just sort of a one-time pain yeah I got you um so it runs um you mentioned
|
||
|
|
pearl it runs on a boot to bsd um so what else is what else is there we're using engine x mainly
|
||
|
|
for the web server um we use a bunch of different data stores uh solar postgres uh flat files
|
||
|
|
this cdb which is a weird read-only database format um we use memcash for caching
|
||
|
|
then there's a bunch of other side components that some other people have written in like
|
||
|
|
there's some python stuff and um there's uh so we have a jabber um client that answers things over
|
||
|
|
iam and that's written in node um okay but yeah mainly the bulk of it is in pearl and javascript
|
||
|
|
so there's there's there's you know some front-end stuff that helps do all these externally
|
||
|
|
api calls um most of that's in javascript looking on your wikipedia page which is cool to have a
|
||
|
|
wikipedia page um it says that you started off a size with the option uh a community size with
|
||
|
|
with the view to open sourcing is is that something that's going to happen or is a practical or
|
||
|
|
so we have yeah so i am very much focused on on doing more of that we have a github account
|
||
|
|
with a bunch of repositories now and um i'm trying to people write all the time asking to help out
|
||
|
|
if they can have out and i'm trying to make it so anyway any time someone helps that'll be open
|
||
|
|
source um or if they want and so more and more of it's coming open source the actual core
|
||
|
|
some of the core stuff i don't i'm not prepared yet to open source for various business reasons
|
||
|
|
but i definitely would like to open source smart as possible okay speaking of business uh where
|
||
|
|
you're gonna make your money from on this so as you noted before there are a few ads on it but um
|
||
|
|
you can turn them off um that it turns out to be enough um for the moment to break even and
|
||
|
|
hopefully over time it'll you know become more profitable yeah um surprisingly
|
||
|
|
unsurprisingly i had the i was very impressed that the option was there to turn it off and uh
|
||
|
|
did for a second to see if yes in fact it turns them off and then i turned them back on because uh
|
||
|
|
well hey why not give you uh give you a few shackles if um it works you also do affiliate programs
|
||
|
|
via amazon and and the like yep that's right the um we're somewhat limited in that approach because
|
||
|
|
well we're so exploring but basically i'm wary of going through third parties which can do some
|
||
|
|
tracking you know yeah um although the the good thing about amazon and ebay is that they they run
|
||
|
|
their own affiliate programs so you don't have to do that okay very good um the one thing that would
|
||
|
|
prevent me from moving from google of course would be the the changing logo from time to time
|
||
|
|
yeah until you get that fixed i'm afraid i'm gonna have to stay with google
|
||
|
|
okay um we're actually working on setting a custom logo um but i actually see here i'm a bit
|
||
|
|
bit facetious here yeah i know i know we see that you do have uh you're being facetious but
|
||
|
|
there you'd be surprised that i get actually requests usually i'd say from the UK more than not
|
||
|
|
that it's the duck is too unprofessional i can't use your search engine please professional enough
|
||
|
|
i am serious this is this comes up this comes up a lot actually and it seems a bit crazy
|
||
|
|
to me but um because we have all these sort of cool logos that people made i wanted to make the
|
||
|
|
option in any case for you able to set um one of these alternative logos but at that point
|
||
|
|
you might as well just let people set their own logo yeah well just to let people know that if you
|
||
|
|
do login you have the logo the duck logo does change from time to time for different celebrations
|
||
|
|
it is kind of cool um you know you should name the other one uh duck duck pro
|
||
|
|
we better go register that before before this errors um was there anything else i'm just looking
|
||
|
|
down through the uh list of questions that i had was there anything else that i missed or
|
||
|
|
that you'd like to bring up um no but if you'd like uh i don't know how a technical
|
||
|
|
audience says but i'm happy to answer any other weird off-technical questions you have
|
||
|
|
our audience varies from the novice right up to the the uh
|
||
|
|
geekiest of the geeks so um if there's anything you want to tell us uh for free to do so
|
||
|
|
you can um sorry come no i don't have anything i would just say um after listening to this whole
|
||
|
|
podcast he should give it a try yes you you should definitely do that the way to do that is to
|
||
|
|
set it as your default for a week because you sort of have to give it a little bit of time yeah
|
||
|
|
and there is that phase where it isn't uh uh it isn't google and once you get over that
|
||
|
|
it's it actually turns out to be quite nice especially they uh the red box gives you a lot of
|
||
|
|
interesting information and because you can see the sources that it comes from Wikipedia or
|
||
|
|
it comes from archive.org then you can uh i find myself trusting it more you know
|
||
|
|
or the red the stuff on the red box i know is going to be an actual result um okay i just will
|
||
|
|
last thing here you see in your website that you give 10% of your income to free and open source
|
||
|
|
projects yes what prompted you to do that well i had been sort of like we talked about before
|
||
|
|
I rely on these external APIs and usually those are businesses and in some sense that's a win win
|
||
|
|
or we even will pay but in the case of open source i mean our the deducted go is essentially built on
|
||
|
|
open source software and we're not you know paying for that obviously um and so i wanted to
|
||
|
|
encapsulate that by giving a giving something back i mean i'd use it also my previous company and
|
||
|
|
we didn't really do that um but i thought this would be a good way and honestly i hope to uh
|
||
|
|
it's it's by other people do the same that hasn't happened too much yet but um either way
|
||
|
|
i enjoy doing it okay so how does yahoo make money on your search results if if they
|
||
|
|
are put into your page and they have no tracking or any other information
|
||
|
|
they charge us per call uh so by searching on duck to go the money is going to eventually go into
|
||
|
|
Microsoft to yahoo to Microsoft yes at least you know where it's going okay how much is a
|
||
|
|
requester can i not ask that question oh it's uh you know i actually it varies by
|
||
|
|
kipa call and i don't have it off top of my head but if you just search yahoo boss yeah
|
||
|
|
they're pricing is you know public and everything okay um the just to be clear right if you want to
|
||
|
|
mess around with the Bing API there is a free Bing API but you you're limited if you um you can't
|
||
|
|
commercialize it yeah i understand no i i'm must say i'm very very happy with the with the results
|
||
|
|
that i'm getting back i like the um exclamation mark or the bank command it's pretty cool
|
||
|
|
and uh yeah look forward to uh the surge of interest coming from the HPR community as uh once this
|
||
|
|
show good is aired well thank you very much okay um shall we call this uh show call us a day
|
||
|
|
there sounds good okay uh just like to thank you very very much for coming on and uh
|
||
|
|
recording this it's i'll um i can't tell you when it's going to be up it'll probably next week
|
||
|
|
but i'll send you a link to us when uh it gets posted okay thank you again for having me no problem bye
|
||
|
|
thank you for listening to Hacker Public Radio for more information on the show
|
||
|
|
and how to contribute your own shows visit Hacker Public Radio dot org
|
||
|
|
you
|