Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
440
hpr_transcripts/hpr3648.txt
Normal file
440
hpr_transcripts/hpr3648.txt
Normal file
@@ -0,0 +1,440 @@
|
||||
Episode: 3648
|
||||
Title: HPR3648: A response to tomorrows show
|
||||
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3648/hpr3648.mp3
|
||||
Transcribed: 2025-10-25 02:48:23
|
||||
|
||||
---
|
||||
|
||||
This is Hacker Public Radio Episode 3648 for Wednesday the 27th of July 2022.
|
||||
Today's show is entitled A Response to Tomorrow's Show.
|
||||
It is hosted by Ken Fallon and is about 28 minutes long.
|
||||
It carries an explicit flag.
|
||||
The summary is, Ken brings the DeLorean up to Earth to address Monochromex comment
|
||||
on stats.
|
||||
Back to the future.
|
||||
Hi everybody, my name is Ken Fallon and you're listening to another episode of Hacker Public
|
||||
Radio.
|
||||
In Tomorrow's show, the Linux in-laws travel back in time to bring us reports from the
|
||||
future.
|
||||
Unfortunately, they took a left turn down the wrong leg of the trousers of time and ended
|
||||
up making some wrong assumptions about how popular they are.
|
||||
Although the entire show is a spoof based around their meteororic rise to him unfortunate,
|
||||
it's a segment 6 minutes 31 to 10 minutes 56 that I want to discuss.
|
||||
It more or less comes down to the following quote.
|
||||
But if I take a look at ARCA or if we take a look at ARCA for the last one year or almost
|
||||
almost one year and a half, we clock in on average between 1500 and 2500 listeners.
|
||||
Given the fact that we have launched this podcast, short of two and a half years ago,
|
||||
that's quite amazing.
|
||||
And then later, on average, we are listened to by anything between 5000 and 10,000 business
|
||||
per episode.
|
||||
Given the fact that, as I said, quite a few people syndicate us, we're just pointing
|
||||
the right place.
|
||||
So you guys, are you sure you got the decimal point in the right place?
|
||||
Maybe I'm off by magnitude, so maybe just 50,000 to 100,000 people.
|
||||
So the logic employed here is tick the downloads from one site, multiply that number by this
|
||||
number of syndicated sites, and that will give you the total downloads.
|
||||
Now I think we can do a lot better than a mere 100,000.
|
||||
So first thing, let's look in the Hacker Public Radio logs.
|
||||
For example, episode HPR 3609 is their latest one.
|
||||
Linux employs 0.01, episode 57, operating system level virtualization and martens fit.
|
||||
So a simple grip, dash i, six, three, six, zero, nine, asterix.log, and pipe that to WC,
|
||||
dash l, or account, that'll give us a total of 8,421, and we do a quick Google search
|
||||
for Linux outlaws, and that turns 564,000 results in 0.54 seconds, no less.
|
||||
So if you multiply the number of hits on HPR by the number of results you get in Google,
|
||||
you arrive at an estimated listenership of 4,749,444,000, now let's round that up to
|
||||
around 5 billion shall we, and say that's pretty impressive, but we can go further, because
|
||||
that's just from one show.
|
||||
We've already released 57 episodes, so the listenership has to be 57 times greater.
|
||||
So obligatory drumroll, and you can logically say that the Linux in law show has a total
|
||||
of 270 billion, 718 million, 3,000, and 8 subscribers.
|
||||
Let's round that up to an even 271 billion shall we.
|
||||
Now that is impressive, given the fact that the total number of people who lived on planet
|
||||
earth ever is 108 billion.
|
||||
So all messing aside, there's something wrong with my logic definitely, but the question
|
||||
is, is there something wrong with their logic?
|
||||
Separating the wheat from the chaff.
|
||||
When I searched through hacker public radio logs earlier, it returned 8,421 hits, but the
|
||||
internet archive only shows 1,493 downloads.
|
||||
So what's going on?
|
||||
Well you guessed it, our logs contain a lot more than just download records.
|
||||
We need to limit ourselves to counting the media for a start, and that reduces us by 3,713
|
||||
log lines.
|
||||
For those interested, gone are 2,169 references where the number 3609 appeared.
|
||||
For example, 5, 6, 3, 6, 0, 9, 6 bytes in a log line number.
|
||||
There were 1,107 hits to the episode page itself.
|
||||
There were 111 hits to a page on the mailing list unrelated to this.
|
||||
42 hits were version numbers in Safari, 154 were version numbers from Chrome, from the
|
||||
user agent string, 22 hits were from web crawlers and bots etc.
|
||||
And of course, 108 hits were attacks, and that's fairly typical.
|
||||
So now looking just at the 4,700 and 8 media files, 21 of those were bots that can be
|
||||
eliminated.
|
||||
And 544 were head methods, not get methods.
|
||||
So the head method is identical to the get method except that server must never return
|
||||
a response.
|
||||
And the reason people would do this is to check and see if the file has been changed or
|
||||
not.
|
||||
And then the rest were duplicate IP addresses.
|
||||
So that leaves a total of 1,079, hold on, that's 414 less than what we were saying was
|
||||
actually downloaded.
|
||||
It turns out that people download the same episode several times on different days.
|
||||
So when you put those back, you get 1,493.
|
||||
Now that begs the question, should you count uni kits per day or just uni kits in general?
|
||||
On the other hand, a single IP address might be hiding multiple downloads, for example
|
||||
in a university or in a company or something like that behind a firewall.
|
||||
So I hope you can see this as an exact science.
|
||||
And even so at the end of the day, the simple fact is just because somebody downloads a show
|
||||
does not mean that they actually listen to it.
|
||||
Stats, I hate them.
|
||||
We love them.
|
||||
They also say that Hacker Public Radio doesn't like stats.
|
||||
Well, that's where they're wrong.
|
||||
It's just me that doesn't like them.
|
||||
And that's because generating them is a waste of time.
|
||||
There is no true figure that you're ever going to arrive at.
|
||||
Producing the figures for this show has taken two weeks of my free time, but at least we
|
||||
get a show.
|
||||
So I'm happy about that.
|
||||
In the process, I picked up two really cool tips from Libra Office Calc, which I'm going
|
||||
to share with you.
|
||||
However, every time this is discussed in the mail list, people really love statistics
|
||||
and want Hacker Public Radio to have them.
|
||||
I ended up putting them off for so long that the problem fixed itself.
|
||||
And now that we're hosting the main feed on the internet archive, we get statistics
|
||||
for free.
|
||||
But can you trust those figures on the internet archive?
|
||||
Yes.
|
||||
By and large, yes.
|
||||
And we can confirm this because we can pair their figures with what we get from the Hacker
|
||||
Public Radio web logs.
|
||||
I put a link in the show notes to how the internet archive works.
|
||||
Each item has a view counter.
|
||||
And by item, they mean like show all the multiple.
|
||||
If you list all the multiple media types, WAV, FLAG, MP3, etc. would all be under one
|
||||
media type.
|
||||
So each item has a counter, sorry, all the media will be one item.
|
||||
And the view counter is increased each time a user engages with a media item.
|
||||
A user cannot increase the view count of a particular item per day.
|
||||
So if I went over and listened to the MP3 and then the log, that's still only one interaction.
|
||||
If a user downloads our views, multiple items on the same day, that's only counted as
|
||||
one.
|
||||
Now we're doing more or less the same thing on HPR.
|
||||
The only difference is we count gets instead of interactions and we eliminate bots and
|
||||
crawlers.
|
||||
Presumably, the internet archive does something similar.
|
||||
So in the example, episode for Linux in Los, season one, episode 57, the internet archive
|
||||
reported 1,269 downloads while Hacker Public Radio reported 1,493.
|
||||
So that's a difference of 224, but that's okay.
|
||||
Sometimes it's more, sometimes it's less.
|
||||
Now to explain the difference, let's explain what is actually happening here.
|
||||
In an episode is published on Hacker Public Radio, it's added to the future RSS feed.
|
||||
And that feed only ever points to media hosted locally on the HPR server.
|
||||
And there's about 50 subscribers to that, give or take.
|
||||
On the other hand, the main feed, now at least, comes exclusively from the internet archive.
|
||||
So additionally, you're going to get discrepancies because initials played on the internet archive
|
||||
are only going to be counted over there.
|
||||
And initials played on the web page of the HPR website are only going to be counted on
|
||||
the HPR website.
|
||||
So there will always be differences on the download stats on both sides, but they're
|
||||
close enough for jazz, yeah.
|
||||
So the statement that the med that we clock in on average between 1,500 and 2,500 listeners
|
||||
is a smidgen of an exaggeration.
|
||||
The correct figures are 1,269 as the lowest and 2,240 as the highest.
|
||||
syndication
|
||||
Where the go astray is when they use that number and then guesstimate the listenership
|
||||
to be between 5 and 10,000 listeners per episode.
|
||||
And they feel justifies in using that number because I called, given the fact that quite
|
||||
a few people syndicate us.
|
||||
So some people might not know what the term syndication is and from Wikipedia, it says web syndication
|
||||
is a form of syndication in which content is made available from one website to other
|
||||
websites.
|
||||
That's not very helpful.
|
||||
So think of it as an old school content delivery network or content caching.
|
||||
And this is how it would work in theory.
|
||||
When the first client makes a request, the media would be retrieved from the hacker
|
||||
public radio.
|
||||
So instead of having the media pass through to their client, which would do, they would
|
||||
also save a copy locally on their servers.
|
||||
That way all requests for all subsequent requests for that file would be served from their
|
||||
local website on the syndication on the syndicated website.
|
||||
So anyone viewing the second or later versions of that media would not be registered in our
|
||||
logs because we wouldn't see it.
|
||||
Therefore, anything played there would not be counted in our internet stats.
|
||||
Actually, for syndicated websites, there is a way where there's a HGTP response that
|
||||
you can send over to say that this content has been played.
|
||||
So even in a syndication, you can register with the source website that it has been played
|
||||
or downloaded.
|
||||
Anyway, but I was immediately suspicious when I heard this, not just because the legal
|
||||
issues with hosting random media were because of the bandwidth costs involved.
|
||||
So fun fact, no, no, that is not what's happening.
|
||||
And you can prove this because fortunately, most popular web browsers have developer tools
|
||||
that let you confirm exactly what's happening on the network.
|
||||
So you can go to this right now, go to, for example, HackerPublic Radio website itself.
|
||||
Actually, that would be pointless.
|
||||
But go to Apple Podcasts, which is the example that we have in the show notes.
|
||||
And if you press and hold down Control, Shift, and then press I, and then go to the Network
|
||||
tab.
|
||||
And then you press on any episode.
|
||||
What you're going to see is, yeah, it does something on its own site.
|
||||
Then you're going to see a call going out to HackerPublic Radio.
|
||||
And we do a redirect to archive.org and then archive.org redirects it to one of the locals,
|
||||
one of their mirror sites.
|
||||
So basically that's what's happening.
|
||||
So I checked.
|
||||
I narrowed down the search using quotes around Linux in those podcasts and that returned
|
||||
a more manageable 1,810 results.
|
||||
And I limited myself to looking at all the pages given on Google and only looking at
|
||||
the ones that had a play button.
|
||||
And these are they in order of ones returned.
|
||||
So the first three are not using the HackerPublic Radio feed, but they XML feed from the Linux
|
||||
in those themselves.
|
||||
They are pod chaser, player FM and YouTube.
|
||||
All the rest of them are using the HackerPublic Radio feed.
|
||||
And those are Apple podcasts, Apple addict, getpodcast, archive.org, list notes, Spotify,
|
||||
G-Podder, digital podcast, podcast.de, hobby public radio, potency, and pod tail.
|
||||
I also checked two other websites, Google podcasts, which is the Linux in those feed as well.
|
||||
And I heard radio.
|
||||
And all of them except two are using.
|
||||
You can see when you open up all of them except two, you can see that they are coming directly
|
||||
from the internet archive.
|
||||
So these sites are not syndicating the content at all.
|
||||
They're just syndicating their MSS feed.
|
||||
So if you press play on that site, it will register as an item hit on the internet archive.
|
||||
The one site that isn't there is Spotify, not because they're not hosting the media but
|
||||
because they're obfuscating it.
|
||||
And we were able to confirm this by looking at the HPR logs and you see that the Spotify
|
||||
client user agent from different IP addresses requesting the same show is coming in.
|
||||
Now if that was being cached, we would see only one IP address coming in and also subsequent
|
||||
and we wouldn't see anything else for that media.
|
||||
The only case for syndication is actually happening is on YouTube and the reason for that is
|
||||
because they need to transfer the media from audio into a video format.
|
||||
So that channel is the unofficial linux in those channels which is actually cool.
|
||||
I left them an old to see how you're doing that because it will be really cool if we
|
||||
go to officially do this for all the HPR shows.
|
||||
And also they need to highlight that it's creative comments content, but that's a by-the-bye.
|
||||
So they have 10 subscribers and a total of 606 views.
|
||||
Now given the release above 57 episodes, 10 views is seems correct, but don't forget
|
||||
that we need to subtract one from the hacker public radio site or the internet archive
|
||||
because it was downloaded from there in the first place in order to be converted to video.
|
||||
So therefore, the claim all between 5,000 and 10,000 listeners per episode is not correct.
|
||||
Simply because there's no syndication going on to speak of.
|
||||
Elephant in the room.
|
||||
And now we need to address the elephant in the room.
|
||||
But if I take a look at archive or if we take a look at an archive of an archive,
|
||||
for the last one year and almost one year and a half,
|
||||
we clock in on average between 1,500 and 1,500 listeners.
|
||||
Given the fact that we have launched this podcast,
|
||||
short of two and a half years ago, that's quite amazing.
|
||||
A figure of 1,278 total downloads for the latest show is an amazing achievement.
|
||||
Seriously, any podcast in the Linux space would be proud to have that.
|
||||
What's even more amazing though is that they managed to garner 2,190 downloads for the very first show
|
||||
because it's very difficult for new shows to get noticed.
|
||||
It takes a time to build your audience and that can be seen with the grumpy old coders, for example.
|
||||
They did an interview in HPR2388, which is Linux in laws season 1, episode 28, the grumpy old coders.
|
||||
And they reported their downloads figures as, and I quote,
|
||||
about 200 listeners across all episodes,
|
||||
which they seem to agree was about right for podcasts of their type.
|
||||
And I know regular listeners' podcasts would kind of agree with that.
|
||||
That seems to be the norm.
|
||||
Now, having listened back to that entire episode again,
|
||||
it was clear that the guests from the grumpy old coders believe
|
||||
that Hacker Public Radio is a podcast hosting platform.
|
||||
One that operates like Spotify, Apple Podcasts, or Google Podcasts,
|
||||
where each show has to build their own new audiences.
|
||||
At this point, neither Chris nor Martin explained that Hacker Public Radio is not a podcast hosting platform,
|
||||
but it is a podcast in and of itself,
|
||||
one where the fixed RSS feed is used
|
||||
by a rotating team of volunteers hosts.
|
||||
Now, the Linux in laws may well believe that Hacker Public Radio
|
||||
is a podcast hosting platform,
|
||||
and that all the traffic is driven by their Linux in laws RSS feed coming from their own website.
|
||||
Show me the stats.
|
||||
So are the listeners to the Linux in laws podcast just Linux in laws listeners?
|
||||
Or are they actually Hacker Public Radio listeners?
|
||||
So let's compare the download numbers for the Linux in laws episodes
|
||||
to the download numbers of the Hacker Public Radio episodes
|
||||
that were released in the previous week to that.
|
||||
We're going to look at their first episode,
|
||||
which picked up 2,190 downloads in total since its release.
|
||||
But on the first year of release,
|
||||
it was downloaded 9,998 times.
|
||||
So that's not bad.
|
||||
And if we look at the shows,
|
||||
four last, the 10 shows before that,
|
||||
they downloaded 910, 940, 947, 968, 971,
|
||||
you get the idea.
|
||||
Their latest show had a first day figure of 753,
|
||||
and the show released before that was 726, 722, 732, 774.
|
||||
So the first day of release numbers for the first show was about 56 more
|
||||
than that average HPR episode released around the same time.
|
||||
And the additional downloads are common enough when a new host joins.
|
||||
The first day release number for their latest show is five downloads above average
|
||||
for the other Hacker Public Radio shows released the same week.
|
||||
Now in the graph, I plotted all the Hacker Public Radio downloads that I know of,
|
||||
and I highlighted the Linux in-laws ones on that.
|
||||
And things you should be aware is that every single dot is there,
|
||||
is not without cost.
|
||||
There is a charge for storing it, and there's a charge for transferring it.
|
||||
And it's provided to us entirely for our hosting provider,
|
||||
and Anastota.com, and the volunteer project, the Internet Archive,
|
||||
both of which have donated turbines of storage and data to use,
|
||||
data transfer to us for free links,
|
||||
and how you can support both of those organizations are in the show notes.
|
||||
So looking at the graph, you can say that their shows are popular,
|
||||
but you can say that they're any more popular than any other shows around the same time.
|
||||
And what else can we derive from the chart?
|
||||
Well, you can derive that if you want a plot count of something against a date,
|
||||
in the Libra Office, you need to make sure that the dates are recognized as dates and a text,
|
||||
and then you need to plot using a script or plot.
|
||||
And thanks to AW35AWaf5A,
|
||||
I think the show notes, you can group data by year and month,
|
||||
so you've got a whole go of days and you want to consolidate them down by year and month.
|
||||
You can create a pivot table using data pivot tables and start to edit,
|
||||
with the days column in the raw field.
|
||||
So you drag the day column into the raw fields and the sum into the data field.
|
||||
And then close that, you click any cell in the pivot table,
|
||||
date usually in the first column,
|
||||
and you go to data, group and outline group,
|
||||
and then in the section group, by select intervals,
|
||||
you click both month and years.
|
||||
And then you can plot those as summaries.
|
||||
It's really, really quite nice.
|
||||
Accurate download numbers.
|
||||
We can actually determine which downloads drive from the Linux and Rows brand
|
||||
and those from the HPR community.
|
||||
And this is due to the fact that the Linux in-laws or SS feed
|
||||
includes shows soon after they're published to the internet archive.
|
||||
While the hacker public radio,
|
||||
we release shows on a per schedule basis
|
||||
and they only get released on their release date.
|
||||
So as explained earlier, the main HPR or SS feed
|
||||
will never release shows that are scheduled for a future release.
|
||||
While the hacker public radio or future feed
|
||||
only ever serves shows from the hacker public radio website itself.
|
||||
And you can check this yourself.
|
||||
If you look at the Linux in-laws feed from their own web page,
|
||||
you see that they use hacker public radio.org for slash apps
|
||||
and then the file that they want.
|
||||
Whereas the hacker public radio of future feeds
|
||||
use hacker public radio.org for slash local.
|
||||
So therefore, if you go to the download statistics
|
||||
for Linux in-laws shows on the internet archive link
|
||||
in the show notes,
|
||||
shows that are listed before the hacker public radio day
|
||||
of release can only have come from the Linux in-laws feed.
|
||||
And the link is screenshots in the show notes.
|
||||
So for the four future shows,
|
||||
they have 18 downloads,
|
||||
112, 115, and 112 downloads respectively.
|
||||
So on average, that puts them around 98 downloads per show.
|
||||
So we can say that together with their YouTube subscribers,
|
||||
their show has 107 downloads
|
||||
before hacker public radio subscribers joining the party.
|
||||
Now, we said the Grumpy old court has said
|
||||
that they had about 200 listeners,
|
||||
but they had a caveat that that was spread across all the episodes.
|
||||
So it's not a per episode counts.
|
||||
So they seem to match.
|
||||
So is that the final answer?
|
||||
Fun fact, no.
|
||||
Because they do get between 1,269 and 2,240 listeners
|
||||
per show.
|
||||
So many hacker public radio subscribers
|
||||
listen to their episode.
|
||||
A number of those subscribers would also
|
||||
to the Linux in-laws,
|
||||
but don't because they're already getting the shows
|
||||
via the hacker public radio podcast.
|
||||
On the other hand,
|
||||
we don't see that these shows consistently
|
||||
get 107 more downloads than other shows.
|
||||
So you could argue that some hacker public radio subscribers
|
||||
don't listen to them,
|
||||
and so would not subscribe.
|
||||
Pick a number, any number, between 18 and 271 billion.
|
||||
I still maintain the processing log files,
|
||||
filtering them out, figuring out what's happening
|
||||
is a complete waste of time.
|
||||
You never get a clear answer,
|
||||
and the answers can be manipulated
|
||||
to get whatever results you want.
|
||||
And we don't have advertisers.
|
||||
We don't need to reduce numbers
|
||||
to make advertisers feel better
|
||||
that they're hitting their target downloads figures.
|
||||
In theory, hosts may find it valuable to see
|
||||
which shows is most popular and focus on those,
|
||||
but in practice, there's so much variability
|
||||
that nothing can be derived from the figures.
|
||||
All the information that I want to know can be plotted.
|
||||
How many people actually listen to the show,
|
||||
and how many people were held by it?
|
||||
That stuff you can't get from statistics.
|
||||
The only way you're gonna get that
|
||||
is if people leave feedback.
|
||||
And when they do, they turn from being listeners
|
||||
into community members.
|
||||
Summary.
|
||||
All right, closing off.
|
||||
I want to explain that the purpose of the show
|
||||
was not to criticize the links in those far from it.
|
||||
This was intended to correct information provided by them.
|
||||
They bring a wide and varied selection of content,
|
||||
Tiger Public Radio, and it's very welcome
|
||||
and it's indeed very popular.
|
||||
Their numbers, and indeed your numbers,
|
||||
if you become a HDR host,
|
||||
are very impressive in their own right.
|
||||
Each day, your show will be heard by as many people
|
||||
as can squeeze into the Janssen room.
|
||||
It had foster them.
|
||||
For those who haven't been fostered them,
|
||||
that's about two of those big double liquor air buses.
|
||||
And every month, we have around 33 and a half
|
||||
thousand downloads.
|
||||
And again, to put that into perspective,
|
||||
that's about 40 of those huge airplanes.
|
||||
But remember, the key takeaway from this show is
|
||||
who should get credit for hosting our shows?
|
||||
Sinticated websites are essentially monetizing
|
||||
HPR content.
|
||||
They're not marrying the media,
|
||||
and we're absolutely fine with that.
|
||||
That's because our shows are released
|
||||
under a Creative Commons attribution,
|
||||
share live, 3.0 imported license.
|
||||
There is absolutely no requirement or obligation
|
||||
to share the spoils with us.
|
||||
Another key takeaway is that our hosting
|
||||
is entirely free of charge to us.
|
||||
So while the podcast hosting platforms
|
||||
actually host a whopping 605 kilobytes,
|
||||
our hosting providers announce host.com
|
||||
and the volunteer project at the internet archive
|
||||
donates terabytes of storage for us to use for free.
|
||||
Not just that, but also the shoulder,
|
||||
the huge cost of transferring data
|
||||
through expensive carrier backbone infrastructure.
|
||||
The people to thank our own Josh Knapp
|
||||
from an honest host.com who provides
|
||||
the Hercopublic Radio website.
|
||||
And the internet archive, who are a digital library,
|
||||
whose state of mission is universal access
|
||||
to all knowledge.
|
||||
And they provide hosting for the media.
|
||||
Links to how you can support
|
||||
those very worthwhile projects are in the show notes.
|
||||
That's it.
|
||||
Tune in again tomorrow for another exciting episode
|
||||
of Hercopublic Radio.
|
||||
You have been listening to Hercopublic Radio
|
||||
at HercopublicRadio.org.
|
||||
Today's show was contributed
|
||||
by a HBR listener like yourself.
|
||||
If you ever thought of recording podcasts,
|
||||
you click on our contribute link
|
||||
to find out how easy it really is.
|
||||
Hosting for HBR has been kindly provided by
|
||||
an honest host.com, the internet archive, and our sync.net.
|
||||
On the Sadois stages, today's show is released
|
||||
under Creative Commons,
|
||||
Attribution 4.0 International License.
|
||||
Reference in New Issue
Block a user