Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
472
hpr_transcripts/hpr2211.txt
Normal file
472
hpr_transcripts/hpr2211.txt
Normal file
@@ -0,0 +1,472 @@
|
||||
Episode: 2211
|
||||
Title: HPR2211: My podcast workflow
|
||||
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2211/hpr2211.mp3
|
||||
Transcribed: 2025-10-18 15:47:38
|
||||
|
||||
---
|
||||
|
||||
This is HPR episode 2211 entitled My Podcast Workflow.
|
||||
It is hosted by Dave Morris and in about 26 minutes long and Karim and exquisite flag.
|
||||
The summary is how I download, manage, listen to and delete podcasts.
|
||||
This episode of HPR is brought to you by AnanasThost.com.
|
||||
Get 15% discount on all shared hosting with the offer code HPR15.
|
||||
That's HPR15.
|
||||
Better web hosting that's honest and fair at AnanasThost.com.
|
||||
Hello everybody. Welcome to Hacker Public Radio. My name is Dave Morris.
|
||||
Today I've got an episode which I've entitled My Podcast Workflow.
|
||||
Probably like most people who are listening to this.
|
||||
I've been listening to podcasts for quite some time.
|
||||
In my case I started in 2005 and that was when I bought my first MP3 player.
|
||||
And over that time I've used various podcasts, downloads or podcasts as people call them.
|
||||
And a lot of them existed and I've tried several of them.
|
||||
But now I use a script based on Bash Potter which I've rewritten and built up to meet my needs.
|
||||
I also use a database to hold details of the feeds that I subscribe to.
|
||||
And it also holds what episodes have been downloaded and what's on a player to be listened to and what can be deleted and all of that sort of thing.
|
||||
I've written scripts in Bash, Pearl and Python to manage all of this.
|
||||
So I'm going to be describing some details of the workflow.
|
||||
But I'm not going to go into specific details about scripts and details of methods and so forth.
|
||||
I was prompted actually to put this show together in 2016.
|
||||
And I'd heard a show produced by Fokey, show number 1992.
|
||||
How I'm handling my podcast subscriptions and listening.
|
||||
And this was April of 2016.
|
||||
And I thought it was a really interesting episode and I thought I must try and write something along the similar sort of lines to that.
|
||||
I think it's always interesting to hear how other people do this sort of thing.
|
||||
So thought a contribution might be a good thing.
|
||||
But I'm embarrassed to say that I started this in April 2016.
|
||||
And somehow it's been lurking in the background ever since.
|
||||
And this is January 2017 that I'm recording this.
|
||||
So it's been waiting a while.
|
||||
I thought it would be interesting to describe what a podcast feed actually is.
|
||||
Sure most people have used them.
|
||||
That's how you're listening to this.
|
||||
But most likely anyway.
|
||||
It's defined by an XML file and there are two main formats which are called RSS and Atem.
|
||||
And I've linked to details about these.
|
||||
I won't go into details myself to what they mean and what they come from.
|
||||
If you're interested you can find out lots of information.
|
||||
Basically they both consist of a list of structured items where an item is a distinct thing.
|
||||
And each item can contain a link to a multimedia file or so-called enclosure.
|
||||
And it's the enclosure that makes it a podcast.
|
||||
There are other sorts of RSS and Atem feeds which are not podcast feeds.
|
||||
And it's the enclosure that makes it one.
|
||||
There's a Wikipedia article I've linked to talking about podcasts which you might find interesting if you want to dig deeper.
|
||||
So the way which a feed's intended to be used is that when something new has been released,
|
||||
new podcasts has been released, a new episode in the case of HPR,
|
||||
the feed is updated to reflect the change.
|
||||
And then pod catchers are monitoring.
|
||||
So they probably, you might be running something that looks every hour or once a day
|
||||
or something like that.
|
||||
And it will go and look at a given feed to see if there's anything new.
|
||||
And it usually does that by scanning through all of the enclosures,
|
||||
all of the items in the list and checking to see whether it's already downloaded things.
|
||||
If it finds something new then it will download it and there are all sorts of complications
|
||||
as to how many podcasts it'll download at a time and so on and so forth.
|
||||
But the point of it is that the pod catcher keeps a record of what it's already downloaded.
|
||||
Now in the early days of podcasts and pod catchers,
|
||||
just saving the URL of the enclosure was enough because that was pretty much a unique item
|
||||
and unique thing that identified that podcast.
|
||||
Now it's not so much the case.
|
||||
But I think it was always designed that there was a unique identifier associated with each enclosure
|
||||
and RSS and Atom certainly contain them.
|
||||
So this acts as a label which can be stored to say I've seen this one already
|
||||
and thereby void duplicate downloads.
|
||||
So looking at my workflow, I'm using a rewritten version of a venerable bash potter
|
||||
which was a bash script written by Link Fessenden of the Linux Link Tech Show.
|
||||
People say he's the link off the link in the title, I don't think so.
|
||||
Anyway, he wrote this rather elegant piece of bash to do this job.
|
||||
But it has its limitations.
|
||||
I rewrote this.
|
||||
He based his around using the XSLT capabilities.
|
||||
There's a parser which you can use which whose name I've certainly forgot.
|
||||
XSLT pars.
|
||||
I can't remember its name but sure you'll find it.
|
||||
Maybe to make sure I put it in the notes.
|
||||
But what this does, it's a method of parsing an XML file.
|
||||
And he had written a thing called parsinclosure.excel
|
||||
which is used to parse the enclosures out of an RSS feed.
|
||||
But since he did that, a lot of other types of feeds have popped up
|
||||
which include an Atom and he hadn't catered for that.
|
||||
So I modified it to include Atom.
|
||||
I also added another one which I call parsID.excel
|
||||
which is quite capable of parsing out the ID strings from feed.
|
||||
And that's the thing I just mentioned about a unique tag per episode.
|
||||
I've included both of these in my, along with my notes.
|
||||
And I should say, I always forget to say this.
|
||||
I'm sure you've worked out yourself that there are long notes
|
||||
that I'm currently reading effectively.
|
||||
But they're there for you to refer to if you find it interesting.
|
||||
One of the drawbacks of Bash Potter add also my version of it
|
||||
is that it can't deal with feeds where the enclosure URL
|
||||
doesn't show the actual download.
|
||||
So in the early days then the enclosure simply consisted of a URL
|
||||
pointing to the audio file itself.
|
||||
So an MP3 or an org or whatever it was.
|
||||
In latter times where there are lots of intermediaries
|
||||
that serve up the audio for podcasts.
|
||||
In many cases the URL that's in the enclosure doesn't actually point to the,
|
||||
point directly to the audio.
|
||||
So if you download it with something like WGET or curl,
|
||||
then you get the end result.
|
||||
But if you're trying to work out things like where the file is,
|
||||
what file is going to be generated as a consequence,
|
||||
it's very hard to do.
|
||||
I haven't quite got a solution for this.
|
||||
These things are popping up more and more.
|
||||
I don't have a complete solution to this yet.
|
||||
Charles in NJ did a show 1935 called Quick Back Bash Potter Fix
|
||||
where he talks about something which is similar,
|
||||
possibly the same as this problem.
|
||||
Anyway back to what I do, I run this modified Bash Potter
|
||||
on one of my Raspberry Pi's once a day and it runs during the night.
|
||||
I originally did this because I had a slow ADSL connection.
|
||||
It's got faster now and I also had a download limit.
|
||||
And what I found was that if I ran the downloads during the day,
|
||||
it collided with what my kids were doing when they were doing stuff.
|
||||
But that's really not relevant anymore,
|
||||
because both my kids have gotten away to uni or whatever.
|
||||
But I still do the same thing,
|
||||
downloads at two in the morning, UK time.
|
||||
It doesn't really matter.
|
||||
It downloads to a directory on the Pi.
|
||||
I've got a disk attached to it.
|
||||
And I export that directory with NFS,
|
||||
so I can see it from other systems in the house.
|
||||
So let's talk about the database briefly.
|
||||
I use a database to hold the feed details
|
||||
and also details of what I've downloaded.
|
||||
And the reason I did this originally was
|
||||
I'm interested in databases and want to learn how to use them.
|
||||
I chose Postgres, Postgres QL.
|
||||
It's the way it's written.
|
||||
Because it's very feature rich and powerful.
|
||||
And the timer first started using it was vastly more powerful than mySQL.
|
||||
Still is quite a lot more so, but mySQL is caught up a bit.
|
||||
And I was using Postgres at work around the time
|
||||
I started doing this in 2005 or so.
|
||||
So it was useful to have a home project as well.
|
||||
I want to be able to generate all sorts of reports
|
||||
from the database and to perform actions based on its content.
|
||||
So the way I've set it up is that the database runs on my workstation,
|
||||
which is a thing I turn off at night,
|
||||
and rather than running on the server.
|
||||
That's maybe a decision I want to review in due course.
|
||||
The design, as I've said in my notes, is sort of bolted on.
|
||||
It's a bolted on database.
|
||||
You know, it's not integrated properly.
|
||||
The bash, the bash podder clone downloads podcasts every day
|
||||
and stores them in a directory.
|
||||
It does it based on the date.
|
||||
So every day you get a new directory containing today's downloads.
|
||||
Then the original original model was that a playlist would be generated for each day.
|
||||
I don't do that anymore.
|
||||
So what I do is I use the thing that scans what's been downloaded
|
||||
and it puts data into the database.
|
||||
I've said that really if you were going to do something like this,
|
||||
it would be better to have database and pod catcher for the integrated.
|
||||
I didn't do this because I started off with the original bash podder
|
||||
and added the database on as an add-on, as a bolt-on.
|
||||
But it would be wiser to do it that way.
|
||||
So I have a thing that runs every morning that looks at the nice downloads.
|
||||
As I said, and it updates the database.
|
||||
And I want to to eventually integrate the two.
|
||||
In the database, I have a bunch of tables.
|
||||
There's more than I've listed here and I'm not going into detail.
|
||||
There's a feed table that contains all the feeds, like the title of the feed in its URL.
|
||||
I also added a classification element to it.
|
||||
So I like to group my feeds into the classes like science or documentary.
|
||||
So I can work with them separately.
|
||||
There's a table of episodes which contains the information about each episode
|
||||
that it's got from the feed.
|
||||
It contains the title of the episode, the URL of the media.
|
||||
It points to where the downloaded episode is on disk.
|
||||
And it links to the feed, obviously.
|
||||
There's a group table which contains a definition of all the groups
|
||||
that I mentioned, like comedy, music or whatever.
|
||||
These are just things I've classified.
|
||||
There's a table of players.
|
||||
And I've got a fair number of them, and I even did a show about this in 2014.
|
||||
I bought one or two more since then, actually.
|
||||
So I index all my players out of the database.
|
||||
I keep playlists in the database.
|
||||
And these are also stored on the players.
|
||||
But I'll get onto that in a minute.
|
||||
So I wanted to speak briefly about audio tags.
|
||||
Many podcasters, people generating the audio,
|
||||
they do a great job of adding metadata for their episodes.
|
||||
It's really important to do that.
|
||||
HPR goes to a lot of trouble to make sure it's got good metadata.
|
||||
And it was one of the criteria in the podcast awards
|
||||
that we were nominated for last year.
|
||||
If you don't have metadata in your episodes,
|
||||
then you tend to be downgraded as a consequence.
|
||||
Anyway, all of the players I use use rockbox.
|
||||
And they can display metadata tags as I, as I deem appropriate.
|
||||
So it's good to be able to see what's playing now
|
||||
and what's coming up next, which you can configure.
|
||||
And I also like to check out tags when I'm managing my episodes.
|
||||
So I can display more information on my workstation,
|
||||
for example, thing.
|
||||
The episode I'm currently listening to has got quite long notes associated
|
||||
with it. I can display them because they're in the tags.
|
||||
However, a lot of podcast episodes these days have quite poor,
|
||||
or even nonexistent tags.
|
||||
Quite a few recently that I've subscribed to, feeds have subscribed to,
|
||||
which don't have tags at all, which I find very, very strange.
|
||||
So I wanted to, when I saw this and saw that tags I was not happy with,
|
||||
I wanted to write something in which would improve them.
|
||||
I know there are plenty of tools out there to do that,
|
||||
but I felt I wanted something that I could build into scripts.
|
||||
So it needed to have a command line interface
|
||||
and most of these things tend to be GUI-based.
|
||||
I wrote something called Fixed Tags,
|
||||
which has actually been used to manage tags on HBR episodes quite some time.
|
||||
It runs on the HPR server.
|
||||
It's available on GitHub, I put a link to it.
|
||||
It's written in Pearl, and it has some issues about it,
|
||||
because the modules it uses are sort of obsolescent,
|
||||
which is quite surprising, but Pearl is gradually falling into a state of disrepair
|
||||
due to waning interest, unfortunately.
|
||||
I also wrote another tool based around the concept in Fixed Tags,
|
||||
which I could run daily to manage tags.
|
||||
This thing is called Tag Manager,
|
||||
and it works on the principle of scanning through all of the podcast episodes
|
||||
that are on disk, and it applies rules tag rules to them.
|
||||
So there are rules like, if there is no title tag,
|
||||
add one from the title field of the item in the feed.
|
||||
So the idea here was that some people don't bother to put a title in there in their metadata,
|
||||
and that bothers me a lot.
|
||||
I don't want to be seeing blank audio files popping up on my player.
|
||||
So because the feed itself needs to contain a title per enclosure,
|
||||
or at least between per item, I store that away in the database
|
||||
and I store away a few other fields as well.
|
||||
And I can write rules that say, like I just said,
|
||||
if there's no title tag,
|
||||
go and look in the database where there will be a title field from the feed
|
||||
and put that in instead.
|
||||
So I came up with a rules format to do this,
|
||||
based around a well-known format of configuration file.
|
||||
There's a per module, which is called config general, which I'm using,
|
||||
which uses a format similar to what you find in a patchy configuration file.
|
||||
It's fine, but it's got quite a lot of limitations.
|
||||
So the rules I came up with tend to be rather ugly,
|
||||
because I'm trying to build a lot more into it than the format really caters for.
|
||||
I put an example of how I deal with a particular feed,
|
||||
the BBC Elements podcast, which is very good.
|
||||
It's finished now, but I think you can still download the episodes.
|
||||
It talks about all the elements in the periodic table,
|
||||
which sort of stuff I love.
|
||||
Anyway, I put the rules in there and I just do things like,
|
||||
in sort of title, if there isn't one,
|
||||
if there's no comment, use the description out of the feed.
|
||||
And I also fiddle with the title to add the name elements to the front of it.
|
||||
So it's quite complex, and I won't go into details of it.
|
||||
It uses pearl regular expressions to do its stuff,
|
||||
and it works fine.
|
||||
But it's ugly.
|
||||
And I'd like to rewrite it in due course.
|
||||
I'd like to come up with my own language, rules, language,
|
||||
config file format, whatever you like to call it.
|
||||
But that's a project for later.
|
||||
Anyway, I write episodes to players, surprise, surprise.
|
||||
Now, I'm old school, right?
|
||||
Don't listen to very many episodes on my smartphone.
|
||||
I do have a smartphone.
|
||||
I currently got a OnePlus 1, which to me is huge.
|
||||
And I don't really want to be lugging that around to listen to stuff.
|
||||
I do occasionally do that.
|
||||
I certainly listen to podcasts in my car
|
||||
by connecting the Bluetooth to my car stereo.
|
||||
So, yeah.
|
||||
So that's independent of this,
|
||||
and I use an antenna pod on my phone to download stuff.
|
||||
However, mostly I'm listening to stuff on MP3 players.
|
||||
And I've written stuff to copy episodes to a given player.
|
||||
So, the way I work is I load up a player,
|
||||
listen to everything on it,
|
||||
and then refill it when it's finished.
|
||||
I usually write the podcast episodes in groups,
|
||||
so I might load a particular player with groups
|
||||
like business and comedy and documentary, etc, etc.
|
||||
and then listen to them in sequence.
|
||||
As episodes are written to a player,
|
||||
their status is updated in the database,
|
||||
so it's marked that they're on the given player.
|
||||
And there's a playlist which is also written to the player.
|
||||
Rockbox can work from a predefined playlist,
|
||||
so you can upload a playlist to it,
|
||||
which is in M3U format.
|
||||
You just need to be careful about the paths
|
||||
to the individual file.
|
||||
And so I upload that,
|
||||
and that's how I just tell Rockbox
|
||||
to use that playlist and off it goes.
|
||||
The way I delete stuff is that I run a script on my workstation
|
||||
to whenever a new episode comes up in the list.
|
||||
I mark that episode as being listened to
|
||||
through a script to that marks in the database.
|
||||
And then when I finished it,
|
||||
I simply run a script again to go through the list of episodes
|
||||
in which are being listened to,
|
||||
because I might have several players on the go at once.
|
||||
I'm not listening to all simultaneously,
|
||||
but when one needs charging,
|
||||
then I'd switch over to another one.
|
||||
And the deletion script looks for things in the being listened to state
|
||||
and says which of these can I delete?
|
||||
So I just say,
|
||||
I listen to that, delete it, delete it, and so on.
|
||||
So I make sure that they're actually deleted from my disk,
|
||||
a disk on the Raspberry Pi, actually,
|
||||
as soon as they have been listened to.
|
||||
I'd never bothered to delete them off the player.
|
||||
I simply overwrite them when I next load the player up.
|
||||
So there's a bunch of other tools that I've developed
|
||||
for generating reports and so on and dealing with issues.
|
||||
And as I've sort of mentioned,
|
||||
there's a feed viewer,
|
||||
which I can check details of a feed,
|
||||
or of a group of feeds, or of a list of feeds, or whatever.
|
||||
And it can also summarize all of the downloaded episodes
|
||||
belonging to a feed,
|
||||
and it generates reports in a variety of formats.
|
||||
I used it to generate the notes for two HPR shows,
|
||||
which I referred to here,
|
||||
shows I did on the podcast feed I'm listening to.
|
||||
And I was able to generate HTML at the back end of this thing.
|
||||
So as always, I tend to over-engine it.
|
||||
But that's what it muses me.
|
||||
I've got a tool for subscribing to a new feed,
|
||||
not too surprisingly,
|
||||
and that's the point at which I assign it to a group,
|
||||
and then I decide,
|
||||
because the feed that I'm newly subscribing to
|
||||
will already have a whole bunch of episodes in it.
|
||||
I can at that point say,
|
||||
I want to get the latest five or ten,
|
||||
or something, or none at all.
|
||||
Just wait for new ones to come out.
|
||||
I can do that at the point at which I subscribe.
|
||||
And obviously I've got the reverse tool,
|
||||
which allow me to cancel the subscription.
|
||||
I store the feed details in an archive,
|
||||
and I add notes to that,
|
||||
as I'm deleting them to say,
|
||||
why I deleted this podcast is boring,
|
||||
or whatever it is.
|
||||
And that way I can always look back and say,
|
||||
oh, I did actually listen to that,
|
||||
and I hated it.
|
||||
This is why.
|
||||
I've also been known to re-subscribe to a feed
|
||||
that I've forgotten,
|
||||
and I've listened to it.
|
||||
And it's during the subscription,
|
||||
I get a prompt that says,
|
||||
hey, you've listened to this before
|
||||
and you didn't like it because of this.
|
||||
So it acts as the memory I don't always have.
|
||||
So let's get to the conclusions then.
|
||||
I have been doing this for quite a long time.
|
||||
I seem to have actually started building this stuff in 2011,
|
||||
although I started listening in 2005.
|
||||
And I kept a journal of what I was doing,
|
||||
which I tend to do with projects.
|
||||
That's a file of the formatted text.
|
||||
And I noticed, as I was preparing this,
|
||||
it's got more than 8,000 lines of notes in there
|
||||
about what I've been doing.
|
||||
So it goes to proof.
|
||||
I've been doing quite a lot of work on this over the years.
|
||||
So what's good about it?
|
||||
Well, it's mine.
|
||||
It's originally inspired by Bash Potter,
|
||||
but the current script is completely a complete rewrite.
|
||||
It works.
|
||||
It does all I want it to do.
|
||||
Now it doesn't need much effort to run and maintain.
|
||||
Along the way I've learned tons of stuff,
|
||||
understand XML and SSLT better.
|
||||
I understand RSS and Atom feeds better.
|
||||
I know a lot more about Bash scripting, still learning.
|
||||
And by the way, quite a lot of that I learned
|
||||
in trying to hack all this stuff together,
|
||||
like using Bash to interface to a database, for example,
|
||||
which is a loony thing to do.
|
||||
Anyway, all of that I've used to make shows.
|
||||
That's it.
|
||||
Things I've discovered,
|
||||
weird things to do with Bash.
|
||||
And I've done HPR shows about.
|
||||
I've learned quite a lot more about Postgres
|
||||
and databases in general.
|
||||
And I understand quite a lot more about audio tags
|
||||
and the taglib library that I used to work on tags in Perl.
|
||||
And a little bit in Python.
|
||||
So it does, the scheme I'm using does have quite a lot of good ideas
|
||||
about how to deal with podcasts.
|
||||
I think it good ideas anyway.
|
||||
But podcast feeds and episodes,
|
||||
though in many cases they're not very well implemented.
|
||||
So what's bad then?
|
||||
It's clunky, badly designed.
|
||||
It's the result of hacks on top of hacks.
|
||||
It's really an alpha version of what it should be,
|
||||
what I wanted it to be.
|
||||
And it's one of those cases where you think,
|
||||
well okay, I've learned some stuff.
|
||||
And now I'm going to throw it away and start again.
|
||||
I'm just reluctant to do that or have been.
|
||||
It's not sufficiently resilient to issues with feeds
|
||||
and the bad practices you find in feeds.
|
||||
For example, BBC have this weird habit of releasing an episode.
|
||||
I think they automate it actually.
|
||||
Then they re-release it, re-release it a few days later.
|
||||
And it's often I think because somebody has checked
|
||||
what was generated by the automation,
|
||||
and found that it's truncated a bit
|
||||
or it's added, it's left some junk at the start or something.
|
||||
And they edit it and then they re-release it.
|
||||
But what they seem to do is they seem to release it
|
||||
with as if it's a brand new episode.
|
||||
So they don't keep the same idea, which is what you should do.
|
||||
They re-release it with the different URL,
|
||||
which is fair enough, but in such a way that you get a duplicate.
|
||||
Now other podcasts deal with this better than mine does.
|
||||
I think because they don't use additional information
|
||||
like hashtags that they have generated themselves,
|
||||
not hashtags, hashes at MD5 hashes or that give a better way of identifying.
|
||||
Another thing that's bad is not easy to extend.
|
||||
The current business of obscuring podcasts
|
||||
behind strange URLs that you then have to dig down through
|
||||
to find the actual name that has thrown everything in a loop.
|
||||
Whereas a lot of other podcatchers have dealt with this
|
||||
through better design, I think.
|
||||
And the last point is it's completely incapable of being shared.
|
||||
I'd have liked to have offered this to the world in large,
|
||||
but in its current incarnation it's absolutely not something
|
||||
anybody else would want.
|
||||
It's very much an alpha thing and it's hugely hacky.
|
||||
And you know, you'd an idiosyncratic and strange.
|
||||
So nobody else would want it as it stands at the moment.
|
||||
So it's a mixed thing.
|
||||
Anyway, I thought I'd share some of the details of it
|
||||
if you want to know more than ask me, but I won't be...
|
||||
I don't plan to do any more about this, because as I say,
|
||||
it's too weird and idiosyncratic.
|
||||
Okay, that's it then. Bye now.
|
||||
You've been listening to HecopobliGradio at HecopobliGradio.org.
|
||||
We are a community podcast network that releases shows
|
||||
every weekday Monday through Friday.
|
||||
Today's show, like all our shows, was contributed by an HPR listener
|
||||
like yourself.
|
||||
If you ever thought of recording a podcast, then click on our
|
||||
contributing to find out how easy it really is.
|
||||
HecopobliGradio was founded by the digital dog pound
|
||||
and the Infonomicon Computer Club,
|
||||
and is part of the binary revolution at binrev.com.
|
||||
If you have comments on today's show,
|
||||
please email the host directly, leave a comment on the website
|
||||
or record a follow-up episode yourself.
|
||||
Unless otherwise stated, today's show is released
|
||||
under Creative Commons' Attribution,
|
||||
ShareLive3.0 LiveSense.
|
||||
Reference in New Issue
Block a user