- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
473 lines
23 KiB
Plaintext
473 lines
23 KiB
Plaintext
Episode: 2211
|
|
Title: HPR2211: My podcast workflow
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2211/hpr2211.mp3
|
|
Transcribed: 2025-10-18 15:47:38
|
|
|
|
---
|
|
|
|
This is HPR episode 2211 entitled My Podcast Workflow.
|
|
It is hosted by Dave Morris and in about 26 minutes long and Karim and exquisite flag.
|
|
The summary is how I download, manage, listen to and delete podcasts.
|
|
This episode of HPR is brought to you by AnanasThost.com.
|
|
Get 15% discount on all shared hosting with the offer code HPR15.
|
|
That's HPR15.
|
|
Better web hosting that's honest and fair at AnanasThost.com.
|
|
Hello everybody. Welcome to Hacker Public Radio. My name is Dave Morris.
|
|
Today I've got an episode which I've entitled My Podcast Workflow.
|
|
Probably like most people who are listening to this.
|
|
I've been listening to podcasts for quite some time.
|
|
In my case I started in 2005 and that was when I bought my first MP3 player.
|
|
And over that time I've used various podcasts, downloads or podcasts as people call them.
|
|
And a lot of them existed and I've tried several of them.
|
|
But now I use a script based on Bash Potter which I've rewritten and built up to meet my needs.
|
|
I also use a database to hold details of the feeds that I subscribe to.
|
|
And it also holds what episodes have been downloaded and what's on a player to be listened to and what can be deleted and all of that sort of thing.
|
|
I've written scripts in Bash, Pearl and Python to manage all of this.
|
|
So I'm going to be describing some details of the workflow.
|
|
But I'm not going to go into specific details about scripts and details of methods and so forth.
|
|
I was prompted actually to put this show together in 2016.
|
|
And I'd heard a show produced by Fokey, show number 1992.
|
|
How I'm handling my podcast subscriptions and listening.
|
|
And this was April of 2016.
|
|
And I thought it was a really interesting episode and I thought I must try and write something along the similar sort of lines to that.
|
|
I think it's always interesting to hear how other people do this sort of thing.
|
|
So thought a contribution might be a good thing.
|
|
But I'm embarrassed to say that I started this in April 2016.
|
|
And somehow it's been lurking in the background ever since.
|
|
And this is January 2017 that I'm recording this.
|
|
So it's been waiting a while.
|
|
I thought it would be interesting to describe what a podcast feed actually is.
|
|
Sure most people have used them.
|
|
That's how you're listening to this.
|
|
But most likely anyway.
|
|
It's defined by an XML file and there are two main formats which are called RSS and Atem.
|
|
And I've linked to details about these.
|
|
I won't go into details myself to what they mean and what they come from.
|
|
If you're interested you can find out lots of information.
|
|
Basically they both consist of a list of structured items where an item is a distinct thing.
|
|
And each item can contain a link to a multimedia file or so-called enclosure.
|
|
And it's the enclosure that makes it a podcast.
|
|
There are other sorts of RSS and Atem feeds which are not podcast feeds.
|
|
And it's the enclosure that makes it one.
|
|
There's a Wikipedia article I've linked to talking about podcasts which you might find interesting if you want to dig deeper.
|
|
So the way which a feed's intended to be used is that when something new has been released,
|
|
new podcasts has been released, a new episode in the case of HPR,
|
|
the feed is updated to reflect the change.
|
|
And then pod catchers are monitoring.
|
|
So they probably, you might be running something that looks every hour or once a day
|
|
or something like that.
|
|
And it will go and look at a given feed to see if there's anything new.
|
|
And it usually does that by scanning through all of the enclosures,
|
|
all of the items in the list and checking to see whether it's already downloaded things.
|
|
If it finds something new then it will download it and there are all sorts of complications
|
|
as to how many podcasts it'll download at a time and so on and so forth.
|
|
But the point of it is that the pod catcher keeps a record of what it's already downloaded.
|
|
Now in the early days of podcasts and pod catchers,
|
|
just saving the URL of the enclosure was enough because that was pretty much a unique item
|
|
and unique thing that identified that podcast.
|
|
Now it's not so much the case.
|
|
But I think it was always designed that there was a unique identifier associated with each enclosure
|
|
and RSS and Atom certainly contain them.
|
|
So this acts as a label which can be stored to say I've seen this one already
|
|
and thereby void duplicate downloads.
|
|
So looking at my workflow, I'm using a rewritten version of a venerable bash potter
|
|
which was a bash script written by Link Fessenden of the Linux Link Tech Show.
|
|
People say he's the link off the link in the title, I don't think so.
|
|
Anyway, he wrote this rather elegant piece of bash to do this job.
|
|
But it has its limitations.
|
|
I rewrote this.
|
|
He based his around using the XSLT capabilities.
|
|
There's a parser which you can use which whose name I've certainly forgot.
|
|
XSLT pars.
|
|
I can't remember its name but sure you'll find it.
|
|
Maybe to make sure I put it in the notes.
|
|
But what this does, it's a method of parsing an XML file.
|
|
And he had written a thing called parsinclosure.excel
|
|
which is used to parse the enclosures out of an RSS feed.
|
|
But since he did that, a lot of other types of feeds have popped up
|
|
which include an Atom and he hadn't catered for that.
|
|
So I modified it to include Atom.
|
|
I also added another one which I call parsID.excel
|
|
which is quite capable of parsing out the ID strings from feed.
|
|
And that's the thing I just mentioned about a unique tag per episode.
|
|
I've included both of these in my, along with my notes.
|
|
And I should say, I always forget to say this.
|
|
I'm sure you've worked out yourself that there are long notes
|
|
that I'm currently reading effectively.
|
|
But they're there for you to refer to if you find it interesting.
|
|
One of the drawbacks of Bash Potter add also my version of it
|
|
is that it can't deal with feeds where the enclosure URL
|
|
doesn't show the actual download.
|
|
So in the early days then the enclosure simply consisted of a URL
|
|
pointing to the audio file itself.
|
|
So an MP3 or an org or whatever it was.
|
|
In latter times where there are lots of intermediaries
|
|
that serve up the audio for podcasts.
|
|
In many cases the URL that's in the enclosure doesn't actually point to the,
|
|
point directly to the audio.
|
|
So if you download it with something like WGET or curl,
|
|
then you get the end result.
|
|
But if you're trying to work out things like where the file is,
|
|
what file is going to be generated as a consequence,
|
|
it's very hard to do.
|
|
I haven't quite got a solution for this.
|
|
These things are popping up more and more.
|
|
I don't have a complete solution to this yet.
|
|
Charles in NJ did a show 1935 called Quick Back Bash Potter Fix
|
|
where he talks about something which is similar,
|
|
possibly the same as this problem.
|
|
Anyway back to what I do, I run this modified Bash Potter
|
|
on one of my Raspberry Pi's once a day and it runs during the night.
|
|
I originally did this because I had a slow ADSL connection.
|
|
It's got faster now and I also had a download limit.
|
|
And what I found was that if I ran the downloads during the day,
|
|
it collided with what my kids were doing when they were doing stuff.
|
|
But that's really not relevant anymore,
|
|
because both my kids have gotten away to uni or whatever.
|
|
But I still do the same thing,
|
|
downloads at two in the morning, UK time.
|
|
It doesn't really matter.
|
|
It downloads to a directory on the Pi.
|
|
I've got a disk attached to it.
|
|
And I export that directory with NFS,
|
|
so I can see it from other systems in the house.
|
|
So let's talk about the database briefly.
|
|
I use a database to hold the feed details
|
|
and also details of what I've downloaded.
|
|
And the reason I did this originally was
|
|
I'm interested in databases and want to learn how to use them.
|
|
I chose Postgres, Postgres QL.
|
|
It's the way it's written.
|
|
Because it's very feature rich and powerful.
|
|
And the timer first started using it was vastly more powerful than mySQL.
|
|
Still is quite a lot more so, but mySQL is caught up a bit.
|
|
And I was using Postgres at work around the time
|
|
I started doing this in 2005 or so.
|
|
So it was useful to have a home project as well.
|
|
I want to be able to generate all sorts of reports
|
|
from the database and to perform actions based on its content.
|
|
So the way I've set it up is that the database runs on my workstation,
|
|
which is a thing I turn off at night,
|
|
and rather than running on the server.
|
|
That's maybe a decision I want to review in due course.
|
|
The design, as I've said in my notes, is sort of bolted on.
|
|
It's a bolted on database.
|
|
You know, it's not integrated properly.
|
|
The bash, the bash podder clone downloads podcasts every day
|
|
and stores them in a directory.
|
|
It does it based on the date.
|
|
So every day you get a new directory containing today's downloads.
|
|
Then the original original model was that a playlist would be generated for each day.
|
|
I don't do that anymore.
|
|
So what I do is I use the thing that scans what's been downloaded
|
|
and it puts data into the database.
|
|
I've said that really if you were going to do something like this,
|
|
it would be better to have database and pod catcher for the integrated.
|
|
I didn't do this because I started off with the original bash podder
|
|
and added the database on as an add-on, as a bolt-on.
|
|
But it would be wiser to do it that way.
|
|
So I have a thing that runs every morning that looks at the nice downloads.
|
|
As I said, and it updates the database.
|
|
And I want to to eventually integrate the two.
|
|
In the database, I have a bunch of tables.
|
|
There's more than I've listed here and I'm not going into detail.
|
|
There's a feed table that contains all the feeds, like the title of the feed in its URL.
|
|
I also added a classification element to it.
|
|
So I like to group my feeds into the classes like science or documentary.
|
|
So I can work with them separately.
|
|
There's a table of episodes which contains the information about each episode
|
|
that it's got from the feed.
|
|
It contains the title of the episode, the URL of the media.
|
|
It points to where the downloaded episode is on disk.
|
|
And it links to the feed, obviously.
|
|
There's a group table which contains a definition of all the groups
|
|
that I mentioned, like comedy, music or whatever.
|
|
These are just things I've classified.
|
|
There's a table of players.
|
|
And I've got a fair number of them, and I even did a show about this in 2014.
|
|
I bought one or two more since then, actually.
|
|
So I index all my players out of the database.
|
|
I keep playlists in the database.
|
|
And these are also stored on the players.
|
|
But I'll get onto that in a minute.
|
|
So I wanted to speak briefly about audio tags.
|
|
Many podcasters, people generating the audio,
|
|
they do a great job of adding metadata for their episodes.
|
|
It's really important to do that.
|
|
HPR goes to a lot of trouble to make sure it's got good metadata.
|
|
And it was one of the criteria in the podcast awards
|
|
that we were nominated for last year.
|
|
If you don't have metadata in your episodes,
|
|
then you tend to be downgraded as a consequence.
|
|
Anyway, all of the players I use use rockbox.
|
|
And they can display metadata tags as I, as I deem appropriate.
|
|
So it's good to be able to see what's playing now
|
|
and what's coming up next, which you can configure.
|
|
And I also like to check out tags when I'm managing my episodes.
|
|
So I can display more information on my workstation,
|
|
for example, thing.
|
|
The episode I'm currently listening to has got quite long notes associated
|
|
with it. I can display them because they're in the tags.
|
|
However, a lot of podcast episodes these days have quite poor,
|
|
or even nonexistent tags.
|
|
Quite a few recently that I've subscribed to, feeds have subscribed to,
|
|
which don't have tags at all, which I find very, very strange.
|
|
So I wanted to, when I saw this and saw that tags I was not happy with,
|
|
I wanted to write something in which would improve them.
|
|
I know there are plenty of tools out there to do that,
|
|
but I felt I wanted something that I could build into scripts.
|
|
So it needed to have a command line interface
|
|
and most of these things tend to be GUI-based.
|
|
I wrote something called Fixed Tags,
|
|
which has actually been used to manage tags on HBR episodes quite some time.
|
|
It runs on the HPR server.
|
|
It's available on GitHub, I put a link to it.
|
|
It's written in Pearl, and it has some issues about it,
|
|
because the modules it uses are sort of obsolescent,
|
|
which is quite surprising, but Pearl is gradually falling into a state of disrepair
|
|
due to waning interest, unfortunately.
|
|
I also wrote another tool based around the concept in Fixed Tags,
|
|
which I could run daily to manage tags.
|
|
This thing is called Tag Manager,
|
|
and it works on the principle of scanning through all of the podcast episodes
|
|
that are on disk, and it applies rules tag rules to them.
|
|
So there are rules like, if there is no title tag,
|
|
add one from the title field of the item in the feed.
|
|
So the idea here was that some people don't bother to put a title in there in their metadata,
|
|
and that bothers me a lot.
|
|
I don't want to be seeing blank audio files popping up on my player.
|
|
So because the feed itself needs to contain a title per enclosure,
|
|
or at least between per item, I store that away in the database
|
|
and I store away a few other fields as well.
|
|
And I can write rules that say, like I just said,
|
|
if there's no title tag,
|
|
go and look in the database where there will be a title field from the feed
|
|
and put that in instead.
|
|
So I came up with a rules format to do this,
|
|
based around a well-known format of configuration file.
|
|
There's a per module, which is called config general, which I'm using,
|
|
which uses a format similar to what you find in a patchy configuration file.
|
|
It's fine, but it's got quite a lot of limitations.
|
|
So the rules I came up with tend to be rather ugly,
|
|
because I'm trying to build a lot more into it than the format really caters for.
|
|
I put an example of how I deal with a particular feed,
|
|
the BBC Elements podcast, which is very good.
|
|
It's finished now, but I think you can still download the episodes.
|
|
It talks about all the elements in the periodic table,
|
|
which sort of stuff I love.
|
|
Anyway, I put the rules in there and I just do things like,
|
|
in sort of title, if there isn't one,
|
|
if there's no comment, use the description out of the feed.
|
|
And I also fiddle with the title to add the name elements to the front of it.
|
|
So it's quite complex, and I won't go into details of it.
|
|
It uses pearl regular expressions to do its stuff,
|
|
and it works fine.
|
|
But it's ugly.
|
|
And I'd like to rewrite it in due course.
|
|
I'd like to come up with my own language, rules, language,
|
|
config file format, whatever you like to call it.
|
|
But that's a project for later.
|
|
Anyway, I write episodes to players, surprise, surprise.
|
|
Now, I'm old school, right?
|
|
Don't listen to very many episodes on my smartphone.
|
|
I do have a smartphone.
|
|
I currently got a OnePlus 1, which to me is huge.
|
|
And I don't really want to be lugging that around to listen to stuff.
|
|
I do occasionally do that.
|
|
I certainly listen to podcasts in my car
|
|
by connecting the Bluetooth to my car stereo.
|
|
So, yeah.
|
|
So that's independent of this,
|
|
and I use an antenna pod on my phone to download stuff.
|
|
However, mostly I'm listening to stuff on MP3 players.
|
|
And I've written stuff to copy episodes to a given player.
|
|
So, the way I work is I load up a player,
|
|
listen to everything on it,
|
|
and then refill it when it's finished.
|
|
I usually write the podcast episodes in groups,
|
|
so I might load a particular player with groups
|
|
like business and comedy and documentary, etc, etc.
|
|
and then listen to them in sequence.
|
|
As episodes are written to a player,
|
|
their status is updated in the database,
|
|
so it's marked that they're on the given player.
|
|
And there's a playlist which is also written to the player.
|
|
Rockbox can work from a predefined playlist,
|
|
so you can upload a playlist to it,
|
|
which is in M3U format.
|
|
You just need to be careful about the paths
|
|
to the individual file.
|
|
And so I upload that,
|
|
and that's how I just tell Rockbox
|
|
to use that playlist and off it goes.
|
|
The way I delete stuff is that I run a script on my workstation
|
|
to whenever a new episode comes up in the list.
|
|
I mark that episode as being listened to
|
|
through a script to that marks in the database.
|
|
And then when I finished it,
|
|
I simply run a script again to go through the list of episodes
|
|
in which are being listened to,
|
|
because I might have several players on the go at once.
|
|
I'm not listening to all simultaneously,
|
|
but when one needs charging,
|
|
then I'd switch over to another one.
|
|
And the deletion script looks for things in the being listened to state
|
|
and says which of these can I delete?
|
|
So I just say,
|
|
I listen to that, delete it, delete it, and so on.
|
|
So I make sure that they're actually deleted from my disk,
|
|
a disk on the Raspberry Pi, actually,
|
|
as soon as they have been listened to.
|
|
I'd never bothered to delete them off the player.
|
|
I simply overwrite them when I next load the player up.
|
|
So there's a bunch of other tools that I've developed
|
|
for generating reports and so on and dealing with issues.
|
|
And as I've sort of mentioned,
|
|
there's a feed viewer,
|
|
which I can check details of a feed,
|
|
or of a group of feeds, or of a list of feeds, or whatever.
|
|
And it can also summarize all of the downloaded episodes
|
|
belonging to a feed,
|
|
and it generates reports in a variety of formats.
|
|
I used it to generate the notes for two HPR shows,
|
|
which I referred to here,
|
|
shows I did on the podcast feed I'm listening to.
|
|
And I was able to generate HTML at the back end of this thing.
|
|
So as always, I tend to over-engine it.
|
|
But that's what it muses me.
|
|
I've got a tool for subscribing to a new feed,
|
|
not too surprisingly,
|
|
and that's the point at which I assign it to a group,
|
|
and then I decide,
|
|
because the feed that I'm newly subscribing to
|
|
will already have a whole bunch of episodes in it.
|
|
I can at that point say,
|
|
I want to get the latest five or ten,
|
|
or something, or none at all.
|
|
Just wait for new ones to come out.
|
|
I can do that at the point at which I subscribe.
|
|
And obviously I've got the reverse tool,
|
|
which allow me to cancel the subscription.
|
|
I store the feed details in an archive,
|
|
and I add notes to that,
|
|
as I'm deleting them to say,
|
|
why I deleted this podcast is boring,
|
|
or whatever it is.
|
|
And that way I can always look back and say,
|
|
oh, I did actually listen to that,
|
|
and I hated it.
|
|
This is why.
|
|
I've also been known to re-subscribe to a feed
|
|
that I've forgotten,
|
|
and I've listened to it.
|
|
And it's during the subscription,
|
|
I get a prompt that says,
|
|
hey, you've listened to this before
|
|
and you didn't like it because of this.
|
|
So it acts as the memory I don't always have.
|
|
So let's get to the conclusions then.
|
|
I have been doing this for quite a long time.
|
|
I seem to have actually started building this stuff in 2011,
|
|
although I started listening in 2005.
|
|
And I kept a journal of what I was doing,
|
|
which I tend to do with projects.
|
|
That's a file of the formatted text.
|
|
And I noticed, as I was preparing this,
|
|
it's got more than 8,000 lines of notes in there
|
|
about what I've been doing.
|
|
So it goes to proof.
|
|
I've been doing quite a lot of work on this over the years.
|
|
So what's good about it?
|
|
Well, it's mine.
|
|
It's originally inspired by Bash Potter,
|
|
but the current script is completely a complete rewrite.
|
|
It works.
|
|
It does all I want it to do.
|
|
Now it doesn't need much effort to run and maintain.
|
|
Along the way I've learned tons of stuff,
|
|
understand XML and SSLT better.
|
|
I understand RSS and Atom feeds better.
|
|
I know a lot more about Bash scripting, still learning.
|
|
And by the way, quite a lot of that I learned
|
|
in trying to hack all this stuff together,
|
|
like using Bash to interface to a database, for example,
|
|
which is a loony thing to do.
|
|
Anyway, all of that I've used to make shows.
|
|
That's it.
|
|
Things I've discovered,
|
|
weird things to do with Bash.
|
|
And I've done HPR shows about.
|
|
I've learned quite a lot more about Postgres
|
|
and databases in general.
|
|
And I understand quite a lot more about audio tags
|
|
and the taglib library that I used to work on tags in Perl.
|
|
And a little bit in Python.
|
|
So it does, the scheme I'm using does have quite a lot of good ideas
|
|
about how to deal with podcasts.
|
|
I think it good ideas anyway.
|
|
But podcast feeds and episodes,
|
|
though in many cases they're not very well implemented.
|
|
So what's bad then?
|
|
It's clunky, badly designed.
|
|
It's the result of hacks on top of hacks.
|
|
It's really an alpha version of what it should be,
|
|
what I wanted it to be.
|
|
And it's one of those cases where you think,
|
|
well okay, I've learned some stuff.
|
|
And now I'm going to throw it away and start again.
|
|
I'm just reluctant to do that or have been.
|
|
It's not sufficiently resilient to issues with feeds
|
|
and the bad practices you find in feeds.
|
|
For example, BBC have this weird habit of releasing an episode.
|
|
I think they automate it actually.
|
|
Then they re-release it, re-release it a few days later.
|
|
And it's often I think because somebody has checked
|
|
what was generated by the automation,
|
|
and found that it's truncated a bit
|
|
or it's added, it's left some junk at the start or something.
|
|
And they edit it and then they re-release it.
|
|
But what they seem to do is they seem to release it
|
|
with as if it's a brand new episode.
|
|
So they don't keep the same idea, which is what you should do.
|
|
They re-release it with the different URL,
|
|
which is fair enough, but in such a way that you get a duplicate.
|
|
Now other podcasts deal with this better than mine does.
|
|
I think because they don't use additional information
|
|
like hashtags that they have generated themselves,
|
|
not hashtags, hashes at MD5 hashes or that give a better way of identifying.
|
|
Another thing that's bad is not easy to extend.
|
|
The current business of obscuring podcasts
|
|
behind strange URLs that you then have to dig down through
|
|
to find the actual name that has thrown everything in a loop.
|
|
Whereas a lot of other podcatchers have dealt with this
|
|
through better design, I think.
|
|
And the last point is it's completely incapable of being shared.
|
|
I'd have liked to have offered this to the world in large,
|
|
but in its current incarnation it's absolutely not something
|
|
anybody else would want.
|
|
It's very much an alpha thing and it's hugely hacky.
|
|
And you know, you'd an idiosyncratic and strange.
|
|
So nobody else would want it as it stands at the moment.
|
|
So it's a mixed thing.
|
|
Anyway, I thought I'd share some of the details of it
|
|
if you want to know more than ask me, but I won't be...
|
|
I don't plan to do any more about this, because as I say,
|
|
it's too weird and idiosyncratic.
|
|
Okay, that's it then. Bye now.
|
|
You've been listening to HecopobliGradio at HecopobliGradio.org.
|
|
We are a community podcast network that releases shows
|
|
every weekday Monday through Friday.
|
|
Today's show, like all our shows, was contributed by an HPR listener
|
|
like yourself.
|
|
If you ever thought of recording a podcast, then click on our
|
|
contributing to find out how easy it really is.
|
|
HecopobliGradio was founded by the digital dog pound
|
|
and the Infonomicon Computer Club,
|
|
and is part of the binary revolution at binrev.com.
|
|
If you have comments on today's show,
|
|
please email the host directly, leave a comment on the website
|
|
or record a follow-up episode yourself.
|
|
Unless otherwise stated, today's show is released
|
|
under Creative Commons' Attribution,
|
|
ShareLive3.0 LiveSense.
|