Files
Lee Hanken 7c8efd2228 Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use
- Search episodes, transcripts, hosts, and series
- 4,511 episodes with metadata and transcripts
- Data loader with in-memory JSON storage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00

240 lines
21 KiB
Plaintext

Episode: 2720
Title: HPR2720: Download youtube channels using the rss feeds
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2720/hpr2720.mp3
Transcribed: 2025-10-19 15:32:28
---
This is HBR Episode 2007 120 entitled Download YouTube channel using the RSS feeds.
It is posted by Ken Fallon and is about 24 minutes long and can remain a explicit flag.
The summary is, can share a script that will allow you to quickly keep up to date on your
YouTube sub-criptions.
This episode of HBR is brought to you by AnanasThost.com.
With 15% discount on all shared hosting with the offer code HBR15, that's HBR15.
Better web hosting that's honest and fair at AnanasThost.com.
Hi everybody, my name is Ken Fallon and you're listening to another episode of Hacker
Public Radio today is going to be the second in a mini series that I'm doing on YouTube
subscription.
The last episode I recorded was in response to a hookah's episode where he wanted to watch
YouTube channels in reverse order, so watching the oldest first and follow the video producer
as they progressed through their voyage of discovery.
That's quite useful because quite a lot of stuff are episodic and sometimes people
the channels that I watch, I also wanted to do that because tutorials build on tutorials
build on tutorials, so you end up watching a video and it says well if you haven't seen
my video on this then you should go back and watch that and then that has a reference
to older stuff, so it's always easier if you like the channel to go back, get all the
videos, download them and play them.
I download them in order to save them bandwidth costs, but also so that I have them safely offline
and I can watch them on the train, but also sometimes the videos disappear for one reason
or another and that locally.
I do delete them after I watch them usually, other stuff I can save to them as, but okay.
Anyway I digress.
The disadvantage of that is that for people who produce a lot of content and have been
doing it for quite a long time, there are a number of videos can be into the thousands
of hundreds if not thousands, the EEV blog took maybe an hour or two to parse using my
previous script and that's a bit of a diss, that's fine, the first time you want to download
it because it's going to take hours and hours to go through each of the videos and download
each of them anyway, but if you just want to check and check the last downloaded time, that
means you need to do a query on the video itself, download all the information about that,
look at the last download time and then check to see is that older than the one that I want
to download now, yes it is, okay I'll skip to the next one and that takes time.
Now I was struggling with this and by happy coincidence Dave Morris mentioned that YouTube,
if you subscribe to channels, they had a RSS feed per channel and that you could export
your all your channels into an opium alpha, well this is absolutely ideal because opium
alpha's are basically a playlist of RSS feeds and you should be familiar with RSS feeds because
you would be listening to HBO probably if it wasn't for RSS feeds and they just allow your
a bug character to go and get the media, provides an XML method to go get media, so there's a
few things in this chain, first of all you need to be subscribed to the channels, which means people
know what your subscriptions are, of course you can create a pseudo anonymous one if you want but
I'm actually thinking that this is kind of the currency of YouTube, the number of subscribers
that you have, so I'm quite happy to subscribe to the people that are subscribed to.
Now a lot of YouTube content providers will complain about things like that, the bell doesn't
work and that you need to get an email when the video comes in and they miss videos because
you didn't click the bell and it only recommends videos for you, this bypasses all of that,
you will always get all the videos that somebody uploaded, all the current ones that they are
called, so if somebody's uploaded less than 10, there will be less than 10 in this feed, if they've
uploaded more than 10, it will only be the 10 newest ones that are in this feed, but if you check
once or twice a week, then assuming somebody pulls one video a day, you still have 10 days to get
all the video downloads and because it is 10 multiplied by the number of videos in your feed,
that's the maximum number that you're going to be checking or so, I'm not sure if 10 is the number,
let's say it is, excuse me one moment, so you need to be subscribed to
people on YouTube, so you can go to subscription manager, and this is actually the only part
that you need to be logged in for this thing to work because the RS feed itself doesn't actually
require you to be logged in or authenticated to Google, so if you discover a new feed,
then you log in, you subscribe to that feed, you export your opml file, and then you can log out
of Google again, and our YouTube, and then you're good to go, so the secret URL for the subscription
manager is youtube.com, for subscription, underscore manager, subscription, underscore manager,
and when you type that in, you'll get to a secret page, and pretty much is the secret page, it
looks very much to the subscriptions page, but it's different, because right down at the end,
you'll see I have 69 subscriptions at the minute, and right down at the end is export to RSS reader,
and there's a button export subscriptions, and when I do that, it asks me to save the file, and I
always paste in the file name where I'm going to save it as subscription underscore manager.opml,
excuse me, and I also run through XML, lint, space, dash dash format to basically make
it human readable, and I use a pipet through the sponge command back into the same file name again,
sponge is an excellent command, which actually I think I was introduced to here on it, it allows you
to, without having to write, write to the same file without having to write a temp file and overwrite it,
very, very cool command. Anyway, I have a bit of a script that I put together,
and it's broken down into some predictable sections, so the first part is some variables,
which determine what's going to happen, some of the settings that I can do.
The second part is some sanity checks to make sure that I have a subscription file, clean up some
log files if they're not there, and maintains a copy of all the files that have been downloaded.
Then I parse through the opml file to get a list of all the
or SS feeds that are listed in the opml file, and inside that loop, I go to each and every one of
those subscriptions, so all six, and I extract all the video URLs within each of those files,
so 70 multiplied by 10, so 70 multiplied by 10 is 700 URLs at the end, maybe there's more,
maybe there's less, I'm not sure, could be 20 and that basically gives me a list of the URLs
that I need to get. Then I'm going to go through all that list, and I'm going to check and see
if I already have downloaded that URL if I have, then I'm not going to do anything.
If I haven't, then I'm going to basically do some checks, and then after all that's done,
I do some tidying, I loop the producers a list of all the files, a cut down list of all the
possible files that I have, so some of them I don't want, some of them I've already downloaded,
and that will give me a list of the new files that I want to download, and then it uses YouTube
DL again, but this time, just downloading a list of these files. Here are YouTube DL,
you don't need to figure anything out, these are the ones that I want, and here's where I want you
to save it, so let's step through this video file. This or this bash file, this will be in the show
notes, as well as copy of my OPML file as it currently stands today, and some other links and
stuff, so if you basically go through a bash file, it doesn't float you both, I would suggest you
basically go to tomorrow's episode right now, but I'll go through this as it is, so the first section
is the save path, and that's where I want to save my files. My subscription is the URL to the
or the location of where my OPML file is, and what an OPML file looks like actually is it's an
XML file, so you've got an opening element, XML version 1, then the OPML element, which opens
to version 1.1, and inside of that is a body, and inside of that is an outline, and inside of
that is another outline, so on the final list of outlines is for each line, there's essentially
one line per channel that you subscribe to, and that channel that you subscribe to has an RSS,
and within that element itself, just the element tag itself, outline it has one, two, three,
four different attributes in there that gives you some, the text and the title seem to be the same,
they just seem to be the name, and the type is RSS, so it's always there you go, and the magic bit
that we're looking for is the XML URL, so that is the URL to this particular channel's RSS feed,
and the first part is always the same, youtube.com, forward slash feeds, forward slash video.xml,
question mark, channel ID equals, and then each of the channels has a unique identifier,
so what we're going to be doing in our script is we're going to be taking that file,
and using XML starlet, and we're going to be taking out the XML URL and the title, and we're just
taking the title so that we can print something nice to the screen so that we know what's going.
So if I take the first one of those, and I actually copy and paste it into youtube, and I get a
basically RSS feed, which is an atom feed, to be honest, so not as it says an RSS feed,
and it gives me the atom feed of this particular channel, and in that channel, there's an entry
for each and every one of the videos, and published data, uploaded lots of cool information about
this video, but at this point, about this channel for this video content producer.
At this point, the only one we're interested in is the URL, which is stored in the feed element,
in the entry element, in the media group element, and the media content element,
then it has an attribute called URL. So in order to get that, we first need to parse the opml file,
so let's start doing that. XML starlet using space siel for select mode, using space dash capital T
for do this as text, space dash T to tell us here comes the template, dash m to say match
four such opml, four such body, four such outline, four such outline, which we described before.
Okay, so now we're on the lines itself, then we're going to produce a list of
space-delimited things, and we're going to produce two of them with a using the concat attribute,
concat function of XML starlet, of XSLT actually, that is provided by XML starlet,
and we're looking for the attribute that's at that location, outline, outline, body, blah blah,
XML URL, and then we're going to put a space, and then we're going to put the title of the
of the video, and we're doing dash in, so it's going to be a new line between each, which is great,
because we're putting it into a loop, and we're using subscriptions as, so our subscription opium,
so that's fantastic, that gives us a list of, in the case, subscription manager, I would have
youtube.com, it would produce HTTPS, called on blah blah blah, channel ID equals blast, space,
winter garden, and then the next one is blah blah blah, space, primitive, space technology.
Cool thing here is that I'm piping it into while read, do, and I'm putting instead of normally,
you would go while read.i, you can actually put two variables, and the first one is subscription,
and the second one is title, so as there is, the first one will always be the URL, and everything
else gets dumped into title, and that allows me to echo out, without getting the title, so I know
I'm going to have getting winter garden, getting primitive technology, getting John Ward, etc,
etc, etc. Then I just do a WGET, because now I'm dealing with an XML file, so all I need to do
is WGET that, to the RSS file, so I use WGET-Q and subscription, which is a variable name for the URL,
dash capital specifies the output location, and the dash, which is send it to standard out,
that's fairly common in a lot of Unix tools, and then I'll pipe that into, you guessed it,
XML Starlet again, and I use the select command capital T for yes, you've been paying attention,
text, locust for template, dash m for match, underscore colon, which is a really cool thing that
XML Starlet does, because XML itself has this horrible thing with namespaces, namespaces are,
there are two daves on the channel, one was called Dave, the other was called Dave and Mars,
so the namespace, distinguishes one from the other, and what they call, sorry, what the
underscore colon does, it says use the default namespace, don't bug me about that, and then I can
specify the other one, so underscore feed, underscore colon feed, forward slash underscore colon entry,
forward slash media colon group, because that's a separate media type, and then under that element,
media colon content, don't worry too much about it, just think of it as like Unix directory paths,
that's what X path kind of does, and in there it says you're going to find an attribute called
URL, and that's the one I want you to print out, and print it out you will, because I've told you
to do it, using the dash v command for value, and then space the dash n for a new line, and then
the dash to say we're not finished, send it to standard output, and then we send it to
Oc, and we use the F, capital F, Oc space capital F, to specify a delimiter outside of Oc, so going
into Oc, we will already have specified what the delimiter is, and it's a question mark, and the
reason for that is the URL's media content gives you versions, version equals three, version equals one,
so if somebody's uploaded it, personally I don't care, I will always get the latest one,
so that's what I want, so I just return back dollar one, and that will give me a clean URL,
so basically what it does is we've gone to the channels opm alpha, we've opened up the RSS fee,
and we stripped out all the YouTube URLs that are now current, so all the possible ones that
we can get, we now have them in a list, which is log file underscore get list, and what we're
going to do then is we're going to go through loop through that list, and we're going to do some
cool things, first of all we're going to have a count, and then we're going to sort it, get a
unique, and a work count of it, so we're going to have a count and a total, and we're going to keep
track of the count, and we're going to say downloading count of total, so that we can update
a feed bar on the bottom of the screen, showing how many we processed, as we go, which is kind of cool,
then the first thing we do is we check to see if this video is stored in our log file, so
similar to the way bash podder does it, anytime it downloads a media file it keeps track of that
in the log file, we check the log file, have you already downloaded this, if it's in there,
it skips it, what you'll find is that youtube dl also maintains a list of what it's downloaded,
and we won't download it again, you can override that as well if you wish, but I want to do it
belt embrace style, so in order to make my life easier and nicer there's a few things that I want
to be able to do, for start I sometimes get links to live events or long events that have 24,
48 hours on them, and I want to be able to limit the maximum size of the video that I download,
and so I have a variable max length, and then I have another thing called excuse my french skip
crap, which is a array, or which is a string that contains a eGrip or a Grip regular expression
command, where I put in a list of stuff that I don't want to download, for example,
fail of the week, kids react to live stuff, best pets, bloopers, or kids try all that stuff that you
you know junk, anything that's junk you put in there, I don't use it in this script, I use it in
another one, which we can talk about later, so several things that I'm able to get, one thing that
youtube dl does is it allows you to go to url and using the command dash dash dump dash json, you will
get a complete json file of all the metadata associated with that video file, all the formats
that is available in the upload time, more information than you can shake it stick at, absolutely
excellent too, and then I use the json equivalent of xml starlet, which is jq, and that allows me
to strip up, for example jq dash uploader will give you the uploader dot title will give you the
title dot upload date will give you the upload date dot id will give you the id dot duration will give
you the duration, so this makes it absolutely really really easy to work with these video files,
you can all the metadata you need to produce nice clean ur messages, so first thing I do is I
check to see if the duration is sane, because sometimes it's it's zero, and if the duration is
strange I skip over that one, otherwise I look to see if the duration is greater than the max
length that I had before, and if it does it skips that one that says you told me not to download
this one, and sometimes I'll keep an eye on those and see oh but that one I actually do want it
and then the next one is running the grep filter to make sure that to skip over any any of the stuff
that I don't want to download, and then finally we get it prints off the video that it tells you
that who uploaded what the title is and what the URL is, and then pipes that into a to-do text file,
another thing that I do then is I save the metadata description of the
of the JSON, which is what you see on the YouTube when YouTube channel when you go to it and you
click more info and you see all the information, quite a lot of the videos that I download have
your links to the components that they use in it or how to's or links to the get GitHub repositories
or whatever, and all that's available right there, the only thing is it has
escape codes like slash in and slash t for new lines and tabs, so what I do is I run that
in a dollar bracket thing, Dave sorry can't remember what that's called, and I use the echo dash e
and then command substitution is it, dollar open bracket echo metadata into JSON jq,
space description, and then I put that to a video id.text, so if ever I need to go back and have a
look at what it was that somebody the links somebody had in the video, I can get it right there,
so that's really really useful, and that that I keep even if I delete the video itself, and if,
so that is the loop, so I've found a video that gets added to the to-do list, and if I haven't found
a video it says processing count of count, so and it uses the dash n dash e, so it doesn't print a
new line and it uses escape codes, and one cool one there is the slash r which means when you print
it off it goes back to the start of the line, so it looks like it looks like only the number
processing 1, 2, 3, 4, 5, 6, 7 of total has been updated, so that's kind of cool,
and then exit the loop, and I index the count using count equals dollar, open bracket book bracket
count plus 1, close bracket, close bracket, thank you again, Dave, so now I have a list, and I check
to see if there is one, because sometimes there isn't because if I run us twice after each other,
there's been no updated files, but if there is, then I cast the to-do file into youtube dl,
and this time I'm using dash dash bash dash file with a dash saying that it's going to be a
batch file, and it's coming from standard input, and then the old classic ignores no m time
restrict file names format equals mpu for dash or save path, and then sub directory of the uploader,
sub directory inside the sub directory, upload date dash, then the title, and my diamond
delimiter, and then the id, and then the extension, and I cast that to the once that's downloaded,
I cut that entire to-do file into the log file so that when I run this again, all the videos that
I have downloaded are already in the log file, and then there's just a little bit of cleanup, so
I hope you didn't find that too boring, because yeah it's actually quite a nice script, this this one,
and sometimes you have scripts that just don't go around, and we'll be talking about that
in the next one, and you have scripts that improve over time, and this one has improved over time,
I think, so I'm welcome to your feedback, constructive of course, and as always tune in tomorrow
for another exciting episode of Hacker Public Radio!
You've been listening to Hacker Public Radio at Hacker Public Radio dot org.
We are a community podcast network that releases shows every weekday, Monday through Friday.
Today's show, like all our shows, was contributed by an HPR listener like yourself.
If you ever thought of recording a podcast, then click on our contributing to find out how easy it really is.
Hacker Public Radio was founded by the digital dog pound and the infonomicum computer club,
and is part of the binary revolution at binrev.com.
If you have comments on today's show, please email the host directly, leave a comment on the website,
or record a follow-up episode yourself.
Unless otherwise stated, today's show is released under creative comments,
attribution, share a light, 3.0 license.