- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
290 lines
26 KiB
Plaintext
290 lines
26 KiB
Plaintext
Episode: 2544
|
|
Title: HPR2544: How I prepared episode 2493: YouTube Subscriptions - update
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2544/hpr2544.mp3
|
|
Transcribed: 2025-10-19 05:13:39
|
|
|
|
---
|
|
|
|
This is HBR episode 2,544 entitled How I Prepare an Episode 2,493 YouTube Subscription Update.
|
|
It is posted by May Morris and is about 33 minutes long and can remain an explicit flag.
|
|
The summary is in show 2,493 I listed some of my whitey subscriptions here now.
|
|
This episode of HBR is brought to you by an honest host.com.
|
|
At 15% discount on all shared hosting with the offer code HBR15, that's HBR15.
|
|
Better web hosting that's honest and fair at an honesthost.com.
|
|
Hello everybody, it's Dave Morris. Got a weird show for you this time.
|
|
I hesitated over whether I should do this to be honest but I thought it might be of interest to somebody.
|
|
So what it is is that in show 2,493 I listed some of the new YouTube channels I've added to my
|
|
subscription list and because I'm very very lazy and I'm always looking for shortcuts,
|
|
I used programming techniques to data manipulation techniques to prepare the notes.
|
|
So I cut and pasted stuff from the YouTube pages for some of the text.
|
|
I couldn't find a simple way to automate that but the basic list of YouTube channels was generated programmatically.
|
|
So I thought it was worth making a show about how I did this.
|
|
So I hope that's enough information to you.
|
|
So if you this is going to be something incredibly boring to you, you can switch off now.
|
|
But I hope you might find it quite interesting because there's a bunch of different techniques
|
|
that are being used here to achieve what I wanted to achieve.
|
|
And it's part of the process of data manipulation which is what I've done for pretty much all of my
|
|
working life. Somebody gives you data in a weird, weird format and you have to turn in something
|
|
we're usable. And you've heard lots of other people talk about this. Josh from my honest host was
|
|
talking about the whole business of being lazy and coming up with scripted ways of achieving
|
|
things of this nature on the New Year's Eve show. So in order to do what I wanted to do,
|
|
I needed the YouTube subscription list, my YouTube subscription list.
|
|
And I'll explain how I got that in a moment. I needed the XML style it tool. This is
|
|
something that Ken has mentioned on many occasions. He uses it regularly, I think.
|
|
I've not really got very deeply into it until I did this project. It's not very deep yet.
|
|
Well, it certainly learnt some stuff about it. Then the third component was the, there's a package
|
|
called the Template Toolkit which I'll enlarge upon in a moment. And I use that to generate the
|
|
markdown that I use for my show notes. And then I use Pandoc to convert markdown into HTML.
|
|
I won't talk about Pandoc in this episode, but I'll talk about the three other steps.
|
|
So first off, if you are a YouTube user and you want to get your subscription list,
|
|
then one technique, maybe there are other techniques that you could scrape the page and stuff,
|
|
but you, I discovered that you, there's a thing called the subscription manager,
|
|
which should be available to you as a YouTube user. And I've given the link to it and so forth
|
|
in the notes. And you select the Managed Subscriptions tab. And at the bottom of the page is an export
|
|
option, which when you click it, generates OPML. And this is by default written to a file called
|
|
subscription manager on your, whatever you're, you're using at that time. So what's OPML?
|
|
I certainly mentioned it before, but I've never gone into much detail. A little plan to do a
|
|
lot of detail now. It's an XML data format and it's designed to be used by some sort of application,
|
|
like a pod catcher or something that uses RSS feeds. You can also use it if you're,
|
|
you're dealing with videos, if you're using some sort of offline video viewer or something.
|
|
I thought it would be a convenient format to parse in order to get the, the basic channel
|
|
information that I wanted. So the list of channels and stuff. And as I say in the notes,
|
|
it's possible to do this by scraping the YouTube website, but you'd need to write something very
|
|
sophisticated in my terms, sophisticated anyway. If you have done this type of thing and you know
|
|
of a better way to achieve this, then let us know. Send in a comment or do a show about it, perhaps.
|
|
Given that I've got the subscription manager file, I used the XML style tool to parse it.
|
|
It's a command line tool and I run Debian testing and I was able to install it from the
|
|
repository with a simple apt-get. There are other tools that can be used to do this, but XML style,
|
|
it is a very powerful and quick Swiss Army knife type tool for doing analysis and
|
|
parsing of XML. Ken has mentioned that he one time was going to do a show about this or even more.
|
|
More than one show because it's quite complex. So I hope you'll do that at some point.
|
|
It certainly deserves some description on HPR I would have thought. It's even worth a short
|
|
series. I'm just going to mention how I use it to generate a simple comma-separated variable file
|
|
from the OPML. The first thing I did was rename this file called subscription manager to the name
|
|
yt underscore subs dot OPML just so I knew what an earth it was in the future. Then I discovered
|
|
how to use XML style it to do an analysis of a bit of XML. XML is a sort of hierarchical tree
|
|
structure of what I guess you could call objects entities or something of that sort and the
|
|
command I used was XML style it and that this is followed by a sort of sub-command EL
|
|
letters E and L in lowercase and then space hyphen U and then the name of the file yt underscore
|
|
subs dot OPML. What that does is it simply shows you that within the tree structure of the XML
|
|
there's a top-level OPML it's a bit like a directory structure then slash body then slash outline
|
|
then slash outline again there's a fairly simple structure. You can work out the structure of
|
|
XML by using various tools which will print it out in a well-formatted way. One of them is called
|
|
XML lint which is part of the XML2 utils package on devian anyway which it also requires
|
|
lib XML2 but if you're interested in that I do actually use XML lint from time to time. I should
|
|
probably use XML style because I think it can do a similar job but I've always been using
|
|
XML lint for many years. The problem is that XML the layout of it is not usually designed for human
|
|
readability so it's all often it's squashed together all in one line or on many many long lines
|
|
so an XML lint can reformat it and I demonstrated briefly how you could do this just showing the first
|
|
seven lines of what was in my file but I'm not going to talk about that anymore. Now within
|
|
the XML. XML consists of objects if you like or tags if you prefer because it's a kin to a
|
|
HTML and the tags are enclosed in less than and greater than signs and you'll see in the XML lint
|
|
output that there's an instance where it just contains body in the symbols of word body but there's
|
|
other cases where you might want to modify the particular object that you're defining and you
|
|
can put further sequences of name equals and then a quoted string and that type of thing within
|
|
it and there are lots of these there are many of these instances in the opml format so you can
|
|
ask XML style it to report back the structure including these things which are called attributes
|
|
and you can you can see that now you'll see all of them if you use XML style it to do this so I've
|
|
just run it with a head command on the end just to show the first 11 lines I chose 11 because after
|
|
trial and error it showed a single sample of what's in there so the command would be XML style it
|
|
space EL is subcommand space hyphen A space and then the name of the file yt subs. opml
|
|
piped into head minus 11 so what that shows is that the opml tag can contain the attribute
|
|
version it shows it as opml slash at version the version is an attribute and it's used in this
|
|
particular file it's just the first line of the opml definition which says that it is a version
|
|
1.1 opml file you don't really need to know more than that but there's other things the deepest
|
|
branch of the tree or the furthest branch of the tree contains a tag which contains the
|
|
the attribute text title type and XML URL so with that in mind it tells you what type of layout
|
|
the XML contain and you can then write a much more complex style it XML style it command
|
|
which will pull all of the relevant information out so I've demonstrated this with an XML
|
|
style it command which took a little bit of trial and error to work out reading of the documentation
|
|
etc and it's just one long line it's a piped line with a bunch of commands in it and in order to
|
|
show it in these notes I've split it up into into separate lines where each one is ends with a
|
|
backslash so this would be the actual contents of a file that you could or indeed you could type this
|
|
in on the command line it shows it being typed on the on the command line so you put a put a backslash
|
|
on the end of the line that means that the command's not finished and it's to continue it's not
|
|
the only way to do it but I thought this would be a way of showing what was going on so it gets quite
|
|
complicated in terms of what I'm doing here but let's see if I can break it down into into some
|
|
reasonable pieces that are understandable what we have here is a pipeline and the first element of
|
|
the pipeline is a bracketed list of commands so it's an open parenthesis and then some stuff and
|
|
then a closed parenthesis and everything that comes out of that parenthesis list is piped into
|
|
in this particular case head, hyphen 5 so it's just to demonstrate it and it just shows the first
|
|
five lines that are output by this this pipeline so going into the parenthesis first command we see
|
|
in there is an echo and echo simply is a string which consists of the words title comma feed comma
|
|
scene comma skip so they're all in single quotes and then a semicolon what that does is it causes
|
|
that particular string title we'd seen a skip to be output by the pipeline and because
|
|
effectively got here is a bunch of commands the brackets that which the parentheses here are a
|
|
bashism it's also available in other shells which causes all of the commands within the parentheses
|
|
to be executed and the output to be written as a stream from them all so this is just the first line
|
|
that's to be written out and it's um we're making a comma separated variable file and the
|
|
requirement is that the first line be the titles of the columns within the within the file so you
|
|
could use this in a spreadsheet for example where you use these as titles in your in your spreadsheet
|
|
so after the semicolon we then go into XML style itself and there's a subcommand that starts
|
|
this off which is cell SEL that means to select data or to query an XML document that's what it
|
|
says in the manual page so we're asking XML style it to to do some specific query of the contents
|
|
then the next thing we see is hyphen t that defines that there's to be a template used hyphen m
|
|
defines a thing called an x-path expression which is the part of the template now an x-path
|
|
expression is conceptually similar to a path within the file system so the part the x-path is
|
|
actually slash opiml slash body slash outline slash outline we already saw that when analyzing
|
|
the contents of this file it's just saying that the deepest node within the within this tree
|
|
structure is the thing I just said so we actually want to pull data out of there we don't care
|
|
about intermediate data just we want this specific path as if we were looking in a in a file system
|
|
path to find specific files at that level in this case we're going to be getting attributes so
|
|
that's then followed by a hyphen s hyphen s option is a sort specification I won't go into
|
|
details as to what this means but just briefly it's it's asking for the a capital a colon t colon
|
|
hyphen space is the type of sort to do and the thing to sort with sort by I suppose you'd say
|
|
is the title attribute so at title is in there that's how we're going to sort the output then the
|
|
next thing is the specification what is to be reported so that is a sub expression which begins
|
|
with the option hyphen v and there's a string containing an expression which will pull
|
|
particular pieces of data out of the XML and what it says is concat so it's it wants to concatenate
|
|
a bunch of things together and in parentheses it then says at title so we want to know the title
|
|
that's the title of each channel in the in the YouTube output then comma then a string containing
|
|
a comma a comma then at XML URL comma then a string containing comma 0 comma 0 close string close
|
|
parentheses and close the whole enclosing double quotes what's that saying is just pull the title
|
|
out the XMLL field out and then put them together in a cover separated variable with a couple of
|
|
zeros on the end then we have hyphen n which just simply specifies the name of the file that's
|
|
to be processed so everything that's everything within the parentheses and what's happening there
|
|
is XML style it is being told how to go and process this file and what to output and it's just
|
|
going to output these two fields with a couple of extra zeros on the end the output of these parentheses
|
|
this parenthesis list pipeline I guess is to be written somewhere in reality I wrote it to a file
|
|
called whitey underscore data dot CSV and I used a greater than sign to pipe them but in this
|
|
particular case I'm just showing you what it looks like by demonstrating the first five lines
|
|
that come out of it so this is fairly advanced bash ism which I sort of think I will get into
|
|
at some stage in my bash series I think but this is a case of actually using it to do some data
|
|
data manipulation so we're going to have a four column comma separated variable file and it's got
|
|
the remained that the last two columns are all going to be zeros in the file that's generated
|
|
but that's to allow me to fiddle around with it and change these values to control things to
|
|
do with the file the column marked with C and that's the third column is for marking the channels
|
|
which I have already talked about in an earlier episode about YouTube subscriptions that was
|
|
2,202 I didn't want to talk about them again so I wanted to mark them as ignore effectively the
|
|
skip column is for channels that I just didn't want to include because I didn't think they were
|
|
relevant to that to that particular thing I've got a lot more channels than I've talked about so
|
|
far that was very long-winded way of explaining a thing that pulls data out of this opml file so the
|
|
next thing I wanted to do was to generate HTML for the hbr show notes to do that I used this tool
|
|
called template tool kit it's a templating system not too surprisingly and there are many
|
|
templating systems for different programming languages and applications this particular one I've
|
|
been using for over 15 years I think and I use it a lot when I was working really find it very
|
|
usable and has tons of features I actually use it on a regular basis when generating show notes
|
|
for hbr shows that I do and I also use it in some of the scripts the admin scripts that I've
|
|
written to do work for hbr a template tool kit is pearl application so you need to have pearl
|
|
installed on your machine but just about everything does these days including raspberry pies so
|
|
it's pretty much a matter of course that you get it you need to have a version of pearl later
|
|
than 5.6.0 and my devintesting box has 5.26.1 so 560 is pretty old and the tool kit can be
|
|
installed in the normal pearl way using the comprehensive pearl archive network cpan but if you
|
|
you do need to do some preliminary work to set that up so if you if you don't want to do that
|
|
then there's a method of doing of installing it which is defined on the template toolkit site
|
|
and I've copied the instructions into the notes basically you need to grab a tar file and you
|
|
need to untie it you cd into it and then you you make it you use the pearl to to run the first
|
|
station and use the make command to to build it and you can use sudo to to install it across your
|
|
system template toolkit is currently version 2.26 but if you look at the main template toolkit site
|
|
whatever that happens to be the instructions whatever versions this is currently instructions will
|
|
reflect that so template toolkit is a big subject and I'm not going to go into detail here I have
|
|
penciled in possibility of doing an episode or two on it in the future and if you it sounds
|
|
interesting do you let me know if you want me to do it principle is that you prepare a template
|
|
and in the template are directives which conform to a syntax specific to template toolkit tt is
|
|
usually referred to it the template is usually called out of a script written in per or indeed
|
|
python there's a python version of template toolkit and then the template is given data from
|
|
the from the script or it can obtain data itself and we're going to use that in this particular
|
|
process and then it does things to the data and and and formats it template toolkit directives
|
|
are enclosed in square bracket percent sequences so open square bracket percent and then
|
|
a directive then percent closed square bracket separates it from the data so you'd put that
|
|
into to represent a piece of data that was to be inserted or to provide directives such as loops
|
|
and variables and control statements and so on and so forth so it's a sort of mini language
|
|
all of its own. Now template toolkit can access CSV data and there's a plugin to it there's
|
|
it has a plugin system so you can enhance the the basic toolkit there's one called template
|
|
colon colon plugin colon colon colon data file and it just comes a standard with template toolkit
|
|
and it allows you to open an arbitrary data file by default I think the data is expected to be
|
|
infield separated by colon but you can also tell it to separate by commas and that's what I did here
|
|
and I could have written the thing with colon rather than commas but I've told it in my particular
|
|
case to use commas throughout just because so I felt like I guess so there's an example of how you
|
|
would in your template define the connection to your data and it consists of in these square bracket
|
|
percent sequences the word use in uppercase then some name equals then data file is a is a
|
|
function and then the first argument to it is the path to the file which I've just written
|
|
as file path here then if you want to change the delimiter to something else you put delim
|
|
equals and then a string containing the single character delimiter so I've defined in general
|
|
terms the thing here which points out the at a file separated with the with the fields separated
|
|
by commas the thing called refer to as name in this example is is actually a data structure
|
|
which is collected by template toolkit and made available within the template it's actually a
|
|
list of hashes a hashes an associative array and a list is a non-associated to the array so it's
|
|
an array of arrays if you like but you probably don't need to know that in huge amount of detail
|
|
because I'll be hopefully be explaining to you in a moment in the example of how I've used it
|
|
so I've got a the actual template that I used to do this sort of stuff and it's got the got a
|
|
used directive in it where I created a name yt list YouTube list and then set that to the
|
|
output from data file function where I pointed a file called yt underscore data dot CSV the one
|
|
I mentioned earlier that was created by XML style it delimiter is comma then in my template the
|
|
next line just consists of hyphen space YouTube channels colon that's piece of text that's to be
|
|
output by the template so I want to have a I want that to be output and that's a piece of markdown
|
|
syntax it's the the way you specify a list element and the next directive is a four each it's a
|
|
loop and it's a four each and then a variable name in and then some data structure so I've got
|
|
four each chan in yt list so yt list is a list of of this data structure I mentioned so it's
|
|
a list of channels basically and each channel contains bits of data about the the channel so I'm
|
|
setting a variable chan to point out then the next statement is next statements any xt is the
|
|
the verb in the command language which means skip to the next iteration in the loop and it's to skip
|
|
if the scene variable the scene element of the chan variable or the skip element the chan variable
|
|
are set to true that is value one so in other words if I have set these fields to either of the fields
|
|
to one then it's not going to be included in the output the next line is a piece of text effectively
|
|
with embedded bits of template toolkit stuff it begins with an indentation the indentation is
|
|
important because it's needed by markdown it is followed by the indentation is followed by a
|
|
hyphen and a space then an open square bracket then an asterisk and then after asterisk is
|
|
an open square bracket percent then chan dot title percent close square brackets so that's
|
|
a substitution of the value of the title of the particular channel with asterisk side of it
|
|
and there's a closed square bracket so there's square brackets around it there's an example
|
|
bit lower down in the notes then we do something very similar with with enclosing in parentheses
|
|
another template toolkit expression in square brackets percent and in this case it's chan dot
|
|
feed feed is the URL of the feed but in the opml the URLs are actually not the feed they are
|
|
RSS expressions they are RSS URLs it's not it's not the channel I am confusing channel and feed
|
|
it's not the the channel that we want that you'd click you'd load into your search bar in your
|
|
browser it's a feed for giving to an RSS feed but the difference between the two is tiny
|
|
so the expression chan dot feed dot replace causes a substitution to be done on that string
|
|
and this the original one is changed to a new one which references the channel so you get out
|
|
a channel pointer I think you'll probably see that from later on without me trying to explain it
|
|
then the last last piece is an end statement for template toolkit getting closed in these
|
|
prevent open square brackets percent and then percent closed square bracket and so that's the
|
|
end of the loop and that's it so there's six lines here and that's all you need in the template
|
|
so if you to run it you don't need to have a programmer totally you can use a command that comes
|
|
with template toolkit which is t page t p a g e and what that does is simply to run a template you
|
|
give it as an argument the name of a template and it will run template toolkit on it because in
|
|
the template it says what file it's to process it it just that's all you need in this particular
|
|
example I am piping the output into the head command where I'm using dash five to get the first five
|
|
line so you'll see that what you what you get is and column one a hyphen then space youtube
|
|
channels that was a bit of text that the template outputs and then the loop starts and it then
|
|
starts to print out indented hyphen things which are actually markdown links the markdown link
|
|
consists of a straight a bit of text in square brackets followed immediately by a URL in
|
|
round brackets in parentheses I've used asterisks in these square brackets because that
|
|
produces an italicized string so that this is markdown magic which is not really very magic but
|
|
they go so if you give that to paddock and the next next example shows the t page output being
|
|
piped directly into paddock and then I put the first five lines of that you see it's html where it's
|
|
some setting up a a list and then a then a sub list within it which is which is triggered by the
|
|
indented lists specifications so that was what was used in show 2493 and there's a there's a link
|
|
in these notes that takes you to the place where it's actually used so as I got to this point and
|
|
writing these notes I've thinking wow I've probably lost 90% of the audience here and anybody's
|
|
left is probably saying why in earth did you do this this is entirely overkill I'm sure Ken is
|
|
but um it's it's just the way my mind works that's it it's that thing that Josh was saying you
|
|
tend to come up with programming solutions to avoid the boredom of actually cutting and pasting
|
|
a whole bunch of things out of a web page or something of that sort it made a tedious process
|
|
a little bit more interesting I know Josh mentioned this but it's also things that I've heard
|
|
said in the the community of programming and people managing computers and that type of thing
|
|
for many many years that there's a tendency to come up with solutions so that you don't have
|
|
to do boring things and if you do have to do boring things that you you only do it once and there
|
|
after you have you've you've built something to short circuit it it's just a piece of psychology I
|
|
guess that goes along with the territory what I have here then is a is a bit of scripting which
|
|
I can use again if I ever want to do another episode on youtube subscriptions and I probably won't
|
|
but if I ever did wanted to say oh I found this cool this one and this one you might like and stuff
|
|
then I can easily go through the same process again and generate such a list and talk about a lot
|
|
a lot more straightforwardly than doing it the long hard way so what I've done is not necessarily
|
|
waste the effort and along the way I learned about how the hell you get stuff out of youtube which
|
|
just seems to be very reluctant to release information about what it is that you're you're subscribed to
|
|
and I also learned how to use XML style it I hope I might have passed on a bit of interest and
|
|
a recipe for doing strange things for that XML style it and I also learned some new things about
|
|
template toolkit even though I use it quite a lot already I found out things I didn't know at all
|
|
I'd never used it to process the CSV file and of course there's a hacker probably radio at the
|
|
episode at the end of it you might not agree but I think this is a cool process so if you made it
|
|
through at the end of this congratulations and thank you for listening okay bye now
|
|
you've been listening to HackerPublic Radio at HackerPublicRadio.org we are a community podcast
|
|
network that releases shows every weekday Monday through Friday today's show like all our shows
|
|
was contributed by an HBR listener like yourself if you ever thought of recording a podcast
|
|
then click on our contributing to find out how easy it really is. HackerPublic Radio was found
|
|
by the digital dog pound and the infonomicon computer club and it's part of the binary revolution
|
|
at binrev.com if you have comments on today's show please email the host directly leave a comment
|
|
on the website or record a follow-up episode yourself unless otherwise status today's show is
|
|
released on the creative comments attribution share a live 3.0 license
|