Episode: 2544 Title: HPR2544: How I prepared episode 2493: YouTube Subscriptions - update Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2544/hpr2544.mp3 Transcribed: 2025-10-19 05:13:39 --- This is HBR episode 2,544 entitled How I Prepare an Episode 2,493 YouTube Subscription Update. It is posted by May Morris and is about 33 minutes long and can remain an explicit flag. The summary is in show 2,493 I listed some of my whitey subscriptions here now. This episode of HBR is brought to you by an honest host.com. At 15% discount on all shared hosting with the offer code HBR15, that's HBR15. Better web hosting that's honest and fair at an honesthost.com. Hello everybody, it's Dave Morris. Got a weird show for you this time. I hesitated over whether I should do this to be honest but I thought it might be of interest to somebody. So what it is is that in show 2,493 I listed some of the new YouTube channels I've added to my subscription list and because I'm very very lazy and I'm always looking for shortcuts, I used programming techniques to data manipulation techniques to prepare the notes. So I cut and pasted stuff from the YouTube pages for some of the text. I couldn't find a simple way to automate that but the basic list of YouTube channels was generated programmatically. So I thought it was worth making a show about how I did this. So I hope that's enough information to you. So if you this is going to be something incredibly boring to you, you can switch off now. But I hope you might find it quite interesting because there's a bunch of different techniques that are being used here to achieve what I wanted to achieve. And it's part of the process of data manipulation which is what I've done for pretty much all of my working life. Somebody gives you data in a weird, weird format and you have to turn in something we're usable. And you've heard lots of other people talk about this. Josh from my honest host was talking about the whole business of being lazy and coming up with scripted ways of achieving things of this nature on the New Year's Eve show. So in order to do what I wanted to do, I needed the YouTube subscription list, my YouTube subscription list. And I'll explain how I got that in a moment. I needed the XML style it tool. This is something that Ken has mentioned on many occasions. He uses it regularly, I think. I've not really got very deeply into it until I did this project. It's not very deep yet. Well, it certainly learnt some stuff about it. Then the third component was the, there's a package called the Template Toolkit which I'll enlarge upon in a moment. And I use that to generate the markdown that I use for my show notes. And then I use Pandoc to convert markdown into HTML. I won't talk about Pandoc in this episode, but I'll talk about the three other steps. So first off, if you are a YouTube user and you want to get your subscription list, then one technique, maybe there are other techniques that you could scrape the page and stuff, but you, I discovered that you, there's a thing called the subscription manager, which should be available to you as a YouTube user. And I've given the link to it and so forth in the notes. And you select the Managed Subscriptions tab. And at the bottom of the page is an export option, which when you click it, generates OPML. And this is by default written to a file called subscription manager on your, whatever you're, you're using at that time. So what's OPML? I certainly mentioned it before, but I've never gone into much detail. A little plan to do a lot of detail now. It's an XML data format and it's designed to be used by some sort of application, like a pod catcher or something that uses RSS feeds. You can also use it if you're, you're dealing with videos, if you're using some sort of offline video viewer or something. I thought it would be a convenient format to parse in order to get the, the basic channel information that I wanted. So the list of channels and stuff. And as I say in the notes, it's possible to do this by scraping the YouTube website, but you'd need to write something very sophisticated in my terms, sophisticated anyway. If you have done this type of thing and you know of a better way to achieve this, then let us know. Send in a comment or do a show about it, perhaps. Given that I've got the subscription manager file, I used the XML style tool to parse it. It's a command line tool and I run Debian testing and I was able to install it from the repository with a simple apt-get. There are other tools that can be used to do this, but XML style, it is a very powerful and quick Swiss Army knife type tool for doing analysis and parsing of XML. Ken has mentioned that he one time was going to do a show about this or even more. More than one show because it's quite complex. So I hope you'll do that at some point. It certainly deserves some description on HPR I would have thought. It's even worth a short series. I'm just going to mention how I use it to generate a simple comma-separated variable file from the OPML. The first thing I did was rename this file called subscription manager to the name yt underscore subs dot OPML just so I knew what an earth it was in the future. Then I discovered how to use XML style it to do an analysis of a bit of XML. XML is a sort of hierarchical tree structure of what I guess you could call objects entities or something of that sort and the command I used was XML style it and that this is followed by a sort of sub-command EL letters E and L in lowercase and then space hyphen U and then the name of the file yt underscore subs dot OPML. What that does is it simply shows you that within the tree structure of the XML there's a top-level OPML it's a bit like a directory structure then slash body then slash outline then slash outline again there's a fairly simple structure. You can work out the structure of XML by using various tools which will print it out in a well-formatted way. One of them is called XML lint which is part of the XML2 utils package on devian anyway which it also requires lib XML2 but if you're interested in that I do actually use XML lint from time to time. I should probably use XML style because I think it can do a similar job but I've always been using XML lint for many years. The problem is that XML the layout of it is not usually designed for human readability so it's all often it's squashed together all in one line or on many many long lines so an XML lint can reformat it and I demonstrated briefly how you could do this just showing the first seven lines of what was in my file but I'm not going to talk about that anymore. Now within the XML. XML consists of objects if you like or tags if you prefer because it's a kin to a HTML and the tags are enclosed in less than and greater than signs and you'll see in the XML lint output that there's an instance where it just contains body in the symbols of word body but there's other cases where you might want to modify the particular object that you're defining and you can put further sequences of name equals and then a quoted string and that type of thing within it and there are lots of these there are many of these instances in the opml format so you can ask XML style it to report back the structure including these things which are called attributes and you can you can see that now you'll see all of them if you use XML style it to do this so I've just run it with a head command on the end just to show the first 11 lines I chose 11 because after trial and error it showed a single sample of what's in there so the command would be XML style it space EL is subcommand space hyphen A space and then the name of the file yt subs. opml piped into head minus 11 so what that shows is that the opml tag can contain the attribute version it shows it as opml slash at version the version is an attribute and it's used in this particular file it's just the first line of the opml definition which says that it is a version 1.1 opml file you don't really need to know more than that but there's other things the deepest branch of the tree or the furthest branch of the tree contains a tag which contains the the attribute text title type and XML URL so with that in mind it tells you what type of layout the XML contain and you can then write a much more complex style it XML style it command which will pull all of the relevant information out so I've demonstrated this with an XML style it command which took a little bit of trial and error to work out reading of the documentation etc and it's just one long line it's a piped line with a bunch of commands in it and in order to show it in these notes I've split it up into into separate lines where each one is ends with a backslash so this would be the actual contents of a file that you could or indeed you could type this in on the command line it shows it being typed on the on the command line so you put a put a backslash on the end of the line that means that the command's not finished and it's to continue it's not the only way to do it but I thought this would be a way of showing what was going on so it gets quite complicated in terms of what I'm doing here but let's see if I can break it down into into some reasonable pieces that are understandable what we have here is a pipeline and the first element of the pipeline is a bracketed list of commands so it's an open parenthesis and then some stuff and then a closed parenthesis and everything that comes out of that parenthesis list is piped into in this particular case head, hyphen 5 so it's just to demonstrate it and it just shows the first five lines that are output by this this pipeline so going into the parenthesis first command we see in there is an echo and echo simply is a string which consists of the words title comma feed comma scene comma skip so they're all in single quotes and then a semicolon what that does is it causes that particular string title we'd seen a skip to be output by the pipeline and because effectively got here is a bunch of commands the brackets that which the parentheses here are a bashism it's also available in other shells which causes all of the commands within the parentheses to be executed and the output to be written as a stream from them all so this is just the first line that's to be written out and it's um we're making a comma separated variable file and the requirement is that the first line be the titles of the columns within the within the file so you could use this in a spreadsheet for example where you use these as titles in your in your spreadsheet so after the semicolon we then go into XML style itself and there's a subcommand that starts this off which is cell SEL that means to select data or to query an XML document that's what it says in the manual page so we're asking XML style it to to do some specific query of the contents then the next thing we see is hyphen t that defines that there's to be a template used hyphen m defines a thing called an x-path expression which is the part of the template now an x-path expression is conceptually similar to a path within the file system so the part the x-path is actually slash opiml slash body slash outline slash outline we already saw that when analyzing the contents of this file it's just saying that the deepest node within the within this tree structure is the thing I just said so we actually want to pull data out of there we don't care about intermediate data just we want this specific path as if we were looking in a in a file system path to find specific files at that level in this case we're going to be getting attributes so that's then followed by a hyphen s hyphen s option is a sort specification I won't go into details as to what this means but just briefly it's it's asking for the a capital a colon t colon hyphen space is the type of sort to do and the thing to sort with sort by I suppose you'd say is the title attribute so at title is in there that's how we're going to sort the output then the next thing is the specification what is to be reported so that is a sub expression which begins with the option hyphen v and there's a string containing an expression which will pull particular pieces of data out of the XML and what it says is concat so it's it wants to concatenate a bunch of things together and in parentheses it then says at title so we want to know the title that's the title of each channel in the in the YouTube output then comma then a string containing a comma a comma then at XML URL comma then a string containing comma 0 comma 0 close string close parentheses and close the whole enclosing double quotes what's that saying is just pull the title out the XMLL field out and then put them together in a cover separated variable with a couple of zeros on the end then we have hyphen n which just simply specifies the name of the file that's to be processed so everything that's everything within the parentheses and what's happening there is XML style it is being told how to go and process this file and what to output and it's just going to output these two fields with a couple of extra zeros on the end the output of these parentheses this parenthesis list pipeline I guess is to be written somewhere in reality I wrote it to a file called whitey underscore data dot CSV and I used a greater than sign to pipe them but in this particular case I'm just showing you what it looks like by demonstrating the first five lines that come out of it so this is fairly advanced bash ism which I sort of think I will get into at some stage in my bash series I think but this is a case of actually using it to do some data data manipulation so we're going to have a four column comma separated variable file and it's got the remained that the last two columns are all going to be zeros in the file that's generated but that's to allow me to fiddle around with it and change these values to control things to do with the file the column marked with C and that's the third column is for marking the channels which I have already talked about in an earlier episode about YouTube subscriptions that was 2,202 I didn't want to talk about them again so I wanted to mark them as ignore effectively the skip column is for channels that I just didn't want to include because I didn't think they were relevant to that to that particular thing I've got a lot more channels than I've talked about so far that was very long-winded way of explaining a thing that pulls data out of this opml file so the next thing I wanted to do was to generate HTML for the hbr show notes to do that I used this tool called template tool kit it's a templating system not too surprisingly and there are many templating systems for different programming languages and applications this particular one I've been using for over 15 years I think and I use it a lot when I was working really find it very usable and has tons of features I actually use it on a regular basis when generating show notes for hbr shows that I do and I also use it in some of the scripts the admin scripts that I've written to do work for hbr a template tool kit is pearl application so you need to have pearl installed on your machine but just about everything does these days including raspberry pies so it's pretty much a matter of course that you get it you need to have a version of pearl later than 5.6.0 and my devintesting box has 5.26.1 so 560 is pretty old and the tool kit can be installed in the normal pearl way using the comprehensive pearl archive network cpan but if you you do need to do some preliminary work to set that up so if you if you don't want to do that then there's a method of doing of installing it which is defined on the template toolkit site and I've copied the instructions into the notes basically you need to grab a tar file and you need to untie it you cd into it and then you you make it you use the pearl to to run the first station and use the make command to to build it and you can use sudo to to install it across your system template toolkit is currently version 2.26 but if you look at the main template toolkit site whatever that happens to be the instructions whatever versions this is currently instructions will reflect that so template toolkit is a big subject and I'm not going to go into detail here I have penciled in possibility of doing an episode or two on it in the future and if you it sounds interesting do you let me know if you want me to do it principle is that you prepare a template and in the template are directives which conform to a syntax specific to template toolkit tt is usually referred to it the template is usually called out of a script written in per or indeed python there's a python version of template toolkit and then the template is given data from the from the script or it can obtain data itself and we're going to use that in this particular process and then it does things to the data and and and formats it template toolkit directives are enclosed in square bracket percent sequences so open square bracket percent and then a directive then percent closed square bracket separates it from the data so you'd put that into to represent a piece of data that was to be inserted or to provide directives such as loops and variables and control statements and so on and so forth so it's a sort of mini language all of its own. Now template toolkit can access CSV data and there's a plugin to it there's it has a plugin system so you can enhance the the basic toolkit there's one called template colon colon plugin colon colon colon data file and it just comes a standard with template toolkit and it allows you to open an arbitrary data file by default I think the data is expected to be infield separated by colon but you can also tell it to separate by commas and that's what I did here and I could have written the thing with colon rather than commas but I've told it in my particular case to use commas throughout just because so I felt like I guess so there's an example of how you would in your template define the connection to your data and it consists of in these square bracket percent sequences the word use in uppercase then some name equals then data file is a is a function and then the first argument to it is the path to the file which I've just written as file path here then if you want to change the delimiter to something else you put delim equals and then a string containing the single character delimiter so I've defined in general terms the thing here which points out the at a file separated with the with the fields separated by commas the thing called refer to as name in this example is is actually a data structure which is collected by template toolkit and made available within the template it's actually a list of hashes a hashes an associative array and a list is a non-associated to the array so it's an array of arrays if you like but you probably don't need to know that in huge amount of detail because I'll be hopefully be explaining to you in a moment in the example of how I've used it so I've got a the actual template that I used to do this sort of stuff and it's got the got a used directive in it where I created a name yt list YouTube list and then set that to the output from data file function where I pointed a file called yt underscore data dot CSV the one I mentioned earlier that was created by XML style it delimiter is comma then in my template the next line just consists of hyphen space YouTube channels colon that's piece of text that's to be output by the template so I want to have a I want that to be output and that's a piece of markdown syntax it's the the way you specify a list element and the next directive is a four each it's a loop and it's a four each and then a variable name in and then some data structure so I've got four each chan in yt list so yt list is a list of of this data structure I mentioned so it's a list of channels basically and each channel contains bits of data about the the channel so I'm setting a variable chan to point out then the next statement is next statements any xt is the the verb in the command language which means skip to the next iteration in the loop and it's to skip if the scene variable the scene element of the chan variable or the skip element the chan variable are set to true that is value one so in other words if I have set these fields to either of the fields to one then it's not going to be included in the output the next line is a piece of text effectively with embedded bits of template toolkit stuff it begins with an indentation the indentation is important because it's needed by markdown it is followed by the indentation is followed by a hyphen and a space then an open square bracket then an asterisk and then after asterisk is an open square bracket percent then chan dot title percent close square brackets so that's a substitution of the value of the title of the particular channel with asterisk side of it and there's a closed square bracket so there's square brackets around it there's an example bit lower down in the notes then we do something very similar with with enclosing in parentheses another template toolkit expression in square brackets percent and in this case it's chan dot feed feed is the URL of the feed but in the opml the URLs are actually not the feed they are RSS expressions they are RSS URLs it's not it's not the channel I am confusing channel and feed it's not the the channel that we want that you'd click you'd load into your search bar in your browser it's a feed for giving to an RSS feed but the difference between the two is tiny so the expression chan dot feed dot replace causes a substitution to be done on that string and this the original one is changed to a new one which references the channel so you get out a channel pointer I think you'll probably see that from later on without me trying to explain it then the last last piece is an end statement for template toolkit getting closed in these prevent open square brackets percent and then percent closed square bracket and so that's the end of the loop and that's it so there's six lines here and that's all you need in the template so if you to run it you don't need to have a programmer totally you can use a command that comes with template toolkit which is t page t p a g e and what that does is simply to run a template you give it as an argument the name of a template and it will run template toolkit on it because in the template it says what file it's to process it it just that's all you need in this particular example I am piping the output into the head command where I'm using dash five to get the first five line so you'll see that what you what you get is and column one a hyphen then space youtube channels that was a bit of text that the template outputs and then the loop starts and it then starts to print out indented hyphen things which are actually markdown links the markdown link consists of a straight a bit of text in square brackets followed immediately by a URL in round brackets in parentheses I've used asterisks in these square brackets because that produces an italicized string so that this is markdown magic which is not really very magic but they go so if you give that to paddock and the next next example shows the t page output being piped directly into paddock and then I put the first five lines of that you see it's html where it's some setting up a a list and then a then a sub list within it which is which is triggered by the indented lists specifications so that was what was used in show 2493 and there's a there's a link in these notes that takes you to the place where it's actually used so as I got to this point and writing these notes I've thinking wow I've probably lost 90% of the audience here and anybody's left is probably saying why in earth did you do this this is entirely overkill I'm sure Ken is but um it's it's just the way my mind works that's it it's that thing that Josh was saying you tend to come up with programming solutions to avoid the boredom of actually cutting and pasting a whole bunch of things out of a web page or something of that sort it made a tedious process a little bit more interesting I know Josh mentioned this but it's also things that I've heard said in the the community of programming and people managing computers and that type of thing for many many years that there's a tendency to come up with solutions so that you don't have to do boring things and if you do have to do boring things that you you only do it once and there after you have you've you've built something to short circuit it it's just a piece of psychology I guess that goes along with the territory what I have here then is a is a bit of scripting which I can use again if I ever want to do another episode on youtube subscriptions and I probably won't but if I ever did wanted to say oh I found this cool this one and this one you might like and stuff then I can easily go through the same process again and generate such a list and talk about a lot a lot more straightforwardly than doing it the long hard way so what I've done is not necessarily waste the effort and along the way I learned about how the hell you get stuff out of youtube which just seems to be very reluctant to release information about what it is that you're you're subscribed to and I also learned how to use XML style it I hope I might have passed on a bit of interest and a recipe for doing strange things for that XML style it and I also learned some new things about template toolkit even though I use it quite a lot already I found out things I didn't know at all I'd never used it to process the CSV file and of course there's a hacker probably radio at the episode at the end of it you might not agree but I think this is a cool process so if you made it through at the end of this congratulations and thank you for listening okay bye now you've been listening to HackerPublic Radio at HackerPublicRadio.org we are a community podcast network that releases shows every weekday Monday through Friday today's show like all our shows was contributed by an HBR listener like yourself if you ever thought of recording a podcast then click on our contributing to find out how easy it really is. HackerPublic Radio was found by the digital dog pound and the infonomicon computer club and it's part of the binary revolution at binrev.com if you have comments on today's show please email the host directly leave a comment on the website or record a follow-up episode yourself unless otherwise status today's show is released on the creative comments attribution share a live 3.0 license