Move under www to ease rsync

This commit is contained in:
2025-10-29 10:51:15 +01:00
parent 2bb22c7583
commit 30ad62e938
890 changed files with 0 additions and 0 deletions

View File

@@ -0,0 +1,205 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<meta name="author" content="Dave Morriss">
<title>How I prepared episode 2493: YouTube Subscriptions - update (HPR Show 2544)</title>
<style type="text/css">code{white-space: pre;}</style>
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<link rel="stylesheet" href="http://hackerpublicradio.org/css/hpr.css">
</head>
<body id="home">
<div id="container" class="shadow">
<header>
<h1 class="title">How I prepared episode 2493: YouTube Subscriptions - update (HPR Show 2544)</h1>
<h2 class="author">Dave Morriss</h2>
<hr/>
</header>
<main id="maincontent">
<article>
<header>
<h1>Table of Contents</h1>
<nav id="TOC">
<ul>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#components">Components</a><ul>
<li><a href="#youtube-subscription-list">YouTube subscription list</a></li>
<li><a href="#using-xmlstarlet">Using <code>xmlstarlet</code></a><ul>
<li><a href="#finding-the-structure-of-the-xml">Finding the structure of the XML</a></li>
<li><a href="#extracting-data-from-the-xml">Extracting data from the XML</a></li>
</ul></li>
<li><a href="#generating-html-with-template-toolkit">Generating HTML with Template Toolkit</a><ul>
<li><a href="#installing-template-toolkit">Installing Template Toolkit</a></li>
<li><a href="#making-a-template">Making a template</a></li>
<li><a href="#running-the-template">Running the template</a></li>
</ul></li>
</ul></li>
<li><a href="#conclusion">Conclusion</a></li>
<li><a href="#links">Links</a></li>
</ul>
</nav>
</header>
<h2 id="introduction">Introduction</h2>
<p>In show <a href="http://hackerpublicradio.org/eps/hpr2493" title="YouTube Subscriptions - update">2493</a> I listed a number of the YouTube channels I watch. Some of what I did to prepare the notes was to cut and paste information from YouTube pages, but the basic list itself was generated programmatically. I thought the process I used might be of interest to somebody so I am describing it here.</p>
<h2 id="components">Components</h2>
<p>I needed four components to achieve what I wanted:</p>
<ul>
<li><a href="https://www.youtube.com/subscription_manager" title="YouTube Subscription Manager page">YouTube subscription list</a> (only available in OPML format as far as I know)</li>
<li>The <a href="http://xmlstar.sourceforge.net/doc/UG/xmlstarlet-ug.html" title="xmlstarlet documentation"><code>xmlstarlet</code></a> tool to parse the OPML</li>
<li><a href="http://www.template-toolkit.org/" title="Template Toolkit">Template Toolkit</a> which I used to generate Markdown</li>
<li>The <code>pandoc</code> document converter tool to generate HTML</li>
</ul>
<p>I will talk a little about the first three components in this episode in order to provide an overview.</p>
<h3 id="youtube-subscription-list">YouTube subscription list</h3>
<p>To find this go to the <a href="https://www.youtube.com/subscription_manager" title="YouTube Subscription Manager page">Subscription Manager</a> page of YouTube (<code>https://www.youtube.com/subscription_manager</code>) and select the Manage Subscriptions tab. At the bottom of the page is an Export option which generates OPML. By default this is written to a file called <code>subscription_manager</code>.</p>
<p>An OPML file is in XML format and is designed to be used by an application that processes RSS feeds such as a Podcatcher or a Video manager. For me it is a convenient format to parse in order to extract the basic channel information. I could not find any other way of doing this apart from scraping the YouTube website. If you know better please let me know in a comment or by submitting a show of your own.</p>
<h3 id="using-xmlstarlet">Using <code>xmlstarlet</code></h3>
<p>This is a tool designed to parse XML files from the command line. I run Debian Testing and was able to install it from the repository.</p>
<p>There are other tools that could be used for parsing but <code>xmlstarlet</code> is the <em>Swiss Army knife</em> of such tools for analysing and parsing such data. The tool deserves a show to itself, or even a short series. I know that Ken Fallon (who uses it a lot) has expressed a desire to go into detail about it at some point.</p>
<p>I am just going to describe how I decided to generate a simple CSV file from the OPML and found out how to do so with <code>xmlstarlet</code>.</p>
<h4 id="finding-the-structure-of-the-xml">Finding the structure of the XML</h4>
<p>I copied the <code>subscription_manager</code> file to <code>yt_subs.opml</code> as a more meaningful name.</p>
<p>I ran the following command against this file to find out its structure:</p>
<pre><code>$ xmlstarlet el -u yt_subs.opml
opml
opml/body
opml/body/outline
opml/body/outline/outline</code></pre>
<p>It is possible to work this out by looking at the XML but its all squashed together and is difficult to read. It can be reformatted as follows:</p>
<pre><code>$ xmllint --format yt_subs.opml | head -7
&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;opml version=&quot;1.1&quot;&gt;
&lt;body&gt;
&lt;outline text=&quot;YouTube Subscriptions&quot; title=&quot;YouTube Subscriptions&quot;&gt;
&lt;outline text=&quot;John Heisz - I Build It&quot; title=&quot;John Heisz - I Build It&quot; type=&quot;rss&quot; xmlUrl=&quot;https://www.youtube.com/feeds/videos.xml?channel_id=UCjA8vRlL1c7BDixQRJ39-LQ&quot;/&gt;
&lt;outline text=&quot;MatterHackers&quot; title=&quot;MatterHackers&quot; type=&quot;rss&quot; xmlUrl=&quot;https://www.youtube.com/feeds/videos.xml?channel_id=UCDk3ScYL7OaeGbOPdDIqIlQ&quot;/&gt;
&lt;outline text=&quot;Alec Steele&quot; title=&quot;Alec Steele&quot; type=&quot;rss&quot; xmlUrl=&quot;https://www.youtube.com/feeds/videos.xml?channel_id=UCWizIdwZdmr43zfxlCktmNw&quot;/&gt;</code></pre>
<p>The program <code>xmllint</code> is part of the <code>libxml2-utils</code> package on Debian, which also requires <code>libxml2</code>.</p>
<p>I think the <code>xmlstarlet</code> output is easier to read and understand.</p>
<p>The XML contains attributes (such as the <code>title</code>) which you can ask <code>xmlstarlet</code> to report on:</p>
<pre><code>$ xmlstarlet el -a yt_subs.opml | head -11
opml
opml/@version
opml/body
opml/body/outline
opml/body/outline/@text
opml/body/outline/@title
opml/body/outline/outline
opml/body/outline/outline/@text
opml/body/outline/outline/@title
opml/body/outline/outline/@type
opml/body/outline/outline/@xmlUrl</code></pre>
<h4 id="extracting-data-from-the-xml">Extracting data from the XML</h4>
<p>So, the <code>xmlstarlet</code> command I came up with (after some trial and error) was as follows. I have broken the long pipeline into lines by adding backslashes and newlines so its slightly more readable, and in this example I have just shown the first 5 lines it generated. In actuality I wrote the output to a file called <code>yt_data.csv</code>:</p>
<pre><code>$ (echo &#39;title,feed,seen,skip&#39;; \
&gt; xmlstarlet sel -t -m &quot;/opml/body/outline/outline&quot; \
&gt; -s A:T:- @title \
&gt; -v &quot;concat(@title,&#39;,&#39;,@xmlUrl,&#39;,0,0&#39;)&quot; \
&gt; -n yt_subs.opml) | head -5
title,feed,seen,skip
akiyuky,https://www.youtube.com/feeds/videos.xml?channel_id=UCCJJNQIhS15ypcHqDfEPNXg,0,0
Alain Vaillancourt,https://www.youtube.com/feeds/videos.xml?channel_id=UCCsdIja21VT7AKkbVI5y8bQ,0,0
Alec Steele,https://www.youtube.com/feeds/videos.xml?channel_id=UCWizIdwZdmr43zfxlCktmNw,0,0
Alex Eames,https://www.youtube.com/feeds/videos.xml?channel_id=UCEXoiRx_rwsMfcD0KjfiMHA,0,0</code></pre>
<p>Here is a breakdown of what is being done here:</p>
<ol type="1">
<li><p>There is an <code>echo</code> command and the <code>xmlstarlet</code> command enclosed in parentheses. This causes Bash to create a sub-process to run everything. In the process the <code>echo</code> command generates the column titles for the CSV as well see later. The output of the entire process is written as a stream of lines so the header and data all go to the same place.</p></li>
<li>The <code>xmlstarlet</code> command takes a sub-command which in this case is <code>sel</code> which causes it to “Select data or query XML document(s)” (quoted from the manual page)
<ul>
<li><code>-t</code> defines a template</li>
<li><code>-m</code> precedes the XPATH expression to match (as part of the template). The XPATH expression here is <code>/opml/body/outline/outline</code> which targets each XML node which contains the attributes we want.</li>
<li><code>-s A:T:- @title</code> defines sorting where <code>A:T:-</code> is the operation and <code>@title</code> is the XPATH expression to sort by</li>
<li><code>-v expression</code> defines what is to be reported; in this case its the <code>@title</code> and <code>@xmlUrl</code> attributes, then two zeroes all separated by commas thereby making a line of CSV data</li>
<li><code>-n</code> defines the XML file to be read</li>
</ul></li>
<li><p>The entire sub-process is piped into <code>head -5</code> which returns the first 5 lines. In the actual case the output is redirected to a file with <code>&gt; yt_data.csv</code></p></li>
<li><p>The reason for making four columns will become clear later, but in summary its so that I can mark lines in particular ways. The <code>seen</code> column is for marking the channels I spoke about in an earlier episode (<a href="http://hackerpublicradio.org/eps/hpr2202" title="Makers on YouTube">2202</a>) so I didnt include them again in this one, and the <code>skip</code> column is for channels I didnt want to include for various reasons.</p></li>
</ol>
<h3 id="generating-html-with-template-toolkit">Generating HTML with Template Toolkit</h3>
<p><em>Template Toolkit</em> is a template system. There are many of these for different programming languages and applications. I have been using this one for over 15 years and am very happy with its features and capabilities.</p>
<p>I currently use it when generating show notes for my HPR contributions, and its used in many of the scripts I use to perform tasks as an HPR Admin.</p>
<h4 id="installing-template-toolkit">Installing Template Toolkit</h4>
<p>The Template Toolkit (TT) is written in Perl so its necessary to have Perl installed on the machine its to be run on. This happens as a matter of course on most Linux and Unix-like operating systems. It is necessary to have a version of Perl later than 5.6.0 (I have 5.26.1 on Debian Testing).</p>
<p>The Toolkit can be installed from the CPAN (Comprehensive Perl Archive Network), but if you do not have your system configured to do this the alternative is shown below (method copied from the <a href="http://www.template-toolkit.org/" title="Template Toolkit">Template Toolkit site</a>):</p>
<pre><code>$ wget http://cpan.org/modules/by-module/Template/Template-Toolkit-2.26.tar.gz
$ tar zxf Template-Toolkit-2.26.tar.gz
$ cd Template-Toolkit-2.26
$ perl Makefile.PL
$ make
$ make test
$ sudo make install</code></pre>
<p>These instructions are relative to the current version of Template Toolkit at the time of writing, version 2.26. The site mentioned above will refer to the latest version.</p>
<h4 id="making-a-template">Making a template</h4>
<p>Using the Template Toolkit is a big subject, and I will not go into great detail here. If there is any interest I will do an episode on it in the future.</p>
<p>The principle is that TT reads a template file containing directives in the TT syntax. Usually TT is called out of a script written in Perl (or Python a new Python version has been released recently). The template can be passed data from the script, but it can also obtain data itself. I used this latter ability to process the CSV file.</p>
<p>TT directives are enclosed in <code>[%</code> and <code>%]</code> sequences. They provide features such as loops, variables, control statements and so forth.</p>
To make TT access the CSV data file I used a plugin that comes with the Template Toolkit package. This plugin is called <a href="http://www.template-toolkit.org/docs/manual/Plugins.html#section_Datafile" title="Template::Plugin::Datafile"><code>Template::Plugin::Datafile</code></a>. It is linked to the required data file with the following directive:
<pre>
[&#37; USE name = datafile('file_path', delim = ',') %]
</pre>
<p>The plugin reads files with fields delimited by colons by default, but in this instance we redefine this to be a comma. The <code>name</code> variable is actually a list of hashes which gives access to the lines of the data.</p>
<p>The following example template shows TT being connected to the file we created earlier, with a loop which iterates through the list of hashes, generating output data.</p>
<pre><code>[% USE ytlist = datafile(&#39;yt_data.csv&#39;, delim = &#39;,&#39;) -%]
- YouTube channels:
[% FOREACH chan IN ytlist -%]
[% NEXT IF chan.seen || chan.skip -%]
- [*[% chan.title %]*]([% chan.feed.replace(&#39;feeds/videos\.xml.channel_id=&#39;, &#39;channel/&#39;) %])
[% END -%]</code></pre>
<p>Note that the TT directives are interleaved with the information we want to write. The line <code>- YouTube channels:</code> is an example of a Markdown list element.</p>
<p>This is followed by a <code>FOREACH</code> loop which iterates through the <code>ytlist</code> list, placing the current line in the hash variable <code>chan</code>. The loop is terminated with an <code>END</code> directive.</p>
<p>The <code>NEXT</code> directive causes the loop to skip a line of data if either the <code>seen</code> or <code>skip</code> column holds the value <em>true</em> (1). These fields are referenced as <code>chan.seen</code> and <code>chan.skip</code> meaning the elements of the hash <code>chan</code>. Before running this template I edited the list and set these values to control what was reported.</p>
<p>The line after <code>NEXT</code> is simply outputting the contents of the hash. It is turning the data into a Markdown sub-list. Because the URL in the OPML file contained the address of a feed, whereas we need a channel address, the <a href="http://www.template-toolkit.org/docs/manual/VMethods.html#method_replace" title="Template Toolkit &#39;replace&#39; virtual method"><code>replace</code></a> function (actually a <em>virtual method</em>) performs the necessary editing.</p>
<p>The expression <code>chan.feed.replace()</code> shows the <code>replace</code> virtual method being applied to the field <code>feed</code> of the <code>chan</code> hash.</p>
<h4 id="running-the-template">Running the template</h4>
<p>Running the template is simply a matter of calling the <code>tpage</code> command on it, where this command is part of the Template Toolkit package:</p>
<pre><code>$ tpage yt_template.tpl | head -5
- YouTube channels:
- [*Anne of All Trades*](https://www.youtube.com/channel/UCCkFJmUgzrZdkeHl_qPItsA)
- [*bigclivedotcom*](https://www.youtube.com/channel/UCtM5z2gkrGRuWd0JQMx76qA)
- [*Computerphile*](https://www.youtube.com/channel/UC9-y-6csu5WGm29I7JiwpnA)
- [*David Waelder*](https://www.youtube.com/channel/UCcapFP3gxL1aJiC8RdwxqRA)</code></pre>
<p>The output is Markdown and these lines are links. I only showed the first 5 lines generated. It is actually possible to pipe the output of <code>tpage</code> directly into <code>pandoc</code> to generate HTML as follows:</p>
<pre><code>$ tpage hpr____/yt_template.tpl | pandoc -f markdown -t html5 | head -5
&lt;ul&gt;
&lt;li&gt;YouTube channels:
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.youtube.com/channel/UCCkFJmUgzrZdkeHl_qPItsA&quot;&gt;&lt;em&gt;Anne of All Trades&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.youtube.com/channel/UCtM5z2gkrGRuWd0JQMx76qA&quot;&gt;&lt;em&gt;bigclivedotcom&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;</code></pre>
<p>You can see the result of running this to generate the notes for show 2493 by looking at the <a href="http://hackerpublicradio.org/eps/hpr2493/full_shownotes.html#links" title="Links section of HPR2493 long notes"><em>Links</em> section of the long notes</a> on that show.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I guess I could be accused of overkill here. When creating the notes for show 2493 I actually did more than what I have described here because it made the slightly tedious process of building a list a bit more interesting than it would have been otherwise.</p>
<p>Also, should I ever wish to record another show updating my YouTube subscriptions I can do something similar to what I have done here, so it is not necessarily wasted effort.</p>
<p>Along the way I learnt about getting data out of YouTube and I learnt more about using <code>xmlstarlet</code>. I also learnt some new things about Template Toolkit.</p>
<p>Of course, I also contributed another episode to Hacker Public Radio!</p>
<p>You may not agree, but I think this whole process is cool (even though it might be described as <em>over-engineered</em>).</p>
<h2 id="links">Links</h2>
<ul>
<li>YouTube <a href="https://www.youtube.com/subscription_manager"><em>Subscription Manager</em></a> page</li>
<li>The <a href="http://xmlstar.sourceforge.net/doc/UG/xmlstarlet-ug.html"><code>xmlstarlet</code></a> manual (HTML, single page)</li>
<li><a href="http://www.template-toolkit.org/"><em>Template Toolkit</em></a>
<ul>
<li><a href="http://www.template-toolkit.org/docs/manual/Plugins.html#section_Datafile"><code>Template::Plugin::Datafile</code></a></li>
<li><a href="http://www.template-toolkit.org/docs/manual/VMethods.html#method_replace">Template Toolkit <code>replace</code> virtual method</a></li>
</ul></li>
<li>The <a href="https://pandoc.org/"><code>pandoc</code></a> document converter</li>
<li>Previous HPR shows referred to:
<ul>
<li><a href="http://hackerpublicradio.org/eps/hpr2202"><em>hpr2202 :: Makers on YouTube</em></a></li>
<li><a href="http://hackerpublicradio.org/eps/hpr2493"><em>hpr2493 :: YouTube Subscriptions - update</em></a></li>
</ul></li>
<li>Resources:
<ul>
<li>Example template file: <a href="hpr2544_yt_template.tpl">yt_template.tpl</a></li>
</ul></li>
</ul>
</article>
</main>
</div>
</body>
</html>

View File

@@ -0,0 +1,6 @@
[% USE ytlist = datafile('yt_data.csv', delim = ',') -%]
- YouTube channels:
[% FOREACH chan IN ytlist -%]
[% NEXT IF chan.seen || chan.skip -%]
- [*[% chan.title %]*]([% chan.feed.replace('feeds/videos\.xml.channel_id=', 'channel/') %])
[% END -%]