Files

85 lines
5.8 KiB
Plaintext
Raw Permalink Normal View History

Episode: 3637
Title: HPR3637: HPR feed to Sqlite
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3637/hpr3637.mp3
Transcribed: 2025-10-25 02:35:21
---
This is Hacker Public Radio Episode 3,637 for Tuesday the 12th of July 2022.
Today's show is entitled, H.P.R. Feed to Slight.
It is the 10th show of Norrist, and is about 8 minutes long.
It carries a clean flag.
The summary is.
Step in creating a static copy of H.P.R.
So there was recently a mailing list discussion about
someone requesting the source code for the H.P.R. site.
And that discussion sort of turned into how can we, the community,
the H.P.R. community recreate the H.P.R. site?
I thought it would be a good idea to publish the database in its current form,
maybe with a magical dump or whatever, but I understand
why that may not be possible.
But I think it would be something like that would be a good
or step in getting all the data out of H.P.R.
so that it can be used to generate a static copy of the H.P.R. site.
One interesting thing that Ken said in one of the mailing lists posts was that he thought
that everything that you needed to recreate the H.P.R. site was already in the H.P.R. feed.
So I thought it would be a good challenge.
You know, while we wait on further discussion on what to do
about how to get the H.P.R. data in a way that can be safely made public.
So we can create sites. I thought it would be a good project,
something fun to work on to take the data that's in the RSS feed and put it in a database.
So I started thinking about what data is in the feed.
And then thinking through the process, if I had the data that's in the feed,
what sort of things could I do with it and could I actually recreate the H.P.R. site,
or at least the something that functioned the same as the H.P.R.
So I started the project, stuck it up on GitLab.
I'll walk a little bit through about what the project is and how it works.
And I'll have a link to the GitLab page and some instructions about how to use it and not
to run it in the show notes. All that will be in there.
So the data that's in the feed that I'm pulling out and putting into a database
is the explicit tag, the title, the author name, the author email, the link,
which is actually a link to the H.P.R. page about the episode.
The description and the summary is best I can tell those are the same fields.
I'm pulling them both, but I think they're the same thing.
And that field contains what ends up in show notes, those are in the description fields.
I'll also pull the publication date and the enclosures, pull those directly from
the RSS feed. The enclosures is where the link to the media download is, if it's the
org or the mp3 file, that's in the enclosure tag. And then the other thing I do, it's not
explicitly in the feed, but it's useful to have a title or an episode ID. So for example,
H.P.R., episode number, whatever. So H.P.R., 2341 or H.P.R., whatever.
I extract that from the title and then insert that into the database as the episode ID.
So sort of using the power of pre-existing Python libraries. I didn't have to write a whole
lot of code to extract the data and put it into a database. I used notably two Python libraries,
the first one is just called the parser. And I've used that a bunch before to get, it's a real
easy way to take an RSS feed and treat it kind of like a database. And then for the database,
I used a Python library called Pwe. It's one I've used before and it's one I'm familiar with.
This project is simple enough, you probably could have just done it with kind of raw SQL commands.
But just because I don't know how to use it, I used Pwe, ORM, Python library.
The process for turning the full feed into SQLite database
took on my machine. It took about 40 seconds and it generated a 20 meg SQL.
So there are a couple of items that are not in the feeds that you would need to recreate the
HPR site as it exists now. Specifically for each episode, if the episode has episode tags or if it's
part of a series, that information is not in the RSS feed or it's not there that I could find.
If it is there and I'm not finding it, please let me know. I'll add it.
So next steps, things that I want to do next are that a community member could do.
Another community member could possibly do is take the information from the feed either directly
or use something like the project I'm talking about today and take it out of the database and
use that to create a markdown and then feed that markdown into a static site generator.
So from the data that we're pulling from RSS feed, you could recreate the main HPR page,
the core fondant page, pages for every episode and I'm not doing it yet, but ultimately you
could get the comments for every episode from the comments feed and then you could manually build
markdown for the other static pages on the site like the about page or the contributing page.
Leave all that into a static site generator and then you've got your own personal copy of.
So like I said earlier, I'll have the link to the panel that I wrote.
I'll have it in the show notes. I'll have instructions about how to generate the SQL
Light database using the code. I think that's it. Hopefully the discussion that was on the mailing
list people go further. I'm really excited and interested in the idea of making the HPR site
better and more reliable and customizable if that's what you want to do.
So that's it for me today. I'll see you guys.
You have been listening to Hacker Public Radio at Hacker Public Radio does work.
Today's show was contributed by a HPR listener like yourself. If you ever thought of recording
or cast, you click on our contribute link to find out how easy it leads.
Hosting for HPR has been kindly provided by an onsthost.com, the internet archive and our
synced.net. On the Sadois status, today's show is released on our Creative Commons
Attribution 4.0 International License.