Initial commit: HPR Knowledge Base MCP Server

- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00
commit 7c8efd2228
4494 changed files with 1705541 additions and 0 deletions
--- a/hpr_transcripts/hpr3654.txt
+++ b/hpr_transcripts/hpr3654.txt
@@ -0,0 +1,149 @@
+Episode: 3654
+Title: HPR3654: Use the data in the Ogg feed to create a website.
+Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3654/hpr3654.mp3
+Transcribed: 2025-10-25 02:53:56
+
+---
+
+This is Hacker Public Radio Episode 3654 for Thursday the 4th of August 2022.
+Today's show is entitled, Use the Data in the Og feed to create a website.
+It is hosted by Norrist, and is about 13 minutes long.
+It carries a clean flag.
+The summary is.
+How much of a site can I make using only the data from the feed?
+Welcome back.
+This is part two of my experiment to see how my chalk can get done with the data that's
+just in the RSSD for Hacker Public Radio or just the log feed specifically.
+If you want to check back part one was HBR Episode 3637, and then that episode talked
+about how I took what I felt was the most interesting bits from the RSS feed and inserted
+extracted the information from the feed and inserted it into a SQLite database.
+So today I'll discuss how I took the data that I had stuck in that SQLite database and
+created a static website.
+So a couple of quick things before we jump into the details.
+I think I probably could have skipped the database step, where I'd take the data from
+the feed and put in a database and then take the data out of the database and use it to
+build a site.
+I probably could have gone straight from the feed to what I was using to the process that
+I was using to build a static site.
+It was extra code and extra time, but that's how projects go sometimes.
+The first time you do it, you think you want to do it and you want to do it and then
+you realize there were some extra code or extra steps that you didn't necessarily need.
+One advantage of putting it in an SQLite database first, though, was it acted sort of like
+a cache so that every time you built a site, you wouldn't have to pull the feed in.
+And then the other thing I wanted to say real quick was I was really struck by just a total
+number of episodes out there for hacker public radio, it's a lot of work that's been
+put into building over 3,000 plus episodes.
+Just a quick thanks to everyone who's ever created an episode, I really, so my original
+intent when I started the project was that I would use Markdown to build a site and
+a lot of static site generators like Hugo or Jekyll, they sort of work with Markdown
+files where you build a bunch of Markdown and you throw it at the static site generator
+and it just builds a nice look inside.
+I started down that path, but then one thing about Markdown is that you can add inline
+HTML if you need to, and I started with just Markdown and I couldn't get it to look.
+It didn't look like I wanted to, so I started adding HTML elements and then by the time
+I got the site to look like a website, there was more HTML than Markdown, so I just kind
+of scrapped the Markdown base.
+Now I can hear all of you Markdown detractors out there saying, of course, Markdown is
+terrible if I was you ever try and build a website with it, and I'm a big fan of Markdown,
+I use it all the time, if you're going to do something like taking notes, writing
+documentation, I think Markdown is a great tool for doing it, but it didn't, this particular
+use case, so what I ended up doing was instead of, I'll talk in a second about the templating
+that I did, but instead of taking database, instead of taking data out of the database
+and templating it to Markdown, I just templated it directly into HTML and just sort of skipped
+the step of converting Markdown to HTML.
+A couple of the libraries that I used to do the work was one, the PWE Python library
+that's used to translate database calls into something a little more, Pythonic, I talked
+about that in the last episode, and then to do the templating I used, Genia, it's a pretty
+easy template language, it was something I was already familiar with, so it could seem
+like a good fit.
+So strictly speaking, I wanted to use only the data from the feed to create, or to recreate
+a website, and that proved to be hard, not impossible, you'd certainly do it, but
+if I wanted to introduce things like logos or headers and footers and a little bit of
+styling, you have to pull in some extra content.
+So aside from the data that I got from the RSS feed, I wrote an HTML header and footer
+for every page in the header, I'm pulling in the bootstrap CSS, so I can use bootstrap
+to do some of the layout using the bootstrap columns, and I've also got the HPR logo
+in there.
+And then in a footer, I basically copied the footer from the HPR site, so it's got links
+to related projects, and it's got the copyright information in the HTML.
+So I was able to build four different pages, or four different types of pages from the
+using the data from the feed, I built sort of a replica of the main page for HPR where
+it lists the most recent episodes.
+I also built a page per episode, and then I also built one page that lists every episode.
+So for the episode specific stuff, there's the main page that shows the recent ones, there
+is a all episodes page where lists every episode, and then there's one page per episode,
+and that's where you can kind of drill down into the episode and read the show notes.
+And then for the host, I did something similar where I built one page that lists every
+host in a table, it's got their host name, also calculated how many shows every host produced
+and I'll put that in there, and I'll put the data there last show, all that's in a table.
+And then for each host has their own individual page, it will list all of their shows.
+And there should be links, I tried to create links where it makes sense.
+So if you're on one of the episode pages, the host name, the name of the contributor or
+the host should be a link to that individual host page.
+So there's a lot of data on the HPR website that is not in the feed, and that shouldn't
+be surprising.
+The feed isn't meant to be a website or it's meant to replace the website and it's meant
+to give you individual information about shows.
+So I couldn't exactly recreate the website, the HPR website using the data just in the
+feed because there's some stuff that's just not there.
+So for example, on the host pages, each individual host has a profile that will list maybe
+a web page or an avatar or something like that, that's not in the feed.
+For individual shows, there's things like the tag information, the series, if the show
+is a part of a series, neither of those are in the feed.
+There's also a show summary, whenever you submit a show, you have to give it a short 100
+character or less summary, and that's not in the feed that I can bond, and maybe it's
+there, but I can bond it.
+And then finally, missing was the license, I couldn't find that in the feed information.
+And then of course, there's some web pages on the HPR site that I wasn't able to replicate
+because they don't have anything to do with individual shows, so they're obviously
+not going to be in the feed, but pages like what you need to know or how to help out
+or request the topics, there was really no way to recreate those from just the web page
+or from just the RSSD.
+So just a little quick overview of how the project works, I'll have a link later to the
+Git Lab page so you can see for yourself.
+But like I said earlier, I used the PUE to read from the SQL Live file.
+Then I've got a Python script that pulls the data out of the SQL Live file, aggregates
+it, kind of packages it up a little bit, and then uses Ginger templates to build the pages.
+There's a template for the index page or the main page.
+There's a template for the All Shows page where I list all the shows out.
+Each individual contributor has their own page and that's got a separate template.
+And then for the correspondence page or for every host on one page, that's got a separate
+template.
+So some things I'd like to do next with the project.
+One, I'd like to try and incorporate the comments, there is an RSS feed for comments.
+I haven't looked at it yet, but I think it would be possible to take the RSS feed for comments
+and match them with the RSS feed for the individual shows and then be able to show display the
+comments that are on the page or per show comments.
+I think I can recreate the RSS feed from the data in the, did I collect in SQL Lite.
+I know that seems a little, seems funny to me, kind of saying that out loud, but taking
+a RSS feed, sticking it in the database and then recreating a separate RSS feed.
+But I think just for the sake of trying to build the most complete site possible, I think
+that's something I'm going to look at, seeing if I can rebuild the RSS feed.
+I'm not sure how, but I think I would like to try and figure out how to get the pages
+that aren't in the feed into a static site, try to recreate the pages that I mentioned
+earlier, like the, what you should know and the pages like that, how do we create those?
+Then next, I mentioned one of the things that are that's missing from the RSS feed is tags.
+I think, I really think it might be possible to use some natural language processing or
+some keyword extractions or something like that and see if I can generate some tags.
+For the shows or keywords for the shows.
+And then sort of the final thing I'm not to do list is to modify how I grab the data from
+the feed and insert it into the database and there's two feeds for HPR, I think most people
+use the latest feed, which has got 10 episodes and it's also a full feed, it's got every
+episode in it.
+So what I would like to be able to do is the first time you run the Python script to build
+the database, the first time you run it, you use it to full feed and subsequent times
+it uses the most recent feed and quite figuring out how to do that yet, but it, so I'll have
+a link to the GitLab page in the show notes, I'll welcome full requests or comments in
+the episode or angry emails or just however you want it, you feel like you have an improvement
+or any suggestions or obviously welcome.
+And then I'll also link to Static site where, build a site, copy it up to a web host,
+it's, I'll read it out real quick, it's hpr.norsd.xlz, if you want to, you just want to look
+and see how the site turned out, put a copy out on the internet, think, I've got it set
+up to, do a daily update, but we'll see how that goes, and that's it, thanks for listening
+and I'll see you guys next time.
+You have been listening to Hacker Public Radio, as Hacker Public Radio does work, today's
+show was contributed by a HBR listener like yourself, if you ever thought of recording
+broadcast, you click on our contribute link to find out how easy it really is.
+Hosting for HBR has been kindly provided by an onsthost.com, the internet archive and
+our syncs.net.
+On the Sadois status, today's show is released under Creative Commons, Attribution 4.0 International