Initial commit: HPR Knowledge Base MCP Server

- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00
commit 7c8efd2228
4494 changed files with 1705541 additions and 0 deletions
--- a/hpr_transcripts/hpr0401.txt
+++ b/hpr_transcripts/hpr0401.txt
@@ -0,0 +1,140 @@
+Episode: 401
+Title: HPR0401: web2speech
+Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0401/hpr0401.mp3
+Transcribed: 2025-10-07 19:50:31
+
+---
+
+Whee!
+Hello everybody, my name is Ken Fallon and today's episode is going to be on converting
+text into speech, particularly URLs.
+The reason I wanted to do this was there was a lot of Wikipedia articles that I wanted
+to look up and I was going to be outside, so I thought well, I can put them on my MP3
+player and listen to them outside.
+But since I've had this idea, I've used it to read long man pages, read articles that
+people have posted on their blogs and just generally use some background reading on
+the CIA world factbook and that sort of thing.
+Some websites work better than others, if there's going to be a lot of graphics obviously
+and a lot of tables, it's not going to be a lot of use to you.
+However, if it's a simply commentary and text, it'll come out quite nice.
+One good tip would be to go to the website using a fairly basic program like e-links or
+dillow, which is more graphical viewer, and you'll see what the page is going to look
+like.
+If it's a WordPress blog, for instance, all the menus will be put down at the bottom and
+yeah, it looks quite nice, which means that all the menus and stuff when they are spoken
+back to you will be at the end of the episode, so you can already audio file and you can
+flick to the next one.
+Okay, let's get down to it.
+The reason I want to talk to you about this is, first of all, it's about the philosophy,
+the unix philosophy of having small things that do a particular task and chaining them
+together.
+That's the first thing.
+And secondly, it's to explain the practical uses of standard input, standard output, redirecting
+and that sort of stuff.
+Now I could have used for Wikipedia a program website called PDFon, which will actually
+do this, but since I've actually started this Wikipedia text-to-speech blog, which I
+did a few weeks ago, I've modified the script heavily so that I can use it for any web
+page and convert it into different formats, put in command line switches, you know, I can
+specify the file type and that sort of thing, file name and whether I want to override
+the file name and that sort of stuff.
+Everything starts off quite simple and then you can expand it out.
+Anyway, they called, as it stands now, will be also in the show notes for this episode
+or at least a link to it.
+Let's begin.
+First of all, I want to talk about standard input, standard output and standard error.
+Standard input is typically your keyboard and your mouse.
+Standard output tends to be your screen or perhaps a printer and standard error usually
+tends to be your screen if there's a neural message.
+So this kind of normal, unique stuff.
+What is kind of cool is that you can take the output of one and redirect it into another
+and you can take the R or you could also take the output and pipe it into another.
+There's a slight difference here.
+Everything is the greater than sign.
+So if you do an LS, which is a directory listing and you use the greater than sign and
+then you specify a file name.
+So LS.text, for instance, instead of sending the output to your screen, it's going to send
+it to the text file called LS.text, fair enough.
+Now, see, you have another directory and you want to do LS.
+On that directory as well, you can change it to that directory and you want to append
+it to the file.
+You simply type LS, greater than greater than LS.text and it will be appended to the end
+of the file.
+So that's redirected.
+What we are going to do here is we're going to be piping the output of standard output
+into standard input.
+Most programs like LS and Grap or whatever, use standard output and standard input as
+there as where they take the file in from standard input and the standard output to standard
+output.
+However, some of the programs that we're going to be using like WGET or whatever, don't
+do this and you need to specify it.
+So if you open up the man pages for these programs and search for STD out or standard out,
+you'll see the switches if they're necessary.
+The Unix philosophy is that you have a lot of small programs that you can put together
+to make a more complex one.
+Your first task when you have an idea, there's an itch that you want to scratch, is how you're
+going to break that big task up into smaller little sub-tasks that you can then work on
+and find a tool that will help you accomplish those.
+In our task here, we want to go to a web page and we want to convert that into a Nog file
+for instance that we can play in my portable media player.
+So task number one is we want to get the web page.
+When we download the web page, that's going to be a HTML format, so we need to convert
+the HTML into standard text.
+When we convert that standard text in through a speech synthesizer, and usually a speech
+synthesizer will only give you the option to output to a WAV file.
+So then we want something else that will convert the WAV file into an AUK file.
+Now under Linux there are many programs and for any particular task, there's probably
+going to have a choice of two or three different programs that you can use.
+For example, getting the website, I can use WGET or I can use curl or I could even use
+a telnet with some expect commands, but I'm going to use WGET because that's the one
+I'm most familiar with.
+When I'm selecting HTML to text as my command to convert the HTML into text, I'm going
+to use eSpeak instead of festival because I found eSpeak to be easier to install, works
+better with standard input and standard output I found.
+For the conversion from WAV to AUG, I could use SOX, but I'm actually going to use FFM
+PEC because SOX has a non-bogue where it doesn't support MP3 files out of the box for
+legal reasons.
+So I've got my commands, WGET, HTML to text, eSpeak and FFM PEC.
+And what I'm going to do is I'm going to have these programs pipe from one command into
+the next.
+So I'm going to need to look into the man pages for all of these commands and make sure
+that they all redirect the standard output and that they can accept input from standard
+input.
+By default, WGET will save a downloaded file instead of displaying it on standard output.
+So we need to use the dash O command, that's a capital O space dash, to tell it, to send
+the standard output as opposed to just saving it in the file.
+The same with HTML to text, we need to specify the dash lowercase or space minus sign.
+With eSpeak, the format is dash dash STDAUT and with FFM PEC, we're actually want to save
+it to a file, but with FFM PEC, we need to tell it to listen to standard input and that
+is dash I, space and the minus sign.
+All the other programs by default will listen on standard input, so we're good to go there.
+Other commands when I chain them together, the only other special thing that I needed
+to do was in the HTML to text command, it has by default, when you download a page, if
+there's bold or italics, it will add some special encoding characters that are understood
+by page programs like less and more and they sound very choppy when you play it through
+eSpeak.
+So some of the other options I added were the dash no BS, space dash ASCII, to strip out
+those special characters and to convert everything into ASCII code.
+So you will get the URL dash greater than all, space minus, space, the pipe sign, space,
+HTML to text, space dash no BS, space dash ASCII, space, lore castle, space dash, pipe
+that into eSpeak, space dash dash STDAUT, pipe that into FFM PEC dash, space dash I, space
+dash, and then output file dot OGG, and when you do that, it will go get the web page,
+convert it to text for you, and output it to an art file.
+Now you can do that on the command line, which is what I did for quite a while, but you
+can also make a script around that using a, you know, what do you want to look up on Wikipedia
+and you can read in line, and then you can put in a WGAS, HTTP code for such, for such
+EN.wikipedia.org, for such wiki, for such double quotes, dollar opens quickly, bracket
+line, close quickly bracket, double quotes again, and that will go off to Wikipedia and it
+will send whatever you typed in as the answer to that, it will send it off to Wikipedia
+Wikipedia will return with the correct URL. That I found was very good for what I needed to do with
+the Wikipedia text at all list of abbreviations and terms that I wanted to look up. So I was
+able to pipe all those into a script and it would go into a text file, pipe the text file into
+this script which I put into a loop and then I was able to convert all these things into org
+and put them on my portable media player. However since then I found that going to Wikipedia
+I'm more or less looking up a URL the whole time so it might be somebody's blog, it might be a URL
+to my man page, it might be a how-to document, it might be the world CIA factbook on some country
+and so I've instead expanded it out so that it's now called web2speech and you can
+specify the format and options in URL so by default web2speech and the URL will just convert it into
+org or whatever is defined as the default format for your player and you'll find a link to that
+program in the show notes for this episode and I'd appreciate your feedback and comments.
+Okay I hope you find this useful. If not tune in tomorrow and expect to hear another
+exciting episode of how-to-public radio. Thank you very much and goodbye.