Initial commit: HPR Knowledge Base MCP Server

- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00
commit 7c8efd2228
4494 changed files with 1705541 additions and 0 deletions
--- a/hpr_transcripts/hpr4341.txt
+++ b/hpr_transcripts/hpr4341.txt
@@ -0,0 +1,151 @@
+Episode: 4341
+Title: HPR4341: Transferring Large Data Sets
+Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr4341/hpr4341.mp3
+Transcribed: 2025-10-25 23:20:05
+
+---
+
+This is Hacker Public Radio Episode 4341 for Monday 24 March 2025.
+Today's show is entitled Transferring Large Data Sets.
+It is part of the series programming 101.
+It is hosted by Harry Larry and is about 11 minutes long.
+It carries a clean flag.
+The summary is, how to transfer large data sets using TAR and Blurray disks while preserving
+metadata.
+Transferring Large Data Sets
+Very large data sets present their own problems.
+Not everyone has directories with hundreds of gigabytes of project files, but I do, and
+I assume I'm not the only one.
+For instance, I have a directory with over 700 radio shows.
+My name these directories also have a podcast, and they also have pictures and text files.
+Doing a properties check on the directory, I see 450 gigabytes of data.
+When I started envisioning Libre Indie Archive, I wanted to move directories into archival
+storage using optical drives.
+My first attempt at this didn't work because I lost metadata when I wrote the optical
+drives since optical drives are read only.
+After further work and study, I learned that TAR files can preserve metadata if they
+are created and uncompressed as root.
+In fact, if you're running TAR as root preserving file ownership and permissions is the default.
+So this means that optical drives are an option if you write TAR archives onto the optical
+drives.
+I have better success rates with 25 gigabyte Blu-ray disks than with the 50 gigabyte disks.
+So if your directory breaks up into projects that fit on 25 gigabyte disks, that's great.
+My data did not do this easily, but TAR does have an option to write a dataset to multiple
+TAR files, each with a maximum size, labeling them dash 0, dash 1, etc.
+When using this multi-volume feature, you cannot use compression, so you will get TAR files
+not TAR.GZ files.
+It's better to break the file sets up in more reasonable sizes, so I decided to divide
+the shows up alphabetically by title, so all the shows starting with the letter A would
+be one dataset and then down the alphabet one letter at a time.
+Most of the letters would result in the single TAR file label dash 0 that would fit on the
+25 gigabyte disk.
+Many letters, however, took two or even three TAR files that would have to be written on
+different disks and then concatenated on the primary system before they are extracted
+to the correct location in primary files.
+There is a companion program to TAR called TARCAT that I used to combine two or three
+TAR files split by length into a single TAR file that could be extracted.
+I ran in grandpa as root to extract the files.
+So I used a TAR command on the working system where my something blue radio shows their
+stored.
+Then I used K3B to burn these files onto a 25 gigabyte blue ray disk, carefully labeling
+the disk and writing a text file that I used to keep up with which files I had already
+copied to disk.
+Then on the Libre Indy Archive primary system, I copied from the blue ray to the boot drive
+the file or files for that dataset.
+Then I would use TARCAT to combine the files if there was more than one file for that
+dataset.
+And finally, I would extract the files to primary files by running and grandpa as root.
+Now I'm going into details on each of these steps.
+First make sure that the Libre Indy Archive program prep.sh is in your home directory
+on your workstation.
+Then from the data directory to be archived, in my case the something underscore blue directory
+run prep.sh like this, till the slash prep.sh.
+This will create a file named IAunderorigin.txt that lists the date, the computer and directory
+being archived, and the users and user IDs on that system.
+All very helpful information to have, if it's sometime in the future, you need to do
+a restore.
+Next create a TAR dataset for each letter of the alphabet.
+You may want to divide your dataset in a different way.
+Open a terminal in the same directory as the data directory, my something blue directory,
+so that LS displays something blue, your data directory.
+I keep the something blue shows and podcasts in sub-directories in the something blue directory.
+Here's the TAR command.
+sudo TAR dash cv dash dash tape dash length equals 20 million.
+dash dash file equals something blue dash a dash squirley bracket zero dot dot 50 close
+squirley bracket dot TAR.
+space slash home slash slurry slash delta slash something under blue slash a star.
+This is for the letter a, so the file parameter includes the letter a.
+The number zero dot dot 50 in the squirley brackets are the sequence numbers for the files.
+I only had one file for the letter a, something blue dash a dash zero dot TAR.
+The last parameter is the source for the TAR files.
+In this case, slash home slash slurry slash delta slash something blue slash a star.
+All of the files and directories in the something blue directory that start with the letter a.
+You may want to change the dash dash tape dash length parameter.
+As listed, it stores up to 19.1 gigabytes.
+The maximum capacity of a 25 gigabyte blue ray is 23.3 gigabytes for data storage.
+Example B. For the letter B, I ended up with three TAR files.
+Something blue dash B dash zero dot TAR, something blue dash B dash one dot TAR, and something blue
+dash B dash two dot TAR.
+I will use these files in the example below using TARCAT to combine the files.
+I use K3B to burn blue ray data disk.
+Besides installing K3B, you have to install some other programs.
+Then there is a particular setup that needs to be done, including selecting CD record and no multi session.
+Here's an excellent article that will go step by step through the installation and setup.
+How to burn blue ray disks on Ubuntu and derivatives using K3B and the link.
+I also always check verify data, and I use the Linux Unix file system not Windows,
+which will rename your files if the file names are too long.
+I installed a blue ray reader into the primary system, and I used sooner to copy the files from the blue ray disk to the boot drive.
+In the primary file directory, I made a sub directory, something under blue, to hold the archive shows.
+If there is only one file, like an example A above, you can skip the concatenation step.
+If there is more than one file, like example B above, you use TARCAT to concatenate these files into one TAR file.
+You have to do this.
+If you try to extract from just one of the numbered files, when there is more than one, you will get an error.
+So if I try to extract from something blue, dash B dash 0 dot TAR, and I get an error, it doesn't mean that there's anything wrong with that file.
+It just has to be concatenated with the other B files before it can be extracted.
+There is a companion program to TAR, called TARCAT, that should be used to concatenate the TAR files.
+Here's the command I used, for example B above.
+TARCAT, something blue dash B dash 0 dot TAR, space, something blue dash B dash 1 dot TAR space,
+something blue dash B dash 2 dot TAR, space, redirect to, space, SB dash B dot TAR.
+This will concatenate the three smaller TAR files into one bigger TAR file named SB dash B dot TAR.
+In order to preserve the metadata, you have to extract the files as root.
+In order to make it easier to select the files to be extracted, and where to store them.
+I used the GUI archive manager and grandpa.
+To run and grandpa as root, open a terminal with CTRL ALT and use this command.
+Sudo, dash capital H and grandpa.
+Click open and select the TAR file to extract.
+Then follow the path until you are in the something blue directory,
+and you are seeing the folders and files you want to extract.
+Type CTRL A to select them all.
+Instead of the something blue directory, you will go to your data directory.
+Then click extract at the top of the window.
+Open the directory where you want the files to go.
+In my case, primary files slash something blue.
+Then click extract again in the lower right.
+After the files are extracted, go to your data directory in primary files,
+and check that the directories and files are where you expect them to be.
+You can also open a terminal in that directory and type LSTASH-L to review the metadata.
+When dealing with that a chunk size 20 gigabytes or more, each of these steps takes time.
+The reason I like using an optical backup to transfer the files from the working system
+to the Libra Indie archive is because it gives me an easy-to-store backup that is not on a
+spinning drive and that cannot be overwritten. Still, optical distortion is not perfect either.
+It's just another belt to go with your suspenders.
+Another way to transfer directories into the primary files directory is with SSH over the network.
+This is not as safe as using optical disk, and it also does not provide the extra snapshot backup.
+It also takes a long time, but it is not as labor intensive.
+After I spend some more time thinking about this and testing,
+I will do a podcast about transferring large data sets with SSH.
+Although I am transferring large data sets to move them into archival storage using Libra Indie
+archive, there are many other situations where you might want to move a large data set
+while preserving the metadata. So what I have written about car files, optical disks,
+and running sooner and then grandpa as root is generally applicable.
+As always, comments are appreciated. You can comment on hacker public radio or on mastodon.
+Visit my blog at home.gamerplus.org where I will post the show notes and embed the mastodon thread
+for comments about this podcast. Thanks.
+You have been listening to hacker public radio as hacker public radio does work.
+Today's show was contributed by a HBR listener like yourself.
+If you ever thought of recording a podcast,
+you can click on our contribute link to find out how easy it really is.
+Hosting for HBR has been kindly provided by
+an honesthost.com, the internet archive, and our sync.net.
+On the Sadois status, today's show is released under Creative Commons
+Attribution 4.0 International License.