Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
151
hpr_transcripts/hpr4341.txt
Normal file
151
hpr_transcripts/hpr4341.txt
Normal file
@@ -0,0 +1,151 @@
|
||||
Episode: 4341
|
||||
Title: HPR4341: Transferring Large Data Sets
|
||||
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr4341/hpr4341.mp3
|
||||
Transcribed: 2025-10-25 23:20:05
|
||||
|
||||
---
|
||||
|
||||
This is Hacker Public Radio Episode 4341 for Monday 24 March 2025.
|
||||
Today's show is entitled Transferring Large Data Sets.
|
||||
It is part of the series programming 101.
|
||||
It is hosted by Harry Larry and is about 11 minutes long.
|
||||
It carries a clean flag.
|
||||
The summary is, how to transfer large data sets using TAR and Blurray disks while preserving
|
||||
metadata.
|
||||
Transferring Large Data Sets
|
||||
Very large data sets present their own problems.
|
||||
Not everyone has directories with hundreds of gigabytes of project files, but I do, and
|
||||
I assume I'm not the only one.
|
||||
For instance, I have a directory with over 700 radio shows.
|
||||
My name these directories also have a podcast, and they also have pictures and text files.
|
||||
Doing a properties check on the directory, I see 450 gigabytes of data.
|
||||
When I started envisioning Libre Indie Archive, I wanted to move directories into archival
|
||||
storage using optical drives.
|
||||
My first attempt at this didn't work because I lost metadata when I wrote the optical
|
||||
drives since optical drives are read only.
|
||||
After further work and study, I learned that TAR files can preserve metadata if they
|
||||
are created and uncompressed as root.
|
||||
In fact, if you're running TAR as root preserving file ownership and permissions is the default.
|
||||
So this means that optical drives are an option if you write TAR archives onto the optical
|
||||
drives.
|
||||
I have better success rates with 25 gigabyte Blu-ray disks than with the 50 gigabyte disks.
|
||||
So if your directory breaks up into projects that fit on 25 gigabyte disks, that's great.
|
||||
My data did not do this easily, but TAR does have an option to write a dataset to multiple
|
||||
TAR files, each with a maximum size, labeling them dash 0, dash 1, etc.
|
||||
When using this multi-volume feature, you cannot use compression, so you will get TAR files
|
||||
not TAR.GZ files.
|
||||
It's better to break the file sets up in more reasonable sizes, so I decided to divide
|
||||
the shows up alphabetically by title, so all the shows starting with the letter A would
|
||||
be one dataset and then down the alphabet one letter at a time.
|
||||
Most of the letters would result in the single TAR file label dash 0 that would fit on the
|
||||
25 gigabyte disk.
|
||||
Many letters, however, took two or even three TAR files that would have to be written on
|
||||
different disks and then concatenated on the primary system before they are extracted
|
||||
to the correct location in primary files.
|
||||
There is a companion program to TAR called TARCAT that I used to combine two or three
|
||||
TAR files split by length into a single TAR file that could be extracted.
|
||||
I ran in grandpa as root to extract the files.
|
||||
So I used a TAR command on the working system where my something blue radio shows their
|
||||
stored.
|
||||
Then I used K3B to burn these files onto a 25 gigabyte blue ray disk, carefully labeling
|
||||
the disk and writing a text file that I used to keep up with which files I had already
|
||||
copied to disk.
|
||||
Then on the Libre Indy Archive primary system, I copied from the blue ray to the boot drive
|
||||
the file or files for that dataset.
|
||||
Then I would use TARCAT to combine the files if there was more than one file for that
|
||||
dataset.
|
||||
And finally, I would extract the files to primary files by running and grandpa as root.
|
||||
Now I'm going into details on each of these steps.
|
||||
First make sure that the Libre Indy Archive program prep.sh is in your home directory
|
||||
on your workstation.
|
||||
Then from the data directory to be archived, in my case the something underscore blue directory
|
||||
run prep.sh like this, till the slash prep.sh.
|
||||
This will create a file named IAunderorigin.txt that lists the date, the computer and directory
|
||||
being archived, and the users and user IDs on that system.
|
||||
All very helpful information to have, if it's sometime in the future, you need to do
|
||||
a restore.
|
||||
Next create a TAR dataset for each letter of the alphabet.
|
||||
You may want to divide your dataset in a different way.
|
||||
Open a terminal in the same directory as the data directory, my something blue directory,
|
||||
so that LS displays something blue, your data directory.
|
||||
I keep the something blue shows and podcasts in sub-directories in the something blue directory.
|
||||
Here's the TAR command.
|
||||
sudo TAR dash cv dash dash tape dash length equals 20 million.
|
||||
dash dash file equals something blue dash a dash squirley bracket zero dot dot 50 close
|
||||
squirley bracket dot TAR.
|
||||
space slash home slash slurry slash delta slash something under blue slash a star.
|
||||
This is for the letter a, so the file parameter includes the letter a.
|
||||
The number zero dot dot 50 in the squirley brackets are the sequence numbers for the files.
|
||||
I only had one file for the letter a, something blue dash a dash zero dot TAR.
|
||||
The last parameter is the source for the TAR files.
|
||||
In this case, slash home slash slurry slash delta slash something blue slash a star.
|
||||
All of the files and directories in the something blue directory that start with the letter a.
|
||||
You may want to change the dash dash tape dash length parameter.
|
||||
As listed, it stores up to 19.1 gigabytes.
|
||||
The maximum capacity of a 25 gigabyte blue ray is 23.3 gigabytes for data storage.
|
||||
Example B. For the letter B, I ended up with three TAR files.
|
||||
Something blue dash B dash zero dot TAR, something blue dash B dash one dot TAR, and something blue
|
||||
dash B dash two dot TAR.
|
||||
I will use these files in the example below using TARCAT to combine the files.
|
||||
I use K3B to burn blue ray data disk.
|
||||
Besides installing K3B, you have to install some other programs.
|
||||
Then there is a particular setup that needs to be done, including selecting CD record and no multi session.
|
||||
Here's an excellent article that will go step by step through the installation and setup.
|
||||
How to burn blue ray disks on Ubuntu and derivatives using K3B and the link.
|
||||
I also always check verify data, and I use the Linux Unix file system not Windows,
|
||||
which will rename your files if the file names are too long.
|
||||
I installed a blue ray reader into the primary system, and I used sooner to copy the files from the blue ray disk to the boot drive.
|
||||
In the primary file directory, I made a sub directory, something under blue, to hold the archive shows.
|
||||
If there is only one file, like an example A above, you can skip the concatenation step.
|
||||
If there is more than one file, like example B above, you use TARCAT to concatenate these files into one TAR file.
|
||||
You have to do this.
|
||||
If you try to extract from just one of the numbered files, when there is more than one, you will get an error.
|
||||
So if I try to extract from something blue, dash B dash 0 dot TAR, and I get an error, it doesn't mean that there's anything wrong with that file.
|
||||
It just has to be concatenated with the other B files before it can be extracted.
|
||||
There is a companion program to TAR, called TARCAT, that should be used to concatenate the TAR files.
|
||||
Here's the command I used, for example B above.
|
||||
TARCAT, something blue dash B dash 0 dot TAR, space, something blue dash B dash 1 dot TAR space,
|
||||
something blue dash B dash 2 dot TAR, space, redirect to, space, SB dash B dot TAR.
|
||||
This will concatenate the three smaller TAR files into one bigger TAR file named SB dash B dot TAR.
|
||||
In order to preserve the metadata, you have to extract the files as root.
|
||||
In order to make it easier to select the files to be extracted, and where to store them.
|
||||
I used the GUI archive manager and grandpa.
|
||||
To run and grandpa as root, open a terminal with CTRL ALT and use this command.
|
||||
Sudo, dash capital H and grandpa.
|
||||
Click open and select the TAR file to extract.
|
||||
Then follow the path until you are in the something blue directory,
|
||||
and you are seeing the folders and files you want to extract.
|
||||
Type CTRL A to select them all.
|
||||
Instead of the something blue directory, you will go to your data directory.
|
||||
Then click extract at the top of the window.
|
||||
Open the directory where you want the files to go.
|
||||
In my case, primary files slash something blue.
|
||||
Then click extract again in the lower right.
|
||||
After the files are extracted, go to your data directory in primary files,
|
||||
and check that the directories and files are where you expect them to be.
|
||||
You can also open a terminal in that directory and type LSTASH-L to review the metadata.
|
||||
When dealing with that a chunk size 20 gigabytes or more, each of these steps takes time.
|
||||
The reason I like using an optical backup to transfer the files from the working system
|
||||
to the Libra Indie archive is because it gives me an easy-to-store backup that is not on a
|
||||
spinning drive and that cannot be overwritten. Still, optical distortion is not perfect either.
|
||||
It's just another belt to go with your suspenders.
|
||||
Another way to transfer directories into the primary files directory is with SSH over the network.
|
||||
This is not as safe as using optical disk, and it also does not provide the extra snapshot backup.
|
||||
It also takes a long time, but it is not as labor intensive.
|
||||
After I spend some more time thinking about this and testing,
|
||||
I will do a podcast about transferring large data sets with SSH.
|
||||
Although I am transferring large data sets to move them into archival storage using Libra Indie
|
||||
archive, there are many other situations where you might want to move a large data set
|
||||
while preserving the metadata. So what I have written about car files, optical disks,
|
||||
and running sooner and then grandpa as root is generally applicable.
|
||||
As always, comments are appreciated. You can comment on hacker public radio or on mastodon.
|
||||
Visit my blog at home.gamerplus.org where I will post the show notes and embed the mastodon thread
|
||||
for comments about this podcast. Thanks.
|
||||
You have been listening to hacker public radio as hacker public radio does work.
|
||||
Today's show was contributed by a HBR listener like yourself.
|
||||
If you ever thought of recording a podcast,
|
||||
you can click on our contribute link to find out how easy it really is.
|
||||
Hosting for HBR has been kindly provided by
|
||||
an honesthost.com, the internet archive, and our sync.net.
|
||||
On the Sadois status, today's show is released under Creative Commons
|
||||
Attribution 4.0 International License.
|
||||
Reference in New Issue
Block a user