Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
241
hpr_transcripts/hpr4312.txt
Normal file
241
hpr_transcripts/hpr4312.txt
Normal file
@@ -0,0 +1,241 @@
|
||||
Episode: 4312
|
||||
Title: HPR4312: What Is The Indie Archive?
|
||||
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr4312/hpr4312.mp3
|
||||
Transcribed: 2025-10-25 22:49:51
|
||||
|
||||
---
|
||||
|
||||
This is Hacker Public Radio Episode 4 312 for Tuesday 11 February 2025.
|
||||
Today's show is entitled, What is the Indie Archive?
|
||||
It is part of the series programming 101.
|
||||
It is hosted by Harry Larry, and is about 16 minutes long.
|
||||
It carries a clean flag.
|
||||
In the summary is, the Indie Archive is an archival solution for Indie producers.
|
||||
What is the Indie Archive?
|
||||
I'm Harry Larry, and you're listening to the Plain Text Programs podcast.
|
||||
The Indie Archive is an archival solution for Indie producers.
|
||||
In most Indie producers run on a shoestring budget, it's important that the Indie Archive
|
||||
is inexpensive to install and run.
|
||||
It's especially important that monthly expenses are minimal, because a reasonable expense
|
||||
most months will sometimes be more than an Indie producer can afford during some months.
|
||||
The first major constraint is cost.
|
||||
So I'll be talking about prices a lot in this podcast, and get more technical in future
|
||||
podcasts about the Indie Archive.
|
||||
Indie Archive is an archival system which is different than a backup system.
|
||||
If you don't have a backup system, do that first.
|
||||
My backup system uses the same tools as Indie Archive, our sync and our snapshot.
|
||||
My brother uses the online backup service carbonite.
|
||||
There are many other options.
|
||||
A good backup system runs automatically to backup everything frequently, and preserve
|
||||
version history.
|
||||
It's also good to have backups off-site.
|
||||
An archival system, like Indie Archive, keeps multiple redundant copies across several
|
||||
hard drives on several systems in multiple locations.
|
||||
An archival system also checks file integrity as protection against file corruption or
|
||||
user error.
|
||||
When you have a project, you really never want to lose, like a finished novel, a music
|
||||
album, a video, or any other major effort that involves significant work.
|
||||
That's when you need an archival system.
|
||||
So the Indie Archive does not automatically backup your projects every day.
|
||||
That's what your backup system should do.
|
||||
The Indie Archive is an archival system where the producer of the content decides what
|
||||
needs to be archived and when it needs to be archived, and then manually moves the directory
|
||||
containing the files onto the Indie Archive carefully preserving the files metadata during
|
||||
the transfer.
|
||||
Then these files are propagated over at least seven hard drives on four different systems
|
||||
in three locations.
|
||||
File integrity checks are run daily, comparing the files and reporting discrepancies.
|
||||
Two of the systems are kept in the studio where the content is produced.
|
||||
I call them the primary and secondary systems.
|
||||
They have a boot drive and two data drives each.
|
||||
One of the systems is kept off-site at a nearby location.
|
||||
I call it the remote system.
|
||||
It also has a boot drive and two data drives.
|
||||
If you have a more distant location where you can put a second remote system, you can
|
||||
have remote near and remote fire systems.
|
||||
Otherwise, the final system is somewhere in the cloud, provided by a professional data
|
||||
storage provider.
|
||||
It has a single copy of the data and usually some additional data retention.
|
||||
The provider makes the backups of this data.
|
||||
This is the part that might involve a monthly bill.
|
||||
So depending on the size of your file set, it could be free or it could cost so much
|
||||
a month.
|
||||
There are a lot of options for cloud storage providers.
|
||||
But first, I'm going to discuss the three systems, primary, secondary, and remote, and
|
||||
how they function.
|
||||
As far as the hardware goes, the systems are the same.
|
||||
Now I'm a Linux guy and I do all my production work on Linux, so I'm using Linux.
|
||||
I want to test the system on several versions of Linux and with BSD.
|
||||
I'm not a Mac guy or a Windows guy, so I won't be going there.
|
||||
The software is open source and the required programs run on all three platforms, so I'll
|
||||
let a Mac or Windows programmer test the indie archive for their systems.
|
||||
My guess is that the Mac fork will be easier than the Windows fork because of the file metadata.
|
||||
It might even be possible to add Mac folders to the indie archive running Linux, but
|
||||
I'll let someone who actually has a Mac figure that out.
|
||||
I don't think the same is true for Windows.
|
||||
Windows file metadata is different, so if you want to preserve the metadata, you will
|
||||
probably have to install indie archive on Windows systems.
|
||||
So I'm developing and deploying on Linux and I will also test on BSD.
|
||||
So far I have tested Debian, Ubuntu, Free BSD, Midnight BSD, and Zubuntu, and the indie
|
||||
archive works fine on all of these operating systems.
|
||||
So back to the hardware, pretty much any older system that will support at least three
|
||||
state address will work.
|
||||
I'm using older business desktops, Dell and HD.
|
||||
I pulled mine out of storage, but they are very inexpensive to buy if you're not like
|
||||
me with a shed full of old computer stuff.
|
||||
I just bought a small form factor HP desktop on eBay for $30, including tax and shipping.
|
||||
To clarify, it's best if the primary system supports four SATA drives.
|
||||
The secondary and remote systems do not need an optical drive, so they should support
|
||||
three SATA drives, but they can be run on two SATA drives if you boot from the primary
|
||||
file drive.
|
||||
I'm currently testing a remote system with two SATA drives running midnight BSD.
|
||||
The Dell desktops make a big deal about being green.
|
||||
I am open to suggestions on what would be the best energy efficient systems for the indie
|
||||
archive because of both the cost of electricity and the impact on the environment.
|
||||
There are three drives on each system, a boot drive and two Dianna drives.
|
||||
The boot drives can be SSD or spinning hard drives and need to be big enough to hold
|
||||
the OS comfortably.
|
||||
The data drives need to be large enough to hold the files you want to archive and they
|
||||
should be high quality spinning drives.
|
||||
I use the multi terabyte HGST drives and I am also looking at some Dell drives made
|
||||
by HGST.
|
||||
There will be a data drive and a snapshot drive on each system.
|
||||
If they are not the same size, the snapshot drives should be larger.
|
||||
I am testing with three terabyte data drives and four terabyte snapshot drives.
|
||||
Besides the main data set that is being archived, the snapshot drives also hold the version
|
||||
history of the files that have been deleted or changed, so that's why they should be
|
||||
the larger drive.
|
||||
From my primary system has a primary files directory with a three terabyte drive
|
||||
mounted to it and a primary snapshot directory with a four terabyte drive mounted to it.
|
||||
Same for the secondary and remote systems.
|
||||
Now, so far I only had to buy one drive, but generally speaking, the six data drives
|
||||
will be the major expense in assembling the systems.
|
||||
So a good bargain on six four terabyte drives could be $120 used or $270 new and this
|
||||
is the most expensive part.
|
||||
I install used HGST drives all the time and really have problems with them.
|
||||
I have worked for clients who won't buy used only new.
|
||||
Since the file integrity checks should give early warning of drive failure and since there
|
||||
is a seven drive redundancy on the data files, if I were buying drives for the indie archive,
|
||||
I'd go with six used four terabyte HGST drives for $120.
|
||||
There is no reason not to use drives all the same size as long as the snapshot drives are
|
||||
large enough.
|
||||
The size of data drives you need depends on the size of your projects and the time it takes
|
||||
to do a project.
|
||||
Look at your hard drives on your working systems.
|
||||
Think about what directories you would like to see in archival storage.
|
||||
What is the total size of these directories?
|
||||
Check how many gigabytes these projects have consumed in the last year.
|
||||
Think forward a few years.
|
||||
Assume you will use more disk space in the future than you are now.
|
||||
Do some quick arithmetic and make a decision.
|
||||
Like I said, I only had to buy one drive so far because I'm weird and I had a bunch of
|
||||
three terabyte drives available.
|
||||
If I had to buy drives, I probably would have tried to start larger.
|
||||
I am sure that at some point in the not too distant future when I am running the indie
|
||||
archive and not developing it, I will have to upgrade my drives.
|
||||
The primary system is the console for the indie archive.
|
||||
When you copy a project onto the indie archive, the directory goes into the primary files
|
||||
directory.
|
||||
From there, it is propagated out to the primary snapshot directory, the secondary system,
|
||||
the cloud storage if you are using it and eventually to the remote system.
|
||||
All of the data propagation is done with arsenic using the archive setting that is designed
|
||||
to preserve file metadata like owner permissions and date less modified.
|
||||
So I have been using arsenic with the archive setting to move the files from the work system
|
||||
to a USB drive and from the USB drive to the primary files folder.
|
||||
At first I thought I would use an optical disk to move the files, but optical disks do
|
||||
not preserve file metadata.
|
||||
Also I had some word results with the USB flash drive because it was formatted fat32.
|
||||
Fat32 does not support Linux metadata, so if you are going to move projects over on a
|
||||
flash drive or a USB external drive, be sure to format to EXT4.
|
||||
Another way to move projects over to the primary files directory is with tower compression.
|
||||
This preserves metadata when the files are extracted, so this might be easier and it works
|
||||
with optical drives.
|
||||
If your directory will fit on an optical drive, this also gives you another backup on another
|
||||
media.
|
||||
If you have any suggestions on how to transfer projects while preserving the file metadata,
|
||||
let me know.
|
||||
I know that there are network options available, but I am hesitant to recommend them because
|
||||
if I can transfer files from a system to the primary system over the LAN, then anyone
|
||||
can do the same.
|
||||
Or delete files, or accidentally delete directories.
|
||||
I kind of want to keep TAC control over access to the primary system.
|
||||
It kind of ruins the archival quality of the indie archive if anyone on the LAN can
|
||||
accidentally mess with it.
|
||||
So I am open to dialogue on these issues.
|
||||
I am kind of worried I want it to be easy to add projects to the indie archive, but not
|
||||
too easy if you know what I mean.
|
||||
I feel like having to sit down at the primary system and enter a password should be the minimum
|
||||
amount of security required to access the primary system.
|
||||
The primary system also runs file integrity checks daily from a crime job.
|
||||
All of the propagation and file integrity scripts have to be run as root to preserve the
|
||||
metadata since only root can write a file that it doesn't own.
|
||||
The secondary system is the SSH server for the indie archive.
|
||||
The primary system logs under the secondary system as root using SSH.
|
||||
Security is managed with public and private keys, so entering a password is not required.
|
||||
After the keys are set up for both the primary and remote systems, password authentication
|
||||
is disabled for the SSH server, so only those two systems can SSH into the secondary system.
|
||||
When the propagation script is run on the primary system, our snapshot is used to create
|
||||
a current version of the primary files directory in the primary snapshots directory.
|
||||
Then the primary system uses our sync over SSH to make a copy of the primary files directory
|
||||
to the secondary files directory.
|
||||
Then the primary system logs onto the secondary system as root, and our snapshot is used
|
||||
to create a current version of the secondary files directory on the secondary snapshots directory.
|
||||
Finally, if cloud storage is being used, the primary system uses gcloudr sync to make
|
||||
a copy of the primary files directory to a Google Cloud Storage bucket archive.
|
||||
I have this bucket set to 90 days soft delete.
|
||||
If you are using another type of cloud storage on Google, AWS, Mega, or other storage providers,
|
||||
this command will have to be adjusted.
|
||||
The reason I chose the gcloud archive bucket is because of the storage cost per gigabyte.
|
||||
They have the cheapest cost per gigabyte that I found.
|
||||
This will keep the monthly bill low.
|
||||
Once a day the primary system runs the file integrity check from a crime job using our sync
|
||||
to compare the primary files directory to the current version alpha.0 in the primary snapshots directory
|
||||
logging any discrepancies.
|
||||
It then does the same comparing primary files to secondary files and to the current version
|
||||
in the secondary snapshots directory.
|
||||
Logging discrepancies and notifying the maintainer of any discrepancies.
|
||||
Notification is done by email using curl and an SMTP provider.
|
||||
The remote system runs on its own schedule.
|
||||
Logging into the secondary system daily to copy data from secondary files to remote files
|
||||
and then using our snapshot to make a copy of remote files to the remote snapshots directory.
|
||||
Since it's run on a daily schedule, it uses our snapshot with the standard daily, weekly, monthly, and yearly backups.
|
||||
The remote system also runs a daily file integrity check comparing remote files to the current version
|
||||
on remote snapshots and comparing remote files to both data directories on the secondary system
|
||||
again logging the results and notifying the maintainer of any discrepancies.
|
||||
If there is an outward facing static IP at the location with the primary and secondary
|
||||
systems, then the remote system can use that static IP to SSH into the secondary system.
|
||||
If there is not a static IP, then the remote system uses a duct DNS subdomain to log
|
||||
onto the secondary system.
|
||||
Any system using the same router as the secondary system can run a crime job to update duct DNS
|
||||
with the current IP address.
|
||||
Since a static IP is a monthly expense, it's important that there's an alternative that
|
||||
does not require paying another bill.
|
||||
So the secondary system has the SSH server, but it doesn't really do much.
|
||||
Both of the other systems connect to it and use it as a junction for data propagation
|
||||
and file integrity checks.
|
||||
So as you can tell, there's a lot going on to make the indie archive work.
|
||||
Future podcasts will get down into the details and discuss some of the choices I had to
|
||||
make and why I made them.
|
||||
The funny thing about this project is that the actual code was the least amount of work.
|
||||
Figuring out exactly how R-Sync and R-Snap Shot work together was quite a bit of work.
|
||||
The duration for both R-Snap Shot and SSH took a bit of head scratching.
|
||||
Then there were a few user ID tricks I had to work through to make the indie archive usable.
|
||||
But by far, the most work was writing the indie archive installation document detailing
|
||||
each step of installing the software on three systems.
|
||||
It's been fun so far.
|
||||
If you have input, I always appreciate the help.
|
||||
I get quite a bit of help on Macedon.
|
||||
If you go to home.gamerplus.org, you'll find the script for this podcast with the Macedon
|
||||
comment thread embedded in the post.
|
||||
This podcast is being read from a document that is a work in progress.
|
||||
Current versions of what is the indie archive document will be posted at Codeberg when
|
||||
I'm ready to upload the project.
|
||||
Thanks for listening.
|
||||
You have been listening to Hacker Public Radio at HackerPublicRadio.org.
|
||||
Today's show was contributed by a HBR listener like yourself.
|
||||
If you ever thought of recording a podcast, then click on our contribute link to find out
|
||||
how easy it really is.
|
||||
Hosting for HBR has been kindly provided by an onsthost.com, the internet archive and
|
||||
rsync.net.
|
||||
On the Sadois status, today's show is released under Creative Commons, Attribution 4.0 International
|
||||
License.
|
||||
Reference in New Issue
Block a user