- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
242 lines
15 KiB
Plaintext
242 lines
15 KiB
Plaintext
Episode: 4312
|
|
Title: HPR4312: What Is The Indie Archive?
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr4312/hpr4312.mp3
|
|
Transcribed: 2025-10-25 22:49:51
|
|
|
|
---
|
|
|
|
This is Hacker Public Radio Episode 4 312 for Tuesday 11 February 2025.
|
|
Today's show is entitled, What is the Indie Archive?
|
|
It is part of the series programming 101.
|
|
It is hosted by Harry Larry, and is about 16 minutes long.
|
|
It carries a clean flag.
|
|
In the summary is, the Indie Archive is an archival solution for Indie producers.
|
|
What is the Indie Archive?
|
|
I'm Harry Larry, and you're listening to the Plain Text Programs podcast.
|
|
The Indie Archive is an archival solution for Indie producers.
|
|
In most Indie producers run on a shoestring budget, it's important that the Indie Archive
|
|
is inexpensive to install and run.
|
|
It's especially important that monthly expenses are minimal, because a reasonable expense
|
|
most months will sometimes be more than an Indie producer can afford during some months.
|
|
The first major constraint is cost.
|
|
So I'll be talking about prices a lot in this podcast, and get more technical in future
|
|
podcasts about the Indie Archive.
|
|
Indie Archive is an archival system which is different than a backup system.
|
|
If you don't have a backup system, do that first.
|
|
My backup system uses the same tools as Indie Archive, our sync and our snapshot.
|
|
My brother uses the online backup service carbonite.
|
|
There are many other options.
|
|
A good backup system runs automatically to backup everything frequently, and preserve
|
|
version history.
|
|
It's also good to have backups off-site.
|
|
An archival system, like Indie Archive, keeps multiple redundant copies across several
|
|
hard drives on several systems in multiple locations.
|
|
An archival system also checks file integrity as protection against file corruption or
|
|
user error.
|
|
When you have a project, you really never want to lose, like a finished novel, a music
|
|
album, a video, or any other major effort that involves significant work.
|
|
That's when you need an archival system.
|
|
So the Indie Archive does not automatically backup your projects every day.
|
|
That's what your backup system should do.
|
|
The Indie Archive is an archival system where the producer of the content decides what
|
|
needs to be archived and when it needs to be archived, and then manually moves the directory
|
|
containing the files onto the Indie Archive carefully preserving the files metadata during
|
|
the transfer.
|
|
Then these files are propagated over at least seven hard drives on four different systems
|
|
in three locations.
|
|
File integrity checks are run daily, comparing the files and reporting discrepancies.
|
|
Two of the systems are kept in the studio where the content is produced.
|
|
I call them the primary and secondary systems.
|
|
They have a boot drive and two data drives each.
|
|
One of the systems is kept off-site at a nearby location.
|
|
I call it the remote system.
|
|
It also has a boot drive and two data drives.
|
|
If you have a more distant location where you can put a second remote system, you can
|
|
have remote near and remote fire systems.
|
|
Otherwise, the final system is somewhere in the cloud, provided by a professional data
|
|
storage provider.
|
|
It has a single copy of the data and usually some additional data retention.
|
|
The provider makes the backups of this data.
|
|
This is the part that might involve a monthly bill.
|
|
So depending on the size of your file set, it could be free or it could cost so much
|
|
a month.
|
|
There are a lot of options for cloud storage providers.
|
|
But first, I'm going to discuss the three systems, primary, secondary, and remote, and
|
|
how they function.
|
|
As far as the hardware goes, the systems are the same.
|
|
Now I'm a Linux guy and I do all my production work on Linux, so I'm using Linux.
|
|
I want to test the system on several versions of Linux and with BSD.
|
|
I'm not a Mac guy or a Windows guy, so I won't be going there.
|
|
The software is open source and the required programs run on all three platforms, so I'll
|
|
let a Mac or Windows programmer test the indie archive for their systems.
|
|
My guess is that the Mac fork will be easier than the Windows fork because of the file metadata.
|
|
It might even be possible to add Mac folders to the indie archive running Linux, but
|
|
I'll let someone who actually has a Mac figure that out.
|
|
I don't think the same is true for Windows.
|
|
Windows file metadata is different, so if you want to preserve the metadata, you will
|
|
probably have to install indie archive on Windows systems.
|
|
So I'm developing and deploying on Linux and I will also test on BSD.
|
|
So far I have tested Debian, Ubuntu, Free BSD, Midnight BSD, and Zubuntu, and the indie
|
|
archive works fine on all of these operating systems.
|
|
So back to the hardware, pretty much any older system that will support at least three
|
|
state address will work.
|
|
I'm using older business desktops, Dell and HD.
|
|
I pulled mine out of storage, but they are very inexpensive to buy if you're not like
|
|
me with a shed full of old computer stuff.
|
|
I just bought a small form factor HP desktop on eBay for $30, including tax and shipping.
|
|
To clarify, it's best if the primary system supports four SATA drives.
|
|
The secondary and remote systems do not need an optical drive, so they should support
|
|
three SATA drives, but they can be run on two SATA drives if you boot from the primary
|
|
file drive.
|
|
I'm currently testing a remote system with two SATA drives running midnight BSD.
|
|
The Dell desktops make a big deal about being green.
|
|
I am open to suggestions on what would be the best energy efficient systems for the indie
|
|
archive because of both the cost of electricity and the impact on the environment.
|
|
There are three drives on each system, a boot drive and two Dianna drives.
|
|
The boot drives can be SSD or spinning hard drives and need to be big enough to hold
|
|
the OS comfortably.
|
|
The data drives need to be large enough to hold the files you want to archive and they
|
|
should be high quality spinning drives.
|
|
I use the multi terabyte HGST drives and I am also looking at some Dell drives made
|
|
by HGST.
|
|
There will be a data drive and a snapshot drive on each system.
|
|
If they are not the same size, the snapshot drives should be larger.
|
|
I am testing with three terabyte data drives and four terabyte snapshot drives.
|
|
Besides the main data set that is being archived, the snapshot drives also hold the version
|
|
history of the files that have been deleted or changed, so that's why they should be
|
|
the larger drive.
|
|
From my primary system has a primary files directory with a three terabyte drive
|
|
mounted to it and a primary snapshot directory with a four terabyte drive mounted to it.
|
|
Same for the secondary and remote systems.
|
|
Now, so far I only had to buy one drive, but generally speaking, the six data drives
|
|
will be the major expense in assembling the systems.
|
|
So a good bargain on six four terabyte drives could be $120 used or $270 new and this
|
|
is the most expensive part.
|
|
I install used HGST drives all the time and really have problems with them.
|
|
I have worked for clients who won't buy used only new.
|
|
Since the file integrity checks should give early warning of drive failure and since there
|
|
is a seven drive redundancy on the data files, if I were buying drives for the indie archive,
|
|
I'd go with six used four terabyte HGST drives for $120.
|
|
There is no reason not to use drives all the same size as long as the snapshot drives are
|
|
large enough.
|
|
The size of data drives you need depends on the size of your projects and the time it takes
|
|
to do a project.
|
|
Look at your hard drives on your working systems.
|
|
Think about what directories you would like to see in archival storage.
|
|
What is the total size of these directories?
|
|
Check how many gigabytes these projects have consumed in the last year.
|
|
Think forward a few years.
|
|
Assume you will use more disk space in the future than you are now.
|
|
Do some quick arithmetic and make a decision.
|
|
Like I said, I only had to buy one drive so far because I'm weird and I had a bunch of
|
|
three terabyte drives available.
|
|
If I had to buy drives, I probably would have tried to start larger.
|
|
I am sure that at some point in the not too distant future when I am running the indie
|
|
archive and not developing it, I will have to upgrade my drives.
|
|
The primary system is the console for the indie archive.
|
|
When you copy a project onto the indie archive, the directory goes into the primary files
|
|
directory.
|
|
From there, it is propagated out to the primary snapshot directory, the secondary system,
|
|
the cloud storage if you are using it and eventually to the remote system.
|
|
All of the data propagation is done with arsenic using the archive setting that is designed
|
|
to preserve file metadata like owner permissions and date less modified.
|
|
So I have been using arsenic with the archive setting to move the files from the work system
|
|
to a USB drive and from the USB drive to the primary files folder.
|
|
At first I thought I would use an optical disk to move the files, but optical disks do
|
|
not preserve file metadata.
|
|
Also I had some word results with the USB flash drive because it was formatted fat32.
|
|
Fat32 does not support Linux metadata, so if you are going to move projects over on a
|
|
flash drive or a USB external drive, be sure to format to EXT4.
|
|
Another way to move projects over to the primary files directory is with tower compression.
|
|
This preserves metadata when the files are extracted, so this might be easier and it works
|
|
with optical drives.
|
|
If your directory will fit on an optical drive, this also gives you another backup on another
|
|
media.
|
|
If you have any suggestions on how to transfer projects while preserving the file metadata,
|
|
let me know.
|
|
I know that there are network options available, but I am hesitant to recommend them because
|
|
if I can transfer files from a system to the primary system over the LAN, then anyone
|
|
can do the same.
|
|
Or delete files, or accidentally delete directories.
|
|
I kind of want to keep TAC control over access to the primary system.
|
|
It kind of ruins the archival quality of the indie archive if anyone on the LAN can
|
|
accidentally mess with it.
|
|
So I am open to dialogue on these issues.
|
|
I am kind of worried I want it to be easy to add projects to the indie archive, but not
|
|
too easy if you know what I mean.
|
|
I feel like having to sit down at the primary system and enter a password should be the minimum
|
|
amount of security required to access the primary system.
|
|
The primary system also runs file integrity checks daily from a crime job.
|
|
All of the propagation and file integrity scripts have to be run as root to preserve the
|
|
metadata since only root can write a file that it doesn't own.
|
|
The secondary system is the SSH server for the indie archive.
|
|
The primary system logs under the secondary system as root using SSH.
|
|
Security is managed with public and private keys, so entering a password is not required.
|
|
After the keys are set up for both the primary and remote systems, password authentication
|
|
is disabled for the SSH server, so only those two systems can SSH into the secondary system.
|
|
When the propagation script is run on the primary system, our snapshot is used to create
|
|
a current version of the primary files directory in the primary snapshots directory.
|
|
Then the primary system uses our sync over SSH to make a copy of the primary files directory
|
|
to the secondary files directory.
|
|
Then the primary system logs onto the secondary system as root, and our snapshot is used
|
|
to create a current version of the secondary files directory on the secondary snapshots directory.
|
|
Finally, if cloud storage is being used, the primary system uses gcloudr sync to make
|
|
a copy of the primary files directory to a Google Cloud Storage bucket archive.
|
|
I have this bucket set to 90 days soft delete.
|
|
If you are using another type of cloud storage on Google, AWS, Mega, or other storage providers,
|
|
this command will have to be adjusted.
|
|
The reason I chose the gcloud archive bucket is because of the storage cost per gigabyte.
|
|
They have the cheapest cost per gigabyte that I found.
|
|
This will keep the monthly bill low.
|
|
Once a day the primary system runs the file integrity check from a crime job using our sync
|
|
to compare the primary files directory to the current version alpha.0 in the primary snapshots directory
|
|
logging any discrepancies.
|
|
It then does the same comparing primary files to secondary files and to the current version
|
|
in the secondary snapshots directory.
|
|
Logging discrepancies and notifying the maintainer of any discrepancies.
|
|
Notification is done by email using curl and an SMTP provider.
|
|
The remote system runs on its own schedule.
|
|
Logging into the secondary system daily to copy data from secondary files to remote files
|
|
and then using our snapshot to make a copy of remote files to the remote snapshots directory.
|
|
Since it's run on a daily schedule, it uses our snapshot with the standard daily, weekly, monthly, and yearly backups.
|
|
The remote system also runs a daily file integrity check comparing remote files to the current version
|
|
on remote snapshots and comparing remote files to both data directories on the secondary system
|
|
again logging the results and notifying the maintainer of any discrepancies.
|
|
If there is an outward facing static IP at the location with the primary and secondary
|
|
systems, then the remote system can use that static IP to SSH into the secondary system.
|
|
If there is not a static IP, then the remote system uses a duct DNS subdomain to log
|
|
onto the secondary system.
|
|
Any system using the same router as the secondary system can run a crime job to update duct DNS
|
|
with the current IP address.
|
|
Since a static IP is a monthly expense, it's important that there's an alternative that
|
|
does not require paying another bill.
|
|
So the secondary system has the SSH server, but it doesn't really do much.
|
|
Both of the other systems connect to it and use it as a junction for data propagation
|
|
and file integrity checks.
|
|
So as you can tell, there's a lot going on to make the indie archive work.
|
|
Future podcasts will get down into the details and discuss some of the choices I had to
|
|
make and why I made them.
|
|
The funny thing about this project is that the actual code was the least amount of work.
|
|
Figuring out exactly how R-Sync and R-Snap Shot work together was quite a bit of work.
|
|
The duration for both R-Snap Shot and SSH took a bit of head scratching.
|
|
Then there were a few user ID tricks I had to work through to make the indie archive usable.
|
|
But by far, the most work was writing the indie archive installation document detailing
|
|
each step of installing the software on three systems.
|
|
It's been fun so far.
|
|
If you have input, I always appreciate the help.
|
|
I get quite a bit of help on Macedon.
|
|
If you go to home.gamerplus.org, you'll find the script for this podcast with the Macedon
|
|
comment thread embedded in the post.
|
|
This podcast is being read from a document that is a work in progress.
|
|
Current versions of what is the indie archive document will be posted at Codeberg when
|
|
I'm ready to upload the project.
|
|
Thanks for listening.
|
|
You have been listening to Hacker Public Radio at HackerPublicRadio.org.
|
|
Today's show was contributed by a HBR listener like yourself.
|
|
If you ever thought of recording a podcast, then click on our contribute link to find out
|
|
how easy it really is.
|
|
Hosting for HBR has been kindly provided by an onsthost.com, the internet archive and
|
|
rsync.net.
|
|
On the Sadois status, today's show is released under Creative Commons, Attribution 4.0 International
|
|
License.
|