152 lines
9.9 KiB
Plaintext
152 lines
9.9 KiB
Plaintext
|
|
Episode: 4341
|
||
|
|
Title: HPR4341: Transferring Large Data Sets
|
||
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr4341/hpr4341.mp3
|
||
|
|
Transcribed: 2025-10-25 23:20:05
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
This is Hacker Public Radio Episode 4341 for Monday 24 March 2025.
|
||
|
|
Today's show is entitled Transferring Large Data Sets.
|
||
|
|
It is part of the series programming 101.
|
||
|
|
It is hosted by Harry Larry and is about 11 minutes long.
|
||
|
|
It carries a clean flag.
|
||
|
|
The summary is, how to transfer large data sets using TAR and Blurray disks while preserving
|
||
|
|
metadata.
|
||
|
|
Transferring Large Data Sets
|
||
|
|
Very large data sets present their own problems.
|
||
|
|
Not everyone has directories with hundreds of gigabytes of project files, but I do, and
|
||
|
|
I assume I'm not the only one.
|
||
|
|
For instance, I have a directory with over 700 radio shows.
|
||
|
|
My name these directories also have a podcast, and they also have pictures and text files.
|
||
|
|
Doing a properties check on the directory, I see 450 gigabytes of data.
|
||
|
|
When I started envisioning Libre Indie Archive, I wanted to move directories into archival
|
||
|
|
storage using optical drives.
|
||
|
|
My first attempt at this didn't work because I lost metadata when I wrote the optical
|
||
|
|
drives since optical drives are read only.
|
||
|
|
After further work and study, I learned that TAR files can preserve metadata if they
|
||
|
|
are created and uncompressed as root.
|
||
|
|
In fact, if you're running TAR as root preserving file ownership and permissions is the default.
|
||
|
|
So this means that optical drives are an option if you write TAR archives onto the optical
|
||
|
|
drives.
|
||
|
|
I have better success rates with 25 gigabyte Blu-ray disks than with the 50 gigabyte disks.
|
||
|
|
So if your directory breaks up into projects that fit on 25 gigabyte disks, that's great.
|
||
|
|
My data did not do this easily, but TAR does have an option to write a dataset to multiple
|
||
|
|
TAR files, each with a maximum size, labeling them dash 0, dash 1, etc.
|
||
|
|
When using this multi-volume feature, you cannot use compression, so you will get TAR files
|
||
|
|
not TAR.GZ files.
|
||
|
|
It's better to break the file sets up in more reasonable sizes, so I decided to divide
|
||
|
|
the shows up alphabetically by title, so all the shows starting with the letter A would
|
||
|
|
be one dataset and then down the alphabet one letter at a time.
|
||
|
|
Most of the letters would result in the single TAR file label dash 0 that would fit on the
|
||
|
|
25 gigabyte disk.
|
||
|
|
Many letters, however, took two or even three TAR files that would have to be written on
|
||
|
|
different disks and then concatenated on the primary system before they are extracted
|
||
|
|
to the correct location in primary files.
|
||
|
|
There is a companion program to TAR called TARCAT that I used to combine two or three
|
||
|
|
TAR files split by length into a single TAR file that could be extracted.
|
||
|
|
I ran in grandpa as root to extract the files.
|
||
|
|
So I used a TAR command on the working system where my something blue radio shows their
|
||
|
|
stored.
|
||
|
|
Then I used K3B to burn these files onto a 25 gigabyte blue ray disk, carefully labeling
|
||
|
|
the disk and writing a text file that I used to keep up with which files I had already
|
||
|
|
copied to disk.
|
||
|
|
Then on the Libre Indy Archive primary system, I copied from the blue ray to the boot drive
|
||
|
|
the file or files for that dataset.
|
||
|
|
Then I would use TARCAT to combine the files if there was more than one file for that
|
||
|
|
dataset.
|
||
|
|
And finally, I would extract the files to primary files by running and grandpa as root.
|
||
|
|
Now I'm going into details on each of these steps.
|
||
|
|
First make sure that the Libre Indy Archive program prep.sh is in your home directory
|
||
|
|
on your workstation.
|
||
|
|
Then from the data directory to be archived, in my case the something underscore blue directory
|
||
|
|
run prep.sh like this, till the slash prep.sh.
|
||
|
|
This will create a file named IAunderorigin.txt that lists the date, the computer and directory
|
||
|
|
being archived, and the users and user IDs on that system.
|
||
|
|
All very helpful information to have, if it's sometime in the future, you need to do
|
||
|
|
a restore.
|
||
|
|
Next create a TAR dataset for each letter of the alphabet.
|
||
|
|
You may want to divide your dataset in a different way.
|
||
|
|
Open a terminal in the same directory as the data directory, my something blue directory,
|
||
|
|
so that LS displays something blue, your data directory.
|
||
|
|
I keep the something blue shows and podcasts in sub-directories in the something blue directory.
|
||
|
|
Here's the TAR command.
|
||
|
|
sudo TAR dash cv dash dash tape dash length equals 20 million.
|
||
|
|
dash dash file equals something blue dash a dash squirley bracket zero dot dot 50 close
|
||
|
|
squirley bracket dot TAR.
|
||
|
|
space slash home slash slurry slash delta slash something under blue slash a star.
|
||
|
|
This is for the letter a, so the file parameter includes the letter a.
|
||
|
|
The number zero dot dot 50 in the squirley brackets are the sequence numbers for the files.
|
||
|
|
I only had one file for the letter a, something blue dash a dash zero dot TAR.
|
||
|
|
The last parameter is the source for the TAR files.
|
||
|
|
In this case, slash home slash slurry slash delta slash something blue slash a star.
|
||
|
|
All of the files and directories in the something blue directory that start with the letter a.
|
||
|
|
You may want to change the dash dash tape dash length parameter.
|
||
|
|
As listed, it stores up to 19.1 gigabytes.
|
||
|
|
The maximum capacity of a 25 gigabyte blue ray is 23.3 gigabytes for data storage.
|
||
|
|
Example B. For the letter B, I ended up with three TAR files.
|
||
|
|
Something blue dash B dash zero dot TAR, something blue dash B dash one dot TAR, and something blue
|
||
|
|
dash B dash two dot TAR.
|
||
|
|
I will use these files in the example below using TARCAT to combine the files.
|
||
|
|
I use K3B to burn blue ray data disk.
|
||
|
|
Besides installing K3B, you have to install some other programs.
|
||
|
|
Then there is a particular setup that needs to be done, including selecting CD record and no multi session.
|
||
|
|
Here's an excellent article that will go step by step through the installation and setup.
|
||
|
|
How to burn blue ray disks on Ubuntu and derivatives using K3B and the link.
|
||
|
|
I also always check verify data, and I use the Linux Unix file system not Windows,
|
||
|
|
which will rename your files if the file names are too long.
|
||
|
|
I installed a blue ray reader into the primary system, and I used sooner to copy the files from the blue ray disk to the boot drive.
|
||
|
|
In the primary file directory, I made a sub directory, something under blue, to hold the archive shows.
|
||
|
|
If there is only one file, like an example A above, you can skip the concatenation step.
|
||
|
|
If there is more than one file, like example B above, you use TARCAT to concatenate these files into one TAR file.
|
||
|
|
You have to do this.
|
||
|
|
If you try to extract from just one of the numbered files, when there is more than one, you will get an error.
|
||
|
|
So if I try to extract from something blue, dash B dash 0 dot TAR, and I get an error, it doesn't mean that there's anything wrong with that file.
|
||
|
|
It just has to be concatenated with the other B files before it can be extracted.
|
||
|
|
There is a companion program to TAR, called TARCAT, that should be used to concatenate the TAR files.
|
||
|
|
Here's the command I used, for example B above.
|
||
|
|
TARCAT, something blue dash B dash 0 dot TAR, space, something blue dash B dash 1 dot TAR space,
|
||
|
|
something blue dash B dash 2 dot TAR, space, redirect to, space, SB dash B dot TAR.
|
||
|
|
This will concatenate the three smaller TAR files into one bigger TAR file named SB dash B dot TAR.
|
||
|
|
In order to preserve the metadata, you have to extract the files as root.
|
||
|
|
In order to make it easier to select the files to be extracted, and where to store them.
|
||
|
|
I used the GUI archive manager and grandpa.
|
||
|
|
To run and grandpa as root, open a terminal with CTRL ALT and use this command.
|
||
|
|
Sudo, dash capital H and grandpa.
|
||
|
|
Click open and select the TAR file to extract.
|
||
|
|
Then follow the path until you are in the something blue directory,
|
||
|
|
and you are seeing the folders and files you want to extract.
|
||
|
|
Type CTRL A to select them all.
|
||
|
|
Instead of the something blue directory, you will go to your data directory.
|
||
|
|
Then click extract at the top of the window.
|
||
|
|
Open the directory where you want the files to go.
|
||
|
|
In my case, primary files slash something blue.
|
||
|
|
Then click extract again in the lower right.
|
||
|
|
After the files are extracted, go to your data directory in primary files,
|
||
|
|
and check that the directories and files are where you expect them to be.
|
||
|
|
You can also open a terminal in that directory and type LSTASH-L to review the metadata.
|
||
|
|
When dealing with that a chunk size 20 gigabytes or more, each of these steps takes time.
|
||
|
|
The reason I like using an optical backup to transfer the files from the working system
|
||
|
|
to the Libra Indie archive is because it gives me an easy-to-store backup that is not on a
|
||
|
|
spinning drive and that cannot be overwritten. Still, optical distortion is not perfect either.
|
||
|
|
It's just another belt to go with your suspenders.
|
||
|
|
Another way to transfer directories into the primary files directory is with SSH over the network.
|
||
|
|
This is not as safe as using optical disk, and it also does not provide the extra snapshot backup.
|
||
|
|
It also takes a long time, but it is not as labor intensive.
|
||
|
|
After I spend some more time thinking about this and testing,
|
||
|
|
I will do a podcast about transferring large data sets with SSH.
|
||
|
|
Although I am transferring large data sets to move them into archival storage using Libra Indie
|
||
|
|
archive, there are many other situations where you might want to move a large data set
|
||
|
|
while preserving the metadata. So what I have written about car files, optical disks,
|
||
|
|
and running sooner and then grandpa as root is generally applicable.
|
||
|
|
As always, comments are appreciated. You can comment on hacker public radio or on mastodon.
|
||
|
|
Visit my blog at home.gamerplus.org where I will post the show notes and embed the mastodon thread
|
||
|
|
for comments about this podcast. Thanks.
|
||
|
|
You have been listening to hacker public radio as hacker public radio does work.
|
||
|
|
Today's show was contributed by a HBR listener like yourself.
|
||
|
|
If you ever thought of recording a podcast,
|
||
|
|
you can click on our contribute link to find out how easy it really is.
|
||
|
|
Hosting for HBR has been kindly provided by
|
||
|
|
an honesthost.com, the internet archive, and our sync.net.
|
||
|
|
On the Sadois status, today's show is released under Creative Commons
|
||
|
|
Attribution 4.0 International License.
|