1 future_upload
Dave Morriss edited this page 2024-06-04 16:48:09 +01:00

future_upload

Description

This is a Bash script which uploads all shows which have not yet been uploaded. It is not possible to skip any shows which are in the pending state. It is possible to limit the number of shows uploaded in a run however - see below.

The script can be found on borg at ~perloid/InternetArchive. It examines the directory /data/IA/uploads. It scans all the files it finds there which conform to the (POSIX extended) regular expression 'hpr[0-9]4.*'. It uses these to recognise shows (every time the file name changes from hpraaaa.* to hprbbbb.* it performs checks on show hpraaaa).

The script determines whether the show is already on the IA. Shows on the IA have names (identifiers in IA terms) which conform to the pattern hpr<number>. Because these searches of the IA are expensive, only newly discovered shows are checked in this way. If a show is already on the IA the identifier is stored in a cache file called .future_upload.dat.

The assumption is made that any show not already on the IA is eligible for upload. With the advent of show state information available through a CMS query, it is now possible to ignore shows which do not have the status MEDIA_TRANSCODED. This addition has not been made as yet (dated 2022-05-11).

The script collects a list of all shows ready for upload up to a limit of 20. The IA servers can become saturated by requests that are over a certain size, so we limit the number of shows per run to help with this. There is currently no way to change this upper limit without editing the script, but it is possible to request a lower limit with the -l option.

A check is made on each show eligible for uploading to ensure that all of the expected files are available. All of the transcoded audio formats are looked for, and if any are missing the script aborts.

Next the script runs make_metadata - if it is in live mode. In dry-run mode it simply reports what would have happened. It determines the names of the output files itself; it uses the same algorithm as make_metadata to ensure the calling script uses the correct names.

Note: It may be desirable to add a means whereby make_metadata could return the file names it uses in a future release.

Calling make_metadata will cause the generation of a CSV file and a Bash script. It the run is successful the CSV "spreadsheet" is passed to the command ia upload --spreadsheet=<name> and if this is successful the Bash script (if any) will be run.

Any errors will result in the upload process being aborted.

If the uploads are successful the IA identities (shows) are written to the cache file.

Help output

This is what is output when the command ./future_upload -h is run.

future_upload - version: 0.0.5

Usage: ./future_upload [-h] [-v] [-D] [-d {0|1}] [-r] [-l cp]

Uploads HPR shows to the Internet Archive that haven't yet been uploaded. This
is as an alternative to uploading the next 5 shows each week for the coming
week.

Options:
  -h                    Print this help
  -v                    Run in verbose mode where more information is reported
  -D                    Run in debug mode where a lot more information is
                        reported
  -d 0|1                Dry run: -d 1 (the default) runs the script in dry-run
                        mode where nothing is uploaded but the actions that
                        will be taken are reported; -d 0 turns off dry-run
                        mode and the actions will be carried out.
  -r                    Run in 'remote' mode, using the live database over an
                        (already established) SSH tunnel. Default is to run
                        against the local database.
  -l N                  Control the number of shows that can be uploaded at
                        once. The range is 1 to 20.

Notes:

1. When running on 'borg' the method used is to run in faux 'local' mode.
   This means we have an open tunnel to the HPR server (mostly left open) and
   the default file .hpr_db.cfg points to the live database via this tunnel.
   So we do not use the -r option here. This is a bit of a hack! Sorry!