hpr-tools/InternetArchive/snapshot_metadata

205 lines
5.7 KiB
Plaintext
Raw Normal View History

#!/bin/bash -
#===============================================================================
#
# FILE: snapshot_metadata
#
# USAGE: ./snapshot_metadata episode_number
#
Updates for missing asset "repair" InternetArchive/recover_transcripts: Bash script to be run on 'borg' which collects files missing on the IA ready for upload as part of the missing asset repair process. InternetArchive/repair_assets: Bash script to take assets from the IA (after they had been repaired on 'borg') and copy them to the HPR server for the notes to access. The local machine, where this was run, was used to store files being uploaded. The planned script to modify the notes to reflect the new file locations was never finished. Notes were edited with Vim using a few macros. InternetArchive/repair_item: Bash script which is best run on 'borg', which repairs an IA item by comparing the files on the IA with the files on 'borg' (or a local machine). These files are either in '/data/IA/uploads/' or in the temporary file hierarchy used by 'recover_transcripts' (which calls it). Used after a normal IA upload to check for and make good any missed file uploads (due to timeouts, etc). Also used during asset repairs, but that project is now finished. InternetArchive/snapshot_metadata: Bash script which collects detailed metadata from the IA in JSON format and saves it locally (run on a local PC). Older shows on the IA often contained derivative files which were identified by the script 'view_derivatives'. These files were never needed, they were IA artefacts, so can be deleted (see the script header for how). InternetArchive/view_derivatives: Perl script to interpret a file of JSON metadata from the IA for an HPR show in order to determine the parent-child hierarchy of files where there may be derivatives. We don't want IA-generated derivatives, but this process was hard to turn off in earlier times. Generates a hierarchical report and a list of unwanted derivatives (see 'snapshot_metadata' for more details of how this was used).
2024-11-23 22:28:52 +00:00
# DESCRIPTION: Collects JSON metadata from the IA for a given show and stores
# it in the cache. Runs 'view_derivatives' on the JSON to
# display the derivatives if any, and to save their names if
# found, for deletion.
# Deletion is performed thus (external to this script):
#
# cat assets/hpr$(./next_repair)/derived.lis | xargs ia delete hpr$(./next_repair) --no-backup
#
# OPTIONS: ---
# REQUIREMENTS: ---
# BUGS: ---
# NOTES: ---
# AUTHOR: Dave Morriss (djm), Dave.Morriss@gmail.com
Updates for missing asset "repair" InternetArchive/recover_transcripts: Bash script to be run on 'borg' which collects files missing on the IA ready for upload as part of the missing asset repair process. InternetArchive/repair_assets: Bash script to take assets from the IA (after they had been repaired on 'borg') and copy them to the HPR server for the notes to access. The local machine, where this was run, was used to store files being uploaded. The planned script to modify the notes to reflect the new file locations was never finished. Notes were edited with Vim using a few macros. InternetArchive/repair_item: Bash script which is best run on 'borg', which repairs an IA item by comparing the files on the IA with the files on 'borg' (or a local machine). These files are either in '/data/IA/uploads/' or in the temporary file hierarchy used by 'recover_transcripts' (which calls it). Used after a normal IA upload to check for and make good any missed file uploads (due to timeouts, etc). Also used during asset repairs, but that project is now finished. InternetArchive/snapshot_metadata: Bash script which collects detailed metadata from the IA in JSON format and saves it locally (run on a local PC). Older shows on the IA often contained derivative files which were identified by the script 'view_derivatives'. These files were never needed, they were IA artefacts, so can be deleted (see the script header for how). InternetArchive/view_derivatives: Perl script to interpret a file of JSON metadata from the IA for an HPR show in order to determine the parent-child hierarchy of files where there may be derivatives. We don't want IA-generated derivatives, but this process was hard to turn off in earlier times. Generates a hierarchical report and a list of unwanted derivatives (see 'snapshot_metadata' for more details of how this was used).
2024-11-23 22:28:52 +00:00
# VERSION: 0.0.3
# CREATED: 2024-08-16 20:36:51
Updates for missing asset "repair" InternetArchive/recover_transcripts: Bash script to be run on 'borg' which collects files missing on the IA ready for upload as part of the missing asset repair process. InternetArchive/repair_assets: Bash script to take assets from the IA (after they had been repaired on 'borg') and copy them to the HPR server for the notes to access. The local machine, where this was run, was used to store files being uploaded. The planned script to modify the notes to reflect the new file locations was never finished. Notes were edited with Vim using a few macros. InternetArchive/repair_item: Bash script which is best run on 'borg', which repairs an IA item by comparing the files on the IA with the files on 'borg' (or a local machine). These files are either in '/data/IA/uploads/' or in the temporary file hierarchy used by 'recover_transcripts' (which calls it). Used after a normal IA upload to check for and make good any missed file uploads (due to timeouts, etc). Also used during asset repairs, but that project is now finished. InternetArchive/snapshot_metadata: Bash script which collects detailed metadata from the IA in JSON format and saves it locally (run on a local PC). Older shows on the IA often contained derivative files which were identified by the script 'view_derivatives'. These files were never needed, they were IA artefacts, so can be deleted (see the script header for how). InternetArchive/view_derivatives: Perl script to interpret a file of JSON metadata from the IA for an HPR show in order to determine the parent-child hierarchy of files where there may be derivatives. We don't want IA-generated derivatives, but this process was hard to turn off in earlier times. Generates a hierarchical report and a list of unwanted derivatives (see 'snapshot_metadata' for more details of how this was used).
2024-11-23 22:28:52 +00:00
# REVISION: 2024-10-02 17:40:13
#
#===============================================================================
set -o nounset # Treat unset variables as an error
Updates for missing asset "repair" InternetArchive/recover_transcripts: Bash script to be run on 'borg' which collects files missing on the IA ready for upload as part of the missing asset repair process. InternetArchive/repair_assets: Bash script to take assets from the IA (after they had been repaired on 'borg') and copy them to the HPR server for the notes to access. The local machine, where this was run, was used to store files being uploaded. The planned script to modify the notes to reflect the new file locations was never finished. Notes were edited with Vim using a few macros. InternetArchive/repair_item: Bash script which is best run on 'borg', which repairs an IA item by comparing the files on the IA with the files on 'borg' (or a local machine). These files are either in '/data/IA/uploads/' or in the temporary file hierarchy used by 'recover_transcripts' (which calls it). Used after a normal IA upload to check for and make good any missed file uploads (due to timeouts, etc). Also used during asset repairs, but that project is now finished. InternetArchive/snapshot_metadata: Bash script which collects detailed metadata from the IA in JSON format and saves it locally (run on a local PC). Older shows on the IA often contained derivative files which were identified by the script 'view_derivatives'. These files were never needed, they were IA artefacts, so can be deleted (see the script header for how). InternetArchive/view_derivatives: Perl script to interpret a file of JSON metadata from the IA for an HPR show in order to determine the parent-child hierarchy of files where there may be derivatives. We don't want IA-generated derivatives, but this process was hard to turn off in earlier times. Generates a hierarchical report and a list of unwanted derivatives (see 'snapshot_metadata' for more details of how this was used).
2024-11-23 22:28:52 +00:00
VERSION="0.0.3"
SCRIPT=${0##*/}
# DIR=${0%/*}
STDOUT="/dev/fd/2"
#
# Select the appropriate working directory for the host
#
case $(hostname) in
i7-desktop)
BASEDIR="$HOME/HPR/InternetArchive"
;;
borg)
BASEDIR="$HOME/IA"
;;
*)
echo "Wrong host!"
exit 1
;;
esac
cd "$BASEDIR" || { echo "Failed to cd to $BASEDIR"; exit 1; }
#
# Load library functions
#
LIB="$HOME/HPR/function_lib.sh"
[ -e "$LIB" ] || { echo "Unable to source functions"; exit; }
# shellcheck disable=SC1090
source "$LIB"
#
# Enable coloured messages
#
define_colours
#
# Sanity checks
#
IA=$(command -v ia)
[ -n "$IA" ] || { echo "Program 'ia' was not found"; exit 1; }
VIEWD="$BASEDIR/view_derivatives"
[ -e "$VIEWD" ] || { echo "Program '$VIEWD' was not found"; exit 1; }
# {{{ -- Functions -- _usage
#=== FUNCTION ================================================================
# NAME: make_dir
# DESCRIPTION: Make a directory if it doesn't exist, failing gracefully on
# errors.
# PARAMETERS: $1 directory path
# RETURNS: True if success, otherwise exits the caller script
#===============================================================================
make_dir () {
local dir="${1}"
if [[ ! -d $dir ]]; then
mkdir -p "$dir" || {
coloured 'red' "Failed to create $dir"
exit 1
}
fi
}
#=== FUNCTION ================================================================
# NAME: _usage
# DESCRIPTION: Reports usage; always exits the script after doing so
# PARAMETERS: 1 - the integer to pass to the 'exit' command
# RETURNS: Nothing
#===============================================================================
_usage () {
local -i result=${1:-0}
cat >$STDOUT <<-endusage
${SCRIPT} - version: ${VERSION}
Usage: ./${SCRIPT} showid
Collects notes for a show and adds them to the cache directory
Arguments:
showid The show id in the form 'hpr1234'
endusage
exit "$result"
}
# }}}
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#-------------------------------------------------------------------------------
# Argument check
#-------------------------------------------------------------------------------
# Should have one argument
#
if [[ $# != 1 ]]; then
coloured 'red' "Missing argument"
_usage 1
fi
show="${1,,}"
#
Updates for missing asset "repair" InternetArchive/recover_transcripts: Bash script to be run on 'borg' which collects files missing on the IA ready for upload as part of the missing asset repair process. InternetArchive/repair_assets: Bash script to take assets from the IA (after they had been repaired on 'borg') and copy them to the HPR server for the notes to access. The local machine, where this was run, was used to store files being uploaded. The planned script to modify the notes to reflect the new file locations was never finished. Notes were edited with Vim using a few macros. InternetArchive/repair_item: Bash script which is best run on 'borg', which repairs an IA item by comparing the files on the IA with the files on 'borg' (or a local machine). These files are either in '/data/IA/uploads/' or in the temporary file hierarchy used by 'recover_transcripts' (which calls it). Used after a normal IA upload to check for and make good any missed file uploads (due to timeouts, etc). Also used during asset repairs, but that project is now finished. InternetArchive/snapshot_metadata: Bash script which collects detailed metadata from the IA in JSON format and saves it locally (run on a local PC). Older shows on the IA often contained derivative files which were identified by the script 'view_derivatives'. These files were never needed, they were IA artefacts, so can be deleted (see the script header for how). InternetArchive/view_derivatives: Perl script to interpret a file of JSON metadata from the IA for an HPR show in order to determine the parent-child hierarchy of files where there may be derivatives. We don't want IA-generated derivatives, but this process was hard to turn off in earlier times. Generates a hierarchical report and a list of unwanted derivatives (see 'snapshot_metadata' for more details of how this was used).
2024-11-23 22:28:52 +00:00
# Ensure show id is correctly formatted. We want it to be 'hpr1234' but we
# allow the 'hpr' bit to be omitted, as well as any leading zeroes. We need to
# handle the weirdness of "leading zero means octal" though, but we always
# store it as 'hpr1234' once processed.
#
if [[ $show =~ (hpr)?([0-9]+) ]]; then
Updates for missing asset "repair" InternetArchive/recover_transcripts: Bash script to be run on 'borg' which collects files missing on the IA ready for upload as part of the missing asset repair process. InternetArchive/repair_assets: Bash script to take assets from the IA (after they had been repaired on 'borg') and copy them to the HPR server for the notes to access. The local machine, where this was run, was used to store files being uploaded. The planned script to modify the notes to reflect the new file locations was never finished. Notes were edited with Vim using a few macros. InternetArchive/repair_item: Bash script which is best run on 'borg', which repairs an IA item by comparing the files on the IA with the files on 'borg' (or a local machine). These files are either in '/data/IA/uploads/' or in the temporary file hierarchy used by 'recover_transcripts' (which calls it). Used after a normal IA upload to check for and make good any missed file uploads (due to timeouts, etc). Also used during asset repairs, but that project is now finished. InternetArchive/snapshot_metadata: Bash script which collects detailed metadata from the IA in JSON format and saves it locally (run on a local PC). Older shows on the IA often contained derivative files which were identified by the script 'view_derivatives'. These files were never needed, they were IA artefacts, so can be deleted (see the script header for how). InternetArchive/view_derivatives: Perl script to interpret a file of JSON metadata from the IA for an HPR show in order to determine the parent-child hierarchy of files where there may be derivatives. We don't want IA-generated derivatives, but this process was hard to turn off in earlier times. Generates a hierarchical report and a list of unwanted derivatives (see 'snapshot_metadata' for more details of how this was used).
2024-11-23 22:28:52 +00:00
printf -v show 'hpr%04d' "$((10#${BASH_REMATCH[2]}))"
else
coloured 'red' "Incorrect show specification: $show"
coloured 'yellow' "Use 'hpr9999' or '9999' format"
exit 1
fi
#-------------------------------------------------------------------------------
# Setting up paths
#-------------------------------------------------------------------------------
#
# CACHEDIR is where we store asset details and files
#
CACHEDIR="$BASEDIR/assets"
[ ! -d "$CACHEDIR" ] && {
coloured 'red' "Creating cache directory"
make_dir "$CACHEDIR"
}
#
# Pointers into the cache:
# LOCAL_ASSETDIR - where the cache for this show lives
#
LOCAL_ASSETDIR="$CACHEDIR/${show}"
[ ! -d "$LOCAL_ASSETDIR" ] && {
coloured 'green' "Creating cache directory for $show"
make_dir "$LOCAL_ASSETDIR"
}
METADATA="$CACHEDIR/$show/metadata.json"
DERIVED="$CACHEDIR/$show/derived.lis"
#-------------------------------------------------------------------------------
# Save the IA metadata unless we already have the file
#-------------------------------------------------------------------------------
if [[ ! -e $METADATA ]]; then
if ia metadata "$show" > "$METADATA"; then
coloured 'green' "Created metadata file"
if [[ ! -s $METADATA ]]; then
coloured 'red' "Metadata file is empty"
fi
else
coloured 'red' "Creation of metadata file failed"
exit 1
fi
else
coloured 'yellow' "Metadata already exists, not replacing it"
fi
#-------------------------------------------------------------------------------
# Use the collected metadata to view the state of the IA, and collect the derived file names
#-------------------------------------------------------------------------------
coloured 'blue' "Viewing IA files"
"$VIEWD" -verb "$METADATA"
if "$VIEWD" -list "$METADATA" > "$DERIVED"; then
nfiles="$(wc -l < "$DERIVED")"
coloured 'green' "Saved 'derived' files for show $show ($nfiles)"
else
coloured 'red' "Creation of $DERIVED file failed"
fi
exit
# vim: syntax=sh:ts=8:sw=4:ai:et:tw=78:fo=tcrqn21:fdm=marker