Commit Graph

26 Commits

Author SHA1 Message Date
Dave Morriss
37567bfd16 New 'extract_images' script
Show_Submission/extract_images: new script to read an HTML file looking
    for 'data' scheme URIs (embedded images), extract them and modify
    the HTML to reflect the new source of the image. At present it
    writes a generated file name with a sequence number in it, but the
    appropriate suffix/extension for the image type. This is an alpha
    version which needs further work.

Show_Submission/parse_JSON: attempting to debug a JSON parsing failure.
2024-12-29 16:33:52 +00:00
bf0a1f056d Distribution of the supporting files in notes 2024-12-27 16:01:54 +01:00
e3e458b2d2 Submit processed files back to upload directory 2024-12-27 14:01:20 +01:00
aae14715f5 Added CC-0 to header 2024-12-27 10:41:13 +01:00
e4aab4d7c2 Converted to using json 2024-12-27 08:53:48 +01:00
4a73931e0d Moved from tools/workflow 2024-12-24 17:57:14 +01:00
924e2fb0eb Moving processing tools from hpr-hub repo to workflow 2024-12-24 13:45:42 +01:00
db9b491324 Cleanup of the show processing workflow 2024-12-24 13:41:40 +01:00
Dave Morriss
4408c255d5 Added two new scripts
Show_Submission/author_title.pl: Added the subtitle field taken from the
    JSON into the YAML

Show_Submission/do_pandoc_assets: New script to process Markdown assets
    files. Not in use.
2024-12-01 21:05:50 +00:00
Dave Morriss
b7cae1cb90 Various updates
Show_Submission/copy_shownotes: Changed the location of the function library

Show_Submission/do_brave: Updates to the way local stand-alone HTML is generated for
    review purposes.

Show_Submission/do_index: Changed the location of the function library

Show_Submission/do_pandoc: Changed the location of the function library; now uses
    'author_title.pl' to generate YAML for Pandoc

Show_Submission/do_parse: Trivial change

Show_Submission/do_pictures: Changed the location of the function library; better
    handling of the show specification

Show_Submission/do_report: Changed the location of the function library

Show_Submission/do_update_reservations: Changed the location of the function library

Show_Submission/fix_relative_links: Added features 'say' and 'state'

Show_Submission/parse_JSON: New checks: notes too short, trailing spaces on title,
    summary and tags (needing JSON changes). Check for Markdown in the
    assets (see 'do_pandoc_assets'). New 'trim' function.
2024-12-01 20:45:20 +00:00
Dave Morriss
7e925621f4 Merge branch 'main' of repo.anhonesthost.net:HPR/hpr-tools 2024-11-23 22:39:30 +00:00
Dave Morriss
31eb5d200f Updates for missing asset "repair"
InternetArchive/recover_transcripts: Bash script to be run on 'borg'
    which collects files missing on the IA ready for upload as part of
    the missing asset repair process.

InternetArchive/repair_assets: Bash script to take assets from the IA
    (after they had been repaired on 'borg') and copy them to the HPR
    server for the notes to access. The local machine, where this was
    run, was used to store files being uploaded. The planned script to
    modify the notes to reflect the new file locations was never
    finished. Notes were edited with Vim using a few macros.

InternetArchive/repair_item: Bash script which is best run on 'borg',
    which repairs an IA item by comparing the files on the IA with the
    files on 'borg' (or a local machine). These files are either in
    '/data/IA/uploads/' or in the temporary file hierarchy used by
    'recover_transcripts' (which calls it). Used after a normal IA
    upload to check for and make good any missed file uploads (due to
    timeouts, etc). Also used during asset repairs, but that project is
    now finished.

InternetArchive/snapshot_metadata: Bash script which collects detailed
    metadata from the IA in JSON format and saves it locally (run on
    a local PC). Older shows on the IA often contained derivative files
    which were identified by the script 'view_derivatives'. These files
    were never needed, they were IA artefacts, so can be deleted (see
    the script header for how).

InternetArchive/view_derivatives: Perl script to interpret a file of
    JSON metadata from the IA for an HPR show in order to determine the
    parent-child hierarchy of files where there may be derivatives. We
    don't want IA-generated derivatives, but this process was hard to
    turn off in earlier times. Generates a hierarchical report and
    a list of unwanted derivatives (see 'snapshot_metadata' for more
    details of how this was used).
2024-11-23 22:28:52 +00:00
a4c24296ef Moved in workflow 2024-11-03 17:07:14 +01:00
22fae69de5 2024-10-29_14-25-10_CET 2024-10-29 14:25:10 +01:00
Dave Morriss
6aff45a10b Minor updates
InternetArchive/repair_assets: Accidentally reverset the "sanity check"
    logic, so put it back the right way!

InternetArchive/view_derivatives: Started on the POD documentation but
    didn't get very far.
2024-08-23 21:17:21 +01:00
Dave Morriss
dd3bf0c981 Minor updates to repair_assets 2024-08-22 21:51:44 +01:00
Dave Morriss
d3c4f3907f Removed obsolete scripts and the SQLite database 2024-08-22 19:39:46 +01:00
Dave Morriss
0ccbb6419a Adding new 'view_derivatives'
InternetArchive/view_derivatives: New Perl script which Reads JSON
    metadata from the IA and builds tree-like structures linking
    original and derived files on the IA. It reports these trees and
    saves a subset of derived files in an output file to be used for
    deletion. In general we do not want derivatives, we make them
    ourselves. Older software had no reliable way to prevent them.
2024-08-22 13:25:22 +01:00
Dave Morriss
19030fee71 Updates for show "repair" processing
InternetArchive/future_upload: Added logging and debugging

InternetArchive/ia_db.sql: Added new tables

InternetArchive/recover_transcripts: New script to run on 'borg' and
    copy missing files from the backup disk to the IA

InternetArchive/repair_assets: More comments, including one about a bug in the design.

InternetArchive/repair_item: Fix relating to octal numbers (if there are
    leading zeroes in a number). '_DEBUG' is now in the function
    library. Added comments to explain obscure stuff.

InternetArchive/snapshot_metadata: New Bash script (to run on my
    desktop) which collects metadata for a show and stores in in the
    '~/HPR/IA/assets' directory. Runs 'view_derivatives' on it to find
    derivative files for deletion.

InternetArchive/tidy_uploaded: Moves files and directories containing
    uploaded files into a holding area for later backup. Added
    debugging, logging and a 'force' mode.

InternetArchive/upload_manager: Manages 'ia.db' (on my workstation).
    Needs many updates which have just started to be added.

InternetArchive/weekly_upload: Old script, now obsolete.
2024-08-22 13:13:38 +01:00
Dave Morriss
dc0f29e957 Updates since 2024-06-15
Database/query2tt2: comment and documentation updates; use of Perl's
    try/catch.

InternetArchive/.make_metadata.cfg: added comments for readability

InternetArchive/make_metadata: bug fix needed now that all shows on the HPR server have
    a directory with assets under it.

InternetArchive/repair_assets: new Bash script in development. Collects
    assets from the IA and uploads them to a new directory on the HPR
    server. Will run 'fix_asset_links' (to repair asset links for their
    new directories) once it is ready.

InternetArchive/repair_item: Bash script which was originally written to
    run on 'borg' and upload files to a new IA item when the uploads
    timed out. Now enhanced to upload missing files recovered from the
    HPR backup disk, such as transcripts.
2024-07-16 21:39:28 +01:00
Dave Morriss
9203dc26e0 Added InternetArchive/function_lib.sh
InternetArchive/function_lib.sh: new file; subset of
    '~/bin/function_lib.sh' which is referred to in a number of scripts.
    It contains relevant functions such as 'yes_no' and 'define_colours'.
2024-06-15 18:47:24 +01:00
Dave Morriss
6805cd662b Updated 'repair_item'
InternetArchive/repair_item: originally planned in 2020 as a Bash script
    to find missing files in shows and then add them, it was not turned
    into the current form until May 2024. Now, with the heavy loading of
    the IA servers, normal uploads are timing out and being aborted.
    This script is more "determined" to upload files and usually
    successfully "repairs" shows that need it.
2024-06-15 17:14:22 +01:00
Dave Morriss
7a4290fcdd Attempting to generate fix_tags.bin 2024-06-14 23:13:45 +01:00
Dave Morriss
50edeccc88 Updates from previous repo
FAQ/FAQ.mkd, FAQ/Makefile: this version of the FAQ is now out of date
    and probably should be deleted.

InternetArchive/repair_item: script to upload missing shows after tie
    out errors during the normal upload; still under development.

InternetArchive/update_state: script to update show state in the
    'reservations' table in the database. Uses the CMS interface.

Link_Checker/scan_links: under development. Not currently usable.

Miscellaneous/fix_tags: audio metadata manipulation script. Recently
    added to this repo for convenience. Updates for 'experimental::try',
    the official Perl try/catch.

PostgreSQL_Database/add_hosts_to_show, PostgreSQL_Database/hpr_schema_2.pgsql,
    PostgreSQL_Database/nuke_n_pave.sh: an old experimental Pg database
    to take over from the previous MySQL version (from before 2023).
    Kept for reference; never implemented.
2024-06-14 16:00:04 +01:00
Dave Morriss
38abbcdd39 Moved project directories and files to an empty local repo 2024-06-04 16:35:44 +01:00
2d2b937a9b Initial commit 2024-06-04 15:06:29 +00:00