1
0
forked from HPR/hpr-tools

Updates since 2024-06-15

Database/query2tt2: comment and documentation updates; use of Perl's
    try/catch.

InternetArchive/.make_metadata.cfg: added comments for readability

InternetArchive/make_metadata: bug fix needed now that all shows on the HPR server have
    a directory with assets under it.

InternetArchive/repair_assets: new Bash script in development. Collects
    assets from the IA and uploads them to a new directory on the HPR
    server. Will run 'fix_asset_links' (to repair asset links for their
    new directories) once it is ready.

InternetArchive/repair_item: Bash script which was originally written to
    run on 'borg' and upload files to a new IA item when the uploads
    timed out. Now enhanced to upload missing files recovered from the
    HPR backup disk, such as transcripts.
This commit is contained in:
Dave Morriss
2024-07-16 21:39:28 +01:00
parent 9203dc26e0
commit dc0f29e957
6 changed files with 763 additions and 36 deletions

View File

@@ -19,21 +19,24 @@
# and this version (0.4.12) made into the main line version
# because 4.14 was developing in a direction that doesn't fit
# with the changes made to the HPR system in June/July 2023.
# Will now move forward with version numbers.
# Will now move forward with version numbers (and will get
# a duplicate).
# 2024-01-23: Added the 'open' pragma for UTF-8
# 2024-07-08: Fixed a bug where the top-level directory was
# being added to assets paths. See the definition of $linkre for
# more detals.
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# AUTHOR: Dave Morriss (djm), Dave.Morriss@gmail.com
# VERSION: 0.4.14
# CREATED: 2014-06-13 12:51:04
# REVISION: 2024-01-23 16:28:59
# REVISION: 2024-07-08 15:21:02
#
#===============================================================================
use 5.010;
use strict;
use warnings;
use open ':encoding(UTF-8)';
#use utf8;
use open ':std', ':encoding(UTF-8)';
use Carp;
use Getopt::Long;
@@ -1527,8 +1530,21 @@ sub find_links_in_notes {
# http://www.hackerpublicradio.org/eps/hpr1303/Music_Notes.html
# Also things like this (**Why Ken?**)
# ../eps/hpr2945/IMG_20191018_122746Z.jpg
# Don't match things like when *not* processing 1986:
# Don't match things like this when *not* processing 1986:
# http://hackerpublicradio.org/eps/hpr1986/full_shownotes.html#example-2
# ----------------------------------------------------------------------
# NOTE: 2024-07-08
#
# It used to be that we added a top-level hprXXXX directory to URLs
# because there wasn't one on the HPR server. This was because the
# majority of shows without assets had no files; the notes were taken from
# the database and displayed dynamically.
#
# Now all HPR shows have a top-level directory for holding the index.html
# with the pre-created notes page. So we DO NOT want to create that
# top-level part. The RE below matches but doesn't store it or we'd get
# one too many directory levels.
# ----------------------------------------------------------------------
#
$epstr = sprintf( "hpr%04d", $episode );
# my $re
@@ -1537,6 +1553,7 @@ sub find_links_in_notes {
^https?://
(?:www.)?
(?:hacker|hobby)publicradio.org/eps/
$epstr/
(.+)$
}x;
@@ -1558,7 +1575,7 @@ sub find_links_in_notes {
_debug( $DEBUG >= 3, "\$uri = $uri\n" );
_debug( $DEBUG >= 3, "\$uri->fragment = " . $uri->fragment )
if $uri->fragment;
_debug( $DEBUG >= 3, "\$slink = $slink, \n" );
_debug( $DEBUG >= 3, "\$slink = $slink\n" );
#
# Is it an HPR link?
@@ -1760,7 +1777,7 @@ sub find_links_in_file {
# http://www.hackerpublicradio.org/eps/hpr1303/Music_Notes.html
# Also things like this (**Why Ken?**)
# ../eps/hpr2945/IMG_20191018_122746Z.jpg
# Don't match things like when *not* processing 1986:
# Don't match things like this when *not* processing 1986:
# http://hackerpublicradio.org/eps/hpr1986/full_shownotes.html#example-2
#
$epstr = sprintf( "hpr%04d", $episode );