Addition of more design ideas to Managing_Assets.md
parent
8d8d4062d6
commit
db898575db
@ -1,11 +1,11 @@
|
||||
|
||||
# Static Site - asset management
|
||||
### Dave Morriss
|
||||
### Last update: 2023-08-31 11:49:34
|
||||
### Last update: 2023-09-01 11:55:51
|
||||
|
||||
* * *
|
||||
|
||||
# Overview
|
||||
## Overview
|
||||
|
||||
This document describes a script which is being planned to perform actions on
|
||||
the HPR database. The main actions are:
|
||||
@ -50,6 +50,98 @@ the HPR database. The main actions are:
|
||||
necessary to gather this information as new shows are added and existing
|
||||
shows post-processed.
|
||||
|
||||
## Script design
|
||||
|
||||
### General
|
||||
|
||||
- The script will use a configuration file for database credentials and other
|
||||
settings.
|
||||
|
||||
- It will use options to specify which episode or episodes are to be
|
||||
processed and will be able to be run a `dry-run` mode to report what it
|
||||
would do without doing it.
|
||||
|
||||
- The script has the name `manage_assets` at the moment.
|
||||
|
||||
- It will log the actions it takes.
|
||||
|
||||
### Information sources
|
||||
|
||||
- One of the main sources of show creator asset information is currently the
|
||||
notes. This technique is used in the script `make_metadata`, the script
|
||||
which prepares shows for upload to the Internet Archive, which in the past
|
||||
has scanned the notes for file references on the HPR server, and has used
|
||||
this information has to download these files in order to upload them.
|
||||
- The approach taken by `make_metadata` has been to scan the notes for
|
||||
files, as mentioned, and if any of these have themselves been HTML, to
|
||||
scan these too for file references.
|
||||
- The goal of scanning for files was first to ensure that they were
|
||||
uploaded to the Internet Archive, but secondly to rewrite their URLs
|
||||
such that the shows were self-contained on the Internet Archive.
|
||||
|
||||
- Another source of the asset information, both the show creator-produced
|
||||
assets, and the audio and transcripts, is the Internet Archive itself. It is
|
||||
possible to collect metadata from there (in JSON format) which lists all the
|
||||
files originally uploaded.
|
||||
|
||||
- There may have been a few files which were not uploaded to the Internet
|
||||
Archive because they were not referenced in the notes. If this is the case,
|
||||
only a scan of the backups of the files stored on the old HPR server can
|
||||
identify them, and hopefully allow them to be added to the Internet Archive
|
||||
and referenced by the episode.
|
||||
|
||||
- In the past, the information gathered about assets was not stored in the
|
||||
database. It is important that this deficiency be rectified by the
|
||||
`manage_assets` script so that it will not be necessary to hunt through
|
||||
notes and subsidiary HTML files to find their names in the future.
|
||||
|
||||
### Algorithms
|
||||
|
||||
[This is a first draft, and is likely to be incomplete]
|
||||
|
||||
- Given a show number, the script will search for it in the database.
|
||||
- If not found, then that show will be skipped
|
||||
- If found then the entries in the `assets` table will be collected as
|
||||
well as the `eps.valid` setting.
|
||||
- If no assets are found and `eps.valid = 0` then this is a new show, the
|
||||
asset details of which are to be loaded into the `assets` table.
|
||||
- The Internet Archive upload might be on-going, which can be
|
||||
determined by querying the `IA` API for pending tasks. If all tasks
|
||||
have run the metadata can be collected and used to fill in asset
|
||||
details. Rather than waiting for tasks to complete it will probably
|
||||
be easier to skip this show and process it later.
|
||||
- Currently, some audio file details are obtained from the files
|
||||
themselves. Quite how to do this needs discussion - unless the
|
||||
`manage_assets` script is being run on the system that holds the
|
||||
files it might be problematic (though an SSH connection could be
|
||||
used to do this remotely)
|
||||
- If assets are found but `eps.valid = 0` this is an anomaly.
|
||||
- If assets are found but `eps.valid = 1` this is is a show that has
|
||||
previously been uploaded.
|
||||
- The assets can be collected from the Internet Archive metadata and
|
||||
compared with what is stored. Any that are missing can be added, and
|
||||
any that differ can be updated. Possibly, any asset records in the
|
||||
database but not on the Internet Archive can be deleted.
|
||||
- If it is necessary to obtain details of assets that are not stored
|
||||
in the Internet Archive then it might be necessary to download the
|
||||
files and store them in a cache for examination - after which they
|
||||
will be deleted.
|
||||
- **NOTE** This section is in need of further thought! \
|
||||
The notes and asset files will need to be scanned to determine if the
|
||||
URLs need to be changed.
|
||||
- Ideally, asset URLs should be absolute. If so, it is simple to
|
||||
determine if a change is needed.
|
||||
- We have no means of marking a show in the database as having been
|
||||
processed by the `manage_assets` script.
|
||||
- Storing the absolute asset URL in the `assets` table will help to
|
||||
simplify processing. If the URL in the table is the same as that in
|
||||
the notes, then no change is needed. If not, then presumably an
|
||||
update *is* needed.
|
||||
- This is complicated by the presence of relative URLs.
|
||||
- The change required in a given asset URL can be determined by a
|
||||
*base URL* in the configuration file.
|
||||
|
||||
|
||||
<!--
|
||||
- vim: syntax=markdown:ts=8:sw=4:ai:et:tw=78:fo=tcqn:fdm=marker:com+=fb\:-
|
||||
-->
|
||||
|
Loading…
Reference in New Issue
Block a user