Introduction
At present (2023-06-22), show notes are processed externally. The task is performed by Dave Morriss on his PC workstation.
The details of the show processing stages are included here because the way many steps are being done at present, in comparison to the way they will be done, needs consideration (and possibly debate) before being changed.
Overview
The overall process consists of these steps:
- A new show is detected (or multiple shows if appropriate)
- The status of the upload is determined, and only when complete, the next stage is performed
- A subset of the files in the directory on the HPR server where incoming shows are dropped is synchronised using
rsync
- The files collected are written to a local directory for processing
- Processing consists of:
- Examination of the file
shownotes.json
- Parsing of this file
- Extraction of elements of the file, particularly the show notes, title and summary
- If there are assets (pictures, scripts, etc) they may need work
- The notes are edited, and may need conversion, reformatting, etc.
- If not HTML already the notes are converted to HTML
- A local stand-alone copy of the notes is generated and can be viewed in a browser
- Further work may be needed to refine the notes
- Any assets are sent to the HPR server
- The HTML is sent back to the upload directory
- The status of the show is set to
METADATA_PROCESSED
in thereservations
table in the HPR database - A message is sent to the HPR Janitor's Closet room on Matrix
- Examination of the file
- The local directory for the show is retained in case further work is required, and deleted after a time by a script
Details
Show detection
- At present, new shows are detected by scraping the page at https://hub.hackerpublicradio.org/calendar.php
- The scraper is a Perl script called
scrape_HPR
and it is run every 30 minutes during the day (UK time) - The scraper detects all of the shows in the main table and categorises them by matching with regular expressions. It is able to spot empty slots, reserved slots, slots that have been requested by clicking on their links and are uploading, uploaded shows awaiting processing and slots which contain already processed shows.
- Uploading and uploaded shows in need of processing are reported by various methods: pop-up alerts, sounds, and the triggering of IoT devices (LED lights)
- The scraper also detects what it sees as anomalies:
- Shows that appear "fully formed" without apparently going through the upload process
- Shows that disappear where they had once existed in the fully uploaded state
- Since the presence of the local show directory affects other parts of the workflow some extra actions are taken when anomalies are detected:
- New unexpected shows cause "dummy" directories to be created
- Show disappearances cause existing directories to be moved to a holding area (in case they are wanted in the future)
Redesign
-
The use of a scraper as described here might not be optimal since it is very dependent on the format of the
calendar.php
page -
There exists a database table called
reservations
which holds status information about shows being received and processed -
An interface to this table exists which can be accessed through
curl
. This interface is used in other scripts within the show processing workflow. -
Discussion of the use of this interface in preference to the current web scraping interface is ongoing and will be documented here.
-
The
reservations
table is populated as a new show is being set up by the host selecting a slot on thecalendar.php
page -
The status values are:
Name | Short description | Comments |
---|---|---|
REQUEST_UNVERIFIED | unverified | shouldn't be returned |
REQUEST_EMAIL_SENT | email sent | host sent the email with a link |
EMAIL_LINK_CLICKED | pending | filling in the form/sending the show |
SHOW_SUBMITTED | uploaded | upload complete |
METADATA_PROCESSED | metadata processed | notes processed, etc |
SHOW_POSTED | in the database | awaiting audio transcoding |
MEDIA_TRANSCODED | transcoded | audio transcoded |
UPLOADED_TO_IA | uploaded to IA | uploaded to IA |
UPLOADED_TO_RSYNC_NET | archived | archived on rsync.net [¹] |
[¹] free allocation exhausted?
- What cannot be detected from the above list?
- If a show existed at some point but has been deleted there's no way of telling. The present system, because it keeps a record of processed shows can spot a free slot where there was a show before.
- Slots which have been reserved (such as for Community News shows) cannot be detected since they are not in the
reservations
table. - If a slot becomes filled in a "non-standard" way, bypassing the normal route of appearing in the
reservations
table and progressing though the above states, this cannot be detected. In the past this has sometimes happened as reserve shows have been added to this system, for example.
Copying files from the server
- This is achieved with
rsync
over an ssh connection which is run from a script calledsync_hpr
- The
rsync
command uses a filter which limits what is copied:--filter=". .rsync_hpr_upload"
, where the filter in the file ignores all (likely) media types, files likely to have been written to the directories during processing, and various others. - The
rsync
command also deletes any local directories which have been deleted on the server
Copying show files to a working area
- Downloaded files are stored in a local directory named after the show, such as
shownotes/hpr1234
. - The script
copy_shownotes
selects new shows from theupload/
directory and copies them to a working directory as described. It does this usingfind
with a regular expression matching the directory structure.
Processing shows
- A
pdmenu
menu is used to manage show processing. This is created dynamically for each show which is ready for processing, and does so in numerical order. A script calledmakemenu
is used to generate each menu. - Once a show has been found that is eligible for processing statistics about are collected by parsing the
shownotes.json
file (usingjq
) and the menu tailored for the type of action which may be required. Such choices as whether to pre-process images and whether to upload assets to the server
TBA