14 KiB
Show Processing
We receive random files from random people on the Internet, so they are treated with extreme care.
The HPR Database is used to track the status
of each show as it is been processed, using a normally hidden table reservations
. It is not publicly available as it contains the IP address of the host uploading the show as a security measure. Once the show is processed the IP address is removed from the table.
Information about the show process is available to the Janitors via stats.php.
A cron job */15 * * * * /home/hpr/bin/update-stats.bash > /dev/null 2>&1
runs every 15 minutes and saves the file to https://hub.hackerpublicradio.org/stats.json for anyone to use.
Input Directory Structure
The files are never processed by the front end HPR servers, and so all processing is done offline.
The files are downloaded by trusted Janitors that have the ability to scp
/rsync
the files from the HPR to a local machine for processing.
The directory structure is based on a combination of fields separated by the underscore (_
) delimiter.
- Upload date and time
UTC_TIMESTAMP()
at reservation time. - The requested
episode number
or9999
if the reserve queue is to be used. - The requested
epidode date
or1970-01-01
if the reserve queue is to be used. - The random unique key for this request.
2339594445_9278_2044-02-24_aeb0579fcac318005d7550a60fd60403676c24d94148b
2339680845_9999_1970-01-01_4bd713699e5bc0978d5fef85a60f09bc7f70ef3488624
The upload will produce at a minimum a shownote.json
file. It may also include a host photo, and usually a media file for the episode. If the media file is not provided, then the Janitors will attempt to download it from the provided url
. If that is not possible, then the host will be contacted to provide the show, and if they do not or if it will cause a delay, the show slot is freed up for another contributor.
Addition files and images may be provided by the host, eg: images, scripts, pdf documents etc.
Shows destined for the reserve queue are moved from the upload directory and placed in the reserve directory using the script rename-reserve.bash.
This is run manually by the Janitors as it checks to see if a url to the show was provided, and attempts to download the linked file. When new hosts submit a show directly to the reserve queue, the Janitors will resubmit it to the first available slot in the normal queue. This is because new hosts need to have an entry created in the hosts
table, but also because it gives the community an opportunity to welcome the new host.
It renames the directory structure based on a combination of fields separated by the underscore (_
) delimiter.
- Upload date and time
UTC_TIMESTAMP()
at reservation time. - The
hosts.hostid
of the host. - The random unique key for this request.
- The host name
hosts.host
of the host. - The spaces replaced with underscore title of the episode.
2339680845_987_4bd713699e5bc0978d5fef85a60f09bc7f70ef3488624_Emperor_Ming_Top_tips_for_time_travel
Reserve shows are downloaded and submitted by the Janitors on behalf of the host. This follows the normal posting process where the host and Janitors are cc'd on the notification emails. The only difference is that the audio is edited to include a notification that the show is from the reserve queue, and that it is the Janitors that upload the show via the supplied link.
Adding the host to the hosts
table
This is currently added manually by the Janitors, as the text to speech tools often requires manipulation to get it sounding correct.
--
-- Table structure for table `hosts`
--
DROP TABLE IF EXISTS `hosts`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `hosts` (
`hostid` int(10) NOT NULL AUTO_INCREMENT,
`host` text NOT NULL,
`email` text NOT NULL,
`profile` text NOT NULL,
`license` varchar(11) NOT NULL DEFAULT 'CC-BY-SA',
`local_image` int(2) NOT NULL DEFAULT 0,
`gpg` text NOT NULL,
`valid` int(1) NOT NULL DEFAULT 1,
`espeak_name` text DEFAULT NULL COMMENT 'Version of the host name for use with espeak',
PRIMARY KEY (`hostid`)
) ENGINE=MyISAM AUTO_INCREMENT=437 DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_unicode_ci;
/*!40101 SET character_set_client = @saved_cs_client */;
The hosts email address with the `@` replaced with `.nospam@nospam.` as an antispam measureField | Description | Example |
---|---|---|
hostid | Automatically generated incrementing number. | 987 |
host | The name or handle of the host. | Emperor Ming |
Emperor.Ming.nospam@nospam.example.com | ||
profile | A html host profile | <p>Follow me Mastodon: <a rel="me" href="https://mastodon.example.org/@Emperor.Ming">@Emperor.Ming@mastodon.example.org</a></p> |
license | One of the allowed licenses | CC-BY-SA |
local_image | If a avatar is available directly from the host. | 1 |
gpg | We can verify the hosts emails, with [thought to automatically verify](HPR/hpr-tools#4). | 1C7398B00F0239E8 |
valid | Allows temporary de-listing of host | 0 |
espeak_name | The text to speech version of the name or handle. | Fifty One Fifty |
Host Images/Avatars, can either be:
Local Images uploaded when the host submits or edits their profile.
Gravatar Images downloaded periodically by the Janitors.
Default Images added by hpr_generator when the host has no other image.
All the images are currently stored in the hpr_generator repository, and get transferred to the server using hpr-publish.bash.
Adding the episode to the eps
table
The script postshow.bash is run locally on the Janitors computer. It calls the HPR CMS script status.php to return a tab separated list of the shows in the queue.
timestamp_epoc ep_num ep_date key status email
2339594445 9278 2044-02-24 aeb0579fcac318005d7550a60fd60403676c24d94148b SHOW_SUBMITTED joe.blogg.nospam@nospam.example.com
2339680845 9999 1970-01-01 4bd713699e5bc0978d5fef85a60f09bc7f70ef3488624 RESERVE_SHOW_SUBMITTED Emperor.Ming.nospam@nospam.example.com
It selects the first show with a status of SHOW_SUBMITTED
and uses rsync
to clone the directory locally. It then parses the shownotes.json
file and extracts the shownotes object to a new file shownotes.html
.
All embedded images are saved as local image files, with the format hpr${ep_num}_${image_count}
based on "episode number", a underscore as delimiter, and then a sequential number of the image in the notes. eg hpr9876_1.jpg
.
Where they are images larger than 400 pixels wide, a thumbnail is created with the same image name but suffixed with _tn
. eg hpr9876_1_tn.jpg
.
The Janitors then review the shownotes for issues.
The duration will be extracted from the media, and the other metadata from the shownotes.json
file. After some checks it will be URL encoded.
The script allows overwriting of any value in the json file from the command line, and also prevents posting from a new host that has not yet been assigned a entry in the hosts
table.
Once all the checks are done the script will rsync
the following files back to the upload directory.
shownotes_origional.json
the original json file for reference.shownotes.json
the human readable formatted json file.shownotes.html
the extracted and edited show notes.post_show.json
the json file used to create theeps
table entry.- Any additional images in the format
hpr${ep_num}_${image_count}.${ext}
and if greater than 400 pixels, then the thumbnail in the formathpr${ep_num}_${image_count}_tn.${ext}
. - Any supporting files.
Then it will use the curl
command to POST the show to add_show_json.php.
The HPR CMS script add_show_json.php will validate the input, and add an entry into the eps
table of the database.
--
-- Table structure for table `eps`
--
DROP TABLE IF EXISTS `eps`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `eps` (
`id` int(5) NOT NULL,
`date` date NOT NULL,
`title` varchar(100) NOT NULL,
`duration` int(5) NOT NULL,
`summary` varchar(100) NOT NULL,
`notes` text NOT NULL,
`hostid` int(10) NOT NULL,
`series` int(10) NOT NULL,
`explicit` tinyint(1) NOT NULL DEFAULT 1,
`license` varchar(11) NOT NULL DEFAULT 'CC-BY-SA',
`tags` varchar(200) NOT NULL,
`version` int(5) NOT NULL DEFAULT 0,
`downloads` int(11) NOT NULL,
`valid` int(1) NOT NULL DEFAULT 1,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=1275 DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_unicode_ci;
/*!40101 SET character_set_client = @saved_cs_client */;
Field | Description | Example |
---|---|---|
id | This is the Episode Number that will uniquely identify the show. | 9278 |
date | The date the show air, namely when it gets put in the main feed. | 2044-02-24 |
title | This will be a short descriptive title and will be used to describe the show. | |
duration | The length of the submitted show without branding in seconds. | 2850 |
summary | This is a short 100 character summary of what the show is about. Used on the main page, on the mobile site, on printed brochures, on text to speech announcements, Mastodon etc. | |
notes | Additional descriptions and images in html format. | |
hostid | The hosts.hostid of the host. | 789 |
series | The miniseries.id that the show belongs to. | |
explicit | Flags the show as [explicit](https://web.archive.org/web/20150326185817/http://www.apple.com/uk/itunes/podcasts/specs.html). | Clean or Yes |
license | Which licenses.short_name the show is released under. | CC-BY-SA |
tags | Add a list of comma separated tags that represent the essence of the show. | |
version | Deprecated. | |
downloads | Deprecated. | |
valid | Allows temporary de-listing of host. | 0 |