forked from HPR/hpr_documentation
189 lines
9.6 KiB
Markdown
189 lines
9.6 KiB
Markdown
# File Structure
|
|
|
|
We receive random files from random people on the Internet, so they are treated with extreme care.
|
|
|
|
The HPR Database is used to track the `status` of each show as it is been processed, using a normally hidden table `reservations`. It is not publicly available as it contains the IP address of the host uploading the show as a security measure. Once the show is processed the IP address is removed from the table.
|
|
|
|
Information about the show process is available to the Janitors via [stats.php](https://repo.anhonesthost.net/HPR/hpr_hub/src/branch/main/cms/stats.php).
|
|
|
|
A cron job `*/15 * * * * /home/hpr/bin/update-stats.bash > /dev/null 2>&1` runs every 15 minutes and saves the file to [https://hub.hackerpublicradio.org/stats.json](https://hub.hackerpublicradio.org/stats.json) for anyone to use.
|
|
|
|
## Input Directory Structure
|
|
|
|
The files are never processed by the front end HPR servers, and so all processing is done offline.
|
|
|
|
The files are downloaded by trusted Janitors that have the ability to `scp`/`rsync` the files from the HPR to a local machine for processing.
|
|
|
|
The directory structure is based on a combination of fields separated by the underscore (`_`) delimiter.
|
|
|
|
- Upload date and time `UTC_TIMESTAMP()` at reservation time.
|
|
- The requested `episode number` or `9999` if the reserve queue is to be used.
|
|
- The requested `epidode date` or `1970-01-01` if the reserve queue is to be used.
|
|
- The random unique key for this request.
|
|
|
|
```
|
|
2339594445_9278_2044-02-24_aeb0579fcac318005d7550a60fd60403676c24d94148b
|
|
2339680845_9999_1970-01-01_4bd713699e5bc0978d5fef85a60f09bc7f70ef3488624
|
|
```
|
|
Shows destined for the reserve queue are moved from the upload directory and placed in the reserve directory using the script [rename-reserve.bash](https://repo.anhonesthost.net/HPR/hpr-tools/src/branch/main/workflow/rename-reserve.bash).
|
|
|
|
This is run manually by the Janitors as it checks to see if a url to the show was provided, and attempts to download the linked file. When new hosts submit a show directly to the reserve queue, the Janitors will resubmit it to the first available slot in the normal queue. This is because new hosts need to have an entry created in the `hosts` table, but also because it gives the community an opportunity to welcome the new host.
|
|
|
|
It renames the directory structure based on a combination of fields separated by the underscore (`_`) delimiter.
|
|
|
|
- Upload date and time `UTC_TIMESTAMP()` at reservation time.
|
|
- The `hosts.hostid` of the host.
|
|
- The random unique key for this request.
|
|
- The host name `hosts.host` of the host.
|
|
- The spaces replaced with underscore title of the episode.
|
|
|
|
```
|
|
2339680845_987_4bd713699e5bc0978d5fef85a60f09bc7f70ef3488624_Emperor_Ming_Top_tips_for_time_travel
|
|
```
|
|
Reserve shows are downloaded and submitted by the Janitors on behalf of the host. This follows the normal posting process where the host and Janitors are cc'd on the notification emails. The only difference is that the audio is edited to include a notification that the show is from the reserve queue, and that it is the Janitors that upload the show via the supplied link.
|
|
|
|
## Show Processing
|
|
|
|
### Adding the host to the `hosts` table
|
|
|
|
This is currently added manually by the Janitors, as the text to speech tools often requires manipulation to get it sounding correct.
|
|
|
|
<table>
|
|
<thead>
|
|
<tr>
|
|
<th>Field</th>
|
|
<th>Description</th>
|
|
<th>Example</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr><th>hostid</th><td>Automatically generated incrementing number.</td><td>987</td></tr>
|
|
<tr><th>host</th><td>The name or handle of the host.</td><td>Emperor Ming</td></tr>
|
|
<tr><th>email</th>The hosts email address with the `@` replaced with `.nospam@nospam.` as an antispam measure<td></td><td>Emperor.Ming.nospam@nospam.example.com</td></tr>
|
|
<tr><th>profile</th><td>A html host profile</td><td>`<p>Follow me Mastodon: <a rel="me" href="https://mastodon.example.org/@Emperor.Ming">@Emperor.Ming@mastodon.example.org</a></p>`</td></tr>
|
|
<tr><th>license</th><td>One of the allowed licenses</td><td>`CC-BY-SA`</td></tr>
|
|
<tr><th>local_image</th><td>If a avatar is available directly from the host.</td><td>`1`</td></tr>
|
|
<tr><th>gpg</th><td>We can verify the hosts emails, with [thought to automatically verify](https://repo.anhonesthost.net/HPR/hpr-tools/issues/4).</td><td>`1C7398B00F0239E8`</td></tr>
|
|
<tr><th>valid</th><td>Allows temporary de-listing of host</td><td>`0`</td></tr>
|
|
<tr><th>espeak_name</th><td>The text to speech version of the name or handle.</td><td>Fifty One Fifty</td></tr>
|
|
<tbody>
|
|
</table>
|
|
|
|
Host Images/Avatars, can either be:
|
|
|
|
<strong>Local Images</strong> uploaded when the host submits <s>or <a href="https://repo.anhonesthost.net/HPR/hpr_hub/issues/66">edits</a></s> their profile.
|
|
<strong>Gravatar</strong> Images downloaded periodically by the Janitors and copied to the file store on the [origin server](https://en.wikipedia.org/wiki/Upstream_server).
|
|
<strong>Default Images</strong> added by [hpr_generator](https://repo.anhonesthost.net/HPR/hpr_generator/src/branch/main/public_html/images/hosts) when the host has no other image.
|
|
|
|
All the images are [currently](https://repo.anhonesthost.net/HPR/hpr_generator/issues/234) stored in the [hpr_generator](https://repo.anhonesthost.net/HPR/hpr_generator/src/branch/main/public_html/images/hosts) repository, and get transferred to the server using [hpr-publish.bash](https://repo.anhonesthost.net/HPR/hpr-tools/src/branch/main/workflow/hpr-publish.bash).
|
|
|
|
### Adding the episode to the `eps` table
|
|
|
|
The script [postshow.bash](https://repo.anhonesthost.net/HPR/hpr-tools/src/branch/main/workflow/postshow.bash) is run locally on the Janitors computer.
|
|
It calls the HPR CMS script [status.php](https://repo.anhonesthost.net/HPR/hpr_hub/src/branch/main/cms/status.php) to return a tab separated list of the shows in the queue.
|
|
|
|
```
|
|
timestamp_epoc ep_num ep_date key status email
|
|
2339594445 9278 2044-02-24 aeb0579fcac318005d7550a60fd60403676c24d94148b SHOW_SUBMITTED joe.blogg.nospam@nospam.example.com
|
|
2339680845 9999 1970-01-01 4bd713699e5bc0978d5fef85a60f09bc7f70ef3488624 RESERVE_SHOW_SUBMITTED Emperor.Ming.nospam@nospam.example.com
|
|
```
|
|
It selects the first show with a status of `SHOW_SUBMITTED` and uses `rsync` to clone the directory locally. It then parses the `shownotes.json` file and extracts the shownotes object to a new file `shownotes.html`.
|
|
|
|
All embedded images are saved as local image files, with the format `hpr${ep_num}_${image_count}` based on "episode number", a underscore as delimiter, and then a sequential number of the image in the notes. eg `hpr9876_1.jpg`.
|
|
|
|
Where they are images larger than 400 pixels wide, a thumbnail is created with the same image name but suffixed with `_tn`. eg `hpr9876_1_tn.jpg`.
|
|
|
|
The Janitors then review the shownotes for issues.
|
|
|
|
The duration will be extracted from the media, and the other metadata from the `shownotes.json` file. After some checks it will be [URL encoded](https://en.wikipedia.org/wiki/Percent-encoding).
|
|
|
|
The script allows overwriting of any value in the json file from the command line, and also prevents posting from a new host that has not yet been assigned a entry in the `hosts` table.
|
|
|
|
Once all the checks are done the
|
|
|
|
## Output Directory Structure
|
|
|
|
It should be possible to ship the entire backlog on physical media to someone, and have them plug it in and for any media player be able to play it. Each episode has it's own "album" which corresponds to a directory. The directory structure is kept as flat as possible with everything related to show eg: 9876 in a single directory `hpr9876`. This is the least common denominator, and in no way precludes web services, or other applications.
|
|
|
|
We do however need to support other functionality so the _Episodes_ are kept inside of the `eps` directory, the _Hosts_ are in `hosts/`, and _Series_ are in `series/`.
|
|
|
|
# Layers
|
|
|
|
We get files from different locations. The source files are delivered by the hosts, some are generated by processing, and others are added by the Janitors that cleanup the show notes.
|
|
|
|
All these are combined and end up as a complete entity, on one of the HPR [Origin servers](https://en.wikipedia.org/wiki/Upstream_server).
|
|
|
|
From there is delivered made available via RSS, etc.
|
|
|
|
<!--
|
|
TODO
|
|
## Upload
|
|
|
|
|
|
|
|
In our worked example a host uploads a show recorded in an audio file in [flac](https://en.wikipedia.org/wiki/FLAC) format.
|
|
The show is about a bash script which they also attach.
|
|
They describe the show in show notes, and include an image of the output.
|
|
|
|
|
|
files will be distributed using the C
|
|
|
|
|
|
The show processing supports the building of this structure,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This is how the
|
|
|
|
The `hpr_generator` places episodes are in `eps/`,
|
|
|
|
Everything related to a given show should be in the
|
|
|
|
We need to base our requirements on our own requirements and not those imposed by the IA.
|
|
|
|
|
|
|
|
It should be possible for someone to `rsync` the entire site and store it locally for use with a file manager/or media player.
|
|
|
|
|
|
To make file management clear all files must begin with the episode number `hpr9876`.
|
|
|
|
Supplemental files should be p
|
|
|
|
If there is a possibility of a clash then we need to ensure that we manage that by avoiding upload names.
|
|
|
|
|
|
should layer ontop of that so `/path/to/disk/hpr/` is the root and then `eps
|
|
|
|
|
|
|
|
|
|
If any of the sites (The IA) require special treatment, then that's fine but it's a deviation from our structure.
|
|
|
|
https://archive.org/details/hpr4230 →
|
|
|
|
The directory structure imposed by IA is less than ideal when it comes to our requirements.
|
|
|
|
|
|
|
|
### Cron jobs
|
|
|
|
```
|
|
$ crontab -l
|
|
SHELL="/bin/bash"
|
|
18 0 * * * /home/hpr/bin/hpr_db_backup.bash > /dev/null 2>&1
|
|
|
|
######20 * * * * php -f /home/hpr/bin/host_image.php > /dev/null 2>&1 # Temporary disabled
|
|
1 1 * * * /home/hpr/bin/hpr_ccdn_stats.bash > /dev/null 2>&1
|
|
```
|
|
|
|
|
|
-->
|