Updated documentation to describe the Input Directory Structure and queue naming

This commit is contained in:
Ken Fallon 2024-12-26 21:48:41 +01:00
parent 67687a6205
commit 52b19ae96e
2 changed files with 130 additions and 73 deletions

View File

@ -1,73 +0,0 @@
# Goal
HPR is dedicated to sharing knowledge and as such it should be possible for someone to have the files locally and play them on a mp3 player.
It should be possible to post the entire backlog to someone and have them plug it in and for any media player be able to play it. Each episode has it's own "album" which corresponds to a directory. The directory structure is kept as flat as possible with everything related to show 9876 in a single directory `hpr9876`. This is the least common denominator, and in no way precludes web services, or other applications.
We do however need to support other functionality so the _Episodes_ are kept inside of the `eps` directory, the _Hosts_ are in `hosts/`, and _Series_ are in `series/`.
# Layers
We get files from different locations. The source files are delivered by the hosts, some are generated by processing, and others are added by the Janitors that cleanup the show notes.
All these are combined and end up as a complete entity, on one of the HPR [Origin servers](https://en.wikipedia.org/wiki/Upstream_server).
From there is delivered made available via RSS, etc.
<!--
TODO
## Upload
In our worked example a host uploads a show recorded in an audio file in [flac](https://en.wikipedia.org/wiki/FLAC) format.
The show is about a bash script which they also attach.
They describe the show in show notes, and include an image of the output.
files will be distributed using the C
The show processing supports the building of this structure,
This is how the
The `hpr_generator` places episodes are in `eps/`,
Everything related to a given show should be in the
We need to base our requirements on our own requirements and not those imposed by the IA.
It should be possible for someone to `rsync` the entire site and store it locally for use with a file manager/or media player.
To make file management clear all files must begin with the episode number `hpr9876`.
Supplemental files should be p
If there is a possibility of a clash then we need to ensure that we manage that by avoiding upload names.
should layer ontop of that so `/path/to/disk/hpr/` is the root and then `eps
If any of the sites (The IA) require special treatment, then that's fine but it's a deviation from our structure.
https://archive.org/details/hpr4230 →
The directory structure imposed by IA is less than ideal when it comes to our requirements.
-->

View File

@ -0,0 +1,130 @@
# File Structure
We receive random files from random people on the Internet, so are treated with extreme care.
The HPR Database is used to track the `status` of each show as it is been processed, using a normally hidden table `reservations`. It is not publicly available as it contains the IP address of the host uploading the show as a security measure. Once the show is processed the IP address is removed from the table.
Information about the show process is available to the Janitors via [stats.php](https://repo.anhonesthost.net/HPR/hpr_hub/src/branch/main/cms/stats.php).
A cron job `*/15 * * * * /home/hpr/bin/update-stats.bash > /dev/null 2>&1` runs every 15 minutes and saves the file to [https://hub.hackerpublicradio.org/stats.json](https://hub.hackerpublicradio.org/stats.json) for anyone to use.
## Input Directory Structure
The files are never processed by the front end HPR servers, and so all processing is done offline.
The files are downloaded by trusted Janitors that have the ability to `scp`/`rsync` the files from the HPR to a local machine for processing.
The directory structure is based on a combination of fields separated by the underscore (`_`) delimiter.
- Upload date and time `UTC_TIMESTAMP()` at reservation time.
- The requested episode number or `9999` if the reserve queue is to be used.
- The requested epidode date or `970-01-01` if the reserve queue is to be used.
- The random unique key for this request.
```
2339594445_9278_2044-02-24_aeb0579fcac318005d7550a60fd60403676c24d94148b
2339680845_9999_1970-01-01_4bd713699e5bc0978d5fef85a60f09bc7f70ef3488624
```
Shows destined for the reserve queue are moved to from the upload directory and placed the reserve directory using the script [rename-reserve.bash](https://repo.anhonesthost.net/HPR/hpr-tools/src/branch/main/workflow/rename-reserve.bash).
It checks to see if a url to the show was provide, and attempts to download the file.
It renames the directory structure based on a combination of fields separated by the underscore (`_`) delimiter.
- Upload date and time `UTC_TIMESTAMP()` at reservation time.
- The `hosts.hostid` of the host.
- The random unique key for this request.
- The host name `hosts.host` of the host.
- The spaces replaced with underscore title of the episode.
<strong>When new hosts submit a show directly to the reserve queue, the Janitors will repost it to the first available slot in the normal queue.</strong>
```
2339680845_987_4bd713699e5bc0978d5fef85a60f09bc7f70ef3488624_Emperor_Ming_Top_tips_for_time_travel
```
## Output Directory Structure
It should be possible to post the entire backlog to someone and have them plug it in and for any media player be able to play it. Each episode has it's own "album" which corresponds to a directory. The directory structure is kept as flat as possible with everything related to show 9876 in a single directory `hpr9876`. This is the least common denominator, and in no way precludes web services, or other applications.
We do however need to support other functionality so the _Episodes_ are kept inside of the `eps` directory, the _Hosts_ are in `hosts/`, and _Series_ are in `series/`.
# Layers
We get files from different locations. The source files are delivered by the hosts, some are generated by processing, and others are added by the Janitors that cleanup the show notes.
All these are combined and end up as a complete entity, on one of the HPR [Origin servers](https://en.wikipedia.org/wiki/Upstream_server).
From there is delivered made available via RSS, etc.
<!--
TODO
## Upload
In our worked example a host uploads a show recorded in an audio file in [flac](https://en.wikipedia.org/wiki/FLAC) format.
The show is about a bash script which they also attach.
They describe the show in show notes, and include an image of the output.
files will be distributed using the C
The show processing supports the building of this structure,
This is how the
The `hpr_generator` places episodes are in `eps/`,
Everything related to a given show should be in the
We need to base our requirements on our own requirements and not those imposed by the IA.
It should be possible for someone to `rsync` the entire site and store it locally for use with a file manager/or media player.
To make file management clear all files must begin with the episode number `hpr9876`.
Supplemental files should be p
If there is a possibility of a clash then we need to ensure that we manage that by avoiding upload names.
should layer ontop of that so `/path/to/disk/hpr/` is the root and then `eps
If any of the sites (The IA) require special treatment, then that's fine but it's a deviation from our structure.
https://archive.org/details/hpr4230 →
The directory structure imposed by IA is less than ideal when it comes to our requirements.
### Cron jobs
```
$ crontab -l
SHELL="/bin/bash"
18 0 * * * /home/hpr/bin/hpr_db_backup.bash > /dev/null 2>&1
######20 * * * * php -f /home/hpr/bin/host_image.php > /dev/null 2>&1 # Temporary disabled
1 1 * * * /home/hpr/bin/hpr_ccdn_stats.bash > /dev/null 2>&1
```
-->