hpr-tools/Processing-Show-Notes.md

3.3 KiB

  • A cron job runs periodically to flag whether there are any notes requiring work (cronjob_scrape). The HPR website is scraped with a Perl script to determine this (scrape_HPR) by looking for entries in the 'processing' state on the calendar page.
  • If there is work to do the partial copy of the upload directory on the HPR server kept locally is synchronised with rsync over SSH (sync_hpr).
  • Files for new shows are saved locally in a directory called 'hprXXXX', based on the show number. A record is kept of the upload/ sub-directory where the show came from so that the end result can be uploaded to it (in the file .origin).
  • For each new show the following steps are carried out:
    • The raw shownotes.txt file is viewed so it can be checked for errors (do_show). This is the point at which errors like misspellings in the title, summary or tags can be determined. These are corrected by editing the raw file and re-uploading it. The script used to perform such edits is do_repair.
    • The shownotes.txt file is parsed. The declared note format is saved for future reference in a file called .format, and the notes themselves are stored in a file called 'hprXXXX.out'. Parsing is controlled by do_parse which calls a Perl script parse_shownotes. Any obvious anomalies such as missing media, summary or tags are flagged at this stage by the Perl script. This script also tries to determine whether the declared format (e.g. HTML5) matches the actual note content, and flags any apparent errors.
    • It is possible to change the declared format at this stage if it seems appropriate (e.g. it's not HTML, just plain text) using script do_change_format.
    • The show notes can then be edited. This is done with do_vim. This script passes the declared file format to Vim in order to enable the relevant syntax. If the format is 'HTML5' a validator is run on the notes (script validate_html). If there are errors these are passed to Vim so that the problems can be found and corrected. For other formats an external script (make_markdown) can be used to convert selected parts or the whole file to Markdown when desired.
    • Having produced suitable Markdown (as appropriate) or other format, the notes can be converted to HTML5 using pandoc. This stage is skipped if the notes are already HTML5. The script used is called do_pandoc. Two types of files are generated by this script: the first is the HTML fragment destined for the HPR database, called 'hprXXXX.html'; the second is for local consumption and is a full standalone file which uses the HPR CSS, called 'hprXXXX_full.html'. The do_pandoc script passes pandoc various settings according to the format of the input file and the desired output file.
    • The HTML is viewed with the script do_browser (which is actually a soft link to another script tailored for the particular preferred browser; currently the browser is brave and the script is do_brave).
    • There is usually an iteration between editing, running pandoc and viewing in the browser before the notes are accepted.
    • The final stage is to run do_upload which copies the HTML fragment to the HPR site under the appropriate upload/ sub-directory.
  • The saved show files are deleted by a cron job according to their age. Currently they are stored for 6 months.