Converted to using json

This commit is contained in:
Ken Fallon 2024-12-27 08:53:48 +01:00
parent 4a73931e0d
commit e4aab4d7c2
2 changed files with 113 additions and 56 deletions

13
Processing-Show-Notes.md Normal file
View File

@ -0,0 +1,13 @@
- A `cron` job runs periodically to flag whether there are any notes requiring work (`cronjob_scrape`). The HPR website is scraped with a Perl script to determine this (`scrape_HPR`) by looking for entries in the 'processing' state on the calendar page.
- If there is work to do the partial copy of the `upload` directory on the HPR server kept locally is synchronised with `rsync` over SSH (`sync_hpr`).
- Files for new shows are saved locally in a directory called `'hprXXXX'`, based on the show number. A record is kept of the `upload/` sub-directory where the show came from so that the end result can be uploaded to it (in the file `.origin`).
- For each new show the following steps are carried out:
- The raw `shownotes.txt` file is viewed so it can be checked for errors (`do_show`). This is the point at which errors like misspellings in the title, summary or tags can be determined. These are corrected by editing the raw file and re-uploading it. The script used to perform such edits is `do_repair`.
- The `shownotes.txt` file is parsed. The declared note format is saved for future reference in a file called `.format`, and the notes themselves are stored in a file called `'hprXXXX.out'`. Parsing is controlled by `do_parse` which calls a Perl script `parse_shownotes`. Any obvious anomalies such as missing media, summary or tags are flagged at this stage by the Perl script. This script also tries to determine whether the declared format (e.g. HTML5) matches the actual note content, and flags any apparent errors.
- It is possible to change the declared format at this stage if it seems appropriate (e.g. it's **not** HTML, just plain text) using script `do_change_format`.
- The show notes can then be edited. This is done with `do_vim`. This script passes the declared file format to Vim in order to enable the relevant syntax. If the format is 'HTML5' a validator is run on the notes (script `validate_html`). If there are errors these are passed to Vim so that the problems can be found and corrected. For other formats an external script (`make_markdown`) can be used to convert selected parts or the whole file to Markdown when desired.
- Having produced suitable Markdown (as appropriate) or other format, the notes can be converted to HTML5 using `pandoc`. This stage is skipped if the notes are already HTML5. The script used is called `do_pandoc`. Two types of files are generated by this script: the first is the HTML fragment destined for the HPR database, called `'hprXXXX.html'`; the second is for local consumption and is a full standalone file which uses the HPR CSS, called `'hprXXXX_full.html'`. The `do_pandoc` script passes `pandoc` various settings according to the format of the input file and the desired output file.
- The HTML is viewed with the script `do_browser` (which is actually a soft link to another script tailored for the particular preferred browser; currently the browser is `brave` and the script is `do_brave`).
- There is usually an iteration between editing, running `pandoc` and viewing in the browser before the notes are accepted.
- The final stage is to run `do_upload` which copies the HTML fragment to the HPR site under the appropriate `upload/` sub-directory.
- The saved show files are deleted by a cron job according to their age. Currently they are stored for 6 months.

View File

@ -1,25 +1,45 @@
#!/usr/bin/env bash #!/usr/bin/env bash
# Copyright Ken Fallon - Released into the public domain. http://creativecommons.org/publicdomain/ # Copyright Ken Fallon - Released into the public domain. http://creativecommons.org/publicdomain/
#============================================================ #============================================================
source /home/ken/tmp/pip3.9/bin/activate
PATH=$PATH:/home/ken/sourcecode/hpr/hpr_hub/bin/
processing_dir="/home/ken/tmp/hpr/processing" processing_dir="$HOME/tmp/hpr/processing" # The directory where the files will be copied to for processing
git_image_dir="/home/ken/sourcecode/hpr/HPR_Public_Code/www/images/hosts"
if [ ! -d "${processing_dir}" ]
then
echo "ERROR: The application \"${this_program}\" is required but is not installed."
exit 1
fi
###################
# Check that all the programs are installed
function is_installed () {
for this_program in "$@"
do
if ! command -v ${this_program} 2>&1 >/dev/null
then
echo "ERROR: The application \"${this_program}\" is required but is not installed."
exit 2
fi
done
}
is_installed awk base64 cat curl curl date detox eval ffprobe file find grep grep head jq jq kate magick mediainfo mv mv rsync rsync seamonkey sed sed sort sponge ssh touch touch wget
echo "Processing the next HPR Show in the queue" echo "Processing the next HPR Show in the queue"
################### ###################
# Get the show # Get the show
# #
# # Replaced METADATA_PROCESSED with SHOW_SUBMITTED
response=$( curl --silent --netrc-file ${HOME}/.netrc "https://hub.hackerpublicradio.org/cms/status.php" | \ response=$( curl --silent --netrc-file ${HOME}/.netrc "https://hub.hackerpublicradio.org/cms/status.php" | \
grep 'METADATA_PROCESSED' | \ grep 'SHOW_SUBMITTED' | \
head -1 | \ head -1 | \
sed 's/,/ /g' ) sed 's/,/ /g' )
if [ -z "${response}" ] if [ -z "${response}" ]
then then
echo "INFO: There appear to be no more shows with the status \"METADATA_PROCESSED\"." echo "INFO: There appear to be no more shows with the status \"SHOW_SUBMITTED\"."
echo "Getting a list of all the reservations." echo "Getting a list of all the reservations."
curl --silent --netrc-file ${HOME}/.netrc "https://hub.hackerpublicradio.org/cms/status.php" | sort -n curl --silent --netrc-file ${HOME}/.netrc "https://hub.hackerpublicradio.org/cms/status.php" | sort -n
exit 3 exit 3
@ -52,13 +72,13 @@ shownotes_json="${processing_dir}/${dest_dir}/shownotes.json"
if [ ! -s "${shownotes_json}" ] if [ ! -s "${shownotes_json}" ]
then then
echo "ERROR: \"${shownotes_json}\" is missing" echo "ERROR: \"${shownotes_json}\" is missing"
exit 2 exit 4
fi fi
if [ "$( file "${shownotes_json}" | grep -ic "text" )" -eq 0 ] if [ "$( file "${shownotes_json}" | grep -ic "text" )" -eq 0 ]
then then
echo "ERROR: \"${shownotes_json}\" is not a text file" echo "ERROR: \"${shownotes_json}\" is not a text file"
exit 3 exit 5
fi fi
@ -68,7 +88,7 @@ jq '.' "${shownotes_json}" | sponge "${shownotes_json}"
# Get the media # Get the media
# #
remote_media="$( jq --raw-output '.metadata.POST.url' "${processing_dir}/${dest_dir}/shownotes.json" )" remote_media="$( jq --raw-output '.metadata.url' "${shownotes_json}" )"
if [ -n "${remote_media}" ] if [ -n "${remote_media}" ]
then then
@ -77,33 +97,56 @@ then
if [ $? -ne 0 ] if [ $? -ne 0 ]
then then
echo "ERROR: Could not get the remote media" echo "ERROR: Could not get the remote media"
exit 5 exit 6
fi fi
fi fi
shownotes_html="${processing_dir}/${dest_dir}/shownotes.html" shownotes_html="${processing_dir}/${dest_dir}/shownotes.html"
jq --raw-output '.episode.Show_Notes' "${shownotes_json}" > "${shownotes_html}"
if [ ! -s "${shownotes_html}" ] if [ ! -s "${shownotes_html}" ]
then then
echo "ERROR: \"${shownotes_html}\" is missing" echo "ERROR: \"${shownotes_html}\" is missing"
exit 4
fi
shownotes_txt="${processing_dir}/${dest_dir}/shownotes.txt"
if [ ! -s "${shownotes_txt}" ]
then
echo "ERROR: \"${shownotes_txt}\" is missing"
exit 7 exit 7
fi fi
xdg-open "${shownotes_txt}" >/dev/null 2>&1 &
xdg-open "${shownotes_json}" >/dev/null 2>&1 & # Process Shownotes
xdg-open "${shownotes_html}" >/dev/null 2>&1 & sed "s#>#>\n#g" "${shownotes_html}" | sponge "${shownotes_html}"
# Extract Images
image_count="1"
touch ${shownotes_html%.*}_combined.html
for image in $( grep --color=never -Po 'data:image/[^;]*;base64,\K[a-zA-Z0-9+/=]*' "${shownotes_html}" )
do
this_image="${processing_dir}/${dest_dir}/hpr${ep_num}_${image_count}"
echo -n "$image" | base64 -di > ${this_image}
this_ext="$( file --mime-type ${this_image} | awk -F '/' '{print $NF}' )"
mv -v "${this_image}" "${this_image}.${this_ext}"
this_width="$( mediainfo "${this_image}.${this_ext}" | grep Width | awk -F ': | pixels' '{print $2}' | sed 's/ //g' )"
if [ "${this_width}" -gt "400" ]
then
magick "${this_image}.${this_ext}" -resize 400x "${this_image}_tn.${this_ext}"
fi
((image_count=image_count+1))
done
## Manually edit the shownotes to fix issues
kate "${shownotes_json}" >/dev/null 2>&1 &
# librewolf "${shownotes_html}" >/dev/null 2>&1 &
seamonkey "${shownotes_html}" >/dev/null 2>&1 &
# bluefish "${shownotes_html}" >/dev/null 2>&1 &
read -p "Does the metadata 'look ok ? (N|y) ? " -n 1 -r read -p "Does the metadata 'look ok ? (N|y) ? " -n 1 -r
echo # (optional) move to a new line echo # (optional) move to a new line
if [[ ! $REPLY =~ ^[yY]$ ]] if [[ ! $REPLY =~ ^[yY]$ ]]
then then
echo "skipping...." echo "skipping...."
exit 22 exit 8
fi fi
media=$( find "${processing_dir}/${dest_dir}/" -type f -exec file {} \; | grep -Ei 'audio|mpeg|video|MP4' | awk -F ': ' '{print $1}' ) media=$( find "${processing_dir}/${dest_dir}/" -type f -exec file {} \; | grep -Ei 'audio|mpeg|video|MP4' | awk -F ': ' '{print $1}' )
@ -111,7 +154,7 @@ if [ -z "${media}" ]
then then
echo "ERROR: Can't find any media in \"${processing_dir}/${dest_dir}/\"" echo "ERROR: Can't find any media in \"${processing_dir}/${dest_dir}/\""
find "${processing_dir}/${dest_dir}/" -type f find "${processing_dir}/${dest_dir}/" -type f
exit 1 exit 9
fi fi
the_show_media="" the_show_media=""
@ -132,6 +175,11 @@ else
fi fi
duration=$( \date -ud "1970-01-01 $( ffprobe -i "${the_show_media}" 2>&1| awk -F ': |, ' '/Duration:/ { print $2 }' )" +%s ) duration=$( \date -ud "1970-01-01 $( ffprobe -i "${the_show_media}" 2>&1| awk -F ': |, ' '/Duration:/ { print $2 }' )" +%s )
if [ $? -ne 0 ]
then
echo 'ERROR: Invalid duration found in '\"${media}\" >&2
exit 10
fi
################### ###################
# Gather episode information # Gather episode information
@ -140,25 +188,25 @@ duration=$( \date -ud "1970-01-01 $( ffprobe -i "${the_show_media}" 2>&1| awk -F
if [ "$( curl --silent --write-out '%{http_code}' http://hackerpublicradio.org/say.php?id=${ep_num} --output /dev/null )" == 200 ] if [ "$( curl --silent --write-out '%{http_code}' http://hackerpublicradio.org/say.php?id=${ep_num} --output /dev/null )" == 200 ]
then then
echo "ERROR: The Episode hpr${ep_num} has already been posted" echo "ERROR: The Episode hpr${ep_num} has already been posted"
exit 6 exit 11
fi fi
if [ "$( jq --raw-output '.metadata.Episode_Number' ${shownotes_json} )" != "${ep_num}" ] if [ "$( jq --raw-output '.metadata.Episode_Number' ${shownotes_json} )" != "${ep_num}" ]
then then
echo "ERROR: The Episode_Number: \"${ep_num}\" was not found in \"${shownotes_json}\"" echo "ERROR: The Episode_Number: \"${ep_num}\" was not found in \"${shownotes_json}\""
exit 10 exit 12
fi fi
if [ "$( jq --raw-output '.metadata.Episode_Date' ${shownotes_json} )" != "${ep_date}" ] if [ "$( jq --raw-output '.metadata.Episode_Date' ${shownotes_json} )" != "${ep_date}" ]
then then
echo "ERROR: The Episode_Date: \"${ep_date}\" was not found in \"${shownotes_json}\"" echo "ERROR: The Episode_Date: \"${ep_date}\" was not found in \"${shownotes_json}\""
exit 8 exit 13
fi fi
if [ "$( jq --raw-output '.host.Host_Email' ${shownotes_json} )" != "${email_unpadded}" ] if [ "$( jq --raw-output '.host.Host_Email' ${shownotes_json} )" != "${email_unpadded}" ]
then then
echo "ERROR: The Host_Email: \"${email_unpadded}\" was not found in \"${shownotes_json}\"" echo "ERROR: The Host_Email: \"${email_unpadded}\" was not found in \"${shownotes_json}\""
exit 9 exit 14
fi fi
################### ###################
@ -169,7 +217,7 @@ hostid="$( jq --raw-output '.host.Host_ID' ${shownotes_json} | jq --slurp --raw-
host_name="$( jq --raw-output '.host.Host_Name' ${shownotes_json} | jq --slurp --raw-input @uri | sed -e 's/%0A"$//g' -e 's/^"//g' )" host_name="$( jq --raw-output '.host.Host_Name' ${shownotes_json} | jq --slurp --raw-input @uri | sed -e 's/%0A"$//g' -e 's/^"//g' )"
title=$( jq --raw-output '.episode.Title' ${shownotes_json} | jq --slurp --raw-input @uri | sed -e 's/%0A"$//g' -e 's/^"//g' ) title=$( jq --raw-output '.episode.Title' ${shownotes_json} | jq --slurp --raw-input @uri | sed -e 's/%0A"$//g' -e 's/^"//g' )
summary="$( jq --raw-output '.episode.Summary' ${shownotes_json} | jq --slurp --raw-input @uri | sed -e 's/%0A"$//g' -e 's/^"//g' )" summary="$( jq --raw-output '.episode.Summary' ${shownotes_json} | jq --slurp --raw-input @uri | sed -e 's/%0A"$//g' -e 's/^"//g' )"
series="$( jq --raw-output '.episode.Series' ${shownotes_json} | jq --slurp --raw-input @uri | sed -e 's/%0A"$//g' -e 's/^"//g' )" series_id="$( jq --raw-output '.episode.Series' ${shownotes_json} | jq --slurp --raw-input @uri | sed -e 's/%0A"$//g' -e 's/^"//g' )"
series_name="$( jq --raw-output '.episode.Series_Name' ${shownotes_json} | jq --slurp --raw-input @uri | sed -e 's/%0A"$//g' -e 's/^"//g' )" series_name="$( jq --raw-output '.episode.Series_Name' ${shownotes_json} | jq --slurp --raw-input @uri | sed -e 's/%0A"$//g' -e 's/^"//g' )"
explicit="$( jq --raw-output '.episode.Explicit' ${shownotes_json} | jq --slurp --raw-input @uri | sed -e 's/%0A"$//g' -e 's/^"//g' )" explicit="$( jq --raw-output '.episode.Explicit' ${shownotes_json} | jq --slurp --raw-input @uri | sed -e 's/%0A"$//g' -e 's/^"//g' )"
episode_license="$( jq --raw-output '.episode.Show_License' ${shownotes_json} | jq --slurp --raw-input @uri | sed -e 's/%0A"$//g' -e 's/^"//g' )" episode_license="$( jq --raw-output '.episode.Show_License' ${shownotes_json} | jq --slurp --raw-input @uri | sed -e 's/%0A"$//g' -e 's/^"//g' )"
@ -199,39 +247,19 @@ fi
if [ "${hostid}" == '0' ] if [ "${hostid}" == '0' ]
then then
echo "ERROR: The hostid is 0. Create the host and use \"hostid=???\" to override" echo "ERROR: The hostid is 0. Create the host and use \"hostid=???\" to override"
exit 11 exit 15
fi fi
# # # # has to be done here as we need to know the hostid
# # # host_photo="$( jq --raw-output '.metadata.FILES.host_photo.name' ${shownotes_json} )"
# # # if [ -n "${host_photo}" ]
# # # then
# # # host_photo="${processing_dir}/${dest_dir}/photo"
# # # host_avatar="${git_image_dir}/${hostid}.png"
# # # echo "INFO: Found host photo \"${host_photo}\""
# # # gm convert "${host_photo}" -resize x80 "${host_avatar}"
# # # xdg-open "${host_avatar}"
# # # read -p "Does the avatar 'look ok ? (N|y) ? " -n 1 -r
# # # echo # (optional) move to a new line
# # # if [[ ! $REPLY =~ ^[yY]$ ]]
# # # then
# # # echo "ERROR: Problem with avatar...."
# # # exit 12
# # # else
# # # echo "INFO: Copying avatar to the server."
# # # scp "${host_avatar}" hpr:www/images/hosts/
# # # fi
# # # fi
################### ###################
# Post show to HPR # Post show to HPR
# #
post_show="${processing_dir}/${dest_dir}/post_show.txt" post_show="${processing_dir}/${dest_dir}/post_show.txt"
post_show_json="${processing_dir}/${dest_dir}/post_show.json"
post_show_response="${processing_dir}/${dest_dir}/post_show_response.txt" post_show_response="${processing_dir}/${dest_dir}/post_show_response.txt"
echo "key=${key}&ep_num=${ep_num}&ep_date=${ep_date}&email=${email}&title=${title}&duration=${duration}&summary=${summary}&series=${series}&series_name=${series_name}&explicit=${explicit}&episode_license=${episode_license}&tags=${tags}&hostid=${hostid}&host_name=${host_name}&host_license=${host_license}&host_profile=${host_profile}&notes=${notes}" > ${post_show} echo "key=${key}&ep_num=${ep_num}&ep_date=${ep_date}&email=${email}&title=${title}&duration=${duration}&summary=${summary}&series_id=${series_id}&series_name=${series_name}&explicit=${explicit}&episode_license=${episode_license}&tags=${tags}&hostid=${hostid}&host_name=${host_name}&host_license=${host_license}&host_profile=${host_profile}&notes=${notes}" > ${post_show}
echo "Sending:" echo "Sending:"
cat "${post_show}" cat "${post_show}"
@ -242,7 +270,7 @@ email=${email}
title=${title} title=${title}
duration=${duration} duration=${duration}
summary=${summary} summary=${summary}
series=${series} series_id=${series_id}
series_name=${series_name} series_name=${series_name}
explicit=${explicit} explicit=${explicit}
episode_license=${episode_license} episode_license=${episode_license}
@ -253,8 +281,24 @@ host_license=${host_license}
host_profile=${host_profile} host_profile=${host_profile}
notes=${notes}" notes=${notes}"
wget --post-file="${post_show}" "https://hub.hackerpublicradio.org/cms/add_show.php" -O - #"${post_show_response}" echo "{
\"key\": \"${key}\",
\"ep_num\": \"${ep_num}\",
\"ep_date\": \"${ep_date}\",
\"email\": \"${email}\",
\"title\": \"${title}\",
\"duration\": \"${duration}\",
\"summary\": \"${summary}\",
\"series_id\": \"${series_id}\",
\"series_name\": \"${series_name}\",
\"explicit\": \"${explicit}\",
\"episode_license\": \"${episode_license}\",
\"tags\": \"${tags}\",
\"hostid\": \"${hostid}\",
\"host_name\": \"${host_name}\",
\"host_license\": \"${host_license}\",
\"host_profile\": \"${host_profile}\",
\"notes\": \"${notes}\"
}" | tee "${post_show_json}"
# /home/ken/sourcecode/personal/bin/hpr-publish.bash curl --netrc --include --request POST "https://hub.hackerpublicradio.org/cms/add_show_json.php" --header "Content-Type: application/json" --data-binary "@${post_show_json}"
#
# xdg-open "https://hackerpublicradio.org/eps/hpr${ep_num}/index.html" >/dev/null 2>&1 &