Create an hpr_website repository #265

Open
opened 2025-06-19 01:39:30 +00:00 by rho_n · 8 comments
Owner

One of the original goals of having a static page based HPR website was to have it stored in a Git repository for easy distribution and redundancy. Unfortunately when I was developing the generator, I took a short cut and stored static resources such as the HPR logo and the hosts avatars in the same directory (public_html) as used for the output of the generated html files. This presents challenges in maintaining a separate repository for just the static site.

Possible solutions:

  • Maintain one repository, storing the generated files in the repository. This might be confusing for those who just want to download just the static website, but not overly so with good documentation and is easiest to implement.
  • Maintain two repositories, keep the static resources in the HPR generator in the current structure. When running the hpr_generator, the output directory can be changed to point to the hpr_website repository. Keeping the generator code as is would mean manually copying the static resources to the hpr_website repository. They don't change often (mostly adding new host avatars), so wouldn't be a big step manually making these changes or updates.
  • Maintain two repositories, keep the static resources in the HPR generator in the current structure. When running the hpr_generator, the output directory can be changed to point to the hpr_website repository. Update the generator code to copy the static resources as part of the output when generating files.

If the choice is to maintain two repositories, we can start with option two above while the 3rd option is being developed.

One of the original goals of having a static page based HPR website was to have it stored in a Git repository for easy distribution and redundancy. Unfortunately when I was developing the generator, I took a short cut and stored static resources such as the HPR logo and the hosts avatars in the same directory (public_html) as used for the output of the generated html files. This presents challenges in maintaining a separate repository for just the static site. Possible solutions: - Maintain one repository, storing the generated files in the repository. This might be confusing for those who just want to download just the static website, but not overly so with good documentation and is easiest to implement. - Maintain two repositories, keep the static resources in the HPR generator in the current structure. When running the hpr_generator, the output directory can be changed to point to the hpr_website repository. Keeping the generator code as is would mean manually copying the static resources to the hpr_website repository. They don't change often (mostly adding new host avatars), so wouldn't be a big step manually making these changes or updates. - Maintain two repositories, keep the static resources in the HPR generator in the current structure. When running the hpr_generator, the output directory can be changed to point to the hpr_website repository. Update the generator code to copy the static resources as part of the output when generating files. If the choice is to maintain two repositories, we can start with option two above while the 3rd option is being developed.
rho_n added the
Feature Request
label 2025-06-19 01:39:30 +00:00
Member

I agree this can be improved!

May I add another possible solution, which is basically a mix of your options 1 and 3:

  • Change the structure so the static files are stored in a separate directory in the repository, then copied into the generated site folder during the build phase. Benefits:
    • One repository, and the ability to clean the build directory by removing it.
    • No need to ensure two repositories are in sync
    • Pulling the repository in order to be able to build the site is less complicated than pulling two repositories (although this could be mitigated by placing the static repository in a submodule). I think this is what you mean with option 3.

In addition (with any of the options):

  • Update the generator code to retrieve the host avatars from the database when building the site.

Thinking about this a bit more, and reflecting on your comments about users downloading the site. It would be really good to understand whether anyone uses the repository to (just) download the site. To reduce the steps required to build the site, we could add a Makefile, so a simple make site will retrieve the database, and build the site in the public_html directory. make clean prunes this directory. git status shows no changes in either case, because the .gitignore excludes the public_html tree.

I agree this can be improved! May I add another possible solution, which is basically a mix of your options 1 and 3: - Change the structure so the static files are stored in a separate directory in the repository, then copied into the generated site folder during the build phase. Benefits: - One repository, and the ability to clean the build directory by removing it. - No need to ensure two repositories are in sync - Pulling the repository in order to be able to build the site is less complicated than pulling two repositories (although this could be mitigated by placing the static repository in a submodule). I think this is what you mean with option 3. In addition (with any of the options): - Update the generator code to retrieve the host avatars from the database when building the site. Thinking about this a bit more, and reflecting on your comments about users downloading the site. It would be really good to understand whether anyone uses the repository to (just) download the site. To reduce the steps required to build the site, we could add a Makefile, so a simple `make site` will retrieve the database, and build the site in the public_html directory. `make clean` prunes this directory. `git status` shows no changes in either case, because the `.gitignore` excludes the public_html tree.
Owner

Good that we are looking at this.

Use Cases:

  1. Then there are those that wish to build the entire site, so that you end up with the entire static website.
  2. There are, as @rho_n puts it, "those who just want to download just the static website". This is the main use case.

So you can't have one without the other.

For clarity I'll use the term "complete site" instead of "static site".
I agree with @paulj that the job of merging is part of the build phase, as this will give us clear separation of concerns.

The end product of the build is the "complete site", which is available from a origin server.
This currently feeds the ccdn origin server mirror network using rsync.
I'm still working on de-duplicating the media, and organizing a better home for the origin server that will host the "complete site".

The process of building the "complete site", results from running several tools, each responsible for its own "Source of truth".
Currently it's a cron job It's likely that these will be triggered separately, eg a change in day, a new comment approved, a new show posted, etc

The function of the hpr_generator is to take the hpr.sql, and produce the static files from it.
Currently the build process tries to git pull and if that fails, it deletes the entire repository and git clones an new version.
The output of public_html are synced with the ccdn origin server.
These "from the database" files should be the only files it produces.

The function of process_episode, to generate the episode media (flac wav mp3 ogg opus spx srt txt, and any other files submitted).
These are sent to the ccdn origin server, as well.
It should only deal with the episode files.

That leaves other files, eg: site icon, branding, host avatars, extra documents etc.
So I agree with @rho_n that these additional files should not be in hpr_generator.
Calling it hpr_website seems fine to me.
It will need to deal with everything else for now, though if it makes more sense later we can also split it out.
The addition of a avatar updater would also be a good addition.

The we should also only produce and manage files that we consider to be part of the HPR Podcast.

Things like howto's and tutorials would be better managed on GitTea hpr_documentation and just linked to from the HPR website.
Perhaps a configuration url would be a good thing in case the repos move. GIT_REPO_BASE='https://repo.anhonesthost.net/HPR/' as an example.

I have no idea if anyone can pull the website down, but it's important that they can.
We do not need to focus any energy on that as our goal here is to make it easier to redeploy the HPR site in event of disaster.

We do need to automate the various different parts of this, however it's difficult at this stage until this work is complete, the ccdn is built out, and the site is migrated to the new hosting service.

Good that we are looking at this. Use Cases: 1. Then there are those that wish to build the entire site, so that you end up with the entire static website. 2. There are, as @rho_n puts it, "those who just want to download just the static website". This is the main use case. So you can't have one without the other. For clarity I'll use the term "complete site" instead of "static site". I agree with @paulj that the job of merging is part of the build phase, as this will give us clear separation of concerns. The end product of the build is the "complete site", which is available from a [origin server](https://en.wikipedia.org/wiki/Upstream_server). This currently feeds the [ccdn origin server](https://repo.anhonesthost.net/HPR/hpr_documentation/src/branch/main/ccdn) mirror network using [rsync](https://rsync.samba.org/). I'm still working on de-duplicating the media, and organizing a better home for the origin server that will host the "complete site". The process of building the "complete site", results from running several tools, each responsible for its own "Source of truth". Currently it's a cron job It's likely that these will be triggered separately, eg a change in day, a new comment approved, a new show posted, etc The function of the [hpr_generator](https://repo.anhonesthost.net/HPR/hpr_generator) is to take the [hpr.sql](https://hackerpublicradio.org/hpr.sql), and produce the static files from it. Currently the build process tries to `git pull` and if that fails, it deletes the entire repository and `git clones` an new version. The output of `public_html` are synced with the [ccdn origin server](https://repo.anhonesthost.net/HPR/hpr_documentation/src/branch/main/ccdn). These "from the database" files should be the only files it produces. The function of [process_episode](https://repo.anhonesthost.net/HPR/hpr-tools/src/branch/main/workflow/process_episode.bash), to generate the episode media (flac wav mp3 ogg opus spx srt txt, and any other files submitted). These are sent to the [ccdn origin server](https://repo.anhonesthost.net/HPR/hpr_documentation/src/branch/main/ccdn), as well. It should only deal with the episode files. That leaves other files, eg: site icon, branding, host avatars, extra documents etc. So I agree with @rho_n that these additional files should not be in [hpr_generator](https://repo.anhonesthost.net/HPR/hpr_generator). Calling it `hpr_website` seems fine to me. It will need to deal with everything else for now, though if it makes more sense later we can also split it out. The addition of a avatar updater would also be a good addition. The we should also only produce and manage files that we consider to be part of the HPR Podcast. Things like howto's and tutorials would be better managed on [GitTea hpr_documentation](https://repo.anhonesthost.net/HPR/hpr_documentation) and just linked to from the HPR website. Perhaps a configuration url would be a good thing in case the repos move. `GIT_REPO_BASE='https://repo.anhonesthost.net/HPR/'` as an example. I have no idea if anyone can pull the website down, but it's important that they can. We do not need to focus any energy on that as our goal here is to make it easier to redeploy the HPR site in event of disaster. We do need to automate the various different parts of this, however it's difficult at this stage until this work is complete, the ccdn is built out, and the site is migrated to the new hosting service.
Member

I like the 3rd option - keeping the current structure and copying the generator output to a 2nd repo.

The 2nd should have something in the readme specifying that the website repo is strictly a downstream build of the hpr_generator repo and any issues/pull requests should be made in the upstream generator repo.

I dont think having some assets in multiple repos is an issue as long as it is well understood that the "source of truth" is the hpr_generator

I like the 3rd option - keeping the current structure and copying the generator output to a 2nd repo. The 2nd should have something in the readme specifying that the website repo is strictly a downstream build of the hpr_generator repo and any issues/pull requests should be made in the upstream generator repo. I dont think having some assets in multiple repos is an issue as long as it is well understood that the "source of truth" is the hpr_generator
Owner

A job will always need to merge the output of the derived media as those files are simply too large/too expensive to track in git.

So I do not see what value would this provide ? Can you elaborate please.

A job will always need to merge the output of the derived media as those files are simply too large/too expensive to track in git. So I do not see what value would this provide ? Can you elaborate please.
Member

I always forget about the media. Feel free to ignore my previous comment.

I always forget about the media. Feel free to ignore my previous comment.
Member

Thanks @ken_fallon for the elaboration of the whole process - it does seem that having two repositories is the best option.

To clarify my point about "whether users ... just download the site". I agree that users should be able to - I meant that it doesn't need to be a static version of the site they download, but that they should be able to build the site using the repository(ies).

@ken_fallon is clearly busy with the ccdn activities at the moment - does this mean we go with @rho_n 's Option 2 short term, then develop option 3 once the ccdn is built out?

Thanks @ken_fallon for the elaboration of the whole process - it does seem that having two repositories is the best option. To clarify my point about "whether users ... just download the site". I agree that users should be able to - I meant that it doesn't need to be a static version of the site they download, but that they should be able to _build_ the site using the repository(ies). @ken_fallon is clearly busy with the ccdn activities at the moment - does this mean we go with @rho_n 's Option 2 short term, then develop option 3 once the ccdn is built out?
Owner

@rho_n I created hpr_website can you check that you are owner etc.

The we can move files our of hpr_generator and into it.

@rho_n I created [hpr_website](https://repo.anhonesthost.net/HPR/hpr_website) can you check that you are owner etc. The we can move files our of hpr_generator and into it.
Author
Owner

@rho_n I created hpr_website can you check that you are owner etc.

It looks like the owner is set to the Owners group which I am a part. I have cloned it but haven't tried to push anything up to it yet.

> @rho_n I created [hpr_website](https://repo.anhonesthost.net/HPR/hpr_website) can you check that you are owner etc. > It looks like the owner is set to the Owners group which I am a part. I have cloned it but haven't tried to push anything up to it yet.
rho_n changed title from [RFC] Create an hpr_website repository to Create an hpr_website repository 2025-06-22 14:14:52 +00:00
Sign in to join this conversation.
4 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: HPR/hpr_generator#265
No description provided.