hpr_documentation/ccdn/README.md

146 lines
6.0 KiB
Markdown
Raw Normal View History

2024-12-25 14:11:18 +00:00
# Community Content Delivery Network (CCDN)
A location to track the deployment of the HPR Community Content Delivery Network, that provides a mirror network for our content.
Availability of HPR Content
The HPR site has been traditionally been run on a single instance, which makes the project vulnerable.
We have experienced several times where we have suffered from issues resulting from system outages, denial of service attacks, forced decommissioning, or increased costs.
There is a clear need to host the content in multiple geographically distributed networks to increase reliability and redundancy.
Applying a [Content Delivery Network](https://en.wikipedia.org/wiki/Content_delivery_network) in front of the provider addresses some but not all of these issues.
These large vendor solutions provide free tiers, but the long term business model shows that these are not sustainable.
Additionally the algorithms used would flag behavior considered normal for HPR contributors, as suspicious and would deny them access.
# Looking to the past
At the dawn of the Internet, it was common for websites and services like DNS to be [mirrored](https://en.wikipedia.org/wiki/Mirror_site) by friends.
This was for a long time not a viable option for HPR as the quantity of Audio Content was expensive to host and transfer, and was therefore beyond what a home user could reliably serve.
Over time, in some locations members of our community have access to facilities that a few years ago would have been reserved for Internet Service Providers.
If you are interested in helping hosting the HPR site and media, then please get in touch with _admin @ hackerpublicradio.org_
## Requirements for Hosting
- 24/7 Home Service
- fixed IP address
- unlimited bandwidth
- fast > 500mb/sec upload
- large > 1T of storage
- permission from your ISP to run a web server
- Contact information know to the Janitors
- Optional: [UPS](https://en.wikipedia.org/wiki/Uninterruptible_power_supply)
<!--
## Mediation for the Internet Archive Outage
Links media files will be updated to point to a redirect service running on the HPR Hub
eg: `https://hub.hackerpublicradio.org/redirect.php?id=9999`
This will maintain a list of HPR mirrors and for now do a simple random redirect
- https://hpr.nyc3.cdn.digitaloceanspaces.com/ → alpha.nj.us.na.mirror.hackerpublicradio.org
- https://188.212.114.84/HPR/ → alpha.
- Internet Archive - DOWN
Maxmind GeoIP free edition has:
Geolocation codes
Our data includes codes that can be used to identify the continent, country, subdivision, and postal or metro code area of the geolocation of the IP address. The codes follow these conventions:
continent
a two-character continent code, as follows:
AF - Africa
AN - Antarctica
AS - Asia
EU - Europe
NA - North America
OC - Oceania
SA - South America
country the two-character ISO 3166-1 country code
subdivision the region-portion of the ISO 3166-2 code for the region
So I will use that.
Ken Fallon (PA7KEN, G5KEN)
Although parsing is better with https://github.com/maxmind/mmdbinspect/
I created a new documentation repo but am keeping the old one around for now as a work in progress
To that end I'm removing port 80 from `borg` and 443 from another server to point to the new server ``
I plan to update the feeds, and the site to point to
That will redirect to one of the mirrors, currently only `vger.mirror.hackerpublicradio.org` but then the IA once it's back, and also
For any of that I need media files - so this is fix now, check later.
We have an account on `rsync.net` which I think we should use privately to push to from the `hpr_generator/static` and the `static/static`
# Origne Server
Where is the source of truth ? As in where will the mirrors rsync the files from ?
This will need to be RW to the processes generating tools, but RO to the admins volunteers of the mirror ccdn network.
Using `rsync.net` is not ideal as we only have one account with RW access.
> What is currently stored on `rsync.net`? Is it just the media files or the html, extended show notes, and related show images, or both?
For now it makes sense to have this on `borg`.
So for now it will be on borg. There are many disadvantages to this, single point of contact, same backup disk as the rsync source, and if I get ddosed were down. However for now that seems to be the way to go
This also means a change to how we send out files. The end point is no longer the IA but having it on the ccdn.
I want to be able add the encoded media, and the transcripts to the assets table as part of their generation
# Are files available
eps valid=1
show needs to be in eps table
# Hosting a complete copy
Where does a complete copy of the website that is easy to download to another computer live?
> What does hosting a complete copy of the website mean? Is this the static site (html, css, images [host and episode], media [ogg, mp3, spx, vtt, srt, txt]) and dynamic hub stuff?
## Internet Archive (IA)
What files associated with an episode are allowed to be stored on IA?
* full show notes?
* Other associated example files?
* images
* audio
What are the standard/best practices for organizing files on IA?
If we can store all show related files on IA, is that what we want to do?
Should IA be the main storage of a shows assets?
## git repository
As Ken as mentioned before, a git repository could be used to allow for an easy way to download and keep updated a complete copy of the website (perhaps without the audio files). This could be achieved relatively simply.
We could also take advantage of GitLab, GitHub or other git hosting providers as mirrors.
## Docker image
The image would not come with the static html, but would be set up to run the site-generator and associated update scripts on a regular basis.
Another way would be to have the image automatically rsync the website to initialize and then update on a regular basis.
-->