Convert the RSS feeds from php to the static site #140

Open
opened 2023-07-02 14:39:16 +00:00 by ken_fallon · 14 comments
Owner
No description provided.
Owner

Tested hpr_mp3.rss and it looks OK except that the URL in the <link> field was wrong. Fixed locally but not yet uploaded. The other feeds are to be tested next.

Tested `hpr_mp3.rss` and it looks OK except that the URL in the `<link>` field was wrong. Fixed locally but not yet uploaded. The other feeds are to be tested next.
Owner

There were problems with the feeds, as reported by Ken.

The most serious issue was that the <enclosure length="..."> value was wrong due to a misunderstanding.

The code that collects and inserts this value has been modified, and test RSS feeds created. After validation these changes can be added to the repository, assuming there are no other problems.

There were problems with the feeds, as reported by Ken. The most serious issue was that the `<enclosure length="...">` value was wrong due to a misunderstanding. The code that collects and inserts this value has been modified, and test RSS feeds created. After validation these changes can be added to the repository, assuming there are no other problems.
Owner

Further issues relating to the generation of the <itunes:summary> XML required enhancements to the site-generator. Now testing is complete.

Further issues relating to the generation of the `<itunes:summary>` XML required enhancements to the `site-generator`. Now testing is complete.
Author
Owner

Check list

  • Verify new feeds match old
  • Set up redirects.
  • Verify that new updates are not triggered when switching feeds.
  • Verify new updates are triggered when a weekday day switched.
# Check list - [ ] Verify new feeds match old - [ ] Set up redirects. - [ ] Verify that new updates **are not** triggered when switching feeds. - [ ] Verify new updates **are** triggered when a weekday day switched.
Author
Owner

@davmo to maintain compatibility we need to make some changes to the static site.

The links in the static have https but should be http
<googleplay:explicit>no</googleplay:explicit> should be No or Yes - upper first letter
Title don't seem to be escaped <title>HPR4119: Cov's Jams 003</title> versus <title>HPR4119: Cov&#039;s Jams 003</title>

@davmo to maintain compatibility we need to make some changes to the static site. The links in the static have `https` but should be `http` `<googleplay:explicit>no</googleplay:explicit>` should be No or Yes - upper first letter Title don't seem to be escaped `<title>HPR4119: Cov's Jams 003</title>` versus `<title>HPR4119: Cov&#039;s Jams 003</title>`
Owner

<googleplay:explicit>no</googleplay:explicit> should be No or Yes - upper first letter

A change to the macro display_explicit_feed in shared-utils.tpl.html should do this.

This change has already been made in the test version of the site and seems to work fine.

### `<googleplay:explicit>no</googleplay:explicit>` should be No or Yes - upper first letter A change to the macro `display_explicit_feed` in `shared-utils.tpl.html` should do this. This change has already been made in the test version of the site and seems to work fine.
Owner

@davmo to maintain compatibility we need to make some changes to the static site.

The links in the static have https but should be http
<googleplay:explicit>no</googleplay:explicit> should be No or Yes - upper first letter
Title don't seem to be escaped <title>HPR4119: Cov's Jams 003</title> versus <title>HPR4119: Cov&#039;s Jams 003</title>

I need explanations of:

  • The links in the static have https but should be http
  • Title don't seem to be escaped <title>HPR4119: Cov's Jams 003</title> versus <title>HPR4119: Cov&#039;s Jams 003</title>

Explanations:

  • When the comment feed broke in the recent past it I believe it was because show 4119 had a title with a Unicode quote in it which was converted to an HTML entity. The error indicated that this entity was not compatible with XML which only takes the following entities: &amp;, &lt;, &gt;, &apos;, and &quot;. We stopped using the entity converter in the template to work around this. There was some debate about whether the <title> field should contain a <![CDATA[]]> tag, but no conclusion was reached. Please define what is required here.
> @davmo to maintain compatibility we need to make some changes to the static site. > > The links in the static have `https` but should be `http` > `<googleplay:explicit>no</googleplay:explicit>` should be No or Yes - upper first letter > Title don't seem to be escaped `<title>HPR4119: Cov's Jams 003</title>` versus `<title>HPR4119: Cov&#039;s Jams 003</title>` > I need explanations of: - The links in the static have `https` but should be `http` - Title don't seem to be escaped `<title>HPR4119: Cov's Jams 003</title>` versus `<title>HPR4119: Cov&#039;s Jams 003</title>` Explanations: - When the comment feed broke in the recent past it I believe it was because show 4119 had a title with a Unicode quote in it which was converted to an HTML entity. The error indicated that this entity was not compatible with XML which only takes the following entities: `&amp;`, `&lt;`, `&gt;`, `&apos;`, and `&quot;`. We stopped using the entity converter in the template to work around this. There was some debate about whether the `<title>` field should contain a `<![CDATA[]]>` tag, but no conclusion was reached. Please define what is required here.
Author
Owner

The title currently is not escaped, so that should be the case for the static feeds. We need the feeds to be identical for the transition. Epically the enclosure and guid fields, as changes in either could be interpreted by pod catchers as a trigger to download all the episodes again.

Ideally the feeds should be identical except for the generator.

The title currently is not escaped, so that should be the case for the static feeds. We need the feeds to be identical for the transition. Epically the `enclosure` and `guid` fields, as changes in either could be interpreted by pod catchers as a trigger to download all the episodes again. Ideally the feeds should be identical except for the `generator`.
Owner

According to https://support.google.com/podcast-publishers/answer/9889544?hl=en valid values for <itunes:explicit> are case-sensitive 'yes' and 'no' (where absence implies 'no'). I will undo the change I made which generates values of Yes and No.

I couldn't find anything definitive. Downloading the DTD for this extension results in Javascript probing my machine for iTunes and refusing to do anything else because I don't have it. I never will have iTunes of course. This strategy seems to go against the design of the XML DTD as I understand it.

According to https://support.google.com/podcast-publishers/answer/9889544?hl=en valid values for `<itunes:explicit>` are case-sensitive 'yes' and 'no' (where absence implies 'no'). I will undo the change I made which generates values of `Yes` and `No`. _I couldn't find anything definitive. Downloading the DTD for this extension results in Javascript probing my machine for iTunes and refusing to do anything else because I don't have it. I never will have iTunes of course. This strategy seems to go against the design of the XML DTD as I understand it._
Owner

According to https://www.google.com/schemas/play-podcasts/1.0/play-podcasts.xsd valid values for gogleplay:explicit seem to be case-sensitive 'yes' and 'no' (where absence implies 'no'), though it's not explicit as far as I know (only lower case forms are shown in the schema). I will undo the change I made which generates values of Yes and No.

According to https://www.google.com/schemas/play-podcasts/1.0/play-podcasts.xsd valid values for <gogleplay:explicit> seem to be case-sensitive 'yes' and 'no' (where absence implies 'no'), though it's not explicit as far as I know (only lower case forms are shown in the schema). I will undo the change I made which generates values of `Yes` and `No`.
Author
Owner

The dynamic/php feeds we are replacing are deployed to thousands of people with possibly hundreds of podcatches that we know are able to parse them without issue.

The purpose of this exercise is to switch to the static feeds, with the least risk possible. That would ideally be replacing a binary compatible file. Each change we do increases the risk. So please ignore improvements and corrections for now and implement what we know is working.

The dynamic/php feeds we are replacing are deployed to thousands of people with possibly hundreds of podcatches that we know are able to parse them without issue. The purpose of this exercise is to switch to the static feeds, with the least risk possible. That would ideally be replacing a binary compatible file. Each change we do increases the risk. So please ignore improvements and corrections for now and implement what we know is working.
Owner

The dynamic/php feeds we are replacing are deployed to thousands of people with possibly hundreds of podcatches that we know are able to parse them without issue.

The purpose of this exercise is to switch to the static feeds, with the least risk possible. That would ideally be replacing a binary compatible file. Each change we do increases the risk. So please ignore improvements and corrections for now and implement what we know is working.

Yes, of course. However, the templates that produce the static versions already contain changes which make the RSS differ from the PHP versions, so it's necessary to undo things or adjust them to make them identical. Also, there was a point last year where the embedded texts were changed to fix grammatical errors, etc. These would need to be undone to achieve binary compatibility, and so on.

> The dynamic/php feeds we are replacing are deployed to thousands of people with possibly hundreds of podcatches that we know are able to parse them without issue. > > The purpose of this exercise is to switch to the static feeds, with the least risk possible. That would ideally be replacing a binary compatible file. Each change we do increases the risk. So please ignore improvements and corrections for now and implement what we know is working. Yes, of course. However, the templates that produce the static versions already contain changes which make the RSS differ from the PHP versions, so it's necessary to undo things or adjust them to make them identical. Also, there was a point last year where the embedded texts were changed to fix grammatical errors, etc. These would need to be undone to achieve binary compatibility, and so on.
Author
Owner

As identical as possible. Each day we make changes to "display" fields, so fixing grammar in our description has a very low risk.

On the other hand anything that requires an "action" has a high risk. Highest is a change in the enclosure and the guid tags. However the Explicit flags of Google and itunes are also action flags, so should remain as they currently are.

Once we have the new feeds working, we can slowly start adding the "action" back corrections like the Google:explicit to lower case.

High risk changes of the https in the enclosure and the guid tags would require a completely different approach.

As identical as possible. Each day we make changes to "display" fields, so fixing grammar in our description has a very low risk. On the other hand anything that requires an "action" has a high risk. Highest is a change in the enclosure and the guid tags. However the Explicit flags of Google and itunes are also action flags, so should remain as they currently are. Once we have the new feeds working, we can slowly start adding the "action" back corrections like the Google:explicit to lower case. High risk changes of the https in the `enclosure` and the `guid` tags would require a completely different approach.
Owner

I have made these changes:

  • Added another URL to the configuration file (http_basename) with http instead of https. The templates can call absolute_url(http_baseurl) and be compatible with the PHP.
  • Changed the function xml_entity in the site-generator to call encode_entities which doesn't generate hex entities. This brings the static RSS in step with the PHP version except the PHP generates &#039; but the Perl module generates &#39;.
  • Since all templates that generate "explicit" values use macro display_explicit_feed to generate 'yes/no' values I created display_explicit_feed_2 to return 'Yes/No' in the few cases that are needed to match the PHP feeds.
  • Added calls to filters html-strip and xml_entity to the template generating the <title> tag to make the static feeds look like the PHP ones.
  • Discovered that HTML::Strip by default adds spaces when replacing tags in some cases. Turned this off.

These changes have brought the two types of feeds closer together, though they are not identical.

Discovered that the PHP code is doing this in certain cases:

USB Rubber Ducky Scripts &amp;amp; Payloads Python 3 Arduino

The original text contained an ampersand that was replaced by an entity &amp; but this has been double encoded.

Note: Since Gitea doesn't seem to allow RSS files to be uploaded here, I will post examples to Matrix.

I have made these changes: - Added another URL to the configuration file (`http_basename`) with `http` instead of `https`. The templates can call `absolute_url(http_baseurl)` and be compatible with the PHP. - Changed the function `xml_entity` in the `site-generator` to call `encode_entities` which doesn't generate hex entities. This brings the static RSS in step with the PHP version **except** the PHP generates `&#039;` but the Perl module generates `&#39;`. - Since all templates that generate "explicit" values use macro `display_explicit_feed` to generate 'yes/no' values I created `display_explicit_feed_2` to return 'Yes/No' in the few cases that are needed to match the PHP feeds. - Added calls to filters `html-strip` and `xml_entity` to the template generating the `<title>` tag to make the static feeds look like the PHP ones. - Discovered that `HTML::Strip` by default adds spaces when replacing tags in some cases. Turned this off. These changes have brought the two types of feeds closer together, though they are not **identical**. Discovered that the PHP code is doing this in certain cases: ``` USB Rubber Ducky Scripts &amp;amp; Payloads Python 3 Arduino ``` The original text contained an ampersand that was replaced by an entity `&amp;` but this has been double encoded. Note: Since Gitea doesn't seem to allow RSS files to be uploaded here, I will post examples to Matrix.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: HPR/hpr_generator#140
No description provided.