rss feeds contain invalid xml #301
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
✗ The feed "https://hackerpublicradio.org/hpr_mp3.rss" is not valid xml.
✗ The feed "https://hackerpublicradio.org/hpr_ogg.rss" is not valid xml.
✗ The feed "https://hackerpublicradio.org/hpr_spx.rss" is not valid xml.
✗ The feed "https://hackerpublicradio.org/hpr_total_mp3.rss" is not valid xml.
✗ The feed "https://hackerpublicradio.org/hpr_total_ogg.rss" is not valid xml.
✗ The feed "https://hackerpublicradio.org/hpr_total_spx.rss" is not valid xml.
XML
XML specifies five predefined entities needed to support every printable ASCII character: &, <, >, ', and ". The trailing semicolon is mandatory in XML (and XHTML) for these five entities (even if HTML or SGML allows omitting it for some of them, according to their DTD).
Ok, will need to create a filter that searches for any named character entities that do not match the above and convert them to their numeric equivalent. See this Wikipedia article: https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
Here is a list of named character entities recognized by HTML5: https://html.spec.whatwg.org/multipage/named-characters.html#named-character-references
Where do we want to resolve this issue?
A classic case of "there is a problem" but not providing any useful information. Thank you for you patients in this matter ;-)
I added the tool I use to check this hpr-check-feeds.
It checks the dynamic rss feeds (currently in use) with the
hpr_generatorones.As an example the feeds below should be functionally identical.
✗ The feed "https://hackerpublicradio.org/hpr_total_mp3.rss" is not valid xml.
🗸 The feed "https://hackerpublicradio.org/hpr_total_rss.php" is valid xml.
So the solution is we support UTF-8 everywhere, so we should not escape it.
I also noticed that
<link>eps/hpr4499/index.html</link>is missing the fqdn part, and the enclosure is