rss feeds contain invalid xml #301

Open
opened 2025-11-11 15:16:45 +00:00 by ken_fallon · 2 comments
Owner

✗ The feed "https://hackerpublicradio.org/hpr_mp3.rss" is not valid xml.
✗ The feed "https://hackerpublicradio.org/hpr_ogg.rss" is not valid xml.
✗ The feed "https://hackerpublicradio.org/hpr_spx.rss" is not valid xml.
✗ The feed "https://hackerpublicradio.org/hpr_total_mp3.rss" is not valid xml.
✗ The feed "https://hackerpublicradio.org/hpr_total_ogg.rss" is not valid xml.
✗ The feed "https://hackerpublicradio.org/hpr_total_spx.rss" is not valid xml.

$ wget --no-verbose https://hackerpublicradio.org/hpr_mp3.rss --output-document=- | xmllint --format -
-:1814: parser error : Entity 'Atilde' not defined
    <title>HPR4499: Greg Farough and Zo&Atilde;&laquo; Kooyman of the FSF interv
                                               ^
-:1814: parser error : Entity 'laquo' not defined
    <title>HPR4499: Greg Farough and Zo&Atilde;&laquo; Kooyman of the FSF interv

✗ The feed "https://hackerpublicradio.org/hpr_mp3.rss" is not valid xml. ✗ The feed "https://hackerpublicradio.org/hpr_ogg.rss" is not valid xml. ✗ The feed "https://hackerpublicradio.org/hpr_spx.rss" is not valid xml. ✗ The feed "https://hackerpublicradio.org/hpr_total_mp3.rss" is not valid xml. ✗ The feed "https://hackerpublicradio.org/hpr_total_ogg.rss" is not valid xml. ✗ The feed "https://hackerpublicradio.org/hpr_total_spx.rss" is not valid xml. ``` $ wget --no-verbose https://hackerpublicradio.org/hpr_mp3.rss --output-document=- | xmllint --format - -:1814: parser error : Entity 'Atilde' not defined <title>HPR4499: Greg Farough and Zo&Atilde;&laquo; Kooyman of the FSF interv ^ -:1814: parser error : Entity 'laquo' not defined <title>HPR4499: Greg Farough and Zo&Atilde;&laquo; Kooyman of the FSF interv ```
Owner

XML
XML specifies five predefined entities needed to support every printable ASCII character: &, <, >, ', and ". The trailing semicolon is mandatory in XML (and XHTML) for these five entities (even if HTML or SGML allows omitting it for some of them, according to their DTD).

Ok, will need to create a filter that searches for any named character entities that do not match the above and convert them to their numeric equivalent. See this Wikipedia article: https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

XML XML specifies five predefined entities needed to support every printable ASCII character: &amp;, &lt;, &gt;, &apos;, and &quot;. The trailing semicolon is mandatory in XML (and XHTML) for these five entities (even if HTML or SGML allows omitting it for some of them, according to their DTD). Ok, will need to create a filter that searches for any named character entities that do not match the above and convert them to their numeric equivalent. See this Wikipedia article: https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
Owner

Here is a list of named character entities recognized by HTML5: https://html.spec.whatwg.org/multipage/named-characters.html#named-character-references

Here is a list of named character entities recognized by HTML5: https://html.spec.whatwg.org/multipage/named-characters.html#named-character-references
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: HPR/hpr_generator#301