rss feeds contain invalid xml #301
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
✗ The feed "https://hackerpublicradio.org/hpr_mp3.rss" is not valid xml.
✗ The feed "https://hackerpublicradio.org/hpr_ogg.rss" is not valid xml.
✗ The feed "https://hackerpublicradio.org/hpr_spx.rss" is not valid xml.
✗ The feed "https://hackerpublicradio.org/hpr_total_mp3.rss" is not valid xml.
✗ The feed "https://hackerpublicradio.org/hpr_total_ogg.rss" is not valid xml.
✗ The feed "https://hackerpublicradio.org/hpr_total_spx.rss" is not valid xml.
XML
XML specifies five predefined entities needed to support every printable ASCII character: &, <, >, ', and ". The trailing semicolon is mandatory in XML (and XHTML) for these five entities (even if HTML or SGML allows omitting it for some of them, according to their DTD).
Ok, will need to create a filter that searches for any named character entities that do not match the above and convert them to their numeric equivalent. See this Wikipedia article: https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
Here is a list of named character entities recognized by HTML5: https://html.spec.whatwg.org/multipage/named-characters.html#named-character-references