Problems with Unicode in the site-generator #158
Labels
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: rho_n/hpr_generator#158
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The current version of the
site-generator
(2023-08-27) produces badly encoded Unicode in shows which have such characters. There are currently 603 such shows with Unicode in titles, summaries, tags and notes.The problem might have been introduced in the last update, and seems to be related to the addition of tag parsing code.
Created a pull request which undoes the Unicode (
utf8
) adjustments from the last update and defaults to allutf8
IO. These updates had been added to try and stop a problem with tags containingutf8
characters, but did not work.The latest change detects
utf8
characters in tags and ensures they are properly encoded. This happens in subroutineparse_csv
which provides thevmethod
csv_parse
.Using a clone of the static site, 60 shows were chosen at random from the 603 shows containing
utf8
, and these were checked. Only one showed encoding problems and that seems to be because of invalid characters in the database.Pull request #159 merged.
Closing this issue.