Problems with Unicode in the site-generator #158

Closed
opened 2023-08-27 10:28:58 +00:00 by davmo · 2 comments
Collaborator

The current version of the site-generator (2023-08-27) produces badly encoded Unicode in shows which have such characters. There are currently 603 such shows with Unicode in titles, summaries, tags and notes.

The problem might have been introduced in the last update, and seems to be related to the addition of tag parsing code.

The current version of the `site-generator` (2023-08-27) produces badly encoded Unicode in shows which have such characters. There are currently 603 such shows with Unicode in titles, summaries, tags and notes. The problem might have been introduced in the last update, and seems to be related to the addition of tag parsing code.
Author
Collaborator

Created a pull request which undoes the Unicode (utf8) adjustments from the last update and defaults to all utf8 IO. These updates had been added to try and stop a problem with tags containing utf8 characters, but did not work.

The latest change detects utf8 characters in tags and ensures they are properly encoded. This happens in subroutine parse_csv which provides the vmethod csv_parse.

Using a clone of the static site, 60 shows were chosen at random from the 603 shows containing utf8, and these were checked. Only one showed encoding problems and that seems to be because of invalid characters in the database.

Created a pull request which undoes the Unicode (`utf8`) adjustments from the last update and defaults to all `utf8` IO. These updates had been added to try and stop a problem with tags containing `utf8` characters, but did not work. The latest change detects `utf8` characters in tags and ensures they are properly encoded. This happens in subroutine `parse_csv` which provides the `vmethod` `csv_parse`. Using a clone of the static site, 60 shows were chosen at random from the 603 shows containing `utf8`, and these were checked. Only one showed encoding problems and that seems to be because of invalid characters in the database.
Author
Collaborator

Pull request #159 merged.

Closing this issue.

Pull request #159 merged. Closing this issue.
davmo closed this issue 2023-08-27 13:56:37 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: rho_n/hpr_generator#158
No description provided.