Commit Graph

6 Commits

Author SHA1 Message Date
Dave Morriss
db39655199 Additions to the database and feedWatcher
feedWatcher: added the parsing of HTML feeds to get the title tag in the
    <head> area; new database fields relating to the copyright check
    done, and why the feed was allowed in if done so manually; added dry
    run mode; changed the way -load and -delete work so each can be
    given URLs on the command line; starting to report settings at start
    time (needs work); -load and -delete not allowed together; more
    logging; addition of a _debug function; enhancement of reportFeed to
    show one feed and a summary of relevant details (more useful than
    dumping the entire database this way); added getHTMLTitle for
    parsing out the HTML title; enhanced checkCopyright to get a reason
    if in manual mode and a feed is allowed in; needs a lot of
    clean-up!

feedWatcher.{html,json,mkd,opml,pdf}: various reports.

feedWatcher_3.tpl: For making Markdown which is turned into PDF.
    'Licence' becomes 'Copyright'

feedWatcher_5.tpl: for dumping all the URLs in the database
    & regenerating everything

feedWatcher_schema.sql: new fields added
2023-01-14 23:13:49 +00:00
Dave Morriss
01ec2cf92f Feed updates 2023-01-11 09:45:38 +00:00
Dave Morriss
d549c7bed0 Hacked around a bug in XML::RSS 2023-01-10 20:22:47 +00:00
Dave Morriss
4f744f37c4 Updates for FOSDEM 2023
Changes to the main 'feedWatcher' script: new -check=mode and
    -rejects=file options to automate copyright checks and save rejected
    URLs. Made subroutines parseFeed, and execSQL more resilient.
    Experimented with using XML::FeedPP but haven't done so yet.
    Enhanced checkCopyright to do auto, manual and no checking. Some POD
    additions.

The database is currently being sent to the repo, but this may be unwise.

The script 'make_reports' is for making the various reports uploaded
    here: html, JSON, OPML, Markdown and PDF. The PDF is built from the
    Markdown with Pandoc. The HTML is generated from the template
    'feedWatcher.tpl', which is the default.

The TT² template 'feedWatcher_5.tpl' is for dumping the URLs from the
    database into a file so that they can be reloaded. Daily dumps of
    the database are made on my workstation, and kept for 6 months.
2023-01-09 18:20:17 +00:00
Dave Morriss
f9cff60021 Regenerated PDF 2022-11-20 22:49:57 +00:00
Dave Morriss
3c4d96db1b first commit 2022-11-19 21:27:51 +00:00