feedWatcher: added the parsing of HTML feeds to get the title tag in the
<head> area; new database fields relating to the copyright check
done, and why the feed was allowed in if done so manually; added dry
run mode; changed the way -load and -delete work so each can be
given URLs on the command line; starting to report settings at start
time (needs work); -load and -delete not allowed together; more
logging; addition of a _debug function; enhancement of reportFeed to
show one feed and a summary of relevant details (more useful than
dumping the entire database this way); added getHTMLTitle for
parsing out the HTML title; enhanced checkCopyright to get a reason
if in manual mode and a feed is allowed in; needs a lot of
clean-up!
feedWatcher.{html,json,mkd,opml,pdf}: various reports.
feedWatcher_3.tpl: For making Markdown which is turned into PDF.
'Licence' becomes 'Copyright'
feedWatcher_5.tpl: for dumping all the URLs in the database
& regenerating everything
feedWatcher_schema.sql: new fields added
Changes to the main 'feedWatcher' script: new -check=mode and
-rejects=file options to automate copyright checks and save rejected
URLs. Made subroutines parseFeed, and execSQL more resilient.
Experimented with using XML::FeedPP but haven't done so yet.
Enhanced checkCopyright to do auto, manual and no checking. Some POD
additions.
The database is currently being sent to the repo, but this may be unwise.
The script 'make_reports' is for making the various reports uploaded
here: html, JSON, OPML, Markdown and PDF. The PDF is built from the
Markdown with Pandoc. The HTML is generated from the template
'feedWatcher.tpl', which is the default.
The TT² template 'feedWatcher_5.tpl' is for dumping the URLs from the
database into a file so that they can be reloaded. Daily dumps of
the database are made on my workstation, and kept for 6 months.