Feed updates

This commit is contained in:
Dave Morriss
2023-01-11 09:45:38 +00:00
parent d549c7bed0
commit 01ec2cf92f
7 changed files with 128 additions and 24 deletions

View File

@@ -21,14 +21,17 @@
# (created with Pandoc) in the same directory as this script for
# details of what has been done to develop the original design.
#
# Further development is taking place in 2022/2023, again for
# FOSDEM.
#
# OPTIONS: ---
# REQUIREMENTS: ---
# BUGS: ---
# NOTES: ---
# AUTHOR: Dave Morriss (djm), Dave.Morriss@gmail.com
# VERSION: 0.1.1
# VERSION: 0.1.2
# CREATED: 2013-12-25 12:40:33
# REVISION: 2023-01-09 15:28:13
# REVISION: 2023-01-10 22:44:38
#
#-------------------------------------------------------------------------------
# Released under the terms of the GNU Affero General Public License (AGPLv3)
@@ -44,6 +47,10 @@ use utf8;
use feature qw{ postderef say signatures state };
no warnings qw{ experimental::postderef experimental::signatures } ;
#
# There's an issue in XML::RSS, so we're using a loocal version with a hack.
# It's in ./lib/ and FiindBin::libs looks there to find it.
#
use FindBin::libs;
use XML::RSS;
@@ -85,7 +92,7 @@ use Data::Dumper;
#
# Version number (manually incremented)
#
our $VERSION = '0.1.1';
our $VERSION = '0.1.2';
#
# Script name
@@ -2538,7 +2545,7 @@ feedWatcher - watch a collection of podcast feeds
=head1 VERSION
This documentation refers to I<feedWatcher> version 0.1.1
This documentation refers to I<feedWatcher> version 0.1.2
=head1 USAGE
@@ -2547,10 +2554,12 @@ This documentation refers to I<feedWatcher> version 0.1.1
[-check[=mode]] [-out=FILE] [-json[=FILE]] [-opml[=FILE]] [-template=FILE]
[-[no]silent] [-config=FILE] [-debug=N] [URL ...]
# Load URLs from a file, perform checks and redirect output to standard
# output and a named file
./feedWatcher -load=feedWatcher_dumped_URLs.txt -check=auto | \
tee load_$(date +'%Y%m%d_%H%M%S')
# Load URLs from a file, perform checks, save the rejects and redirect output
# to a named file. Uses an alias:
# alias isostamp='date +"%Y%m%d_%H%M%S"'
#
./feedWatcher -load=feedWatcher_dumped_URLs.txt -check=auto \
-rej=output/rejects_$(isostamp).out > output/load_$(isostamp).out 2>&1
# Generate Markdown output with a template writing to a named file
./feedWatcher -tem=feedWatcher_3.tpl -out=feedWatcher.mkd
@@ -2607,26 +2616,69 @@ script, (B<-noscan>) omits the scan.
NOTE: This function is not implemented yet.
=item B<-[no]silent>
This option controls the amount of output written by the script. In
B<nosilent> mode the script reports on the processing of each URL it receives,
which can be fairly verbose. This can be turned off with this option, though
it is often wiser to redirect the output for later review rather than to
suppress it.
=item B<-out=FILE>
This option defines an output file to receive outputi from reporting
This option defines an output file to receive output from the reporting
functions. If the option is omitted the data is written to STDOUT, allowing it
to be redirected if required. This option does not cause transactional
listings to be captured.
to be redirected if required. See the 'Usage' section above for an example of
how transactional output can be redirected.
=item B<-[no]check>
=item B<-check[=MODE]>
This option (B<-check>) causes each feed which is being to be checked against
the script user to check that it's OK to add it. The script reports the
I<copyright> field and requests a I<y> or I<n> response.
This option (B<-check[=MODE]>) controls the mode used to check
the copyright setting of the current feed and deciide whether to add it.
=item B<-[no]report>
Possible settings are: B<auto>, B<manual> and B<none>.
This option (B<-report>) causes a report of the contents of the database to be
generated. The negated form, which is also the default behaviour of the
script, (B<-noreport>) omits the report.
=over 4
NOTE: The report is currently very simple.
=item B<-check=auto> or B<-check>
An automatic check is made against a series of regular expressions looking for
something in the I<copyright> field which signifies that the feed is under
a Creative Commons licence. A blank field is currently considered to denote
this type of licence.
The option may be written as B<-check> when it is interpreted as B<-check=auto>
=item B<-check=manual>
In this mode the script pauses after processing each feed to ask the script
user to check that it's OK to add it. The script reports the I<copyright>
field and requests a I<y> or I<n> response.
=item B<-check=manual>
In this mode no checks are performed.
=back
=item B<-report[=title]>
This option (B<-report[=title]>) causes a fairly simplistic report to be
generated to enable the database contents to be examined. The I<title>
argument specifies a case-sensitive feed title or component of such a title.
So, for instance B<-report=Hacker> currently reports on the batabase data
relating to the "Hacker Public Radio" feed.
If the argument is omitted the entire database is reported.
Reports consist of the details of the RSS (or Atom) channel with other
information about the site hosting the feed such as the IP address. The latest
episode in the feed is also reported.
Note that the feed information in the database is a snapshot made when the
feed details were last loaded. This is static information and does not get
updatedunless the feed is deleted and reloaded, or the B<-scan> function is run
(not currently available).
=item B<-json[=FILE]>
@@ -2650,10 +2702,10 @@ file.
If the B<=FILE> portion is omitted a default name of 'feedWatcher.opml' is
used.
=item B<-template=FILE>
=item B<-template[=FILE]>
This option defines the template used to generate a form of the feed data. The
template is written using the B<Template> toolkit language.
template is written using the B<Template Toolkit> language.
If the file name is omitted then the script uses the file B<feedWatcher.tpl>
in the same directory as the script. If this file does not exist then the