Files
hpr_website/www/eps/hpr1694/hpr1694_full_shownotes.html

284 lines
14 KiB
HTML
Raw Normal View History

2025-10-28 18:39:57 +01:00
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<title></title>
<style type="text/css">code{white-space: pre;}</style>
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body>
<h1 id="my-apod-downloader">My APOD Downloader</h1>
<h2 id="astronomy-picture-of-the-day">Astronomy Picture of the Day</h2>
<p>You have probably heard of the <a href="http://apod.nasa.gov/apod/astropix.html">Astronomy Picture of the Day (APOD)</a> site. It has existed since 1995, is provided by <a href="http://en.wikipedia.org/wiki/NASA">NASA</a> and <a href="http://en.wikipedia.org/wiki/Michigan_Technological_University">Michigan Technological University (MTU)</a> and is created and managed by <a href="http://www.mtu.edu/physics/department/faculty/nemiroff/">Robert Nemiroff</a> and <a href="http://antwrp.gsfc.nasa.gov/htmltest/jbonnell/www/bonnell.html">Jerry Bonnell</a>. The FAQ on the site says <em>&quot;The APOD archive contains the largest collection of annotated astronomical images on the internet&quot;</em>.</p>
<h2 id="the-downloader">The Downloader</h2>
<p>Being a KDE user I quite like a moderate amount of bling, and I particularly like to have a picture on my desktop. I like to rotate my wallpaper pictures every so often, so I want to have a collection of images. To this end I download the APOD on my server every day and make the images available through an NFS-mounted volume.</p>
<p>In 2012 I wrote a Perl script to perform the download, using a fairly primitive HTML parsing method. This script has been improved over the intervening years and now uses the Perl module <a href="http://search.cpan.org/~cjm/HTML-Tree-5.03/lib/HTML/TreeBuilder.pm"><code>HTML::TreeBuilder</code></a> which I believe is much better at parsing HTML.</p>
<p>The version of the script I use myself also includes the Perl module <code>Image::Magick</code> which interfaces to the awesome <a href="http://www.imagemagick.org/"><code>ImageMagick</code></a> image manipulation software suite. I use this to annotate the downloaded image with the title parsed from the HTML so I know what it is.</p>
<p>The script I am presenting here is called <code>collect_apod_simple</code> and does not use <code>ImageMagick</code>. I chose to omit it because the installation of this suite and the related Perl module can be difficult. Also, I do not feel that the annotation always works as well as it could, and I have not yet found the time to correct this shortcoming.</p>
<p>A version of the more advanced script (called <code>collect_apod</code>) is available in the same place as <code>collect_apod_simple</code> should you wish to give it a try. Both scripts are available on <em>GitLab</em> under the link <a href="https://gitlab.com/davmo/hprmisc" class="uri">https://gitlab.com/davmo/hprmisc</a>.</p>
<h2 id="the-code">The Code</h2>
<p>If you are acquainted with Perl you'll probably find this script quite simple. All it really does is:</p>
<ul>
<li><p>Get or compute the date string for building the APOD URL</p></li>
<li><p>Download the HTML on the selected APOD page</p></li>
<li><p>Look for an image being used as a link</p></li>
<li><p>Download the image being linked to and save it where requested</p></li>
</ul>
<p>The following is a numbered listing with annotations. There are a several comments in the script itself, but the annotations are there to try and make the various sections as clear as possible.</p>
<pre><code> 1 #!/usr/bin/env perl
2 #===============================================================================
3 #
4 # FILE: collect_apod_simple
5 #
6 # USAGE: ./collect_apod_simple [YYMMDD]
7 #
8 # DESCRIPTION: Downloads the current Astronomy Picture of the Day or that
9 # relating to the formatted date provided as an argument. In
10 # this context &quot;current&quot; can mean two URLs: .../astropix.html or
11 # .../apYYMMDD.html. We now *do not* download the
12 # .../astropix.html version since it has a different HTML
13 # layout.
14 #
15 # OPTIONS: ---
16 # REQUIREMENTS: ---
17 # BUGS: ---
18 # NOTES: Based on &#39;collect_apod&#39; but without the Image::Magick stuff,
19 # for simplicity and for release to the HPR community
20 # AUTHOR: Dave Morriss (djm), Dave.Morriss@gmail.com
21 # VERSION: 0.0.1
22 # CREATED: 2015-01-02 19:58:01
23 # REVISION: 2015-01-03 23:00:27
24 #
25 #===============================================================================
26
27 use 5.010;
28 use strict;
29 use warnings;
30 use utf8;
31
32 use LWP::UserAgent;
33 use DateTime;
34 use HTML::TreeBuilder 5 -weak;
35 </code></pre>
<hr />
<p>Lines 32-34 define the modules the script uses:</p>
<ul>
<li><a href="http://search.cpan.org/dist/libwww-perl/lib/LWP/UserAgent.pm"><code>LWP::UserAgent</code></a> used to perform the web downloads</li>
<li><a href="http://search.cpan.org/~drolsky/DateTime-1.18/lib/DateTime.pm"><code>DateTime</code></a> generates and formats the default date</li>
<li><a href="http://search.cpan.org/~cjm/HTML-Tree-5.03/lib/HTML/TreeBuilder.pm"><code>HTML::TreeBuilder</code></a> parses HTML</li>
</ul>
<hr />
<pre><code> 36 #
37 # Version number (manually incremented)
38 #
39 our $VERSION = &#39;0.0.1&#39;;
40
41 #
42 # Set to 0 to be more silent
43 #
44 my $DEBUG = 1;
45
46 #
47 # Script name
48 #
49 ( my $PROG = $0 ) =~ s|.*/||mx;
50
51 #-------------------------------------------------------------------------------
52 # Edit this to your needs
53 #-------------------------------------------------------------------------------
54 #
55 # Where the script will download the picture. Edit this to where you want
56 #
57 my $image_base = &quot;$ENV{HOME}/Backgrounds/apod&quot;;
58
59 #-------------------------------------------------------------------------------
60 # Nothing needs editing below here
61 #-------------------------------------------------------------------------------
62
63 #
64 # Get the argument or default it
65 #
66 my $arg = shift;
67 unless ( defined($arg) ) {
68 #
69 # APOD wants a date in YYMMDD format
70 #
71 my $dt = DateTime-&gt;now;
72 $arg = sprintf( &quot;%02i%02i%02i&quot;,
73 substr( $dt-&gt;year, -2 ),
74 $dt-&gt;month, $dt-&gt;day );
75 }
76
77 #
78 # Check the argument is a valid date in YYMMDD format
79 #
80 die &quot;Usage: $PROG [YYMMDD]\n&quot; unless ( $arg =~ /^\d{6}$/ );
81 </code></pre>
<hr />
<p>Lines 66-80 collect the date from the command line, or if none is given generate the correctly formatted date. If a date in an invalid format is given the script aborts.</p>
<hr />
<pre><code> 82 #
83 # Make an URL depending on the argument
84 #
85 my $apod_base = &quot;http://apod.nasa.gov/apod&quot;;
86 my $apod_URL = &quot;$apod_base/ap$arg.html&quot;;
87 </code></pre>
<hr />
<p>Lines 85-86 define the APOD URL for the chosen date. This will look like http://apod.nasa.gov/apod/ap150106.html for 2015-01-06 for example.</p>
<hr />
<pre><code> 88 #
89 # General declarations
90 #
91 my ( $image_URL, $image_file );
92 my ( $tree, $title );
93 my ( $url, $element, $attr, $tag );
94
95 #
96 # Enable Unicode mode
97 #
98 binmode STDOUT, &quot;:encoding(UTF-8)&quot;;
99 binmode STDERR, &quot;:encoding(UTF-8)&quot;;
100
101 if ($DEBUG) {
102 print &quot;Base URL: $apod_base\n&quot;;
103 print &quot;APOD URL: $apod_URL\n&quot;;
104 print &quot;Image base: $image_base\n&quot;;
105 print &quot;\n&quot;;
106 }
107
108 #
109 # Get the HTML page, pretending to be some unknown User Agent
110 #
111 my $ua = LWP::UserAgent-&gt;new;
112 $ua-&gt;agent(&quot;MyApp/0.1&quot;);
113
114 my $req = HTTP::Request-&gt;new( GET =&gt; $apod_URL );
115
116 my $res = $ua-&gt;request($req);
117 if ( $res-&gt;is_success ) {
118 print &quot;GET request successful\n&quot; if $DEBUG;
119
120 #
121 # Parse the HTML we got back
122 #
123 $tree = HTML::TreeBuilder-&gt;new;
124 $tree-&gt;parse_content( $res-&gt;content_ref );
125 </code></pre>
<hr />
<p>Lines 111-114 set up and download the APOD web page. If the download was successful then the HTML is parsed with HTML::TreeBuilder in lines 123 and 124.</p>
<hr />
<pre><code> 126 #
127 # Get and display the title in debug mode
128 #
129 if ($DEBUG) {
130 if ( $title = $tree-&gt;look_down( _tag =&gt; &#39;title&#39; ) ) {
131 $title = $title-&gt;as_trimmed_text();
132 print &quot;Found title: $title\n&quot; if $title;
133 }
134 }
135
136 #
137 # Look for the image. This is expected to be the href attribute of an &lt;a&gt;
138 # tag. The image we see on the page is merely a link to this (usually)
139 # larger image.
140 #
141 for ( @{ $tree-&gt;extract_links(&#39;a&#39;) } ) {
142 ( $url, $element, $attr, $tag ) = @$_;
143 if ($DEBUG) {
144 print &quot;Found: $url\n&quot; if $url;
145 }
146 last unless defined($url);
147 last if ( $url =~ /\.(jpg|png)$/i );
148 }
149 </code></pre>
<hr />
<p>Lines 141-148 consist of a loop which walks through the parsed HTML looking for <a> tags. The loop ends if the tag references an image URL.</p>
<hr />
<pre><code> 150 #
151 # Abort if no image (it might be a video or a GIF)
152 #
153 die &quot;Image URL not found\n&quot;
154 unless defined($url)
155 &amp;&amp; $url =~ /\.(jpg|png)$/i;
156 </code></pre>
<hr />
<p>Lines 153-155 check that an image URL was actually found. Some days the APOD site might host a YouTube video or some other animated display. The script is not interested in these since they are no use as wallpaper.</p>
<hr />
<pre><code> 157 $image_URL = &quot;$apod_base/$url&quot;;
158
159 #
160 # Extract the final part of the URL for the file name. We usually get
161 # a JPEG, sometimes with a shouty extension, which we change.
162 #
163 ( $image_file = $image_URL ) =~ s|.*/||mx;
164 ( $image_file = &quot;$image_base/$image_file&quot; ) =~ s/JPG$/jpg/mx;
165
166 if ($DEBUG) {
167 print &quot;Image URL: $image_URL\n&quot;;
168 print &quot;Image file: $image_file\n&quot;;
169 }
170
171 #
172 # Abort if the file already exists (the script already ran?)
173 #
174 die &quot;File $image_file already exists\n&quot; if ( -f $image_file );
175 </code></pre>
<hr />
<p>Lines 157-174 prepare the image URL and make a file name to hold the image.</p>
<hr />
<pre><code> 176 #
177 # Set up the GET request for the image
178 #
179 $req = HTTP::Request-&gt;new( GET =&gt; $image_URL );
180
181 #
182 # Download the image to the (possibly renamed) image file
183 #
184 $res = $ua-&gt;request( $req, $image_file );
185 if ( $res-&gt;is_success ) {
186 print &quot;Downloaded to $image_file\n&quot; if $DEBUG;
187 }
188 else {
189 #
190 # The image download failed
191 #
192 die $res-&gt;status_line, &quot; ($image_URL)\n&quot;;
193 }
194 </code></pre>
<hr />
<p>Lines 179-193 download the image to a file</p>
<hr />
<pre><code> 195 }
196 else {
197 #
198 # We failed to get the web page
199 #
200 die $res-&gt;status_line, &quot; ($apod_URL)\n&quot;;
201 }
202
203 exit;
204
205 # vim: syntax=perl:ts=8:sw=4:et:ai:tw=78:fo=tcrqn21:fdm=marker</code></pre>
<p>I hope you find the script interesting and/or useful.</p>
<h2 id="links">Links</h2>
<ul>
<li>Wikipedia entry <a href="http://en.wikipedia.org/wiki/Astronomy_Picture_of_the_Day" class="uri">http://en.wikipedia.org/wiki/Astronomy_Picture_of_the_Day</a></li>
<li>Astronomy Picture of the Day <a href="http://apod.nasa.gov/apod/astropix.html" class="uri">http://apod.nasa.gov/apod/astropix.html</a></li>
<li>NASA <a href="http://en.wikipedia.org/wiki/NASA" class="uri">http://en.wikipedia.org/wiki/NASA</a></li>
<li>Michigan Technological University (MTU) <a href="http://en.wikipedia.org/wiki/Michigan_Technological_University" class="uri">http://en.wikipedia.org/wiki/Michigan_Technological_University</a></li>
<li>Robert Nemiroff <a href="http://www.mtu.edu/physics/department/faculty/nemiroff/" class="uri">http://www.mtu.edu/physics/department/faculty/nemiroff/</a></li>
<li>Jerry Bonnell <a href="http://antwrp.gsfc.nasa.gov/htmltest/jbonnell/www/bonnell.html" class="uri">http://antwrp.gsfc.nasa.gov/htmltest/jbonnell/www/bonnell.html</a></li>
<li><code>HTML::TreeBuilder</code> Perl module <a href="http://search.cpan.org/~cjm/HTML-Tree-5.03/lib/HTML/TreeBuilder.pm" class="uri">http://search.cpan.org/~cjm/HTML-Tree-5.03/lib/HTML/TreeBuilder.pm</a></li>
<li><code>ImageMagick</code> image manipulation software suite <a href="http://www.imagemagick.org/" class="uri">http://www.imagemagick.org/</a></li>
<li><em>GitLab</em> link <a href="https://gitlab.com/davmo/hprmisc" class="uri">https://gitlab.com/davmo/hprmisc</a>.</li>
</ul>
<!--
vim: syntax=markdown:ts=8:sw=4:ai:et:tw=78:fo=tcqn:fdm=marker
-->
</body>
</html>