Files
hpr_website/www/eps/hpr2824/hpr2824_full_shownotes.html

331 lines
23 KiB
HTML
Executable File
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<meta name="author" content="Dave Morriss">
<title>Gnu Awk - Part 15 (HPR Show 2824)</title>
<style type="text/css">code{white-space: pre;}</style>
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<link rel="stylesheet" href="http://hackerpublicradio.org/css/hpr.css">
</head>
<body id="home">
<div id="container" class="shadow">
<header>
<h1 class="title">Gnu Awk - Part 15 (HPR Show 2824)</h1>
<h2 class="subtitle">Redirection of input and output - part 2</h2>
<h2 class="author">Dave Morriss</h2>
<hr/>
</header>
<main id="maincontent">
<article>
<header>
<h1>Table of Contents</h1>
<nav id="TOC">
<ul>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#redirection-of-input">Redirection of input</a><ul>
<li><a href="#a-reminder-of-how-awk-processes-rules">A reminder of how <code>awk</code> processes rules</a></li>
<li><a href="#the-getline-command">The <code>getline</code> command</a><ul>
<li><a href="#simple-usage">Simple usage</a></li>
<li><a href="#reading-into-a-variable">Reading into a variable</a></li>
<li><a href="#reading-from-a-file">Reading from a file</a></li>
<li><a href="#reading-from-a-file-into-a-variable">Reading from a file into a variable</a></li>
<li><a href="#reading-from-a-pipe">Reading from a pipe</a></li>
<li><a href="#using-getline-with-a-coprocess">Using <code>getline</code> with a <em>coprocess</em></a></li>
</ul></li>
</ul></li>
<li><a href="#finale">Finale</a></li>
<li><a href="#links">Links</a></li>
</ul>
</nav>
</header>
<h2 id="introduction">Introduction</h2>
<p>This is the fifteenth episode of the “<a href="http://hackerpublicradio.org/series.php?id=94" title="Learning Awk">Learning Awk</a>” series which is being produced by <a href="http://hackerpublicradio.org/correspondents.php?hostid=300" title="b-yeezi">b-yeezi</a> and myself.</p>
<p>This is the second of a pair of episodes looking at <em>redirection</em> in Awk scripts.</p>
<p>In this episode I will spend some time looking at the <code>getline</code> command used for <em>explicit input</em> (as opposed to the usual <em>implicit</em> sort), often with redirection. The <code>getline</code> command is a complex subject which I will cover only relatively briefly. You are directed to the <a href="https://www.gnu.org/software/gawk/manual/gawk.html#Getline" title="Explicit Input with getline"><code>getline</code> section</a> of the GNU Awk Users Guide for the full details.</p>
<h2 id="redirection-of-input">Redirection of input</h2>
<h3 id="a-reminder-of-how-awk-processes-rules">A reminder of how <code>awk</code> processes rules</h3>
<p>We are going to look at how <code>awk</code>s normal input processing is changed in this episode, so I thought it might be a good idea to revisit how things work in the normal course of events.</p>
<p>The <code>awk</code> script reads a line from a file or standard input and then scans the (non <code>BEGIN</code>/<code>END</code>) rules that make up the script in the sequence they are listed. If a rule matches then it is run, and the process of matching continues until all rules have been checked. It is entirely possible that multiple rules will match, and they will all be executed if so, in the sequence they are encountered.</p>
<p>I have prepared a data file <a href="hpr2824_awk15_testdata1">awk15_testdata1</a> and a simple script <a href="hpr2824_awk15_ex1.awk">awk15_ex1.awk</a> to demonstrate this, both downloadable. The data is generated with the <code>lorem</code><a href="#fn1" class="footnote-ref" id="fnref1"><sup>1</sup></a> command thus:</p>
<pre><code>$ printf &quot;%s\n&quot; $(lorem -w 3) &gt; awk15_testdata1</code></pre>
<p>The two files are shown here:</p>
<pre><code>$ cat awk15_ex1.awk
#!/usr/bin/awk -f
# Downloadable example 1 for GNU Awk Part 15
{ print &quot;R1 ---&quot; }
{ print &quot;R2&quot;,$0 }
{ print &quot;R3&quot;,$0 }
$ cat awk15_testdata1
voluptatibus
quaerat
sunt</code></pre>
<p>Running the script gives the following result:</p>
<pre><code>$ ./awk15_ex1.awk awk15_testdata1
R1 ---
R2 voluptatibus
R3 voluptatibus
R1 ---
R2 quaerat
R3 quaerat
R1 ---
R2 sunt
R3 sunt
</code></pre>
<p>You can see that each rule is run for each line read from the data file. Rule 1 just prints some hyphens and does nothing with the data, but rules 2 and 3 print the line that was read. There is nothing to stop any of these rules from running.</p>
<h3 id="the-getline-command">The <code>getline</code> command</h3>
<p>So far we have encountered <code>awk</code> scripts which have read lines from a file or standard input and used them to match patterns which invoke various actions. That is different from the way many other programming languages handle input and is one of the great strengths of <code>awk</code>.</p>
<p>The <code>'getline'</code> command can be used to read lines explicitly outside the usual <em>read→pattern-match→action</em> cycle of <code>awk</code>.</p>
<h4 id="simple-usage">Simple usage</h4>
<p>The <code>'getline'</code> command used on its own (with no arguments) reads in the next line and splits it up into fields in the normal way. If used with normal input it affects how data is read and how rules are executed.</p>
<p>If <code>'getline'</code> finds a record it returns 1, and if it encounters the end of the file it returns 0. If theres an error while reading it returns -1 (and the variable <code>'ERRNO'</code> will contain a description of the error).</p>
<p>The following script (<a href="hpr2824_awk15_ex2.awk">awk15_ex2.awk</a>) is the same as the one just looked at <em>except</em> that it now calls <code>'getline'</code> inside rule 2.</p>
<pre><code>$ cat awk15_ex2.awk
#!/usr/bin/awk -f
# Downloadable example 2 for GNU Awk Part 15
{ print &quot;R1 ---&quot; }
{ print &quot;R2&quot;,$0; getline }
{ print &quot;R3&quot;,$0 }
</code></pre>
<p>Running the script gives the following result:</p>
<pre><code>$ ./awk15_ex2.awk awk15_testdata1
R1 ---
R2 voluptatibus
R3 quaerat
R1 ---
R2 sunt
R3 sunt
</code></pre>
<p>Here it can be see that rule 2 printed the first line read from the data file. The <code>'getline'</code> call then read the second line, replacing the first one, and rule 3 then printed it. The third line was then read in the normal way and there was nothing for the <code>'getline'</code> to read, so rules 2 and 3 both printed that last line.</p>
<p>The following downloadable example deals with a file of text where some lines have continuations. This is shown by the line ending with a hyphen. The script detects these lines and concatenates them with the next line. The data file (edited output from the <code>'lorem'</code> command again) is included with this show (<a href="hpr2824_awk15_testdata2">awk15_testdata2</a>) and is listed below.</p>
<pre><code>$ cat awk15_ex3.awk
#!/usr/bin/awk -f
# Downloadable example 3 for GNU Awk Part 15
{
if ($NF == &quot;-&quot;) {
$NF = &quot;&quot;
line = $0
getline
print line $0
}
else {
print $0
}
}
$ cat awk15_testdata2
Dolore eum corporis excepturi. -
Dolorum nulla qui nemo at earum beatae. Laborum
quo hic rem aspernatur accusamus -
praesentium. Impedit eveniet ut reprehenderit
deleniti aut placeat. -
Laudantium sapiente eaque dolor.</code></pre>
<p>Running the script (<a href="hpr2824_awk15_ex3.awk">awk15_ex3.awk</a>) gives the following result:</p>
<pre><code>$ ./awk15_ex3.awk awk15_testdata2
Dolore eum corporis excepturi. Dolorum nulla qui nemo at earum beatae. Laborum
quo hic rem aspernatur accusamus praesentium. Impedit eveniet ut reprehenderit
deleniti aut placeat. Laudantium sapiente eaque dolor.
</code></pre>
<p>If the last field (<code>'$NF'</code>) is a hyphen then its deleted and the line is saved. The <code>'getline'</code> call then re-fills <code>'$0'</code> and it is printed preceded by the saved line. Using <code>'getline'</code> makes this type of processing simpler.</p>
<p>Note that this script is too simple for real use since it doesnt deal with cases like the final <code>'-'</code> not being separated from the preceding word, and would fail if there was a hyphen ending the last line and so on.</p>
<p>See the more sophisticated example in the GNU Awk Users Guide (<a href="https://www.gnu.org/software/gawk/manual/gawk.html#Plain-Getline" title="Using getline with no arguments">4.10.1 Using <code>getline</code> with No Arguments</a>).</p>
<h4 id="reading-into-a-variable">Reading into a variable</h4>
<p>If <code>'getline var'</code> is used the next record is read from the main input stream into a variable (<em>var</em> in this example). The record is not split into fields, and variables like <code>'NF'</code> are not changed. Since the main input stream is being read, variables like <code>'NR'</code> (number of records) <strong>are</strong> changed.</p>
<h4 id="reading-from-a-file">Reading from a file</h4>
<p>This is another case of redirection:</p>
<pre><code>getline &lt; file</code></pre>
<p>Here <code>'file'</code> is a string expression that specifies the file name.</p>
<p>As mentioned earlier, the string expression used here can also be used to close the file with the <code>'close'</code> command, but has to be specified exactly. Saving the expression in a variable helps with this:</p>
<pre><code>input = path &quot;/&quot; filename
getline &lt; input
close(input)</code></pre>
<p>In this fragment it is assumed that <code>'path'</code> contains a file path, which is concatenated with a slash and a file name to produce the input specification.</p>
<h4 id="reading-from-a-file-into-a-variable">Reading from a file into a variable</h4>
<p>This is a concatenation of the previous two forms:</p>
<pre><code>getline var &lt; file</code></pre>
<p>As before <code>'file'</code> is a string expression that specifies the file name.</p>
<p>The following simple example (downloadable as part of this episode) deals with the file we generated in episode 14 <code>'fruit_names'</code>.</p>
<pre><code>$ cat awk15_ex4.awk
#!/usr/bin/awk -f
# Downloadable example 4 for GNU Awk Part 15
BEGIN {
if (ARGC != 2 ) {
print &quot;Needs a file name argument&quot; &gt; &quot;/dev/stderr&quot;
exit
}
data = ARGV[1]
while ( (getline line &lt; data) &gt; 0 )
print line
close(data)
}
</code></pre>
<p><small>Note: I did not explain <code>'ARGC'</code> and <code>'ARGV'</code> very clearly in the audio. As with other Unix-like systems, <code>'ARGC'</code> is a numeric variable containing the count of arguments given to the script when it is run from the command line. The arguments themselves are stored in the array <code>'ARGV'</code>, and element zero is always the name of command or script, so <code>'ARGC'</code> is one greater than expected because of this. </small></p>
<p>Running the script (<a href="hpr2824_awk15_ex4.awk">awk15_ex4.awk</a>) simply lists the file.</p>
<pre><code>$ ./awk15_ex4.awk fruit_names
apple
banana
strawberry
grape
apple
plum
kiwi
potato
pineapple
</code></pre>
<p>This is (another) trivial script presented as an example of how this form of <code>'getline'</code> can be used. Everything runs in the <code>'BEGIN'</code> rule. First a check is made to ensure the script has been given an argument (the input file), and if so the name is stored in the variable <code>'data'</code>. If not an error message is written and the script exits. If all is well a <code>'while'</code> loop runs, reading lines from the file and printing them. Finally the file is closed.</p>
<p>As a seasoned <code>awk</code> user by now you will have realised that the above could have been achieved with the <em>much</em> simpler script:</p>
<pre><code>$ awk &#39;{print}&#39; fruit_names</code></pre>
<h4 id="reading-from-a-pipe">Reading from a pipe</h4>
<p>Using <code>'command | getline'</code> or <code>'command | getline var'</code> reads from a command. In the first case the record is split into fields in the usual way, and in the second case it is stored in a variable.</p>
<p>The following simple example (<a href="hpr2824_awk15_ex5.awk">awk15_ex5.awk</a> downloadable as part of this episode) runs <code>'wget'</code> to read the HPR statistics page:</p>
<pre><code>$ cat awk15_ex5.awk
#!/usr/bin/awk -f
# Downloadable example 5 for GNU Awk Part 15
BEGIN {
cmd = &quot;wget -q http://hackerpublicradio.org/stats.php -O -&quot;
while ((cmd | getline) &gt; 0) {
if ($0 ~ /^Shows in Queue:/)
printf &quot;Queued shows on HPR: %d\n&quot;, $4
}
close(cmd)
}
</code></pre>
<p>The statistics include a line <code>'Shows in Queue: x'</code> which the script checks for. If it is found then the number at the end is extracted (as a normal <code>awk</code> field) and it is displayed with different text. Running the script gives the following result (at the time of generating these notes):</p>
<pre><code>$ ./awk15_ex5.awk
Queued shows on HPR: 27
</code></pre>
<p>The following downloadable example (<a href="hpr2824_awk15_ex6.awk">awk15_ex6.awk</a>) is essentially the same as the previous one except that it uses <code>'command | getline var'</code>:</p>
<pre><code>$ cat awk15_ex6.awk
#!/usr/bin/awk -f
# Downloadable example 6 for GNU Awk Part 15
BEGIN {
cmd = &quot;wget -q http://hackerpublicradio.org/stats.php -O -&quot;
while ((cmd | getline line) &gt; 0);
close(cmd)
split(line,fields,&quot;,&quot;)
printf &quot;Queued shows on HPR: %d\n&quot;, fields[10]
}
</code></pre>
<p>It loops through the lines returned, placing each in the variable <code>'line'</code> but doing nothing else. This means that the last line is left in the variable at the end. This contains comma-separated numbers which are separated into an array called <code>'fields'</code> using the <code>'split'</code> function. The 10<sup>th</sup> element contains the number of queued shows in this case.</p>
<h4 id="using-getline-with-a-coprocess">Using <code>getline</code> with a <em>coprocess</em></h4>
<p>This feature is provided by Gnu Awk and it allows a <em>coprocess</em> to be created which can be written to and read from. In the context of <code>print</code> and <code>printf</code> we send data to the <em>coprocess</em> with the <code>'|&amp;'</code> operator, as we have seen briefly already. Not surprisingly, <code>'getline'</code> can be used to read data back, either being split up into fields in the normal way, or being saved in a variable.</p>
<p>This subject is quite advanced and will not be discussed in much depth here. The GNU Awk Users Guide can be used to find out more about <a href="https://www.gnu.org/software/gawk/manual/gawk.html#Getline_002fCoprocess" title="Using getline with a coprocess"><code>getline</code> and <em>coprocesses</em></a> and about the whole subject of <a href="https://www.gnu.org/software/gawk/manual/gawk.html#Two_002dway-I_002fO" title="Two-way I/O">Two-Way I/O</a>.</p>
<p>The following downloadable example (<a href="hpr2824_awk15_ex7.awk">awk15_ex7.awk</a>) demonstrates a use for this feature. In this case we have a SQLite database. This is a copy of one that I use to keep track of HPR episodes on the Internet Archive and is called <code>awktest.db</code> in this incarnation. It is not included with the show.</p>
<p>The command to interact with the database is simply <code>'sqlite3 awktest.db'</code> and this command can be fed an SQL query of the form:</p>
<pre><code>select id,title from episodes where id = ?;</code></pre>
<p>Here the <code>'?'</code> represents a show number that is inserted into the query (actually in the form of a <code>'printf'</code> template using <code>'%d'</code>, as you will see). On the command line you can do this type of thing in this way:</p>
<pre><code>$ printf &#39;select id,title from episodes where id = %d;\n&#39; {2796..2800} | sqlite3 awktest.db
2796 IRS,Credit Freezes and Junk Mail Ohh My!
2797 Writing Web Game in Haskell - Simulation at high level
2798 Should Podcasters be Pirates ?
2799 building an arduino programmer
2800 My YouTube Subscriptions #6</code></pre>
<p>Here is the script:</p>
<pre><code>$ cat awk15_ex7.awk
#!/usr/bin/awk -f
# Downloadable example 7 for GNU Awk Part 15
BEGIN {
db = &quot;awktest.db&quot;
cmd = &quot;sqlite3 &quot; db
querytpl = &quot;select id,title from episodes where id = %d;\n&quot;
}
$0 ~ /^[0-9]+$/ {
printf querytpl,$0 |&amp; cmd
cmd |&amp; getline result
print result
}</code></pre>
<p>In the <code>'BEGIN'</code> rule the variables <code>'db'</code>, <code>'cmd'</code> and <code>'querytpl'</code> are initialised with the database name, the command to interact with it and a template to be used to construct a query.</p>
<p>The main rule looks for numbers which are to be used in the query. If a number is detected a <code>'printf'</code> command uses the format string in <code>'querytpl'</code>, and the number just received, to generate the query and pass it to the coprocess which is running the database command.</p>
<p>Then we use <code>'getline'</code> to read the result from the database into a variable called <code>'result'</code> which is printed. Be aware that this is a simple script which does not cater for errors of any kind.</p>
<p>There are various ways in which this script could be run. One number could be echoed into it, a string of multiple lines containing numbers could be passed in, as could a file of numbers. It could also read from the terminal and process numbers as they are typed in. We will demonstrate it running with a file of show numbers which is listed before the script is run (but not included in downloadable form):</p>
<pre><code>$ cat awk15_ex5.data
2761
2789
2773
$ ./awk15_ex7.awk awk15_ex5.data
2761 HPR Community News for February 2019
2789 Pacing In Storytelling
2773 Lead/Acid Battery Maintenance and Calcium Charge Voltage
</code></pre>
<p>If this subject is of interest you could refer to <a href="http://hackerpublicradio.org/correspondents.php?hostid=311" title="clacke">clacke</a>s HPR episode about coprocesses in Bash <a href=""><em>hpr2793 :: bash coproc: the future (2009) is here</em></a>.</p>
<h2 id="finale">Finale</h2>
<p>There is more that could be said about redirection of input and output, as well as about coprocesses. In fact there are many more subjects within Gnu Awk that could be examined. However, this series will soon be coming to an end.</p>
<p>My collaborator <a href="http://hackerpublicradio.org/correspondents.php?hostid=300" title="b-yeezi">b-yeezi</a> and I feel that the areas of Gnu Awk we have not covered in this series might be best left for you to investigate further if you have the need. We both feel that <code>awk</code> is a very useful tool in many respects, but does not stand comparison with more advanced scripting languages such as Python, Ruby and Perl. Perl in particular has borrowed many ideas from Awk but has extended them considerably. Ruby was designed with Perl in mind, and Python has innovated considerably too and is a very widely-used language. Even though Gnu Awk has advanced considerably since it was created it still shows its age and its usefulness is limited.</p>
<p>There are cases where quite complex scripts might be written in Awk, but the way most people seem to use it is as part of a pipeline or inside shell scripts of various sorts. Where you might write a complex script in Perl, Python or Ruby (for example), taking on a large project solely in Awk seems like a bad choice today.</p>
<p>Before we finish this series it is planned to produce one more episode number 16. In it <a href="http://hackerpublicradio.org/correspondents.php?hostid=300" title="b-yeezi">b-yeezi</a> and I will record a show together. At the time of writing there is no timescale, but we will endeavour to do this as soon as our schedules allow.</p>
<h2 id="links">Links</h2>
<ul>
<li><a href="https://www.gnu.org/software/gawk/manual/html_node/index.html"><em>GNU Awk Users Guide</em></a>
<ul>
<li><a href="https://www.gnu.org/software/gawk/manual/gawk.html#Getline">Explicit Input with <code>getline</code></a></li>
<li><a href="https://www.gnu.org/software/gawk/manual/gawk.html#Plain-Getline">Using <code>getline</code> with No Arguments</a></li>
<li><a href="https://www.gnu.org/software/gawk/manual/gawk.html#Getline_002fCoprocess">Using <code>getline</code> with a <em>coprocess</em></a></li>
<li><a href="https://www.gnu.org/software/gawk/manual/gawk.html#Getline-Notes">Getline notes</a></li>
<li><a href="https://www.gnu.org/software/gawk/manual/gawk.html#Two_002dway-I_002fO">Two-way I/O</a></li>
</ul></li>
</ul>
<!-- -->
<ul>
<li>Previous shows in this series on HPR:
<ul>
<li><a href="http://hackerpublicradio.org/eps/hpr2114"><em>Gnu Awk - Part 1</em></a> - episode 2114</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2129"><em>Gnu Awk - Part 2</em></a> - episode 2129</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2143"><em>Gnu Awk - Part 3</em></a> - episode 2143</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2163"><em>Gnu Awk - Part 4</em></a> - episode 2163</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2184"><em>Gnu Awk - Part 5</em></a> - episode 2184</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2238"><em>Gnu Awk - Part 6</em></a> - episode 2238</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2330"><em>Gnu Awk - Part 7</em></a> - episode 2330</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2438"><em>Gnu Awk - Part 8</em></a> - episode 2438</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2476"><em>Gnu Awk - Part 9</em></a> - episode 2476</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2526"><em>Gnu Awk - Part 10</em></a> - episode 2526</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2554"><em>Gnu Awk - Part 11</em></a> - episode 2554</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2610"><em>Gnu Awk - Part 12</em></a> - episode 2610</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2804"><em>Gnu Awk - Part 13</em></a> - episode 2804</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2816"><em>Gnu Awk - Part 14</em></a> - episode 2816</li>
</ul></li>
</ul>
<!-- -->
<ul>
<li>Resources:
<ul>
<li><a href="hpr2824_full_shownotes.epub">ePub version of these notes</a></li>
<li>Examples: <a href="hpr2824_awk15_testdata1">awk15_testdata1</a>, <a href="hpr2824_awk15_ex1.awk">awk15_ex1.awk</a>, <a href="hpr2824_awk15_ex2.awk">awk15_ex2.awk</a>, <a href="hpr2824_awk15_testdata2">awk15_testdata2</a>, <a href="hpr2824_awk15_ex3.awk">awk15_ex3.awk</a>, <a href="hpr2824_awk15_ex4.awk">awk15_ex4.awk</a>, <a href="hpr2824_awk15_ex5.awk">awk15_ex5.awk</a>, <a href="hpr2824_awk15_ex6.awk">awk15_ex6.awk</a>, <a href="hpr2824_awk15_ex7.awk">awk15_ex7.awk</a></li>
</ul></li>
</ul>
<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p>The <em>Lorem Ipsum</em> text here is generated by the <code>'lorem'</code> command which is installed with the Perl module called <code>Text::Lorem</code>. You can generate words, sentences or paragraphs of pseudo-Latin with it. The module exists as a Debian package called <code>'libtext-lorem-perl'</code> amongst others.<a href="#fnref1" class="footnote-back"></a></p></li>
</ol>
</section>
</article>
</main>
</div>
</body>
</html>