Files
hpr_website/www/eps/hpr2438/hpr2438_full_shownotes.html

341 lines
21 KiB
HTML
Executable File
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<meta name="author" content="Dave Morriss">
<title>Gnu Awk - Part 8 (HPR Show 2438)</title>
<style type="text/css">code{white-space: pre;}</style>
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
code > span.dt { color: #902000; } /* DataType */
code > span.dv { color: #40a070; } /* DecVal */
code > span.bn { color: #40a070; } /* BaseN */
code > span.fl { color: #40a070; } /* Float */
code > span.ch { color: #4070a0; } /* Char */
code > span.st { color: #4070a0; } /* String */
code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
code > span.ot { color: #007020; } /* Other */
code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
code > span.fu { color: #06287e; } /* Function */
code > span.er { color: #ff0000; font-weight: bold; } /* Error */
code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
code > span.cn { color: #880000; } /* Constant */
code > span.sc { color: #4070a0; } /* SpecialChar */
code > span.vs { color: #4070a0; } /* VerbatimString */
code > span.ss { color: #bb6688; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { color: #19177c; } /* Variable */
code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code > span.op { color: #666666; } /* Operator */
code > span.bu { } /* BuiltIn */
code > span.ex { } /* Extension */
code > span.pp { color: #bc7a00; } /* Preprocessor */
code > span.at { color: #7d9029; } /* Attribute */
code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
</style>
<link rel="stylesheet" href="http://hackerpublicradio.org/css/hpr.css">
</head>
<body id="home">
<div id="container" class="shadow">
<header>
<h1 class="title">Gnu Awk - Part 8 (HPR Show 2438)</h1>
<h2 class="author">Dave Morriss</h2>
<hr/>
</header>
<main id="maincontent">
<article>
<header>
<h1>Table of Contents</h1>
<nav id="TOC">
<ul>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#recap-of-the-last-episode">Recap of the last episode</a></li>
<li><a href="#some-more-statements">Some more statements</a><ul>
<li><a href="#the-switch-statement">The <code>switch</code> statement</a></li>
<li><a href="#the-break-statement">The <code>break</code> statement</a></li>
<li><a href="#the-continue-statement">The <code>continue</code> statement</a></li>
<li><a href="#the-next-statement">The <code>next</code> statement</a></li>
</ul></li>
<li><a href="#links">Links</a></li>
</ul>
</nav>
</header>
<h2 id="introduction">Introduction</h2>
<p>This is the eighth episode of the “<a href="http://hackerpublicradio.org/series.php?id=94" title="Learning Awk">Learning Awk</a>” series that <a href="http://hackerpublicradio.org/correspondents.php?hostid=300" title="Mr. Young">Mr. Young</a> and I are doing.</p>
<h2 id="recap-of-the-last-episode">Recap of the last episode</h2>
<ul>
<li><p>The <code>while</code> loop: tests a condition and performs commands <em>while</em> the test returns true</p></li>
<li><p>The <code>do while</code> loop: performs commands after the <code>do</code>, then tests afterwards, repeating the commands <em>while</em> the test is true.</p></li>
<li><p>The <code>for</code> loop (type 1): initialises a variable, performs a test, and increments the variable all together, performing commands while the test is true.</p></li>
<li><p>The <code>for</code> loop (type 2): sets a variable to successive indices of an array, preforming a collection of commands for each index.</p></li>
</ul>
<p>These types of loops were demonstrated by examples in the <a href="http://hackerpublicradio.org/eps/hpr2330" title="Awk Part 7">last episode</a>.</p>
<p>Note that the example for <code>do while</code> was an infinite loop (perhaps as a test of the alertness of the audience!):</p>
<pre><code>#!/usr/bin/awk -f
BEGIN {
i=2;
do {
print &quot;The square of &quot;, i, &quot; is &quot;, i*i;
i = i + 1
}
while (i != 2)
exit;
}</code></pre>
<p>The condition in the <code>while</code> is always true:</p>
<pre><code>The square of 2 is 4
The square of 3 is 9
The square of 4 is 16
The square of 5 is 25
The square of 6 is 36
The square of 7 is 49
The square of 8 is 64
The square of 9 is 81
The square of 10 is 100
...
The square of 1269630 is 1611960336900
The square of 1269631 is 1611962876161
The square of 1269632 is 1611965415424
The square of 1269633 is 1611967954689
The square of 1269634 is 1611970493956
...</code></pre>
<p>The variable <code>i</code> is set to 2, the <code>print</code> is executed, then <code>i</code> is set to 3. The test “<code>i != 2</code>” is true and will be <em>ad infinitum</em>.</p>
<h2 id="some-more-statements">Some more statements</h2>
<p>We will come back to loops later in this episode, but first this seems like a good point to describe another statement: the <code>switch</code> statement.</p>
<h3 id="the-switch-statement">The <code>switch</code> statement</h3>
<p>This is specific to <code>gawk</code>, and can be disabled if non-GNU <code>awk</code>-compatibility is required. The <code>switch</code> statement in <code>gawk</code> is very similar to the one in <code>C</code> and many other languages.</p>
<p>The layout of the <code>switch</code> statement is as follows:</p>
<p><code>switch</code> (<em>expression</em>) {<br/>    <code>case</code> <em>value</em>:<br/>        <em>case-body</em><br/>    <code>default</code>:<br/>        <em>default-body</em><br/> }</p>
<p>The <code>expression</code> part is an expression, which returns a numeric or string result. The <code>value</code> part after the <code>case</code> is a numeric or string constant or a regular expression.</p>
<p>The <code>expression</code> is evaluated and the result matched against the case <code>value</code>s in turn. If there is a match the <code>case-body</code> statements are executed. If there is no match the <code>default-body</code> statements are executed.</p>
<p>The following example is included as one of the files associated with this show, called <code>switch_example.awk</code>:</p>
<div class="sourceCode"><pre class="sourceCode awk"><code class="sourceCode awk"><span class="co">#!/usr/bin/awk -f</span>
<span class="co">#</span>
<span class="co"># Example of the use of &#39;switch&#39; in GNU Awk.</span>
<span class="co">#</span>
<span class="co"># Should be run against the data file &#39;file1.txt&#39; included with the second</span>
<span class="co"># show in the series: http://hackerpublicradio.org/eps/hpr2129/file1.txt</span>
<span class="co">#</span>
<span class="bu">NR</span> <span class="op">&gt;</span> <span class="dv">1</span> <span class="kw">{</span>
<span class="kw">printf</span> <span class="st">&quot;The %s is classified as: &quot;</span><span class="op">,</span><span class="dt">$1</span>
switch (<span class="dt">$1</span>) <span class="kw">{</span>
case <span class="st">&quot;apple&quot;</span><span class="op">:</span>
<span class="kw">print</span> <span class="st">&quot;a fruit, pome&quot;</span>
<span class="kw">break</span>
case <span class="st">&quot;banana&quot;</span><span class="op">:</span>
case <span class="st">&quot;grape&quot;</span><span class="op">:</span>
case <span class="st">&quot;kiwi&quot;</span><span class="op">:</span>
<span class="kw">print</span> <span class="st">&quot;a fruit, berry&quot;</span>
<span class="kw">break</span>
case <span class="st">&quot;strawberry&quot;</span><span class="op">:</span>
<span class="kw">print</span> <span class="st">&quot;not a true fruit, pseudocarp&quot;</span>
<span class="kw">break</span>
case <span class="st">&quot;plum&quot;</span><span class="op">:</span>
<span class="kw">print</span> <span class="st">&quot;a fruit, drupe&quot;</span>
<span class="kw">break</span>
case <span class="st">&quot;pineapple&quot;</span><span class="op">:</span>
<span class="kw">print</span> <span class="st">&quot;a fruit, fused berries (syncarp)&quot;</span>
<span class="kw">break</span>
case <span class="st">&quot;potato&quot;</span><span class="op">:</span>
<span class="kw">print</span> <span class="st">&quot;a vegetable, tuber&quot;</span>
<span class="kw">break</span>
default<span class="op">:</span>
<span class="kw">print</span> <span class="st">&quot;[unclassified]&quot;</span>
<span class="kw">}</span>
<span class="kw">}</span></code></pre></div>
<p>The result of running this script against the “fruit” file presented in show 2129 is the following (<code>switch_example.out</code>): <small></p>
<pre><code>The apple is classified as: a fruit, pome
The banana is classified as: a fruit, berry
The strawberry is classified as: not a true fruit, pseudocarp
The grape is classified as: a fruit, berry
The apple is classified as: a fruit, pome
The plum is classified as: a fruit, drupe
The kiwi is classified as: a fruit, berry
The potato is classified as: a vegetable, tuber
The pineapple is classified as: a fruit, fused berries (syncarp)</code></pre>
<p></small></p>
<p>What this simple example does is:</p>
<ul>
<li>It ignores the first line of the file (a header)</li>
<li>It prints the first field (the name of a fruit - mostly) in the string “The %s is classified as:”. There is no newline so whatever is printed next is appended to the line.</li>
<li>It uses the first field in a <code>switch</code> statement. Each <code>case</code> is an exact match with the contents of the field. If there is a match a <code>print</code> statement is used to print out the Botanical classification. If there are no matches then the <code>default</code> instance would print “[unclassified]”, but that doesnt happen in this example.</li>
<li>All <code>print</code> statements are followed by <code>break</code>. If this hadnt been there the next <code>case</code> would be executed and so forth. This can be desirable in some instances. See the next section for a discussion of <code>break</code>.</li>
<li>Note that banana, grape and kiwi are all Botanically classified as a berry, so there are three <code>case</code> parts associated with one <code>print</code>.</li>
</ul>
<h3 id="the-break-statement">The <code>break</code> statement</h3>
<p>This statement is mainly for “breaking out of” a <code>for</code>, <code>while</code> or <code>do-while</code> loop, though, as we have seen it can interrupt the flow of execution in a <code>switch</code> statement also. Outside of these statements <code>break</code> has no effect.</p>
<p>In a loop a <code>break</code> statement is often used where its not possible to determine the number of iterations of the loop beforehand. Invoking <code>break</code> completely terminates the enclosing loop (relevant when there are nested loops, or loops within loops).</p>
<p>The following example (available for download as <code>divisor.awk</code>) is from the Gnu Awk manual and shows a method of finding the smallest divisor:</p>
<div class="sourceCode"><pre class="sourceCode awk"><code class="sourceCode awk"><span class="co">#!/usr/bin/awk -f</span>
<span class="co"># find smallest divisor of num</span>
<span class="kw">{</span>
num <span class="op">=</span> <span class="dt">$1</span>
<span class="co">#</span>
<span class="co"># Make an infinite loop using the for loop</span>
<span class="co">#</span>
<span class="kw">for</span> (divisor <span class="op">=</span> <span class="dv">2</span><span class="op">;</span> <span class="op">;</span> divisor<span class="op">++</span>) <span class="kw">{</span>
<span class="co">#</span>
<span class="co"># If the number is divisible by &#39;divisor&#39; then we&#39;re done</span>
<span class="co">#</span>
<span class="kw">if</span> (num <span class="op">%</span> divisor <span class="op">==</span> <span class="dv">0</span>) <span class="kw">{</span>
<span class="kw">printf</span> <span class="st">&quot;Smallest divisor of %d is %d</span><span class="sc">\n</span><span class="st">&quot;</span><span class="op">,</span> num<span class="op">,</span> divisor
<span class="kw">break</span>
<span class="kw">}</span>
<span class="co">#</span>
<span class="co"># If the value of &#39;divisor&#39; has got too large the number has no</span>
<span class="co"># divisors and is therefore a prime number</span>
<span class="co">#</span>
<span class="kw">if</span> (divisor <span class="op">*</span> divisor <span class="op">&gt;</span> num) <span class="kw">{</span>
<span class="kw">printf</span> <span class="st">&quot;%d is prime</span><span class="sc">\n</span><span class="st">&quot;</span><span class="op">,</span> num
<span class="kw">break</span>
<span class="kw">}</span>
<span class="kw">}</span>
<span class="kw">}</span></code></pre></div>
<p>I have added some comments to this script to (hopefully) make it clearer.</p>
<p>Running this in a pipeline with the number presented to it as shown results in the following type of output (<code>divisor.out</code>):</p>
<pre><code>$ echo 67 | ./divisor.awk
67 is prime
$ echo 69 | ./divisor.awk
Smallest divisor of 69 is 3</code></pre>
<h3 id="the-continue-statement">The <code>continue</code> statement</h3>
<p>This is similar to <code>break</code> in that it is used a <code>for</code>, <code>while</code> or <code>do-while</code> loop. It is <strong>not</strong> relevant in <code>switch</code> statements however.</p>
<p>Invoking <code>continue</code> skips the rest of the enclosing loop and begins the next cycle.</p>
<p>The following example (available for download as <code>continue_example.awk</code>) is from the Gnu Awk manual and demonstrates a possible use of <code>continue</code>:</p>
<div class="sourceCode"><pre class="sourceCode awk"><code class="sourceCode awk"><span class="co">#!/usr/bin/awk -f</span>
<span class="co">#</span>
<span class="co"># Loop, printing numbers from 0-20, except for 5</span>
<span class="co"># (From the GNU Awk User&#39;s Guide)</span>
<span class="co">#</span>
<span class="cf">BEGIN</span> <span class="kw">{</span>
<span class="kw">for</span> (x <span class="op">=</span> <span class="dv">0</span><span class="op">;</span> x <span class="op">&lt;=</span> <span class="dv">20</span><span class="op">;</span> x<span class="op">++</span>) <span class="kw">{</span>
<span class="kw">if</span> (x <span class="op">==</span> <span class="dv">5</span>)
<span class="kw">continue</span>
<span class="kw">printf</span> <span class="st">&quot;%d &quot;</span><span class="op">,</span> x
<span class="kw">}</span>
<span class="kw">print</span> <span class="st">&quot;&quot;</span>
<span class="kw">}</span></code></pre></div>
<h3 id="the-next-statement">The <code>next</code> statement</h3>
<p>This statement is not related to loops in the same way as <code>break</code> and <code>continue</code> but to the main record processing cycle of Awk. The <code>next</code> statement causes Awk to stop processing the current input record and go on to the next one.</p>
<p>As we know from earlier episodes in this series, Awk reads records from its input stream and applies rules to them. The <code>next</code> statement stops the execution of further rules for the current record, and moves on to the next one.</p>
<p>The following example (available for download as <code>next_example.awk</code>) is demonstrates a use of <code>next</code>:</p>
<div class="sourceCode"><pre class="sourceCode awk"><code class="sourceCode awk"><span class="co">#!/usr/bin/awk -f</span>
<span class="co">#</span>
<span class="co"># Ignore the header</span>
<span class="co">#</span>
<span class="bu">NR</span> <span class="op">==</span> <span class="dv">1</span> <span class="kw">{</span> <span class="kw">next</span> <span class="kw">}</span>
<span class="co">#</span>
<span class="co"># If field 2 (colour) is less than 6 characters then save it with its line</span>
<span class="co"># number and skip it</span>
<span class="co">#</span>
<span class="fu">length</span>(<span class="dt">$2</span>) <span class="op">&lt;</span> <span class="dv">6</span> <span class="kw">{</span>
skip[<span class="bu">NR</span>] <span class="op">=</span> <span class="dt">$0</span>
<span class="kw">next</span>
<span class="kw">}</span>
<span class="co">#</span>
<span class="co"># It&#39;s not the header and the colour name is &gt; 6 characters, so print the line</span>
<span class="co">#</span>
<span class="kw">{</span>
<span class="kw">print</span>
<span class="kw">}</span>
<span class="co">#</span>
<span class="co"># At the end show what was skipped</span>
<span class="co">#</span>
<span class="cf">END</span> <span class="kw">{</span>
<span class="kw">printf</span> <span class="st">&quot;</span><span class="sc">\n</span><span class="st">Skipped:</span><span class="sc">\n</span><span class="st">&quot;</span>
<span class="kw">for</span> (n <span class="kw">in</span> skip)
<span class="kw">print</span> n<span class="st">&quot;: &quot;</span>skip[n]
<span class="kw">}</span></code></pre></div>
<ul>
<li>The script uses <code>next</code> in the first rule to avoid the first line of the file (a header).</li>
<li>The second rule skips lines where the colour name is less than 6 characters long, but it also saves that line in an array called <code>skip</code> using the line number as the key (index).</li>
<li>The third rule prints anything it sees, but it will not be invoked if either rule 1 or rule 2 cause it to be skipped.</li>
<li>Finally, and <code>END</code> rule prints the contents of the array.</li>
</ul>
<p>Running this with the file we have used many times before, <code>file1.txt</code>, results in the following output (<code>next_example.out</code>): <small></p>
<pre><code>$ next_example.awk file1.txt
banana yellow 6
grape purple 10
plum purple 2
pineapple yellow 5
Skipped:
2: apple red 4
4: strawberry red 3
6: apple green 8
8: kiwi brown 4
9: potato brown 9</code></pre>
<p></small></p>
<h2 id="links">Links</h2>
<ul>
<li><a href="https://www.gnu.org/software/gawk/manual/html_node/index.html"><em>GNU Awk Users Guide</em></a></li>
<li>Previous shows in this series on HPR:
<ul>
<li><a href="http://hackerpublicradio.org/eps/hpr2114"><em>Gnu Awk - Part 1</em></a> - episode 2114</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2129"><em>Gnu Awk - Part 2</em></a> - episode 2129</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2143"><em>Gnu Awk - Part 3</em></a> - episode 2143</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2163"><em>Gnu Awk - Part 4</em></a> - episode 2163</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2184"><em>Gnu Awk - Part 5</em></a> - episode 2184</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2238"><em>Gnu Awk - Part 6</em></a> - episode 2238</li>
<li><a href="http://hackerpublicradio.org/eps/hpr2330"><em>Gnu Awk - Part 7</em></a> - episode 2330</li>
</ul></li>
<li>Resources:
<ul>
<li><a href="hpr2438_full_shownotes.epub">ePub version of these notes</a></li>
<li><a href="hpr2438_full_shownotes.pdf">PDF version of these notes</a></li>
<li>Demonstration of the <code>switch</code> statement:
<ul>
<li>Script: <a href="hpr2438_switch_example.awk">switch_example.awk</a></li>
<li>Output: <a href="hpr2438_switch_example.out">switch_example.out</a></li>
</ul></li>
<li>Demonstration of the <code>break</code> statement:
<ul>
<li>Script: <a href="hpr2438_divisor.awk">divisor.awk</a></li>
<li>Output: <a href="hpr2438_divisor.out">divisor.out</a></li>
</ul></li>
<li>Demonstration of the <code>continue</code> statement:
<ul>
<li>Script: <a href="hpr2438_continue_example.awk">continue_example.awk</a></li>
</ul></li>
<li>Demonstration of the <code>next</code> statement:
<ul>
<li>Script: <a href="hpr2438_next_example.awk">next_example.awk</a></li>
<li>Output: <a href="hpr2438_next_example.out">next_example.out</a></li>
</ul></li>
</ul></li>
</ul>
<!--
vim: syntax=markdown:ts=8:sw=4:ai:et:tw=78:fo=tcqn:fdm=marker
-->
</article>
</main>
</div>
</body>
</html>