www/eps/hpr2699/hpr2699_full_shownotes.html

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <meta name="generator" content="pandoc">
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
  <meta name="author" content="Dave Morriss">
  <title>Bash Tips - 15 (HPR Show 2699)</title>
  <style type="text/css">code{white-space: pre;}</style>
  <!--[if lt IE 9]>
    <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
  <![endif]-->
  <link rel="stylesheet" href="http://hackerpublicradio.org/css/hpr.css">
</head>

<body id="home">
<div id="container" class="shadow">
<header>
<h1 class="title">Bash Tips - 15 (HPR Show 2699)</h1>
<h2 class="author">Dave Morriss</h2>
<hr/>
</header>

<main id="maincontent">
<article>
<header>
<h1>Table of Contents</h1>
<nav id="TOC">
<ul>
<li><a href="#pitfalls-for-the-unwary-bash-loop-user">Pitfalls for the unwary Bash loop user</a></li>
<li><a href="#feeding-a-loop-from-a-pipe">Feeding a loop from a pipe</a><ul>
<li><a href="#what-is-a-pipeline">What is a pipeline?</a></li>
<li><a href="#piping-into-a-loop">Piping into a loop</a></li>
<li><a href="#avoiding-the-pipe-pitfalls">Avoiding the pipe pitfalls</a></li>
<li><a href="#using-find-instead-of-ls">Using <code>find</code> instead of <code>ls</code></a></li>
<li><a href="#using-extglob-enabled-extended-patterns">Using <code>extglob</code>-enabled extended patterns</a></li>
</ul></li>
<li><a href="#future-topics">Future topics</a></li>
<li><a href="#links">Links</a></li>
</ul>
</nav>
</header>
<h2 id="pitfalls-for-the-unwary-bash-loop-user">Pitfalls for the unwary Bash loop user</h2>
<p>This is the fifteenth episode covering useful tips for Bash users. In the last episode we looked at the <code>'for'</code> loop, and prior to that we looked at <code>'while'</code> and <code>'until'</code> loops. In this one I want to look at some of the loop-related issues that can trip up the unwary user.</p>
<p>Loops in Bash are extremely useful, and they are not at all difficult to use in their basic forms. However, there are some perhaps less than obvious issues that can result in unexpected behaviour.</p>
<h2 id="feeding-a-loop-from-a-pipe">Feeding a loop from a pipe</h2>
<h3 id="what-is-a-pipeline">What is a pipeline?</h3>
<p>Bash contains a feature known as a <em>pipeline</em> which is a sequence of one or more commands separated by a vertical bar (<code>'|'</code>) control operator where the output of one command is connected to the input of another. We will spend some time on this subject (and related areas) later in this series of Bash Tips, but for now I want to explain enough for this particular episode.</p>
<p>The series of commands and <code>'|'</code> control characters is called a <em>pipeline</em>. The connection of one command to another is called a <em>pipe</em>.</p>
<p>A typical example is:</p>
<pre><code>$ echo &quot;Hello World&quot; | sed -e &#39;s/^.\+$/\U&amp;/&#39;
HELLO WORLD</code></pre>
<p>Here the string <code>&quot;Hello World&quot;</code> is piped to <code>'sed'</code> which replaces all characters on the line by their upper case versions.</p>
<p>What is happening here is that the <code>'echo'</code> command writes the arguments it has been given on the standard output channel and the pipe passes this data to the standard input channel for <code>'sed'</code> to consume, and it in turn writes to its standard output (the terminal) and the transformed version is displayed.</p>
<p>One of the key characteristics of the pipeline is that each command is executed in its own <em>subshell</em>. This is a separate process within the operating system which inherits settings from the parent shell (process) that created (spawned) it, but it cannot affect the parent environment. In particular, environmental variables cannot be passed back to the parent.</p>
<p>We’ll look at pipelines in more detail in later shows in the Bash Tips sub-series.</p>
<h3 id="piping-into-a-loop">Piping into a loop</h3>
<p>One of the common scenarios where data is piped to a loop is where the output from the <code>'ls'</code> command is being processed. For example:</p>
<pre><code>ls *.mp3 | while read name; do echo $name; done</code></pre>
<p>Although not really a pipeline issue it is a bad idea to use <code>'ls'</code> like this because the output it produces is meant to be displayed, and there are often settings, aliases or defaults which cause <code>'ls'</code> to add extra characters and colour codes to the file names.<a href="#fn1" class="footnote-ref" id="fnref1"><sup>1</sup></a></p>
<p>This type of pipeline <u>can</u> work if you ensure that you are using plain <code>'ls'</code> and not an alias, as shown<a href="#fn2" class="footnote-ref" id="fnref2"><sup>2</sup></a>:</p>
<pre><code>$ unalias ls
$ ls *.mp3 | while read name; do echo &quot;$name&quot;; done
astonish.mp3
birettas.mp3
dizzying.mp3
fabled.mp3
neckline.mp3
overtone.mp3
salamis.mp3
skunked.mp3
sniffing.mp3
theorize.mp3</code></pre>
<p>Regardless of this the advice it usually to avoid the use of <code>'ls'</code> in this context.</p>
<p>(Note that this example does nothing useful since <code>'ls'</code> itself can list files. More realistically, instead of the <code>'echo'</code> such a loop might run a program or script on each of these files to do some useful work.)</p>
<p>Problems arise as a consequence of the loop running in a subshell when you want to work with variables in the loop. For example, you might want to count the files:</p>
<pre><code>$ count=0
$ ls *.mp3 | while read name; do ((count++)); done
$ echo &quot;$count&quot;
0</code></pre>
<p>The count is zero – why?</p>
<p>The answer is that the <code>'count'</code> variable being incremented in the loop is a copy of the one set to zero before the pipeline. Its value is being incremented in the subshell running the <code>'while'</code> command, but it is discarded when the pipeline ends. Bash cannot <em>pass back</em> the value from the subshell.</p>
<p>A similar case was highlighted by <a href="http://hackerpublicradio.org/correspondents.php?hostid=311" title="HPR Host &#39;clacke&#39;"><code>clacke</code></a> in the comments to show <a href="http://hackerpublicradio.org/eps/hpr2651" title="HPR Community News for September 2018">2651</a> (Community News for September 2018):</p>
<pre><code>items=()
produce_items | while read item; items+=( &quot;$item&quot; ); done
do_stuff_with &quot;${items[@]}&quot;</code></pre>
<p>Here, <code>'items'</code> is an array (a subject we’ll be looking at soon in a forthcoming episode). It is assumed that <code>'produce_items'</code> is a program or function that generates individual strings or numbers which are read by the <code>'read'</code> in the loop and appended to the array. Then <code>'do_stuff_with'</code> deals with all of the elements of the array.</p>
<p>This is what <code>clacke</code> says about it:</p>
<blockquote>
<p><em><code>&quot;items&quot;</code> gets updated just fine, in a subshell, and then after the pipe has finished executing, execution continues in the parent shell where the array is still empty.</em></p>
</blockquote>
<p>What looks like instances of the same array outside and inside the loop are in fact separate arrays.</p>
<h3 id="avoiding-the-pipe-pitfalls">Avoiding the pipe pitfalls</h3>
<p>We looked at the subject of <a href="https://www.gnu.org/software/bash/manual/bash.html#Process-Substitution" title="Bash Process Substitution">process substitution</a> in the Bash Tips series in show 6, episode <a href="http://hackerpublicradio.org/eps/hpr2045" title="Some other Bash tips">2045</a> (and also briefly considered the pipe problem which we’ve just examined in detail).</p>
<p>In that show we saw that the loop could be provided with data for a <code>'read'</code> command by such a process:</p>
<pre><code>$ unalias ls
$ count=0
$ while read name; do ((count++)); done &lt; &lt;(ls *.mp3)
$ echo &quot;$count&quot;
10</code></pre>
<p>Here the <code>'while'</code> loop runs in the parent process reading lines from the separate process containing the <code>'ls'</code> command. This time the count is correct because we’re not counting in the subshell of a pipeline and expecting the result to be available to the parent process.<a href="#fn3" class="footnote-ref" id="fnref3"><sup>3</sup></a></p>
<p>The example <code>clacke</code> mentioned could also be remodelled as:</p>
<pre><code>items=()
while read item; items+=( &quot;$item&quot; ); done &lt; &lt;(produce_items):
do_stuff_with &quot;${items[@]}&quot;</code></pre>
<p>The downloadable script in <a href="hpr2699_bash15_ex1.sh">bash15_ex1.sh</a> demonstrates a simplified version of the above example using the (now probably infamous) <code>/usr/share/dict/words</code>:</p>
<pre><code>$ cat bash15_ex1.sh
#!/bin/bash

#-------------------------------------------------------------------------------
# Example 1 for Bash Tips show 15 - a working example similar to clacke&#39;s
# problem example in the comments to HPR episode 2651
#-------------------------------------------------------------------------------

#
# Initialise an array
#
items=()

#
# Populate the array with random words
#
while read -r item; do
    items+=( &quot;$item&quot; )
done &lt; &lt;(grep -E -v &quot;&#39;s$&quot; /usr/share/dict/words | shuf -n 5)

#
# Print the array with word numbers
#
for ((i = 0, j = 1; i &lt; ${#items[@]}; i++, j++)); do
    echo &quot;$j: ${items[$i]}&quot;
done</code></pre>
<p>Invoking the script results in a list of random words:</p>
<pre><code>1: thruways
2: crimsoning
3: destructing
4: cadaver
5: pocketknives
</code></pre>
<p>It’s also possible to do something similar using a <code>'for'</code> loop as in the following downloadable example <a href="hpr2699_bash15_ex2.sh">bash15_ex2.sh</a>:</p>
<pre><code>$ cat bash15_ex2.sh
#!/bin/bash

#-------------------------------------------------------------------------------
# Example 2 for Bash Tips show 15 - you can also use a &#39;for&#39; loop to load an
# array
#-------------------------------------------------------------------------------

#
# Initialise an array
#
items=()

#
# Populate the array with random words
#
for word in $(grep -E -v &quot;&#39;s$&quot; /usr/share/dict/words | shuf -n 5); do
    items+=( &quot;$word&quot; )
done

#
# Print the array with word numbers
#
for ((i = 0, j = 1; i &lt; ${#items[@]}; i++, j++)); do
    echo &quot;$j: ${items[$i]}&quot;
done</code></pre>
<p>I will leave you to try this one out; the result is the same as example 1 (with different words).</p>
<h3 id="using-find-instead-of-ls">Using <code>find</code> instead of <code>ls</code></h3>
<p>Another improvement to the earlier file counting example would be to to avoid the use of <code>'ls'</code> and instead use <a href="https://www.gnu.org/software/findutils/manual/html_mono/find.html" title="Finding Files"><code>'find'</code></a>. This command (and a number of others in the <em>GNU Findutils</em> manual) warrants a whole show or set of shows because it is so full of features, but for now we’ll just look at how it can be used in this context.</p>
<p>The typical way of using <code>'find'</code> is like this:</p>
<pre><code>find directory options</code></pre>
<p>For example, to find all files in the current directory with a suffix of <code>'.mp3'</code> use:</p>
<pre><code>find . -name &#39;*.mp3&#39; -print</code></pre>
<p>The <code>'-name'</code> option defines a <em>glob</em> pattern to match the files we need returned. This must be quoted otherwise Bash will expand it on the command line, and we want <code>'find'</code> to do that. The <code>'-print'</code> option causes the file to be reported. In this case the path of the file (relative to the nominated or defaulted directory) is also reported.</p>
<pre><code>$ find . -name &#39;*.mp3&#39; -print
./theorize.mp3
./neckline.mp3
./sniffing.mp3
./fabled.mp3
./birettas.mp3
./salamis.mp3
./overtone.mp3
./dizzying.mp3
./skunked.mp3
./astonish.mp3</code></pre>
<p>Unlike <code>'ls'</code> the <code>'find'</code> command does not sort the files.</p>
<p>One other difference from <code>'ls'</code> is that <code>'find'</code> will search any subdirectories as well. The following example makes a sub-directory called <code>'subdir'</code> and creates a file within it. The <code>'find'</code> command limits the search to files that begin with <code>'a'</code> or <code>'i'</code> for brevity:</p>
<pre><code>$ mkdir subdir
$ touch subdir/ignorethisfile.mp3
$ find . -name &#39;[ai]*.mp3&#39; -print
./subdir/ignorethisfile.mp3
./astonish.mp3</code></pre>
<p>Another option <code>'-maxdepth'</code> can be used to limit searches to the current directory (this option must precede <code>'-name'</code>):</p>
<pre><code>$ find . -maxdepth 1 -name &#39;[ai]*.mp3&#39; -print
./astonish.mp3</code></pre>
<p>So, using <code>'find'</code> rather than <code>'ls'</code> the earlier example might be:</p>
<pre><code>$ count=0
$ while read name; do ((count++)); done &lt; &lt;(find . -maxdepth 1 -name &quot;*.mp3&quot;)
$ echo &quot;$count&quot;
10</code></pre>
<h3 id="using-extglob-enabled-extended-patterns">Using <code>extglob</code>-enabled extended patterns</h3>
<p>Finally, let’s look at how the patterns available when the <code>'extglob'</code> option is turned on can help to find files in a loop.</p>
<p>Since doing show <a href="http://hackerpublicradio.org/eps/hpr2293" title="More supplementary Bash tips">2293</a>, where I looked at extended pattern matching features and the <code>'extglob'</code> option enabled by the <code>'shopt'</code> command, I have been using this capability a lot. As I mentioned in the show, my Debian system has <code>'extglob'</code> enabled by default as part of the <em>Bash completion</em> extension. If your operating system does not do this you can set the option as described in show 2293.</p>
<p>The following example uses the files mentioned above where the sub-directory created earlier is still present. It uses a <code>'for'</code> loop with the pattern <code>'+(i|sa|t)*.mp3'</code> which selects files beginning with <code>'i'</code>, with <code>'sa'</code> and with <code>'t'</code>. Note that the second case contains two letters which is not something we can specify with simple <em>glob</em> patterns:</p>
<pre><code>$ for f in +(i|sa|t)*.mp3; do echo &quot;$f&quot;; done
salamis.mp3
theorize.mp3</code></pre>
<p>No files beginning with <code>'i'</code> were returned; but the only one that there is exists in the sub-directory, so we know that, unlike <code>'find'</code> in its default form, this search does not visit the directory.</p>
<p>Note also that the files are sorted this time and do not have the directory <code>'./'</code> on the front.</p>
<p>This is a good way to process files in a loop in some circumstances. For more complex requirements the big guns of the <code>'find'</code> command are often needed.</p>
<h2 id="future-topics">Future topics</h2>
<p>There are other issues related to those we have examined here that need to be looked at in future episodes. For example:</p>
<ul>
<li>A guide to arrays in Bash; types of arrays, how to initialise them and how to access them</li>
<li>More about the <code>'find'</code> command</li>
<li>The features of the <code>'read'</code> command</li>
</ul>
<p>We will cover these topics in upcoming episodes of <em>Bash Tips</em>.</p>
<h2 id="links">Links</h2>
<ul>
<li><a href="https://www.gnu.org/software/bash/manual/bash.html">"<em>GNU BASH Reference Manual</em>"</a>
<ul>
<li><a href="https://www.gnu.org/software/bash/manual/bash.html#Process-Substitution">"Bash Process Substitution"</a></li>
</ul></li>
<li><p><a href="https://www.gnu.org/software/findutils/manual/html_mono/find.html">"<em>GNU Findutils</em>"</a></p></li>
<li><p><a href="http://hackerpublicradio.org/series.php?id=42">HPR series: <em>Bash Scripting</em></a></p></li>
<li>Previous episodes under the heading <em>Bash Tips</em>:
<ol>
<li><a href="http://hackerpublicradio.org/eps/hpr1648">HPR episode 1648 "<em>Bash parameter manipulation</em>"</a></li>
<li><a href="http://hackerpublicradio.org/eps/hpr1843">HPR episode 1843 "<em>Some Bash tips</em>"</a></li>
<li><a href="http://hackerpublicradio.org/eps/hpr1884">HPR episode 1884 "<em>Some more Bash tips</em>"</a></li>
<li><a href="http://hackerpublicradio.org/eps/hpr1903">HPR episode 1903 "<em>Some further Bash tips</em>"</a></li>
<li><a href="http://hackerpublicradio.org/eps/hpr1951">HPR episode 1951 "<em>Some additional Bash tips</em>"</a></li>
<li><a href="http://hackerpublicradio.org/eps/hpr2045">HPR episode 2045 "<em>Some other Bash tips</em>"</a></li>
<li><a href="http://hackerpublicradio.org/eps/hpr2278">HPR episode 2278 "<em>Some supplementary Bash tips</em>"</a></li>
<li><a href="http://hackerpublicradio.org/eps/hpr2293">HPR episode 2293 "<em>More supplementary Bash tips</em>"</a></li>
<li><a href="http://hackerpublicradio.org/eps/hpr2639">HPR episode 2639 "<em>Some ancillary Bash tips - 9</em>"</a></li>
<li><a href="http://hackerpublicradio.org/eps/hpr2649">HPR episode 2649 "<em>More ancillary Bash tips - 10</em>"</a></li>
<li><a href="http://hackerpublicradio.org/eps/hpr2659">HPR episode 2659 "<em>Further ancillary Bash tips - 11</em>"</a></li>
<li><a href="http://hackerpublicradio.org/eps/hpr2669">HPR episode 2669 "<em>Additional ancillary Bash tips - 12</em>"</a></li>
<li><a href="http://hackerpublicradio.org/eps/hpr2679">HPR episode 2679 "<em>Extra ancillary Bash tips - 13</em>"</a></li>
<li><a href="http://hackerpublicradio.org/eps/hpr2689">HPR episode 2689 "<em>Bash Tips - 14</em>"</a></li>
</ol></li>
</ul>
<!--- -->
<ul>
<li>Resources:
<ul>
<li>Examples: <a href="hpr2699_bash15_ex1.sh">bash15_ex1.sh</a>, <a href="hpr2699_bash15_ex2.sh">bash15_ex2.sh</a></li>
</ul></li>
</ul>
<section class="footnotes">
<hr />
<ol>
<li id="fn1"><p>Also, Unix and Linux filenames can contain a wide range of characters which lead to complications which <code>'ls'</code> doesn’t help with.<a href="#fnref1" class="footnote-back">↩</a></p></li>
<li id="fn2"><p>In case it is of interest, a group of 10 dummy <code>*.mp3</code> files were generated for testing here. This was done by the following loop:</p>
<pre><code>for w in $(grep -E -v &quot;&#39;s$&quot; /usr/share/dict/words | grep -E &#39;^.{3,8}$&#39; | shuf -n 10); do
touch ${w}.mp3
done</code></pre>
<p>Inside the command substitution the first <code>'grep'</code> removes all possessive forms of words. The second one matches words between 3 and 8 characters in length, and <code>'shuf'</code> then extracts 10 random words from all of that. The <code>'touch'</code> command creates an empty file with the suffix <code>'.mp3'</code> using each word as the filename.<a href="#fnref2" class="footnote-back">↩</a></p></li>
<li id="fn3"><p>It didn’t occur to me at the time, but the process substitution would be the better place to unalias <code>'ls'</code>. Using <code>&lt;(unalias ls; ls *.mp3)</code> means the alias is only removed in the sub-process, not the main login process.<a href="#fnref3" class="footnote-back">↩</a></p></li>
</ol>
</section>
</article>
</main>
</div>
</body>
</html>