406 lines
33 KiB
HTML
Executable File
406 lines
33 KiB
HTML
Executable File
<!DOCTYPE html>
|
||
<html>
|
||
<head>
|
||
<meta charset="utf-8">
|
||
<meta name="generator" content="pandoc">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
|
||
<meta name="author" content="Dave Morriss">
|
||
<title>More supplementary Bash tips (HPR Show 2293)</title>
|
||
<style type="text/css">code{white-space: pre;}</style>
|
||
<!--[if lt IE 9]>
|
||
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
|
||
<![endif]-->
|
||
<link rel="stylesheet" href="http://hackerpublicradio.org/css/hpr.css">
|
||
</head>
|
||
|
||
<body id="home">
|
||
<div id="container" class="shadow">
|
||
<header>
|
||
<h1 class="title">More supplementary Bash tips (HPR Show 2293)</h1>
|
||
<h2 class="subtitle">Pathname expansion; part 2 of 2</h2>
|
||
<h2 class="author">Dave Morriss</h2>
|
||
<hr/>
|
||
</header>
|
||
|
||
<main id="maincontent">
|
||
<article>
|
||
<header>
|
||
<h1>Table of Contents</h1>
|
||
<nav id="TOC">
|
||
<ul>
|
||
<li><a href="#expansion">Expansion</a><ul>
|
||
<li><a href="#pathname-expansion---continued">Pathname expansion - continued</a></li>
|
||
<li><a href="#notes">Notes</a></li>
|
||
<li><a href="#examples">Examples</a><ul>
|
||
<li><a href="#example-1---match-zero-or-one-occurrence">Example 1 - “match zero or one occurrence”</a></li>
|
||
<li><a href="#example-2---match-zero-or-more-occurrences">Example 2 - “match zero or more occurrences”</a></li>
|
||
<li><a href="#example-3---match-one-or-more-occurrences">Example 3 - “match one or more occurrences”</a></li>
|
||
<li><a href="#example-4---match-one-of-the-given-patterns">Example 4 - “match one of the given patterns”</a></li>
|
||
<li><a href="#example-5---match-anything-but">Example 5 - “match anything but”</a></li>
|
||
<li><a href="#example-6---use-of-patterns-elsewhere">Example 6 - use of patterns elsewhere</a></li>
|
||
</ul></li>
|
||
</ul></li>
|
||
<li><a href="#conclusion">Conclusion</a></li>
|
||
<li><a href="#links">Links</a></li>
|
||
<li><a href="#manual-page-extracts">Manual Page Extracts</a><ul>
|
||
<li><a href="#expansion-1">EXPANSION</a><ul>
|
||
<li><a href="#brace-expansion">Brace Expansion</a></li>
|
||
<li><a href="#tilde-expansion">Tilde Expansion</a></li>
|
||
<li><a href="#parameter-expansion">Parameter Expansion</a></li>
|
||
<li><a href="#command-substitution">Command Substitution</a></li>
|
||
<li><a href="#arithmetic-expansion">Arithmetic Expansion</a></li>
|
||
<li><a href="#process-substitution">Process Substitution</a></li>
|
||
<li><a href="#word-splitting">Word Splitting</a></li>
|
||
<li><a href="#pathname-expansion">Pathname Expansion</a><ul>
|
||
<li><a href="#pattern-matching">Pattern Matching</a></li>
|
||
</ul></li>
|
||
</ul></li>
|
||
</ul></li>
|
||
</ul>
|
||
</nav>
|
||
</header>
|
||
<h2 id="expansion">Expansion</h2>
|
||
<p>As we saw in the last episode <a href="http://hackerpublicradio.org/eps/hpr2278" title="Some supplementary Bash tips">2278</a> (and others in this sub-series) there are eight types of expansion applied to the command line in the following order:</p>
|
||
<ul>
|
||
<li>Brace expansion (we looked at this subject in episode <a href="http://hackerpublicradio.org/eps/hpr1884" title="Some more Bash tips">1884</a>)</li>
|
||
<li>Tilde expansion (seen in episode <a href="http://hackerpublicradio.org/eps/hpr1903" title="Some further Bash tips">1903</a>)</li>
|
||
<li>Parameter and variable expansion (this was covered in episode <a href="http://hackerpublicradio.org/eps/hpr1648" title="Bash parameter manipulation">1648</a>)</li>
|
||
<li>Command substitution (seen in episode <a href="http://hackerpublicradio.org/eps/hpr1903" title="Some further Bash tips">1903</a>)</li>
|
||
<li>Arithmetic expansion (seen in episode <a href="http://hackerpublicradio.org/eps/hpr1951" title="Some additional Bash tips">1951</a>)</li>
|
||
<li>Process substitution (seen in episode <a href="http://hackerpublicradio.org/eps/hpr2045" title="Some other Bash tips">2045</a>)</li>
|
||
<li>Word splitting (seen in episode <a href="http://hackerpublicradio.org/eps/hpr2045" title="Some other Bash tips">2045</a>)</li>
|
||
<li>Pathname expansion (the previous episode <a href="http://hackerpublicradio.org/eps/hpr2278" title="Some supplementary Bash tips">2278</a> and this one)</li>
|
||
</ul>
|
||
<p>This is the last topic in the (sub-) series about expansion in Bash.</p>
|
||
<p>In this episode we will look at extended pattern matching as also defined in the “<a href="#manual-page-extracts">Manual Page Extracts</a>” section at the end of the long notes.</p>
|
||
<h3 id="pathname-expansion---continued">Pathname expansion - continued</h3>
|
||
<p>As we saw in the last episode (<a href="http://hackerpublicradio.org/eps/hpr2278" title="Some supplementary Bash tips">2278</a>), if we enable the option ‘<code>extglob</code>’ using the ‘<code>shopt</code>’ command we enable a number of additional <em>extended pattern matching</em> features<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a>.</p>
|
||
<p>In the following description, a <em>pattern-list</em> is a list of one or more patterns separated by a ‘<b>|</b>’. Composite patterns may be formed using one or more of the following sub-patterns:</p>
|
||
<dl>
|
||
<dt><b>?(</b><em>pattern-list</em><b>)</b></dt>
|
||
<dd><p>Matches zero or one occurrence of the given patterns</p>
|
||
</dd>
|
||
<dt><b>*(</b><em>pattern-list</em><b>)</b></dt>
|
||
<dd><p>Matches zero or more occurrences of the given patterns</p>
|
||
</dd>
|
||
<dt><b>+(</b><em>pattern-list</em><b>)</b></dt>
|
||
<dd><p>Matches one or more occurrences of the given patterns</p>
|
||
</dd>
|
||
<dt><b>@(</b><em>pattern-list</em><b>)</b></dt>
|
||
<dd><p>Matches one of the given patterns</p>
|
||
</dd>
|
||
<dt><b>!(</b><em>pattern-list</em><b>)</b></dt>
|
||
<dd><p>Matches anything except one of the given patterns</p>
|
||
</dd>
|
||
</dl>
|
||
<h3 id="notes">Notes</h3>
|
||
<ol>
|
||
<li>This is a fairly new feature</li>
|
||
<li>It does not seem to be very well documented</li>
|
||
<li>There are some similarities to regular expressions</li>
|
||
</ol>
|
||
<p><b>Warning!</b>: It is not explained explicitly in the Bash manpage but these patterns are applied to each <b>filename</b>. So the pattern:</p>
|
||
<pre><code>a?(b)c</code></pre>
|
||
<p>matches a file which begins with ‘<code>a</code>’, is followed by zero or one instance of letter ‘<code>b</code>’ and ends with ‘<code>c</code>’. This means it can match only the filenames ‘<code>abc</code>’ and ‘<code>ac</code>’. This is explained more completely below.</p>
|
||
<p>Some of the confusion this can cause can be seen in the Stack Exchange questions listed in the <a href="#links">Links</a> section below.</p>
|
||
<h3 id="examples">Examples</h3>
|
||
<p>It turns out that the 33,800 files generated in the last episode are not particularly useful when demonstrating how this feature works. I had not investigated extended glob patterns when I created them unfortunately.</p>
|
||
<p>Although these files will be used for these examples we will create some more directories and files of a simpler structure, and will turn on ‘<code>extglob</code>’ (assuming it’s not on by default - see the footnote):</p>
|
||
<pre><code>$ cd Pathname_expansion
|
||
$ mkdir test
|
||
$ touch test/{abbc,abc,ac,axc}
|
||
$ touch test/{x,xx,xxx}.dat
|
||
$ ls -1 test/
|
||
abbc
|
||
abc
|
||
ac
|
||
axc
|
||
x.dat
|
||
xx.dat
|
||
xxx.dat
|
||
$ shopt -s extglob</code></pre>
|
||
<p>(Some examples here are derived from the Stack Exchange articles mentioned earlier and listed in the <a href="#links">Links</a> section.)</p>
|
||
<h4 id="example-1---match-zero-or-one-occurrence">Example 1 - “match zero or one occurrence”</h4>
|
||
<p><b>?(</b><em>pattern-list</em><b>)</b></p>
|
||
<p>In the first demonstration we are asking for zero or one occurrence of ‘<code>b</code>’ between the ‘<code>a</code>’ and ‘<code>b</code>’. We get the files ‘<code>abc</code>’ and ‘<code>ac</code>’ because they match the zero and one cases.</p>
|
||
<pre><code>$ echo test/a?(b)c
|
||
test/abc test/ac</code></pre>
|
||
<p>Next we have asked for zero or one letter ‘<code>b</code>’ or letter ‘<code>x</code>’ in the centre, so in this case we also see ‘<code>axc</code>’.</p>
|
||
<pre><code>$ echo test/a?(b|x)c
|
||
test/abc test/ac test/axc</code></pre>
|
||
<p>Note that the <em>pattern list</em> has become a little more complex, since we have an alternative character.</p>
|
||
<p>Now we will move to a more complex example using the large collection of test files.</p>
|
||
<p>Here we are searching though the directories that start with a vowel for all files that have ‘a’ or ‘b’ as the second letter and ‘01’, ‘10’ or ‘11’ as the next two digits, <b>or</b> files whose second letter is ‘a’ or ‘b’ followed by the digits ‘50’:</p>
|
||
<pre><code>$ ls -w 50 -x [aeiou]/?(?[ab][01][01]*|?[ab]50*)
|
||
a/aa01.txt a/aa10.txt a/aa11.txt a/aa50.txt
|
||
a/ab01.txt a/ab10.txt a/ab11.txt a/ab50.txt
|
||
e/ea01.txt e/ea10.txt e/ea11.txt e/ea50.txt
|
||
e/eb01.txt e/eb10.txt e/eb11.txt e/eb50.txt
|
||
i/ia01.txt i/ia10.txt i/ia11.txt i/ia50.txt
|
||
i/ib01.txt i/ib10.txt i/ib11.txt i/ib50.txt
|
||
o/oa01.txt o/oa10.txt o/oa11.txt o/oa50.txt
|
||
o/ob01.txt o/ob10.txt o/ob11.txt o/ob50.txt
|
||
u/ua01.txt u/ua10.txt u/ua11.txt u/ua50.txt
|
||
u/ub01.txt u/ub10.txt u/ub11.txt u/ub50.txt</code></pre>
|
||
<p><small><em>The ‘<code>-l 50</code>’ option to ‘<code>ls</code>’ limits the output width for better readability in these notes. We also use ‘<code>-x</code>’ which lists files in row order rather than the default column order so you can read left to right.</em></small></p>
|
||
<p>There are some important points to understand in this example:</p>
|
||
<ul>
|
||
<li><p>Although we are using the “match zero or one occurrence” sub-pattern there are <b>no</b> cases where there are zero matches. The main benefit we are getting from this feature is that we can use alternation (vertical bar).</p></li>
|
||
<li><p>Use of the ‘<code>*</code>’ wildcard in the sub-pattern avoids the need to be explicit about the ‘<code>.txt</code>’ suffix on the files. The same effect would be achieved with the following:</p>
|
||
<pre><code>[aeiou]/?(?[ab][01][01]|?[ab]50).txt</code></pre></li>
|
||
<li><p>Adding a ‘<code>*</code>’ wildcard to the <b>end</b> will result in the sub-expression having no effect, and all files in the directories will be returned. That is because the wildcard matches everything! The difference is shown below:</p>
|
||
<pre><code>$ echo [aeiou]/?(?[ab][01][01]*|?[ab]50*) | wc -w
|
||
40
|
||
$ echo [aeiou]/?(?[ab][01][01]*|?[ab]*)* | wc -w
|
||
6500
|
||
$ echo [aeiou]/* | wc -w
|
||
6500</code></pre></li>
|
||
</ul>
|
||
<!-- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- -->
|
||
<h4 id="example-2---match-zero-or-more-occurrences">Example 2 - “match zero or more occurrences”</h4>
|
||
<p><b>*(</b><em>pattern-list</em><b>)</b></p>
|
||
<p>In the next demonstration we are asking for zero or more occurrences of ‘<code>b</code>’ between the ‘<code>a</code>’ and ‘<code>b</code>’. We get the files ‘<code>abbc</code>’, ‘<code>abc</code>’ and ‘<code>ac</code>’ because they match the zero and more than zero cases.</p>
|
||
<pre><code>$ echo test/a*(b)c
|
||
test/abbc test/abc test/ac</code></pre>
|
||
<p>Not surprisingly, adding ‘<code>x</code>’ to the list in the sub-expression also returns ‘<code>axc</code>’.</p>
|
||
<pre><code>$ echo test/a*(b|x)c
|
||
test/abbc test/abc test/ac test/axc</code></pre>
|
||
<p>There are files in the ‘<code>test</code>’ directory with one to three ‘<code>x</code>’ characters at the start of their names. We can search for them as follows:</p>
|
||
<pre><code>$ echo test/*(x).dat
|
||
test/x.dat test/xx.dat test/xxx.dat</code></pre>
|
||
<p>There is no instance of zero ‘<code>x</code>’es followed by ’<code>.dat</code>‘ but a file ’<code>.dat</code>‘ would match, though it would only be shown if ’<code>dotglob</code>’ was set.</p>
|
||
<p>Applying this sub-pattern to the large collection of test files from the last episode we might want to find all files in directory ‘<code>a</code>’ which begin with two ’<code>a</code>’s and numbers in the range 1-3:</p>
|
||
<pre><code>$ ls -w 50 -x a/*(a)*([1-3]).txt
|
||
a/aa11.txt a/aa12.txt a/aa13.txt a/aa21.txt
|
||
a/aa22.txt a/aa23.txt a/aa31.txt a/aa32.txt
|
||
a/aa33.txt</code></pre>
|
||
<p>You might expect to get back only ‘<code>a/aa11.txt</code>’, ‘<code>a/aa22.txt</code>’ and ‘<code>a/aa22.txt</code>’ but what is actually returned matches ‘<code>aa</code>’ followed by two numbers, each in the range 1-3. This is the same as:</p>
|
||
<pre><code>$ ls -w 50 -x a/aa[1-3][1-3].txt
|
||
a/aa11.txt a/aa12.txt a/aa13.txt a/aa21.txt
|
||
a/aa22.txt a/aa23.txt a/aa31.txt a/aa32.txt
|
||
a/aa33.txt</code></pre>
|
||
<p>Just to demonstrate how these sub-patterns work, the following example returns the three files in the first column above:</p>
|
||
<pre><code>$ ls -1 a/?(*(a)*(1)|*(a)*(2)|*(a)*(3)).txt
|
||
a/aa11.txt
|
||
a/aa22.txt
|
||
a/aa33.txt</code></pre>
|
||
<p>However, it does not seem very practical!</p>
|
||
<!-- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- -->
|
||
<h4 id="example-3---match-one-or-more-occurrences">Example 3 - “match one or more occurrences”</h4>
|
||
<p><b>+(</b><em>pattern-list</em><b>)</b></p>
|
||
<p>The next demonstration requests one or more instances of the letter ‘<code>b</code>’ between the other letters and returns the files ‘<code>abbc</code>’ (two ‘<code>b</code>’s) and ’<code>abc</code>‘ (one’<code>b</code>’):</p>
|
||
<pre><code>$ echo test/a+(b)c
|
||
test/abbc test/abc</code></pre>
|
||
<p>As before, adding ‘<code>x</code>’ as an alternative adds file ‘<code>axc</code>’ to the list:</p>
|
||
<pre><code>$ echo test/a+(b|x)c
|
||
test/abbc test/abc test/axc</code></pre>
|
||
<p>The following example looks in directories ‘<code>a</code>’ and ‘<code>b</code>’ for files that begin with an ‘<code>a</code>’ or a ‘<code>b</code>’ and end with ‘<code>01.txt</code>’:</p>
|
||
<pre><code>$ ls -w 50 -x [ab]/*(a|b)*01.txt
|
||
a/aa01.txt a/ab01.txt a/ac01.txt a/ad01.txt
|
||
a/ae01.txt a/af01.txt a/ag01.txt a/ah01.txt
|
||
a/ai01.txt a/aj01.txt a/ak01.txt a/al01.txt
|
||
a/am01.txt a/an01.txt a/ao01.txt a/ap01.txt
|
||
a/aq01.txt a/ar01.txt a/as01.txt a/at01.txt
|
||
a/au01.txt a/av01.txt a/aw01.txt a/ax01.txt
|
||
a/ay01.txt a/az01.txt b/ba01.txt b/bb01.txt
|
||
b/bc01.txt b/bd01.txt b/be01.txt b/bf01.txt
|
||
b/bg01.txt b/bh01.txt b/bi01.txt b/bj01.txt
|
||
b/bk01.txt b/bl01.txt b/bm01.txt b/bn01.txt
|
||
b/bo01.txt b/bp01.txt b/bq01.txt b/br01.txt
|
||
b/bs01.txt b/bt01.txt b/bu01.txt b/bv01.txt
|
||
b/bw01.txt b/bx01.txt b/by01.txt b/bz01.txt</code></pre>
|
||
<p>This could just as well have been achieved with:</p>
|
||
<pre><code>$ ls -w 50 -x [ab]/[ab]*01.txt</code></pre>
|
||
<!-- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- -->
|
||
<h4 id="example-4---match-one-of-the-given-patterns">Example 4 - “match one of the given patterns”</h4>
|
||
<p><b>@(</b><em>pattern-list</em><b>)</b></p>
|
||
<p>This demonstration requests one instance of the letter ‘<code>b</code>’ between the other letters and returns one file ‘<code>abc</code>’:</p>
|
||
<pre><code>$ echo test/a@(b)c
|
||
test/abc</code></pre>
|
||
<p>Again, adding ‘<code>x</code>’ as an alternative adds file ‘<code>axc</code>’ to the list:</p>
|
||
<pre><code>$ echo test/a@(b|x)c
|
||
test/abc test/axc</code></pre>
|
||
<p>To make some better search targets I ran the following commands:</p>
|
||
<pre><code>$ mkdir words
|
||
$ while read word; do
|
||
> word=${word%[^a-zA-Z]*}
|
||
> word=${word,,}
|
||
> touch words/$word
|
||
> done < <(shuf -n100 /usr/share/dict/words)</code></pre>
|
||
<ul>
|
||
<li>A directory ‘<code>words</code>’ was created</li>
|
||
<li>A ‘<code>while</code>’ loop was started to read data into a variable called ‘<code>word</code>’ (this starts a multi-line command so the prompt changes to ‘<code>></code>’ until the entire loop is typed in)</li>
|
||
<li>The ‘<code>word</code>’ variable is stripped of all non alphabetic characters at the end to remove trailing apostrophes or ‘<code>'s</code>’ sequences.</li>
|
||
<li>The ‘<code>word</code>’ variable is converted to lower case</li>
|
||
<li>The ‘<code>touch</code>’ command makes an empty file named whatever variable ‘<code>word</code>’ contains</li>
|
||
<li>The loop ends with ‘<code>done</code>’ and the loop is “fed” with data by a process substitution (see show <a href="http://hackerpublicradio.org/eps/hpr2045" title="Some other Bash tips">2045</a>). This runs the ‘<code>shuf</code>’ command to return 100 random words from ‘<code>/usr/share/dict/words</code>’.</li>
|
||
</ul>
|
||
<p>If you try this you will get different words.</p>
|
||
<p>In my case I used the following command to return words containing one of ‘<code>ee</code>’, ‘<code>oo</code>’, ‘<code>th</code>’ and ‘<code>ss</code>’:</p>
|
||
<pre><code>$ ls -w 60 words/*@(ee|oo|th|ss)*
|
||
words/commandeering words/katherine words/woolly
|
||
words/eighteenths words/slathering
|
||
words/ingress words/thoughtlessly</code></pre>
|
||
<!-- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- -->
|
||
<h4 id="example-5---match-anything-but">Example 5 - “match anything but”</h4>
|
||
<p><b>!(</b><em>pattern-list</em><b>)</b></p>
|
||
<p>In the final demonstration we look for file names which do not contain a ‘<code>b</code>’ between the ‘<code>a</code>’ and ‘<code>c</code>’:</p>
|
||
<pre><code>$ echo test/a!(b)c
|
||
test/abbc test/ac test/axc</code></pre>
|
||
<p>Notice how this list includes ‘<code>abbc</code>’ because there are multiple ’<code>b</code>’s between the other letters and the pattern specified one.</p>
|
||
<p>If we replace the ‘<code>b</code>’ in the pattern with a further pattern which means “one or more” then we do not get ‘<code>abbc</code>’:</p>
|
||
<pre><code>$ echo test/a!(+(b))c
|
||
test/ac test/axc</code></pre>
|
||
<p>This again demonstrates that patterns can contain patterns!</p>
|
||
<p>As a more complex example to show how this sub-pattern works we might try searching for files thus:</p>
|
||
<pre><code>$ ls -w 50 -x a/a!([c-z]*).txt
|
||
a/aa01.txt a/aa02.txt a/aa03.txt a/aa04.txt
|
||
a/aa05.txt a/aa06.txt a/aa07.txt a/aa08.txt
|
||
...
|
||
a/aa49.txt a/aa50.txt a/ab01.txt a/ab02.txt
|
||
a/ab03.txt a/ab04.txt a/ab05.txt a/ab06.txt
|
||
...
|
||
a/ab47.txt a/ab48.txt a/ab49.txt a/ab50.txt</code></pre>
|
||
<p>Here we’re looking for files in the directory ‘<code>a</code>’ where the first letter is ‘<code>a</code>’ (they all are) and the second letter is <strong>not</strong> in the range ‘<code>[c-z]</code>’. The output here shows a subset of what was returned.</p>
|
||
<p>Let’s finish with an example searching the directory of words. This time we have a pattern within a pattern. The inner pattern is a <b>@(</b><em>pattern-list</em><b>)</b> which contains a list of pairs of letters, mostly identical. This pattern is surrounded by asterisk wildcards. The effect of this is to select all words that contain one of the letter pairs.</p>
|
||
<p>This is enclosed in a <b>!(</b><em>pattern-list</em><b>)</b> pattern which negates the inner selection making it match words which <b>do not</b> contain the pairs of letters.</p>
|
||
<pre><code>$ ls -w 70 words/!(*@(bb|cc|dd|ee|gg|ll|oo|pp|tt|th|ss)*)
|
||
words/adela words/falconers words/protectively
|
||
words/adversest words/frankie words/quits
|
||
words/ails words/gnomes words/rashes
|
||
words/airline words/haring words/recites
|
||
words/alton words/indianapolis words/rescuers
|
||
...
|
||
words/dickson words/pitchfork words/weightlifting
|
||
words/elitist words/pomade words/whales
|
||
words/enactment words/prepackaging words/writings
|
||
words/épées words/preview words/yens
|
||
words/exit words/profusion words/yodel</code></pre>
|
||
<p>The result is 81 of the 100 words in the directory.</p>
|
||
<h4 id="example-6---use-of-patterns-elsewhere">Example 6 - use of patterns elsewhere</h4>
|
||
<p>We have seen at various times in this series that <em>glob</em>-style patterns can be used in other contexts. One instance was when manipulating Bash parameters (<a href="http://hackerpublicradio.org/eps/hpr1648" title="Bash parameter manipulation">show 1648</a>):</p>
|
||
<pre><code>$ x="aaabbbccc"
|
||
$ echo ${x/a/-}
|
||
-aabbbccc</code></pre>
|
||
<p>Here we created a variable ‘<code>x</code>’ and used <em>pattern substitution</em> to replace the first ‘<code>a</code>’ with a hyphen.</p>
|
||
<pre><code>$ echo ${x/+(a)/-}
|
||
-bbbccc</code></pre>
|
||
<p>This time we have used the ‘<code>+(a)</code>’ pattern to match one or more ’<code>a</code>’s. Note that the matched group is replaced by <b>one</b> hyphen. If we want to replace each of the letters with a hyphen then we’d use an alternative type <em>pattern substitution</em> that works through the entire string:</p>
|
||
<pre><code>$ echo ${x//a/-}
|
||
---bbbccc</code></pre>
|
||
<p>This time we didn’t want to match a group of letters, so didn’t use extended pattern matching.</p>
|
||
<p>Another place where extended pattern matching can be used is in ‘<code>case</code>’ statements. I will not go into further detail about this here. However, there is a Stack Exchange question about it listed in the <a href="#links">Links</a> section.</p>
|
||
<p>To summarise: anywhere where a filename-type pattern match is allowed then <em>extended</em> patterns can be used (assuming ‘<code>extglob</code>’ is set).</p>
|
||
<h2 id="conclusion">Conclusion</h2>
|
||
<p>Until I started investigating these extended pattern matching features of Bash I did not think I would find them particularly useful. It also took me quite a while to understand how they worked.</p>
|
||
<p>Now I actually find them quite powerful and will use them in future in scripts I write.</p>
|
||
<p>Bash extended patterns are similar in concept to Regular Expressions, although they are written totally differently. For example, the Bash pattern: ‘<code>hot*(dog)</code>’ means the same as the RE: ‘<code>hot(dog)*</code>’. They both match the words “hot” and “hotdog”. The difference is that ‘<code>*</code>’ in a RE means that the preceding expression may match zero or more times, and can follow many sorts of expressions. The extended pattern is not quite so general.</p>
|
||
<p>I hope this episode has helped you understand these Bash features and that you also find them useful.</p>
|
||
<h2 id="links">Links</h2>
|
||
<!-- \_ -->
|
||
<ul>
|
||
<li>Previous shows in this series
|
||
<ol>
|
||
<li><a href="http://hackerpublicradio.org/eps/hpr1648">HPR episode 1648 “<em>Bash parameter manipulation</em>”</a></li>
|
||
<li><a href="http://hackerpublicradio.org/eps/hpr1843">HPR episode 1843 “<em>Some Bash tips</em>”</a></li>
|
||
<li><a href="http://hackerpublicradio.org/eps/hpr1884">HPR episode 1884 “<em>Some more Bash tips</em>”</a></li>
|
||
<li><a href="http://hackerpublicradio.org/eps/hpr1903">HPR episode 1903 “<em>Some further Bash tips</em>”</a></li>
|
||
<li><a href="http://hackerpublicradio.org/eps/hpr1951">HPR episode 1951 “<em>Some additional Bash tips</em>”</a></li>
|
||
<li><a href="http://hackerpublicradio.org/eps/hpr2045">HPR episode 2045 “<em>Some other Bash tips</em>”</a></li>
|
||
<li><a href="http://hackerpublicradio.org/eps/hpr2278">HPR episode 2278 “<em>Some supplementary Bash tips</em>”</a></li>
|
||
</ol></li>
|
||
</ul>
|
||
<!-- - -->
|
||
<ul>
|
||
<li>Other HPR series referenced:
|
||
<ul>
|
||
<li><a href="http://hackerpublicradio.org/series.php?id=90">“<em>Learning sed</em>”</a> series on HPR</li>
|
||
<li><a href="http://hackerpublicradio.org/series.php?id=94">“<em>Learning Awk</em>”</a> series on HPR</li>
|
||
</ul></li>
|
||
</ul>
|
||
<!-- - -->
|
||
<ul>
|
||
<li>Wikipedia article on <a href="https://en.wikipedia.org/wiki/Glob_%28programming%29"><em>glob patterns</em></a></li>
|
||
<li>Advanced Bash Scripting Guide: <a href="http://www.tldp.org/LDP/abs/html/globbingref.html">“<em>Globbing</em>”</a></li>
|
||
<li>Article on <a href="http://mywiki.wooledge.org/glob"><em>Greg’s Wiki</em></a> entitled “<em>Globs</em>”</li>
|
||
<li>Questions about <em>Bash extended globbing</em> on Stack Exchange:
|
||
<ul>
|
||
<li>Question 1: <a href="https://unix.stackexchange.com/questions/168769/bash-extended-globbing">How to list just one file with ‘<code>ls</code>’</a></li>
|
||
<li>Question 2: <a href="https://unix.stackexchange.com/questions/203386/extended-glob-what-is-the-difference-in-syntax-between-list-list-list">What is the difference between…</a></li>
|
||
<li>Question 3: <a href="https://stackoverflow.com/questions/4554718/patterns-in-case-statement-in-bash-scripting">Patterns in case statements</a></li>
|
||
</ul></li>
|
||
</ul>
|
||
<hr />
|
||
<!-- {{{ -->
|
||
<h1 id="manual-page-extracts">Manual Page Extracts</h1>
|
||
<h2 id="expansion-1">EXPANSION</h2>
|
||
<p>Expansion is performed on the command line after it has been split into words. There are seven kinds of expansion performed: <em>brace expansion</em>, <em>tilde expansion</em>, <em>parameter and variable expansion</em>, <em>command substitution</em>, <em>arithmetic expansion</em>, <em>word splitting</em>, and <em>pathname expansion</em>.</p>
|
||
<p>The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and pathname expansion.</p>
|
||
<p>On systems that can support it, there is an additional expansion available: <em>process substitution</em>. This is performed at the same time as tilde, parameter, variable, and arithmetic expansion and command substitution.</p>
|
||
<p>Only brace expansion, word splitting, and pathname expansion can change the number of words of the expansion; other expansions expand a single word to a single word. The only exceptions to this are the expansions of “<strong>$@</strong>” and “<strong>${name[@]}</strong>” as explained above (see <strong>PARAMETERS</strong>).</p>
|
||
<h3 id="brace-expansion">Brace Expansion</h3>
|
||
<p>See the notes for HPR show <a href="http://hackerpublicradio.org/eps/hpr1884" title="Some more Bash tips">1884</a>.</p>
|
||
<h3 id="tilde-expansion">Tilde Expansion</h3>
|
||
<p>See the notes for HPR show <a href="http://hackerpublicradio.org/eps/hpr1903" title="Some further Bash tips">1903</a>.</p>
|
||
<h3 id="parameter-expansion">Parameter Expansion</h3>
|
||
<p>See the notes for HPR show <a href="http://hackerpublicradio.org/eps/hpr1648" title="Bash parameter manipulation">1648</a>.</p>
|
||
<h3 id="command-substitution">Command Substitution</h3>
|
||
<p>See the notes for HPR show <a href="http://hackerpublicradio.org/eps/hpr1903" title="Some further Bash tips">1903</a>.</p>
|
||
<h3 id="arithmetic-expansion">Arithmetic Expansion</h3>
|
||
<p>See the notes for HPR show <a href="http://hackerpublicradio.org/eps/hpr1951" title="Some additional Bash tips">1951</a>.</p>
|
||
<h3 id="process-substitution">Process Substitution</h3>
|
||
<p>See the notes for HPR show <a href="http://hackerpublicradio.org/eps/hpr2045" title="Some other Bash tips">2045</a>.</p>
|
||
<h3 id="word-splitting">Word Splitting</h3>
|
||
<p>See the notes for HPR show <a href="http://hackerpublicradio.org/eps/hpr2045" title="Some other Bash tips">2045</a>.</p>
|
||
<h3 id="pathname-expansion">Pathname Expansion</h3>
|
||
<p>See the notes for HPR show <a href="http://hackerpublicradio.org/eps/hpr2278" title="Some supplementary Bash tips">2278</a> for some of the material in this section.</p>
|
||
<p>After word splitting, unless the <strong>-f</strong> option has been set, <strong>bash</strong> scans each word for the characters <b>*</b>, <b>?</b>, and <b>[</b>. If one of these characters appears, then the word is regarded as a <em>pattern</em>, and replaced with an alphabetically sorted list of filenames matching the pattern (see <strong>Pattern Matching</strong> below). If no matching filenames are found, and the shell option <strong>nullglob</strong> is not enabled, the word is left unchanged. If the <strong>nullglob</strong> option is set, and no matches are found, the word is removed. If the <strong>failglob</strong> shell option is set, and no matches are found, an error message is printed and the command is not executed. If the shell option <strong>nocaseglob</strong> is enabled, the match is performed without regard to the case of alphabetic characters. Note that when using range expressions like [a-z] (see below), letters of the other case may be included, depending on the setting of <strong>LC_COLLATE</strong>. When a pattern is used for pathname expansion, the character “.” at the start of a name or immediately following a slash must be matched explicitly, unless the shell option <strong>dotglob</strong> is set. When matching a pathname, the slash character must always be matched explicitly. In other cases, the “.” character is not treated specially. See the description of <strong>shopt</strong> below under <strong>SHELL BUILTIN COMMANDS</strong> for a description of the <strong>nocaseglob</strong>, <strong>nullglob</strong>, <strong>failglob</strong>, and <strong>dotglob</strong> shell options.</p>
|
||
<p>The <strong>GLOBIGNORE</strong> shell variable may be used to restrict the set of filenames matching a pattern. If <strong>GLOBIGNORE</strong> is set, each matching filename that also matches one of the patterns in <strong>GLOBIGNORE</strong> is removed from the list of matches. The filenames “.” and “..” are always ignored when <strong>GLOBIGNORE</strong> is set and not null. However, setting <strong>GLOBIGNORE</strong> to a non-null value has the effect of enabling the <strong>dotglob</strong> shell option, so all other file‐ names beginning with a “.” will match. To get the old behavior of ignoring filenames beginning with a “.”, make “.*" one of the patterns in <strong>GLOBIGNORE</strong>. The <strong>dotglob</strong> option is disabled when <strong>GLOBIGNORE</strong> is unset.</p>
|
||
<h4 id="pattern-matching">Pattern Matching</h4>
|
||
<p>Any character that appears in a pattern, other than the special pattern characters described below, matches itself. The NUL character may not occur in a pattern. A backslash escapes the following character; the escaping backslash is discarded when matching. The special pattern characters must be quoted if they are to be matched literally.</p>
|
||
<p>The special pattern characters have the following meanings:</p>
|
||
<dl>
|
||
<dt><b>*</b></dt>
|
||
<dd><p>Matches any string, including the null string. When the <strong>globstar</strong> shell option is enabled, and <b>*</b> is used in a pathname expansion context, two adjacent <b>*</b>s used as a single pattern will match all files and zero or more directories and subdirectories. If followed by a <b>/</b>, two adjacent <b>*</b>s will match only directories and subdirectories.</p>
|
||
</dd>
|
||
<dt><strong>?</strong></dt>
|
||
<dd><p>Matches any single character.</p>
|
||
</dd>
|
||
<dt><b>[…]</b></dt>
|
||
<dd><p>Matches any one of the enclosed characters. A pair of characters separated by a hyphen denotes a <em>range expression</em>; any character that falls between those two characters, inclusive, using the current locale’s collating sequence and character set, is matched. If the first character following the <strong>[</strong> is a <strong>!</strong> or a <strong>^</strong> then any character not enclosed is matched. The sorting order of characters in range expressions is determined by the current locale and the values of the <strong>LC_COLLATE</strong> or <strong>LC_ALL</strong> shell variables, if set. To obtain the traditional interpretation of range expressions, where <strong>[a-d]</strong> is equivalent to <strong>[abcd]</strong>, set value of the <strong>LC_ALL</strong> shell variable to <strong>C</strong>, or enable the <strong>globasciiranges</strong> shell option. A <strong>-</strong> may be matched by including it as the first or last character in the set. A <strong>]</strong> may be matched by including it as the first character in the set.</p>
|
||
<p>Within <b>[</b> and <b>]</b>, character classes can be specified using the syntax <b>[:</b><em>class</em><b>:]</b>, where <em>class</em> is one of the following classes defined in the POSIX standard: <b>alnum alpha ascii blank cntrl digit graph lower print punct space upper word xdigit</b> A character class matches any character belonging to that class. The <b>word</b> character class matches letters, digits, and the character _.</p>
|
||
<p>Within <b>[</b> and <b>]</b>, an <em>equivalence class</em> can be specified using the syntax <b>[=</b><em>c</em><b>=]</b>, which matches all characters with the same collation weight (as defined by the current locale) as the character <em>c.</em></p>
|
||
<p>Within <b>[</b> and <b>]</b>, the syntax <b>[.</b><em>symbol</em><b>.]</b> matches the collating symbol <em>symbol</em>.</p>
|
||
</dd>
|
||
</dl>
|
||
<p>If the <b>extglob</b> shell option is enabled using the <b>shopt</b> builtin, several extended pattern matching operators are recognized. In the following description, a <em>pattern-list</em> is a list of one or more patterns separated by a <b>|</b>. Composite patterns may be formed using one or more of the following sub-patterns:</p>
|
||
<dl>
|
||
<dt><b>?(</b><em>pattern-list</em><b>)</b></dt>
|
||
<dd><p>Matches zero or one occurrence of the given patterns</p>
|
||
</dd>
|
||
<dt><b>*(</b><em>pattern-list</em><b>)</b></dt>
|
||
<dd><p>Matches zero or more occurrences of the given patterns</p>
|
||
</dd>
|
||
<dt><b>+(</b><em>pattern-list</em><b>)</b></dt>
|
||
<dd><p>Matches one or more occurrences of the given patterns</p>
|
||
</dd>
|
||
<dt><b>@(</b><em>pattern-list</em><b>)</b></dt>
|
||
<dd><p>Matches one of the given patterns</p>
|
||
</dd>
|
||
<dt><b>!(</b><em>pattern-list</em><b>)</b></dt>
|
||
<dd><p>Matches anything except one of the given patterns</p>
|
||
</dd>
|
||
</dl>
|
||
<!-- }}} -->
|
||
<section class="footnotes">
|
||
<hr />
|
||
<ol>
|
||
<li id="fn1"><p>Note that on the versions of GNU Linux that I run (Debian, KDE Neon and Raspbian) ‘<code>extglob</code>’ is on by default. It is actually set in <code>/usr/share/bash-completion/bash_completion</code> which is invoked directly or from <code>/etc/bash_completion</code> which is invoked from the default <code>~/.bashrc</code>. These are all Debian-derived distributions, so I can’t speak for others.<a href="#fnref1">↩</a></p></li>
|
||
</ol>
|
||
</section>
|
||
</article>
|
||
</main>
|
||
</div>
|
||
</body>
|
||
</html>
|