Files
hpr_website/www/eps/hpr2060/hpr2060_full_shownotes.html

488 lines
35 KiB
HTML
Executable File

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<meta name="author" content="Dave Morriss">
<title>Introduction to sed - part 5 (HPR Show 2060)</title>
<style type="text/css">code{white-space: pre;}</style>
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
code > span.dt { color: #902000; } /* DataType */
code > span.dv { color: #40a070; } /* DecVal */
code > span.bn { color: #40a070; } /* BaseN */
code > span.fl { color: #40a070; } /* Float */
code > span.ch { color: #4070a0; } /* Char */
code > span.st { color: #4070a0; } /* String */
code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
code > span.ot { color: #007020; } /* Other */
code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
code > span.fu { color: #06287e; } /* Function */
code > span.er { color: #ff0000; font-weight: bold; } /* Error */
code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
code > span.cn { color: #880000; } /* Constant */
code > span.sc { color: #4070a0; } /* SpecialChar */
code > span.vs { color: #4070a0; } /* VerbatimString */
code > span.ss { color: #bb6688; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { color: #19177c; } /* Variable */
code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code > span.op { color: #666666; } /* Operator */
code > span.bu { } /* BuiltIn */
code > span.ex { } /* Extension */
code > span.pp { color: #bc7a00; } /* Preprocessor */
code > span.at { color: #7d9029; } /* Attribute */
code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
</style>
<link rel="stylesheet" href="http://hackerpublicradio.org/css/hpr.css">
</head>
<body id="home">
<div id="container" class="shadow">
<header>
<h1 class="title">Introduction to sed - part 5 (HPR Show 2060)</h1>
<h2 class="author">Dave Morriss</h2>
<hr/>
</header>
<main id="maincontent">
<article>
<header>
<h1>Table of Contents</h1>
<nav id="TOC">
<ul>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#commands">Commands</a><ul>
<li><a href="#finishing-off-less-frequently-used-commands">Finishing off less frequently used commands</a><ul>
<li><a href="#the-c-command">The <em>c</em> command</a></li>
<li><a href="#the-a-command">The <em>a</em> command</a></li>
<li><a href="#the-i-command">The <em>i</em> command</a></li>
</ul></li>
<li><a href="#guru-level-commands">&quot;Guru&quot; level commands</a><ul>
<li><a href="#defining-a-label">Defining a label</a></li>
<li><a href="#the-b-command">The <em>b</em> command</a></li>
<li><a href="#the-t-command">The <em>t</em> command</a></li>
</ul></li>
<li><a href="#commands-specific-to-gnu-sed">Commands specific to GNU <code>sed</code></a><ul>
<li><a href="#the-f-command">The <em>F</em> command</a></li>
</ul></li>
</ul></li>
<li><a href="#examples-from-the-gnu-manual">Examples from the GNU manual</a><ul>
<li><a href="#centering-lines">Centering lines</a></li>
<li><a href="#reverse-lines-of-files">Reverse lines of files</a></li>
<li><a href="#reverse-characters-of-lines">Reverse characters of lines</a></li>
</ul></li>
<li><a href="#my-answer-to-the-quiz-in-the-last-episode">My answer to the quiz in the last episode</a></li>
<li><a href="#links">Links</a></li>
</ul>
</nav>
</header>
<h2 id="introduction">Introduction</h2>
<p>This episode is the last one in the &quot;<em>Introduction to sed</em>&quot; series.</p>
<p>In the <a href="http://hackerpublicradio.org/eps/hpr2011" title="Introduction to sed - part 4">last episode</a> we looked at the full story of how <code>sed</code> works with the <em>hold</em> and <em>pattern</em> buffers. We looked at some of the commands that we had not yet seen and how they can be used to do more advanced processing using <code>sed</code>'s buffers.</p>
<p>In this episode we will look at a selection of the remaining commands, which might be described as quite obscure (even <em>very obscure</em>). We will also look at some of the example <code>sed</code> scripts found in the <a href="https://www.gnu.org/software/sed/manual/sed.html#Examples" title="GNU sed manual - examples">GNU sed manual</a>.</p>
<h2 id="commands">Commands</h2>
<h3 id="finishing-off-less-frequently-used-commands">Finishing off less frequently used commands</h3>
<p>We omitted a few commands in this group in the last episode. I will not cover everything in this category but there are some that might be useful, which we'll look at now.</p>
<h4 id="the-c-command">The <em>c</em> command</h4>
<p>This is one of the commands for inserting text in <code>sed</code>. The command is written as:</p>
<pre><code>c\
line1\
line2</code></pre>
<p>The <strong>c</strong> command itself must be followed by a backslash, as should all of the lines which follow, except the last. The backslashes stand for newlines.</p>
<p>The command can be preceded by any of the address types we saw in <a href="http://hackerpublicradio.org/eps/hpr1986" title="Introduction to sed - part 2">episode 2</a>. The lines matching the address(es) are deleted and replaced by the line(s) associated with this command. If no addresses are given all lines are replaced.</p>
<p>Since the command deletes the <em>pattern space</em> a new cycle follows.</p>
<p>The <strong>c</strong> command can be used on the command line, but not very usefully. For example, it is not possible to follow it with any more <code>sed</code> commands and another <code>-e</code> option needs to be resorted to:</p>
<pre><code>$ sed -e &#39;1c\Line removed&#39; -e &#39;3q&#39; sed_demo1.txt
Line removed
shows every weekday Monday through Friday. HPR has a long lineage going back to
Radio FreeK America, Binary Revolution Radio &amp; Infonomicon, and it is a direct</code></pre>
<p>Also, only one line can be generated this way:</p>
<pre><code>$ sed -e &#39;1c\**Censored**\Do not read!&#39; -e &#39;3q&#39; sed_demo1.txt
**Censored**Do not read!
shows every weekday Monday through Friday. HPR has a long lineage going back to
Radio FreeK America, Binary Revolution Radio &amp; Infonomicon, and it is a direct</code></pre>
<p>However, escape characters can be used so the following example generates two lines as intended:</p>
<pre><code>$ sed -e &#39;1c\**Censored**\nDo not read!&#39; -e &#39;3q&#39; sed_demo1.txt
**Censored**
Do not read!
shows every weekday Monday through Friday. HPR has a long lineage going back to
Radio FreeK America, Binary Revolution Radio &amp; Infonomicon, and it is a direct</code></pre>
<p>This feature is a GNU extension just for single-line additions.</p>
<p>The <strong>c</strong> command is best used in a file of <code>sed</code> commands. One has been prepared as <a href="hpr2060_demo5.sed" title="hpr2060_demo5.sed"><code>demo5.sed</code></a> which is available on the HPR website. The example below shows the file being listed with the <code>nl</code> command to show line numbers then it is used as a <code>sed</code> script and the results are shown:</p>
<pre><code>$ nl -w2 -ba -s&#39;: &#39; demo5.sed
1: 1c\
2: ------\
3: This line has been censored\
4: By the Department of Not Seeing Stuff\
5: ------
6:
7: 3q
$ sed -f demo5.sed sed_demo1.txt
------
This line has been censored
By the Department of Not Seeing Stuff
------
shows every weekday Monday through Friday. HPR has a long lineage going back to
Radio FreeK America, Binary Revolution Radio &amp; Infonomicon, and it is a direct</code></pre>
<p>Of course, this could all be done on one line using '<code>\n</code>' sequences, as we saw above, but that is extremely GNU sed-specific.</p>
<h4 id="the-a-command">The <em>a</em> command</h4>
<p>This command is a GNU extension. It has the same structure of lines as the <strong>c</strong> command.</p>
<pre><code>a\
line1\
line2</code></pre>
<p>The command can be preceded by any of the address types we saw in <a href="http://hackerpublicradio.org/eps/hpr1986" title="Introduction to sed - part 2">episode 2</a>. The lines matching the address(es) are processed as normal but are followed by the line(s) associated with this command at the end of the current cycle, or when the next input line is read. If no addresses are given all lines of processed by <code>sed</code> are followed by the line(s) of the <strong>a</strong> command.</p>
<p>If using the one-line form (as discussed with the <strong>c</strong> command) escape sequences like '<code>\n</code>' are allowed.</p>
<pre><code>$ sed -e &#39;1a\Chickens&#39; -e &#39;1q&#39; sed_demo1.txt
Hacker Public Radio (HPR) is an Internet Radio show (podcast) that releases
Chickens</code></pre>
<p>Here the <strong>a</strong> command only applies to the first line, after which a line is added. The second <code>-e</code> expression stops processing after line 1, so we only see one original line and one added line.</p>
<p>The following example adds a line containing just a hyphen after each line of the file, but the second <code>-e</code> expression stops processing after line 3 so we only see three lines of the file:</p>
<pre><code>$ sed -e &#39;a\-&#39; -e &#39;3q&#39; sed_demo1.txt
Hacker Public Radio (HPR) is an Internet Radio show (podcast) that releases
-
shows every weekday Monday through Friday. HPR has a long lineage going back to
-
Radio FreeK America, Binary Revolution Radio &amp; Infonomicon, and it is a direct
-</code></pre>
<h4 id="the-i-command">The <em>i</em> command</h4>
<p>This command is a GNU extension. It has the same structure of lines as the <strong>c</strong> and <strong>a</strong> commands.</p>
<pre><code>i\
line1\
line2</code></pre>
<p>The command can be preceded by any of the address types we saw in <a href="http://hackerpublicradio.org/eps/hpr1986" title="Introduction to sed - part 2">episode 2</a>. The lines matching the address(es) are preceded by the line(s) associated with this command. If no addresses are given all lines of processed by <code>sed</code> are preceded by the line(s) of the <strong>i</strong> command.</p>
<p>If using the one-line form (as discussed with the <strong>c</strong> command) escape sequences like '<code>\n</code>' are allowed.</p>
<p>The following example adds a line containing just a hyphen before each line of the file, but the second <code>-e</code> expression stops processing after line 3 so we only see three lines of the file:</p>
<pre><code>$ sed -e &#39;i\-&#39; -e &#39;3q&#39; sed_demo1.txt
-
Hacker Public Radio (HPR) is an Internet Radio show (podcast) that releases
-
shows every weekday Monday through Friday. HPR has a long lineage going back to
-
Radio FreeK America, Binary Revolution Radio &amp; Infonomicon, and it is a direct</code></pre>
<p>This example is similar to the preceding one but it adds an open square bracket before each line and a close square bracket after it. It uses the <strong>i</strong> and <strong>a</strong> commands to do this.</p>
<pre><code>$ sed -e &#39;i\[&#39; -e &#39;a\]&#39; -e &#39;3q&#39; sed_demo1.txt
[
Hacker Public Radio (HPR) is an Internet Radio show (podcast) that releases
]
[
shows every weekday Monday through Friday. HPR has a long lineage going back to
]
[
Radio FreeK America, Binary Revolution Radio &amp; Infonomicon, and it is a direct
]</code></pre>
<h3 id="guru-level-commands">&quot;Guru&quot; level commands</h3>
<p>The commands we have not seen yet are quite obscure. Even the section on <a href="https://www.gnu.org/software/sed/manual/sed.html#Programming-Commands" title="GNU sed manual - Commands for sed gurus">&quot;Commands for sed gurus&quot; in the GNU Manual</a> states:</p>
<blockquote>
<p><em>In most cases, use of these commands indicates that you are probably better off programming in something like awk or Perl. But occasionally one is committed to sticking with sed, and these commands can enable one to write quite convoluted scripts.</em></p>
</blockquote>
<p>I am including them in this episode because they will help with understanding some of the <a href="https://www.gnu.org/software/sed/manual/sed.html#Examples" title="GNU sed manual - examples">examples from the GNU Manual</a> later on.</p>
<h4 id="defining-a-label">Defining a label</h4>
<p>It is possible to create simple loops within <code>sed</code> but only by branching to a label conditionally or unconditionally. The label itself consists of a colon and a character sequence:</p>
<pre><code>: <em>label</em></code></pre>
<p>The label cannot be associated with an address (it makes no sense), and it serves no other purpose than to act as a point for transfer of execution.</p>
<h4 id="the-b-command">The <em>b</em> command</h4>
<p>This command takes the form:</p>
<pre><code>b <em>label</em></code></pre>
<p>It causes an unconditional branch to a <em>label</em>. The label may be omitted in which case the <em>b</em> command causes the next cycle to start.</p>
<p>See the third example below &quot;<em><a href="#reverse-characters-of-lines">Reverse characters of lines</a></em>&quot; for an example of this command's use.</p>
<h4 id="the-t-command">The <em>t</em> command</h4>
<p>This command takes the form:</p>
<pre><code>t <em>label</em></code></pre>
<p>It causes a conditional branch to the <em>label</em>. This happens only if there has been a successful substitution (<strong>s</strong> command) since the last input line was read or conditional branch was taken. The label may be omitted in which case the <em>t</em> command causes the next cycle to start.</p>
<p>See the third example below &quot;<em><a href="#reverse-characters-of-lines">Reverse characters of lines</a></em>&quot; for an example of this command's use.</p>
<h3 id="commands-specific-to-gnu-sed">Commands specific to GNU <code>sed</code></h3>
<p>This is one of the commands which are specific to GNU <code>sed</code>. For the full list refer to the <a href="https://www.gnu.org/software/sed/manual/sed.html#Extended-Commands" title="GNU sed manual - Commands Specific to GNU sed">GNU Manual</a>.</p>
<h4 id="the-f-command">The <em>F</em> command</h4>
<p>This command prints out the file name of the current input file (with a trailing newline).</p>
<p>This example contains a command group that is obeyed on line 1 of the input. The commands are an <strong>F</strong> which prints the filename, and a <strong>q</strong> which stops processing. Because <code>sed</code> is run in the default &quot;read and print&quot; mode the first line is printed:</p>
<pre><code>$ sed -e &#39;1{F;q}&#39; sed_demo1.txt
sed_demo1.txt
Hacker Public Radio (HPR) is an Internet Radio show (podcast) that releases</code></pre>
<h2 id="examples-from-the-gnu-manual">Examples from the GNU manual</h2>
<h3 id="centering-lines">Centering lines</h3>
<p>This example is from the <a href="https://www.gnu.org/software/sed/manual/sed.html#Centering-lines" title="GNU sed manual - 4.1">GNU manual</a> and centres all lines of a file in a width of 80 columns.</p>
<p>The script, called <a href="hpr2060_centre.sed" title="hpr2060_centre.sed"><code>centre.sed</code></a>, has been made available on the HPR site, and is reproduced below with line numbers for easy reference. Note that the path to <code>sed</code> has been changed from the original since many Linux distributions store it in <code>/bin</code> rather than <code>/usr/bin</code>.</p>
<p>Note that option <em>-f</em> is needed to make <code>sed</code> read the rest of the file.</p>
<div class="sourceCode"><table class="sourceCode sed numberLines"><tr class="sourceCode"><td class="lineNumbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
</pre></td><td class="sourceCode"><pre><code class="sourceCode sed"><span class="co">#!/bin/sed -f</span>
<span class="co"># Put 80 spaces in the buffer</span>
<span class="dv">1</span> {
<span class="kw">x</span>
<span class="kw">s</span><span class="st">/</span><span class="ch">^$</span><span class="st">/ /</span>
<span class="kw">s</span><span class="st">/</span><span class="ch">^.*$</span><span class="st">/</span><span class="ch">&amp;&amp;&amp;&amp;&amp;&amp;&amp;&amp;</span><span class="st">/</span>
<span class="kw">x</span>
}
<span class="co"># del leading and trailing spaces</span>
<span class="kw">y</span><span class="st">/\t/ /</span>
<span class="kw">s</span><span class="st">/</span><span class="ch">^</span><span class="st"> </span><span class="ch">*</span><span class="st">//</span>
<span class="kw">s</span><span class="st">/ </span><span class="ch">*$</span><span class="st">//</span>
<span class="co"># add a newline and 80 spaces to end of line</span>
<span class="kw">G</span>
<span class="co"># keep first 81 chars (80 + a newline)</span>
<span class="kw">s</span><span class="st">/</span><span class="ch">^\(.\{</span><span class="st">81</span><span class="ch">\}\).*$</span><span class="st">/</span><span class="ch">\1</span><span class="st">/</span>
<span class="co"># \2 matches half of the spaces, which are moved to the beginning</span>
<span class="kw">s</span><span class="st">/</span><span class="ch">^\(.*\)\n\(.*\)\2</span><span class="st">/</span><span class="ch">\2\1</span><span class="st">/</span></code></pre></td></tr></table></div>
<ul>
<li>Lines 4-9: This group of commands is executed on line 1 of the input stream.
<ul>
<li>Line 5: The <strong>x</strong> command exchanges the <em>pattern space</em> and the <em>hold space</em>. There will be data in the <em>pattern space</em> which will be stored in the <em>hold space</em>, but the <em>hold space</em> will have been empty originally, so now the <em>pattern space</em> is empty.</li>
<li>Line 6: Replaces the empty <em>pattern space</em> by 10 spaces.</li>
<li>Line 7: Replaces the 10 spaces in the <em>pattern space</em> by itself 8 times, thereby creating 80 spaces.</li>
<li>Line 8: Exchanges the buffers again so that the 80 spaces are stored in the <em>hold space</em> and the <em>pattern space</em> is back as it was.</li>
</ul></li>
<li>Line 12: In the <a href="https://www.gnu.org/software/sed/manual/sed.html#Centering-lines" title="GNU sed manual - 4.1">GNU manual</a> the command is written as <code>y/tab/ /</code> but the word <em>tab</em> is meant to signify a tab character, since it is invisible. The copy used here has used the '\t' metacharacter (or escape sequence), though this is GNU-specific. The <strong>y</strong> command replaces all tabs by spaces.</li>
<li>Line 13: This <strong>s</strong> command removes leading spaces.</li>
<li>Line 14: This <strong>s</strong> command removes trailing spaces.</li>
<li>Line 17: The <strong>G</strong> command appends the contents of the <em>hold space</em> to the <em>pattern space</em>, preceded by a newline. The contents of the <em>hold space</em> are not changed. Remember that the hold space contains 80 spaces.</li>
<li>Line 20: This <strong>s</strong> command replaces the <em>pattern space</em> by the first 81 characters, so this should consist of the original line, the newline and some of the newly added spaces.</li>
<li>Line 23: This <strong>s</strong> command matches the line up to the newline (using grouping), and enough of the spaces after the newline which can be split into two equal parts. Then half of the spaces (\2) are placed at the beginning of the line, centring it.</li>
</ul>
<p>This example is built for centring in 80 columns and would need a change to the <strong>s</strong> command on line 20 to use a different width. It will also truncate lines longer than 80 characters. However, it is a useful demonstration.</p>
<h3 id="reverse-lines-of-files">Reverse lines of files</h3>
<p>This example is from the <a href="https://www.gnu.org/software/sed/manual/sed.html#tac" title="GNU sed manual - 4.6">GNU manual</a>. It emulates the Unix command <code>tac</code> which is a reverse version of <code>cat</code>. The example is quite well described in the manual, but it seemed desirable to look at it in even more detail.</p>
<p>The script, called <a href="hpr2060_tac.sed" title="hpr2060_tac.sed"><code>tac.sed</code></a>, has been made available on the HPR site, and is reproduced below with line numbers for easy reference. Note that the path to <code>sed</code> has been changed as before.</p>
<p>Note that in addition to option <em>-f</em> we also have <em>-n</em> to suppress auto-printing.</p>
<div class="sourceCode"><table class="sourceCode sed numberLines"><tr class="sourceCode"><td class="lineNumbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="sourceCode"><pre><code class="sourceCode sed"><span class="co">#!/bin/sed -nf</span>
<span class="co"># reverse all lines of input, i.e. first line became last, ...</span>
<span class="co"># from the second line, the buffer (which contains all previous lines)</span>
<span class="co"># is *appended* to current line, so, the order will be reversed</span>
<span class="dv">1</span><span class="ot">!</span> <span class="kw">G</span>
<span class="co"># on the last line we&#39;re done -- print everything</span>
<span class="ot">$</span> <span class="kw">p</span>
<span class="co"># store everything on the buffer again</span>
<span class="kw">h</span></code></pre></td></tr></table></div>
<ul>
<li>Line 1: This is the usual <em>crunch-bang</em> or <em>hash-bang</em> line that is found on executable <code>sed</code> scripts.</li>
<li>Line 7: This is an address and a single command. The address is a line number, 1, but is <em>negated</em> so that it refers to all other lines. The <strong>G</strong> command appends a newline to the contents of the pattern space, and then appends the contents of the hold space to that of the pattern space.</li>
<li>Line 10: This is another command controlled by an address, a <strong>$</strong>, as we saw in <a href="http://hackerpublicradio.org/eps/hpr1997" title="Introduction to sed - part 3">episode 3</a>. The command is <strong>p</strong> which prints the pattern space. So, when the last input line has been reached the entire accumulated pattern space is printed.</li>
<li>Line 13: The <strong>h</strong> command replaces the contents of the hold space with the contents of the pattern space. This is done for every input line since it has no address.</li>
</ul>
<p>So, the algorithm used here is:</p>
<ul>
<li>The first line read by <code>sed</code> does not trigger anything other than the <strong>h</strong> command on line 13 of the script. This means that the line is stored in the hold space.</li>
<li>The second and subsequent input lines trigger the <strong>G</strong> command on line 7 of the script. For input line 2, for example, this command appends a newline to the pattern space, then appends input line 1 (previous stored in the hold space) to it. Then the <strong>h</strong> command on line 13 is invoked and the pattern space (in the order <em>line 2</em>/<em>line 1</em>) is stored in the hold space again. In this way, each line is appended to the already accumulated lines in reverse order.</li>
<li>When the last line is read the <strong>G</strong> command on line 7 will be triggered as before, appending the hold space contents again, with the result that the pattern space now holds the entire file in reverse order. Now, however, the <strong>p</strong> command on line 10 will trigger and the result of reversing everything will be printed.</li>
</ul>
<p>It bothers me slightly that the <strong>h</strong> command on line 13 will be run again after printing everything, but its effects will not be seen. I would have wanted to make line 10 into:</p>
<pre><code>$ {
p
q
}</code></pre>
<p>This would stop <code>sed</code> after printing. However, this is probably just obsessive thinking on my part!</p>
<h3 id="reverse-characters-of-lines">Reverse characters of lines</h3>
<p>This example is from the <a href="https://www.gnu.org/software/sed/manual/sed.html#Reverse-chars-of-lines" title="GNU sed manual - 4.5">GNU manual</a> where <code>sed</code> is used to emulate the <code>rev</code> command. The script, called <a href="hpr2060_reverse_characters.sed" title="hpr2060_reverse_characters.sed"><code>reverse_characters.sed</code></a>, has been made available on the HPR site, and is reproduced below with line numbers for easy reference. Note that the path to <code>sed</code> has been changed from the original as before. I have also changed line 6, replacing implicit newlines by '\n' sequences, which might mean the modified script will not run on non GNU <code>sed</code> versions.</p>
<div class="sourceCode"><table class="sourceCode sed numberLines"><tr class="sourceCode"><td class="lineNumbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="sourceCode"><pre><code class="sourceCode sed"><span class="co">#!/bin/sed -f</span>
<span class="st">/</span><span class="ch">..</span><span class="st">/</span><span class="ot">!</span> <span class="kw">b</span>
<span class="co"># Reverse a line. Begin embedding the line between two newlines</span>
<span class="kw">s</span><span class="st">/</span><span class="ch">^.*$</span><span class="st">/\n</span><span class="ch">&amp;</span><span class="st">\n/</span>
<span class="co"># Move first character at the end. The regexp matches until</span>
<span class="co"># there are zero or one characters between the markers</span>
<span class="kw">t</span><span class="fu">x</span>
<span class="fu">:x</span>
<span class="kw">s</span><span class="st">/</span><span class="ch">\(\n.\)\(.*\)\(.\n\)</span><span class="st">/</span><span class="ch">\3\2\1</span><span class="st">/</span>
<span class="kw">t</span><span class="fu">x</span>
<span class="co"># Remove the newline markers</span>
<span class="kw">s</span><span class="st">/</span><span class="ch">\n</span><span class="st">//</span><span class="dt">g</span></code></pre></td></tr></table></div>
<ul>
<li>Line 1: This is the usual <em>hash-bang</em> line that is found on executable <code>sed</code> scripts. This one does not suppress auto-printing.</li>
<li>Line 3: Here the <strong>b</strong> command is invoked on any line that does not have two characters in it. The <strong>b</strong> command normally invokes an unconditional branch to a label, but if the label is omitted it triggers a new cycle. The effect here is that any line with one character or less is simply printed and the rest of the commands are ignored. There is no point in reversing such a line!</li>
<li>Line 6: This <strong>s</strong> command replaces the current line by itself (<em>&amp;</em>) with a newline at the beginning and the end.</li>
<li>Line 10: This is documented in the <a href="https://www.gnu.org/software/sed/manual/sed.html#Reverse-chars-of-lines" title="GNU sed manual - 4.5">GNU manual</a> as: &quot;This is often needed to reset the flag that is tested by the t command.&quot; I have tried removing it and the script still works. Other versions of <code>sed</code> may not however.</li>
<li>Line 11: This is a label '<em>x</em>' for branch commands.</li>
<li>Line 12: This <strong>s</strong> command uses the newlines added on line 6 to determine which characters to swap. It uses groups to indicate the character after the first newline and before the second one, and groups the rest of the line, allowing that part to be zero or more characters long. It replaces what it finds with a reversed version of the first and third groups. This also ensures that the moved characters end up on the other side of the newlines. Note that this only finds the characters inside the newlines and swaps two. The rest of the line before the first newline and after the second are left alone.</li>
<li>Line 13: The <strong>t</strong> command is a conditional branch to label '<em>x</em>'. It will only branch if the <strong>s</strong> command on line 12 performs a substitution. In this way lines 11-13 form a loop to repeat the action on line 12 until the regular expression stops matching.</li>
<li>Line 16: Having reversed the line the newlines can be removed, and this <strong>s</strong> command does this, and the reversed line can then be printed before the next cycle begins.</li>
</ul>
<p>The processing of a line can be visualised by using the <strong>l</strong> command. I have provided another version of this script containing such commands to show what is happening:</p>
<div class="sourceCode"><table class="sourceCode sed numberLines"><tr class="sourceCode"><td class="lineNumbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="sourceCode"><pre><code class="sourceCode sed"><span class="co">#!/bin/sed -f</span>
<span class="co"># reverse_characters_debug.sed</span>
<span class="co">#</span>
<span class="co"># A version which prints what it&#39;s doing to help understand the process</span>
<span class="st">/</span><span class="ch">..</span><span class="st">/</span><span class="ot">!</span> <span class="kw">b</span>
<span class="co"># Reverse a line. Begin embedding the line between two newlines</span>
<span class="kw">s</span><span class="st">/</span><span class="ch">^.*$</span><span class="st">/\n</span><span class="ch">&amp;</span><span class="st">\n/</span>
<span class="co"># List the line to see what the command above did to it</span>
<span class="kw">l</span>
<span class="co"># Move first character at the end. The regexp matches until</span>
<span class="co"># there are zero or one characters between the markers</span>
<span class="kw">t</span><span class="fu">x</span>
<span class="fu">:x</span>
<span class="kw">s</span><span class="st">/</span><span class="ch">\(\n.\)\(.*\)\(.\n\)</span><span class="st">/</span><span class="ch">\3\2\1</span><span class="st">/</span>
<span class="co"># List the result of each loop iteration</span>
<span class="kw">l</span>
<span class="kw">t</span><span class="fu">x</span>
<span class="co"># Remove the newline markers</span>
<span class="kw">s</span><span class="st">/</span><span class="ch">\n</span><span class="st">//</span><span class="dt">g</span></code></pre></td></tr></table></div>
<p>It is available on the HPR site as <a href="hpr2060_reverse_characters_debug.sed" title="hpr2060_reverse_characters_debug.sed"><code>reverse_characters_debug.sed</code></a> and you can examine it yourself. Running it on a simple string gives output as follows:</p>
<pre><code>$ echo abcdefghijklmnopqrstuvwxyz | ./reverse_characters_debug.sed
\nabcdefghijklmnopqrstuvwxyz\n$
z\nbcdefghijklmnopqrstuvwxy\na$
zy\ncdefghijklmnopqrstuvwx\nba$
zyx\ndefghijklmnopqrstuvw\ncba$
zyxw\nefghijklmnopqrstuv\ndcba$
zyxwv\nfghijklmnopqrstu\nedcba$
zyxwvu\nghijklmnopqrst\nfedcba$
zyxwvut\nhijklmnopqrs\ngfedcba$
zyxwvuts\nijklmnopqr\nhgfedcba$
zyxwvutsr\njklmnopq\nihgfedcba$
zyxwvutsrq\nklmnop\njihgfedcba$
zyxwvutsrqp\nlmno\nkjihgfedcba$
zyxwvutsrqpo\nmn\nlkjihgfedcba$
zyxwvutsrqpon\n\nmlkjihgfedcba$
zyxwvutsrqpon\n\nmlkjihgfedcba$
zyxwvutsrqponmlkjihgfedcba</code></pre>
<p>The first line of the output shows the original line being embedded between two newlines.</p>
<p>The second line shows the 'a' and 'z' being swapped as discussed in the explanation. Then successive lines show further swaps based on the positions of the two newlines.</p>
<p>The auto-printed last line shows the final result after all swaps have been carried out.</p>
<h2 id="my-answer-to-the-quiz-in-the-last-episode">My answer to the quiz in the last episode</h2>
<p>As promised here is my answer to the quiz I set in <a href="http://hackerpublicradio.org/eps/hpr2011" title="Introduction to sed - part 4">episode 4</a>. The request was to use <code>sed_demo1.txt</code>, taking the first line and converting it to <em>Pig Latin</em>. The brief rules were:</p>
<ul>
<li>Take the first letter of each word and place it at the end, followed by 'ay'. Thus 'pig' becomes 'igpay' and 'latin' becomes 'atinlay'.</li>
<li>Skip 1- and 2-letter words, since 'a' -&gt; 'aay' is not wanted.</li>
<li>Do not bother about capitals.</li>
</ul>
<p>Here's what I did:</p>
<pre><code>$ sed -ne &#39;1s/\(\b\w\)\(\w\{2,\}\)/\2\1ay/gp&#39; sed_demo1.txt
ackerHay ublicPay adioRay (PRHay) is an nternetIay adioRay howsay (odcastpay) hattay eleasesray</code></pre>
<p>Sadly, there were no winners of this little competition because there were no entries. It's probably just as well that I am finishing this series here because I think I probably sent everyone to sleep several episodes back!!</p>
<h2 id="links">Links</h2>
<ul>
<li><em>Introduction to sed - part 1</em>: <a href="http://hackerpublicradio.org/eps/hpr1976" class="uri">http://hackerpublicradio.org/eps/hpr1976</a></li>
<li><em>Introduction to sed - part 2</em>: <a href="http://hackerpublicradio.org/eps/hpr1986" class="uri">http://hackerpublicradio.org/eps/hpr1986</a></li>
<li><em>Introduction to sed - part 3</em>: <a href="http://hackerpublicradio.org/eps/hpr1997" class="uri">http://hackerpublicradio.org/eps/hpr1997</a></li>
<li><em>Introduction to sed - part 4</em>: <a href="http://hackerpublicradio.org/eps/hpr2011" class="uri">http://hackerpublicradio.org/eps/hpr2011</a></li>
<li>GNU <code>sed</code> manual:
<ul>
<li>Index: <a href="https://www.gnu.org/software/sed/manual/sed.html" class="uri">https://www.gnu.org/software/sed/manual/sed.html</a></li>
<li>Commands for <code>sed</code> gurus: <a href="https://www.gnu.org/software/sed/manual/sed.html#Programming-Commands" class="uri">https://www.gnu.org/software/sed/manual/sed.html#Programming-Commands</a></li>
<li>Commands Specific to GNU <code>sed</code>: <a href="https://www.gnu.org/software/sed/manual/sed.html#Extended-Commands" class="uri">https://www.gnu.org/software/sed/manual/sed.html#Extended-Commands</a></li>
</ul></li>
<li>Wikipedia entry for <code>sed</code>: <a href="https://en.wikipedia.org/wiki/Sed" class="uri">https://en.wikipedia.org/wiki/Sed</a></li>
<li>&quot;<em>Sed - An Introduction and Tutorial</em>&quot; by Bruce Barnett: <a href="http://www.grymoire.com/Unix/Sed.html" class="uri">http://www.grymoire.com/Unix/Sed.html</a></li>
<li>Wikibooks sed wiki: <a href="https://en.wikibooks.org/wiki/Sed" class="uri">https://en.wikibooks.org/wiki/Sed</a></li>
<li>Example files:
<ul>
<li>Using the <strong>c</strong> command: <a href="hpr2060_demo5.sed" class="uri">hpr2060_demo5.sed</a></li>
<li>Centring lines: <a href="hpr2060_centre.sed" class="uri">hpr2060_centre.sed</a></li>
<li>Reverse lines of files: <a href="hpr2060_tac.sed" class="uri">hpr2060_tac.sed</a></li>
<li>Reverse characters of lines (original and debug): <a href="hpr2060_reverse_characters.sed" class="uri">hpr2060_reverse_characters.sed</a> <a href="hpr2060_reverse_characters_debug.sed" class="uri">hpr2060_reverse_characters_debug.sed</a></li>
</ul></li>
</ul>
<!--
vim: syntax=markdown:ts=8:sw=4:ai:et:tw=78:fo=tcqn:fdm=marker
-->
</article>
</main>
</div>
</body>
</html>