Files
hpr_website/www/eps/hpr2317/hpr2317_full_shownotes.html

292 lines
22 KiB
HTML
Raw Normal View History

2025-10-28 18:39:57 +01:00
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<meta name="author" content="Dave Morriss">
<title>Bash snippet - extglob and scp (HPR Show 2317)</title>
<style type="text/css">code{white-space: pre;}</style>
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<link rel="stylesheet" href="http://hackerpublicradio.org/css/hpr.css">
</head>
<body id="home">
<div id="container" class="shadow">
<header>
<h1 class="title">Bash snippet - extglob and scp (HPR Show 2317)</h1>
<h2 class="author">Dave Morriss</h2>
<hr/>
</header>
<main id="maincontent">
<article>
<header>
<h1>Table of Contents</h1>
<nav id="TOC">
<ul>
<li><a href="#the-problem">The Problem</a></li>
<li><a href="#test-environment">Test Environment</a></li>
<li><a href="#what-works">What Works</a></li>
<li><a href="#what-fails">What Fails</a></li>
<li><a href="#alternatives">Alternatives</a><ul>
<li><a href="#first-try---attempting-to-use-extended-globs">First try - attempting to use extended globs</a></li>
<li><a href="#second-try---just-use-simpler-globs">Second try - just use simpler globs</a></li>
<li><a href="#third-try---use-rsync-with-a-filter">Third try - use <code>rsync</code> with a filter</a><ul>
<li><a href="#making-a-filter-file">Making a filter file</a></li>
<li><a href="#running-rsync-with-the-filter">Running rsync with the filter</a></li>
<li><a href="#caution">Caution</a></li>
</ul></li>
</ul></li>
<li><a href="#another-digression">Another digression</a></li>
<li><a href="#conclusion">Conclusion</a></li>
<li><a href="#links">Links</a></li>
</ul>
</nav>
</header>
<h2 id="the-problem">The Problem</h2>
<p>Following on from my last show on <a href="http://hackerpublicradio.org/eps/hpr2293" title="More supplementary Bash tips">filename expansion</a>, concentrating on extended patterns and the <code>extglob</code> option, I was asked a question by <a href="http://hackerpublicradio.org/correspondents.php?hostid=238" title="HPR Host Jon Kulp">Jon Kulp</a> in the comment section.</p>
<p>Jon was using <code>ls *(*.mp3|*.ogg)</code> to find all OGG and MP3 files in a directory which also held other files. However, when he wanted to copy this subset of files elsewhere he had problems using this expression in an <code>scp</code> command.</p>
<p>Having done some investigations to help solve this I thought Id put what I found into an HPR episode and share it, and this is the show.</p>
<h2 id="test-environment">Test Environment</h2>
<p>On one of my Raspberry Pis (<code>rpi4</code>) I made some empty test files for the purposes of this show:</p>
<pre><code>$ mkdir scptest
$ touch scptest/{a..c}{00..10}.{mkd,mp3,ogg}
$ ls -x -w 80 scptest/
a00.mkd a00.mp3 a00.ogg a01.mkd a01.mp3 a01.ogg a02.mkd a02.mp3 a02.ogg
a03.mkd a03.mp3 a03.ogg a04.mkd a04.mp3 a04.ogg a05.mkd a05.mp3 a05.ogg
.
.
.
c05.mkd c05.mp3 c05.ogg c06.mkd c06.mp3 c06.ogg c07.mkd c07.mp3 c07.ogg
c08.mkd c08.mp3 c08.ogg c09.mkd c09.mp3 c09.ogg c10.mkd c10.mp3 c10.ogg</code></pre>
<p>So, we have made files with the extensions <code>mkd</code>, <code>ogg</code> and <code>mp3</code> and these are shown with an <code>ls</code> command.</p>
<p>If we move into the directory and use the glob pattern Jon did we see just the <code>mp3</code> and <code>ogg</code> files:</p>
<pre><code>$ cd scptest/
$ ls -x -w 80 *(*.mp3|*.ogg)
a00.mp3 a00.ogg a01.mp3 a01.ogg a02.mp3 a02.ogg a03.mp3 a03.ogg a04.mp3
a04.ogg a05.mp3 a05.ogg a06.mp3 a06.ogg a07.mp3 a07.ogg a08.mp3 a08.ogg
.
.
.
c05.mp3 c05.ogg c06.mp3 c06.ogg c07.mp3 c07.ogg c08.mp3 c08.ogg c09.mp3
c09.ogg c10.mp3 c10.ogg</code></pre>
<h2 id="what-works">What Works</h2>
<p>I ran the following command on <code>rpi4</code> to copy selected files from the <code>scptest</code> directory to another Raspberry Pi called <code>rpi5</code> where I have created a directory called <code>test</code> for the purpose. I have copied my ssh key to that machine already so no password is prompted for.</p>
<pre><code>$ scp *(*.mp3|*.ogg) dave@rpi5:test/
a00.mp3 100% 0 0.0KB/s 00:00
a00.ogg 100% 0 0.0KB/s 00:00
a01.mp3 100% 0 0.0KB/s 00:00
a01.ogg 100% 0 0.0KB/s 00:00
.
.
.
c10.mp3 100% 0 0.0KB/s 00:00
c10.ogg 100% 0 0.0KB/s 00:00</code></pre>
<p>All of the requested (empty) files were copied.</p>
<h2 id="what-fails">What Fails</h2>
<p>If I try the equivalent from the other host, pulling the files from <code>rpi4</code> to <code>rpi5</code>, I dont get what I might expect:</p>
<pre><code>$ scp dave@rpi4:scptest/*(*.mp3|*.ogg) .
bash: -c: line 0: syntax error near unexpected token `(&#39;
bash: -c: line 0: `scp -f scptest/*(*.mp3|*.ogg)&#39;</code></pre>
<p>Running the command again with the <code>-v</code> option we can see that the line <code>scp -f scptest/*(*.mp3,*.ogg)</code> is being executed on <code>rpi4</code> and this is causing the error. The conclusion is that <code>scp</code> itself is doing something thats not compatible with this expression.</p>
<p>My later investigations revealed that <code>extglob</code> is apparently off when this command is being executed, but more of this anon.</p>
<h2 id="alternatives">Alternatives</h2>
<h3 id="first-try---attempting-to-use-extended-globs">First try - attempting to use extended globs</h3>
<p>I found an article about this issue on <a href="https://unix.stackexchange.com/questions/103058/exclude-characters-for-scp-filepattern" title="StackExchange question about scp">StackExchange</a> with a very comprehensive (if impenetrable) answer.</p>
<p>The answer points out that <code>scp</code> simply hands the filename (or expression) to the remote machine where its interpreted by the local shell. This could be any shell.</p>
<p>The answer suggests that the remote filename could be a command for the remote system, but that doesnt seem to be the case in my very simple test:</p>
<pre><code>$ scp dave@rpi4:&#39;ls&#39; .
scp: ls: No such file or directory</code></pre>
<p>This is probably too naive to work as it is however.</p>
<p>It is suggested that the following command will work though. Note that the command contains a newline inside the string passed to <code>scp' before the word '</code>bash`. This is necessary for the command to work:</p>
<pre><code>$ LC_SCPFILES=&#39;scptest/*(*.mp3|*.ogg)&#39; scp -o SendEnv=LC_SCPFILES &quot;dave@rpi4:&lt;/dev/null
bash -O extglob -c &#39;exec scp -f -- \$LC_SCPFILES&#39;;exit&quot; .
a00.mp3 100% 0 0.0KB/s 00:00
a00.ogg 100% 0 0.0KB/s 00:00
a01.mp3 100% 0 0.0KB/s 00:00
a01.ogg 100% 0 0.0KB/s 00:00
.
.
.</code></pre>
<!-- \* -->
<p>This does work, though understanding why is a challenge.</p>
<p>A more manageable solution is the following function based on the same idea:</p>
<pre><code>safer_scp() (
file=$1; shift
export LC_SCPFILES=&quot;${file#*:}&quot;
exec scp -o SendEnv=LC_SCPFILES &quot;${file%%:*}:&lt;/dev/null
bash -O extglob -c &#39;exec scp -f -- \$LC_SCPFILES&#39;;exit&quot; &quot;$@&quot;
)</code></pre>
<blockquote>
<hr>
</blockquote>
<p><u>You might want to skip this part since it gets into deep deep Bash and <code>scp</code> magic!</u></p>
<p>This all hinges on the fact that in this case <code>scp</code> works by doing the following:</p>
<ol type="1">
<li><p>It connects to the remote machine using the remote username and host name. It does this using <code>ssh</code>, creating a “tunnel” between the two and running a shell at the remote end.</p></li>
<li><p>Over the tunnel it issues a command to be run on the remote machine which consists of <code>scp -f FILENAME</code>. The <code>-f</code> option runs <code>scp</code> in “remote” mode. This option is undocumented but can be seen in the source code.</p></li>
<li><p>The remote end copies the file (or files) back to the local end. It interprets the filename or glob expression using the shell opened on the remote machine.</p></li>
</ol>
<p>The <code>safer_scp</code> function takes advantage of these features. Note that the body of a function can be any compound command. A series of commands enclosed in parentheses is such a compound command, BUT it executes in a sub-shell where the more usual compound command in braces does not. I am not 100% clear why it is written this way but experimentation has shown that without a body in parentheses running the function will disconnect from the remote machine!</p>
<p>In the function the variable <code>file</code> is set to the first argument. This is then removed from the function argument list with <code>shift</code>.</p>
<p>The variable <code>LC_SCPFILES</code> is defined, being set to the piece of the contents of the <code>file</code> variable following the colon.</p>
<p>The <code>exec</code> command runs the rest of the function as a command which replaces the currently executing shell. The command invoked is an <code>scp</code> command which passes the environment variable <code>LC_SCPFILES</code> to the remote end (using the <code>-o</code> option with <code>SendEnv=LC_SCPFILES</code>).</p>
<p>The arguments to <code>scp</code> are two strings. The first is:</p>
<pre><code>&quot;${file%%:*}:&lt;/dev/null
bash -O extglob -c &#39;exec scp -f -- \$LC_SCPFILES&#39;;exit&quot;</code></pre>
<p>The second argument consists of the remaining arguments to <code>safer_scp</code> (<code>&quot;$@&quot;</code>).</p>
<p>The first argument expands variable <code>file</code>, returning the first part (by removing the colon and everything after it). It then adds a colon and takes input from <code>/dev/null</code>. This is then followed by a newline.</p>
<p>The rest of the string invokes Bash, setting the <code>extglob</code> option with the <code>-O</code> option and reading the following string as a command as specified by the <code>-c</code> option. The command is a further <code>exec</code> which runs <code>scp</code>.</p>
<p>This instance of <code>scp</code> uses the undocumented option <code>-f</code> (as mentioned earlier). This tells <code>scp</code> that it is running as the remote instance.</p>
<p>The <code>--</code> (double hyphen) is a convention to tell a program that the options have ended. This protects the following filename (in variable <code>LC_SCPFILES</code>) from possibly being interpreted as options.</p>
<p>So, going back to the entire string being handed to the first <code>scp</code>, this does the following:</p>
<ul>
<li>It receives the username and host string (as in <code>dave@rpi4</code>) with a colon at the end. The rest of the remote file specification is <code>/dev/null/</code> and when this is processed the usual remote <code>scp</code> exits.</li>
<li>The part after the newline is then executed. It runs Bash with <code>extglob</code> on and invokes another <code>scp</code> which simulates the one which is normally run - but now guaranteed to be in a Bash shell and with <code>extglob</code> on. This then sends the file or files back to the local end after expanding the expanded glob pattern in variable <code>LC_SCPFILES</code>.</li>
<li>The <code>exit</code> after the Bash process ensures the process invoked at the remote end shuts down.</li>
</ul>
<p>This complex set of events compensates for deficiencies of <code>scp</code> and allows expanded glob patterns to be passed through. However, its still error-prone, as will be seen later.</p>
<p>The function does actually work, but its <strong>so</strong> obscure and reliant on what seem like edge conditions or hidden features I dont think it should be used.</p>
<blockquote>
<hr>
</blockquote>
<h3 id="second-try---just-use-simpler-globs">Second try - just use simpler globs</h3>
<p>If the requirement is to use an extended glob expression in the solution then this one will not suit. However, if the goal is to copy files, then it will!</p>
<pre><code>$ scp dave@rpi4:scptest/*.{mp3,ogg} .
a00.mp3 100% 0 0.0KB/s 00:00
a01.mp3 100% 0 0.0KB/s 00:00
a02.mp3 100% 0 0.0KB/s 00:00
a03.mp3 100% 0 0.0KB/s 00:00
.
.
.</code></pre>
<p>This does the job. The expression passed to the remote end is s simple glob pattern (with a brace expansion) and this does not rely on <code>extglob</code> being on at the remote end. It may not work if the glob uses Bash-specific patterns and the remote account uses a shell other than Bash though.</p>
<h3 id="third-try---use-rsync-with-a-filter">Third try - use <code>rsync</code> with a filter</h3>
<p>I have never encountered this issue with <code>scp</code> myself when moving files around between servers. I do a lot of file moving both for myself and as an HPR “<em>janitor</em>”. The reason I havent seen it is because I usually use <code>rsync</code>.</p>
<p>There is a way of using <code>rsync</code> to achieve what was wanted here, though it does not use extended glob patterns.</p>
<p>The <code>rsync</code> command can be told to copy files from a directory, including those that match a pattern and to exclude the rest. This is done with filters.</p>
<p>The <code>rsync</code> command is very powerful and hard to master. In fact there is scope for a whole HPR series on its intricacies. However, well just restrict ourselves to the use of filters here to solve this problem.</p>
<p>Heres what I do:</p>
<ol type="1">
<li>Make a filter stored in a file</li>
<li>Run <code>rsync</code> with the filter</li>
</ol>
<h4 id="making-a-filter-file">Making a filter file</h4>
<p>I created a file called <code>.rsync_test</code>:</p>
<pre><code>$ cat .rsync_test
+ *.mp3
+ *.ogg
- *</code></pre>
<p>Lines beginning with + are rules for inclusion. Those beginning with - are exclusions. The order is significant.</p>
<p>These rules tell <code>rsync</code> to include all files ending <code>.mp3</code> and <code>.ogg</code>. Anything else is to be excluded.</p>
<h4 id="running-rsync-with-the-filter">Running rsync with the filter</h4>
<p>The command would be:</p>
<pre><code>$ rsync -vaP -e ssh --filter=&quot;. .rsync_test&quot; dave@rpi4:scptest/ test/
receiving incremental file list
./
a00.mp3
0 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=65/67)
a00.ogg
0 100% 0.00kB/s 0:00:00 (xfr#2, to-chk=64/67)
a01.mp3
0 100% 0.00kB/s 0:00:00 (xfr#3, to-chk=63/67)
a01.ogg
0 100% 0.00kB/s 0:00:00 (xfr#4, to-chk=62/67)
a02.mp3
0 100% 0.00kB/s 0:00:00 (xfr#5, to-chk=61/67)
a02.ogg
0 100% 0.00kB/s 0:00:00 (xfr#6, to-chk=60/67)
.
.
.
c10.mp3
0 100% 0.00kB/s 0:00:00 (xfr#65, to-chk=1/67)
c10.ogg
0 100% 0.00kB/s 0:00:00 (xfr#66, to-chk=0/67)
sent 1,310 bytes received 3,809 bytes 10,238.00 bytes/sec
total size is 0 speedup is 0.00</code></pre>
<p>The options are:</p>
<pre><code>-vaP select verbose mode (v), archive mode (a, shorthand for many
other options) and show progress (P)
-e ssh use ssh to transfer files
--filter=&quot;. .rsync_test&quot; use a filter</code></pre>
<p>The filter expression is <code>. .rsync_test</code> where the leading . is short for merge and tells <code>rsync</code> to read filter rules from the file.</p>
<p>The arguments are:</p>
<pre><code>dave@rpi4:scptest/ the remote host and directory to copy from
test/ the local directory to copy to</code></pre>
<p>It is a good idea to use the <code>-n</code> option when setting up such a command, to check that everything works as it should, before running it for real. This option turns on dry-run mode where the process is run without actually copying anything.</p>
<p>You dont have to use the filter file. The following command does the same:</p>
<pre><code>$ rsync -vaP -e ssh -f &quot;+ *.mp3&quot; -f &quot;+ *.ogg&quot; -f &quot;- *&quot; dave@rpi4:scptest/ test/</code></pre>
<p>Here <code>-f</code> is the short form of <code>--filter</code>.</p>
<p>I prefer the filter file myself.</p>
<h4 id="caution">Caution</h4>
<p>The <code>rsync</code> tool is a beast and needs careful treatment! Things to be aware of if you want to go further than this simple guide:</p>
<ul>
<li><code>rsync</code> will traverse a directory hierarchy (its recursive)</li>
<li>the presence of a trailing slash on the <em>source</em> directory makes it transfer the <u>contents</u> of the directory. Without it the directory itself and its contents will be copied</li>
<li><code>rsync</code> compares source and destination files. If a file already exists at the destination it will not copy it. However, if the source copy is different from the destination copy <code>rsync</code> will transfer differences</li>
</ul>
<h2 id="another-digression">Another digression</h2>
<p>Since I am already well off the rails with this episode I thought Id go looking at another area commented on by <a href="http://hackerpublicradio.org/correspondents.php?hostid=311" title="HPR Host clacke">clacke</a> in the context of <a href="http://hackerpublicradio.org/eps/hpr2293" title="More supplementary Bash tips">show 2293</a>.</p>
<p>You are probably aware that file names containing spaces (and other unusual characters) can be difficult to use with commands and programs in Unix and Linux. The question was how <code>scp</code> would behave. I thought Id do some experimentation with filenames containing spaces.</p>
<blockquote>
<hr>
</blockquote>
<p><u>You might want to skip this part since it gets into more of the guts of <code>scp</code></u></p>
<p>I created a file on <code>rpi4</code> called “<code>what a horrible filename.txt</code>” and tried to pull it across to <code>rpi5</code>. In each case I used the <code>-v</code> option to <code>scp</code> in order to see all the details of what was going on. Be warned that this generates a lot of output.</p>
<ol type="1">
<li><p><code>scp -v dave@rpi4:'scptest/what a horrible filename.txt' test/</code><br />
This normally is one way filenames with spaces can be dealt with but it fails here because the quotes are removed in the transfer.</p></li>
<li><p><code>scp -v dave@rpi4:'scptest/what\ a\ horrible\ filename.txt' test/</code><br />
Another way of protecting spaces is to escape each of them with a backslash. This time I have used these inside the string. This works. The quotes are removed but the backslashes remain to protect the spaces.</p></li>
<li><p><code>scp -v dave@rpi4:&quot;scptest/what\ a\ horrible\ filename.txt&quot; test/</code><br />
Double quotes are equivalent to single ones in this context, so this works in the same way as example 2.</p></li>
<li><p><code>scp -v dave@rpi4:scptest/what\ a\ horrible\ filename.txt test/</code><br />
This is normally another way that spaces can be protected, but this one fails because the backslashes are removed in the first pass. It is logically equivalent to example 1.</p></li>
<li><p><code>scp -v dave@rpi4:scptest/what\\ a\\ horrible\\ filename.txt test/</code><br />
Since the <code>scp</code> process removes quotes and backslashes first time round, well try doubling them. This does not work because the remote end gets the filename with literal backslashes and rejects it.</p></li>
<li><p><code>scp -v dave@rpi4:scptest/what\\\ a\\\ horrible\\\ filename.txt test/</code><br />
Since the last test failed well try trebling the backslashes. This works - rather counter-intuitively I find.</p></li>
<li><p><code>scp -v dave@rpi4:'&quot;scptest/what a horrible filename.txt&quot;' test/</code><br />
Enclosing one sort of quotes in another should work, and indeed it does. Nested quotes are another solution. However, they must be different types of quotes - single inside double or vice versa.</p></li>
</ol>
<p>You might wonder how the <code>safer_scp</code> function we saw earlier deals with such filenames. I could not get it to transfer the file using any of these formats.</p>
<p>However, by modifying it slightly (removing the backslash in front of <code>$LC_SCPFILES</code>) it worked:</p>
<pre><code>$ safer_scp() (
&gt; file=$1; shift
&gt; export LC_SCPFILES=&quot;${file#*:}&quot;
&gt; exec scp -o SendEnv=LC_SCPFILES &quot;${file%%:*}:&lt;/dev/null
&gt; bash -O extglob -c &#39;exec scp -f -- $LC_SCPFILES&#39;;exit&quot; &quot;$@&quot;
&gt; )
$ safer_scp dave@rpi4:&#39;scptest/what\ a\ horrible\ filename.txt&#39; test/
what a horrible filename.txt 100% 0 0.0KB/s 00:00</code></pre>
<p>I wasnt clear what the backslash was for anyway!</p>
<p>This modified function passed all of the tests of plain filenames and glob patterns which I tried. I am still not sure that Id use it myself though.</p>
<blockquote>
<hr>
</blockquote>
<h2 id="conclusion">Conclusion</h2>
<p>The <code>scp</code> command is built on the original BSD Unix command <code>rcp</code>. I dont know if this is why it has the quirks we have looked at here, but it does seem to suffer some deficiencies. However, I find it useful and usable most of the time.</p>
<p>Using <code>rsync</code> solves a number of the problems <code>scp</code> shows, though it has its own shortcomings. I think a good working knowledge of <code>scp</code> and <code>rsync</code> is important in a Sysadmins toolkit and can be of great use to all Unix/Linux users.</p>
<h2 id="links">Links</h2>
<ul>
<li>HPR Show 2293: <a href="http://hackerpublicradio.org/eps/hpr2293">More supplementary Bash tips</a></li>
<li>StackExchange question: <a href="https://unix.stackexchange.com/questions/103058/exclude-characters-for-scp-filepattern">Exclude characters for SCP-filepattern</a></li>
</ul>
<!--
vim: syntax=markdown:ts=8:sw=4:ai:et:tw=78:fo=tcqn:fdm=marker
-->
</article>
</main>
</div>
</body>
</html>