Files
hpr_website/eps/hpr2317/hpr2317_full_shownotes.html
2025-10-28 18:39:57 +01:00

292 lines
22 KiB
HTML
Executable File
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<meta name="author" content="Dave Morriss">
<title>Bash snippet - extglob and scp (HPR Show 2317)</title>
<style type="text/css">code{white-space: pre;}</style>
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<link rel="stylesheet" href="http://hackerpublicradio.org/css/hpr.css">
</head>
<body id="home">
<div id="container" class="shadow">
<header>
<h1 class="title">Bash snippet - extglob and scp (HPR Show 2317)</h1>
<h2 class="author">Dave Morriss</h2>
<hr/>
</header>
<main id="maincontent">
<article>
<header>
<h1>Table of Contents</h1>
<nav id="TOC">
<ul>
<li><a href="#the-problem">The Problem</a></li>
<li><a href="#test-environment">Test Environment</a></li>
<li><a href="#what-works">What Works</a></li>
<li><a href="#what-fails">What Fails</a></li>
<li><a href="#alternatives">Alternatives</a><ul>
<li><a href="#first-try---attempting-to-use-extended-globs">First try - attempting to use extended globs</a></li>
<li><a href="#second-try---just-use-simpler-globs">Second try - just use simpler globs</a></li>
<li><a href="#third-try---use-rsync-with-a-filter">Third try - use <code>rsync</code> with a filter</a><ul>
<li><a href="#making-a-filter-file">Making a filter file</a></li>
<li><a href="#running-rsync-with-the-filter">Running rsync with the filter</a></li>
<li><a href="#caution">Caution</a></li>
</ul></li>
</ul></li>
<li><a href="#another-digression">Another digression</a></li>
<li><a href="#conclusion">Conclusion</a></li>
<li><a href="#links">Links</a></li>
</ul>
</nav>
</header>
<h2 id="the-problem">The Problem</h2>
<p>Following on from my last show on <a href="http://hackerpublicradio.org/eps/hpr2293" title="More supplementary Bash tips">filename expansion</a>, concentrating on extended patterns and the <code>extglob</code> option, I was asked a question by <a href="http://hackerpublicradio.org/correspondents.php?hostid=238" title="HPR Host Jon Kulp">Jon Kulp</a> in the comment section.</p>
<p>Jon was using <code>ls *(*.mp3|*.ogg)</code> to find all OGG and MP3 files in a directory which also held other files. However, when he wanted to copy this subset of files elsewhere he had problems using this expression in an <code>scp</code> command.</p>
<p>Having done some investigations to help solve this I thought Id put what I found into an HPR episode and share it, and this is the show.</p>
<h2 id="test-environment">Test Environment</h2>
<p>On one of my Raspberry Pis (<code>rpi4</code>) I made some empty test files for the purposes of this show:</p>
<pre><code>$ mkdir scptest
$ touch scptest/{a..c}{00..10}.{mkd,mp3,ogg}
$ ls -x -w 80 scptest/
a00.mkd a00.mp3 a00.ogg a01.mkd a01.mp3 a01.ogg a02.mkd a02.mp3 a02.ogg
a03.mkd a03.mp3 a03.ogg a04.mkd a04.mp3 a04.ogg a05.mkd a05.mp3 a05.ogg
.
.
.
c05.mkd c05.mp3 c05.ogg c06.mkd c06.mp3 c06.ogg c07.mkd c07.mp3 c07.ogg
c08.mkd c08.mp3 c08.ogg c09.mkd c09.mp3 c09.ogg c10.mkd c10.mp3 c10.ogg</code></pre>
<p>So, we have made files with the extensions <code>mkd</code>, <code>ogg</code> and <code>mp3</code> and these are shown with an <code>ls</code> command.</p>
<p>If we move into the directory and use the glob pattern Jon did we see just the <code>mp3</code> and <code>ogg</code> files:</p>
<pre><code>$ cd scptest/
$ ls -x -w 80 *(*.mp3|*.ogg)
a00.mp3 a00.ogg a01.mp3 a01.ogg a02.mp3 a02.ogg a03.mp3 a03.ogg a04.mp3
a04.ogg a05.mp3 a05.ogg a06.mp3 a06.ogg a07.mp3 a07.ogg a08.mp3 a08.ogg
.
.
.
c05.mp3 c05.ogg c06.mp3 c06.ogg c07.mp3 c07.ogg c08.mp3 c08.ogg c09.mp3
c09.ogg c10.mp3 c10.ogg</code></pre>
<h2 id="what-works">What Works</h2>
<p>I ran the following command on <code>rpi4</code> to copy selected files from the <code>scptest</code> directory to another Raspberry Pi called <code>rpi5</code> where I have created a directory called <code>test</code> for the purpose. I have copied my ssh key to that machine already so no password is prompted for.</p>
<pre><code>$ scp *(*.mp3|*.ogg) dave@rpi5:test/
a00.mp3 100% 0 0.0KB/s 00:00
a00.ogg 100% 0 0.0KB/s 00:00
a01.mp3 100% 0 0.0KB/s 00:00
a01.ogg 100% 0 0.0KB/s 00:00
.
.
.
c10.mp3 100% 0 0.0KB/s 00:00
c10.ogg 100% 0 0.0KB/s 00:00</code></pre>
<p>All of the requested (empty) files were copied.</p>
<h2 id="what-fails">What Fails</h2>
<p>If I try the equivalent from the other host, pulling the files from <code>rpi4</code> to <code>rpi5</code>, I dont get what I might expect:</p>
<pre><code>$ scp dave@rpi4:scptest/*(*.mp3|*.ogg) .
bash: -c: line 0: syntax error near unexpected token `(&#39;
bash: -c: line 0: `scp -f scptest/*(*.mp3|*.ogg)&#39;</code></pre>
<p>Running the command again with the <code>-v</code> option we can see that the line <code>scp -f scptest/*(*.mp3,*.ogg)</code> is being executed on <code>rpi4</code> and this is causing the error. The conclusion is that <code>scp</code> itself is doing something thats not compatible with this expression.</p>
<p>My later investigations revealed that <code>extglob</code> is apparently off when this command is being executed, but more of this anon.</p>
<h2 id="alternatives">Alternatives</h2>
<h3 id="first-try---attempting-to-use-extended-globs">First try - attempting to use extended globs</h3>
<p>I found an article about this issue on <a href="https://unix.stackexchange.com/questions/103058/exclude-characters-for-scp-filepattern" title="StackExchange question about scp">StackExchange</a> with a very comprehensive (if impenetrable) answer.</p>
<p>The answer points out that <code>scp</code> simply hands the filename (or expression) to the remote machine where its interpreted by the local shell. This could be any shell.</p>
<p>The answer suggests that the remote filename could be a command for the remote system, but that doesnt seem to be the case in my very simple test:</p>
<pre><code>$ scp dave@rpi4:&#39;ls&#39; .
scp: ls: No such file or directory</code></pre>
<p>This is probably too naive to work as it is however.</p>
<p>It is suggested that the following command will work though. Note that the command contains a newline inside the string passed to <code>scp' before the word '</code>bash`. This is necessary for the command to work:</p>
<pre><code>$ LC_SCPFILES=&#39;scptest/*(*.mp3|*.ogg)&#39; scp -o SendEnv=LC_SCPFILES &quot;dave@rpi4:&lt;/dev/null
bash -O extglob -c &#39;exec scp -f -- \$LC_SCPFILES&#39;;exit&quot; .
a00.mp3 100% 0 0.0KB/s 00:00
a00.ogg 100% 0 0.0KB/s 00:00
a01.mp3 100% 0 0.0KB/s 00:00
a01.ogg 100% 0 0.0KB/s 00:00
.
.
.</code></pre>
<!-- \* -->
<p>This does work, though understanding why is a challenge.</p>
<p>A more manageable solution is the following function based on the same idea:</p>
<pre><code>safer_scp() (
file=$1; shift
export LC_SCPFILES=&quot;${file#*:}&quot;
exec scp -o SendEnv=LC_SCPFILES &quot;${file%%:*}:&lt;/dev/null
bash -O extglob -c &#39;exec scp -f -- \$LC_SCPFILES&#39;;exit&quot; &quot;$@&quot;
)</code></pre>
<blockquote>
<hr>
</blockquote>
<p><u>You might want to skip this part since it gets into deep deep Bash and <code>scp</code> magic!</u></p>
<p>This all hinges on the fact that in this case <code>scp</code> works by doing the following:</p>
<ol type="1">
<li><p>It connects to the remote machine using the remote username and host name. It does this using <code>ssh</code>, creating a “tunnel” between the two and running a shell at the remote end.</p></li>
<li><p>Over the tunnel it issues a command to be run on the remote machine which consists of <code>scp -f FILENAME</code>. The <code>-f</code> option runs <code>scp</code> in “remote” mode. This option is undocumented but can be seen in the source code.</p></li>
<li><p>The remote end copies the file (or files) back to the local end. It interprets the filename or glob expression using the shell opened on the remote machine.</p></li>
</ol>
<p>The <code>safer_scp</code> function takes advantage of these features. Note that the body of a function can be any compound command. A series of commands enclosed in parentheses is such a compound command, BUT it executes in a sub-shell where the more usual compound command in braces does not. I am not 100% clear why it is written this way but experimentation has shown that without a body in parentheses running the function will disconnect from the remote machine!</p>
<p>In the function the variable <code>file</code> is set to the first argument. This is then removed from the function argument list with <code>shift</code>.</p>
<p>The variable <code>LC_SCPFILES</code> is defined, being set to the piece of the contents of the <code>file</code> variable following the colon.</p>
<p>The <code>exec</code> command runs the rest of the function as a command which replaces the currently executing shell. The command invoked is an <code>scp</code> command which passes the environment variable <code>LC_SCPFILES</code> to the remote end (using the <code>-o</code> option with <code>SendEnv=LC_SCPFILES</code>).</p>
<p>The arguments to <code>scp</code> are two strings. The first is:</p>
<pre><code>&quot;${file%%:*}:&lt;/dev/null
bash -O extglob -c &#39;exec scp -f -- \$LC_SCPFILES&#39;;exit&quot;</code></pre>
<p>The second argument consists of the remaining arguments to <code>safer_scp</code> (<code>&quot;$@&quot;</code>).</p>
<p>The first argument expands variable <code>file</code>, returning the first part (by removing the colon and everything after it). It then adds a colon and takes input from <code>/dev/null</code>. This is then followed by a newline.</p>
<p>The rest of the string invokes Bash, setting the <code>extglob</code> option with the <code>-O</code> option and reading the following string as a command as specified by the <code>-c</code> option. The command is a further <code>exec</code> which runs <code>scp</code>.</p>
<p>This instance of <code>scp</code> uses the undocumented option <code>-f</code> (as mentioned earlier). This tells <code>scp</code> that it is running as the remote instance.</p>
<p>The <code>--</code> (double hyphen) is a convention to tell a program that the options have ended. This protects the following filename (in variable <code>LC_SCPFILES</code>) from possibly being interpreted as options.</p>
<p>So, going back to the entire string being handed to the first <code>scp</code>, this does the following:</p>
<ul>
<li>It receives the username and host string (as in <code>dave@rpi4</code>) with a colon at the end. The rest of the remote file specification is <code>/dev/null/</code> and when this is processed the usual remote <code>scp</code> exits.</li>
<li>The part after the newline is then executed. It runs Bash with <code>extglob</code> on and invokes another <code>scp</code> which simulates the one which is normally run - but now guaranteed to be in a Bash shell and with <code>extglob</code> on. This then sends the file or files back to the local end after expanding the expanded glob pattern in variable <code>LC_SCPFILES</code>.</li>
<li>The <code>exit</code> after the Bash process ensures the process invoked at the remote end shuts down.</li>
</ul>
<p>This complex set of events compensates for deficiencies of <code>scp</code> and allows expanded glob patterns to be passed through. However, its still error-prone, as will be seen later.</p>
<p>The function does actually work, but its <strong>so</strong> obscure and reliant on what seem like edge conditions or hidden features I dont think it should be used.</p>
<blockquote>
<hr>
</blockquote>
<h3 id="second-try---just-use-simpler-globs">Second try - just use simpler globs</h3>
<p>If the requirement is to use an extended glob expression in the solution then this one will not suit. However, if the goal is to copy files, then it will!</p>
<pre><code>$ scp dave@rpi4:scptest/*.{mp3,ogg} .
a00.mp3 100% 0 0.0KB/s 00:00
a01.mp3 100% 0 0.0KB/s 00:00
a02.mp3 100% 0 0.0KB/s 00:00
a03.mp3 100% 0 0.0KB/s 00:00
.
.
.</code></pre>
<p>This does the job. The expression passed to the remote end is s simple glob pattern (with a brace expansion) and this does not rely on <code>extglob</code> being on at the remote end. It may not work if the glob uses Bash-specific patterns and the remote account uses a shell other than Bash though.</p>
<h3 id="third-try---use-rsync-with-a-filter">Third try - use <code>rsync</code> with a filter</h3>
<p>I have never encountered this issue with <code>scp</code> myself when moving files around between servers. I do a lot of file moving both for myself and as an HPR “<em>janitor</em>”. The reason I havent seen it is because I usually use <code>rsync</code>.</p>
<p>There is a way of using <code>rsync</code> to achieve what was wanted here, though it does not use extended glob patterns.</p>
<p>The <code>rsync</code> command can be told to copy files from a directory, including those that match a pattern and to exclude the rest. This is done with filters.</p>
<p>The <code>rsync</code> command is very powerful and hard to master. In fact there is scope for a whole HPR series on its intricacies. However, well just restrict ourselves to the use of filters here to solve this problem.</p>
<p>Heres what I do:</p>
<ol type="1">
<li>Make a filter stored in a file</li>
<li>Run <code>rsync</code> with the filter</li>
</ol>
<h4 id="making-a-filter-file">Making a filter file</h4>
<p>I created a file called <code>.rsync_test</code>:</p>
<pre><code>$ cat .rsync_test
+ *.mp3
+ *.ogg
- *</code></pre>
<p>Lines beginning with + are rules for inclusion. Those beginning with - are exclusions. The order is significant.</p>
<p>These rules tell <code>rsync</code> to include all files ending <code>.mp3</code> and <code>.ogg</code>. Anything else is to be excluded.</p>
<h4 id="running-rsync-with-the-filter">Running rsync with the filter</h4>
<p>The command would be:</p>
<pre><code>$ rsync -vaP -e ssh --filter=&quot;. .rsync_test&quot; dave@rpi4:scptest/ test/
receiving incremental file list
./
a00.mp3
0 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=65/67)
a00.ogg
0 100% 0.00kB/s 0:00:00 (xfr#2, to-chk=64/67)
a01.mp3
0 100% 0.00kB/s 0:00:00 (xfr#3, to-chk=63/67)
a01.ogg
0 100% 0.00kB/s 0:00:00 (xfr#4, to-chk=62/67)
a02.mp3
0 100% 0.00kB/s 0:00:00 (xfr#5, to-chk=61/67)
a02.ogg
0 100% 0.00kB/s 0:00:00 (xfr#6, to-chk=60/67)
.
.
.
c10.mp3
0 100% 0.00kB/s 0:00:00 (xfr#65, to-chk=1/67)
c10.ogg
0 100% 0.00kB/s 0:00:00 (xfr#66, to-chk=0/67)
sent 1,310 bytes received 3,809 bytes 10,238.00 bytes/sec
total size is 0 speedup is 0.00</code></pre>
<p>The options are:</p>
<pre><code>-vaP select verbose mode (v), archive mode (a, shorthand for many
other options) and show progress (P)
-e ssh use ssh to transfer files
--filter=&quot;. .rsync_test&quot; use a filter</code></pre>
<p>The filter expression is <code>. .rsync_test</code> where the leading . is short for merge and tells <code>rsync</code> to read filter rules from the file.</p>
<p>The arguments are:</p>
<pre><code>dave@rpi4:scptest/ the remote host and directory to copy from
test/ the local directory to copy to</code></pre>
<p>It is a good idea to use the <code>-n</code> option when setting up such a command, to check that everything works as it should, before running it for real. This option turns on dry-run mode where the process is run without actually copying anything.</p>
<p>You dont have to use the filter file. The following command does the same:</p>
<pre><code>$ rsync -vaP -e ssh -f &quot;+ *.mp3&quot; -f &quot;+ *.ogg&quot; -f &quot;- *&quot; dave@rpi4:scptest/ test/</code></pre>
<p>Here <code>-f</code> is the short form of <code>--filter</code>.</p>
<p>I prefer the filter file myself.</p>
<h4 id="caution">Caution</h4>
<p>The <code>rsync</code> tool is a beast and needs careful treatment! Things to be aware of if you want to go further than this simple guide:</p>
<ul>
<li><code>rsync</code> will traverse a directory hierarchy (its recursive)</li>
<li>the presence of a trailing slash on the <em>source</em> directory makes it transfer the <u>contents</u> of the directory. Without it the directory itself and its contents will be copied</li>
<li><code>rsync</code> compares source and destination files. If a file already exists at the destination it will not copy it. However, if the source copy is different from the destination copy <code>rsync</code> will transfer differences</li>
</ul>
<h2 id="another-digression">Another digression</h2>
<p>Since I am already well off the rails with this episode I thought Id go looking at another area commented on by <a href="http://hackerpublicradio.org/correspondents.php?hostid=311" title="HPR Host clacke">clacke</a> in the context of <a href="http://hackerpublicradio.org/eps/hpr2293" title="More supplementary Bash tips">show 2293</a>.</p>
<p>You are probably aware that file names containing spaces (and other unusual characters) can be difficult to use with commands and programs in Unix and Linux. The question was how <code>scp</code> would behave. I thought Id do some experimentation with filenames containing spaces.</p>
<blockquote>
<hr>
</blockquote>
<p><u>You might want to skip this part since it gets into more of the guts of <code>scp</code></u></p>
<p>I created a file on <code>rpi4</code> called “<code>what a horrible filename.txt</code>” and tried to pull it across to <code>rpi5</code>. In each case I used the <code>-v</code> option to <code>scp</code> in order to see all the details of what was going on. Be warned that this generates a lot of output.</p>
<ol type="1">
<li><p><code>scp -v dave@rpi4:'scptest/what a horrible filename.txt' test/</code><br />
This normally is one way filenames with spaces can be dealt with but it fails here because the quotes are removed in the transfer.</p></li>
<li><p><code>scp -v dave@rpi4:'scptest/what\ a\ horrible\ filename.txt' test/</code><br />
Another way of protecting spaces is to escape each of them with a backslash. This time I have used these inside the string. This works. The quotes are removed but the backslashes remain to protect the spaces.</p></li>
<li><p><code>scp -v dave@rpi4:&quot;scptest/what\ a\ horrible\ filename.txt&quot; test/</code><br />
Double quotes are equivalent to single ones in this context, so this works in the same way as example 2.</p></li>
<li><p><code>scp -v dave@rpi4:scptest/what\ a\ horrible\ filename.txt test/</code><br />
This is normally another way that spaces can be protected, but this one fails because the backslashes are removed in the first pass. It is logically equivalent to example 1.</p></li>
<li><p><code>scp -v dave@rpi4:scptest/what\\ a\\ horrible\\ filename.txt test/</code><br />
Since the <code>scp</code> process removes quotes and backslashes first time round, well try doubling them. This does not work because the remote end gets the filename with literal backslashes and rejects it.</p></li>
<li><p><code>scp -v dave@rpi4:scptest/what\\\ a\\\ horrible\\\ filename.txt test/</code><br />
Since the last test failed well try trebling the backslashes. This works - rather counter-intuitively I find.</p></li>
<li><p><code>scp -v dave@rpi4:'&quot;scptest/what a horrible filename.txt&quot;' test/</code><br />
Enclosing one sort of quotes in another should work, and indeed it does. Nested quotes are another solution. However, they must be different types of quotes - single inside double or vice versa.</p></li>
</ol>
<p>You might wonder how the <code>safer_scp</code> function we saw earlier deals with such filenames. I could not get it to transfer the file using any of these formats.</p>
<p>However, by modifying it slightly (removing the backslash in front of <code>$LC_SCPFILES</code>) it worked:</p>
<pre><code>$ safer_scp() (
&gt; file=$1; shift
&gt; export LC_SCPFILES=&quot;${file#*:}&quot;
&gt; exec scp -o SendEnv=LC_SCPFILES &quot;${file%%:*}:&lt;/dev/null
&gt; bash -O extglob -c &#39;exec scp -f -- $LC_SCPFILES&#39;;exit&quot; &quot;$@&quot;
&gt; )
$ safer_scp dave@rpi4:&#39;scptest/what\ a\ horrible\ filename.txt&#39; test/
what a horrible filename.txt 100% 0 0.0KB/s 00:00</code></pre>
<p>I wasnt clear what the backslash was for anyway!</p>
<p>This modified function passed all of the tests of plain filenames and glob patterns which I tried. I am still not sure that Id use it myself though.</p>
<blockquote>
<hr>
</blockquote>
<h2 id="conclusion">Conclusion</h2>
<p>The <code>scp</code> command is built on the original BSD Unix command <code>rcp</code>. I dont know if this is why it has the quirks we have looked at here, but it does seem to suffer some deficiencies. However, I find it useful and usable most of the time.</p>
<p>Using <code>rsync</code> solves a number of the problems <code>scp</code> shows, though it has its own shortcomings. I think a good working knowledge of <code>scp</code> and <code>rsync</code> is important in a Sysadmins toolkit and can be of great use to all Unix/Linux users.</p>
<h2 id="links">Links</h2>
<ul>
<li>HPR Show 2293: <a href="http://hackerpublicradio.org/eps/hpr2293">More supplementary Bash tips</a></li>
<li>StackExchange question: <a href="https://unix.stackexchange.com/questions/103058/exclude-characters-for-scp-filepattern">Exclude characters for SCP-filepattern</a></li>
</ul>
<!--
vim: syntax=markdown:ts=8:sw=4:ai:et:tw=78:fo=tcqn:fdm=marker
-->
</article>
</main>
</div>
</body>
</html>