Table of Contents
The Problem
Following on from my last show on filename expansion, concentrating on extended patterns and the extglob option, I was asked a question by Jon Kulp in the comment section.
Jon was using ‘ls *(*.mp3|*.ogg)’ to find all OGG and MP3 files in a directory which also held other files. However, when he wanted to copy this subset of files elsewhere he had problems using this expression in an scp command.
Having done some investigations to help solve this I thought I’d put what I found into an HPR episode and share it, and this is the show.
Test Environment
On one of my Raspberry Pis (rpi4) I made some empty test files for the purposes of this show:
$ mkdir scptest
$ touch scptest/{a..c}{00..10}.{mkd,mp3,ogg}
$ ls -x -w 80 scptest/
a00.mkd a00.mp3 a00.ogg a01.mkd a01.mp3 a01.ogg a02.mkd a02.mp3 a02.ogg
a03.mkd a03.mp3 a03.ogg a04.mkd a04.mp3 a04.ogg a05.mkd a05.mp3 a05.ogg
.
.
.
c05.mkd c05.mp3 c05.ogg c06.mkd c06.mp3 c06.ogg c07.mkd c07.mp3 c07.ogg
c08.mkd c08.mp3 c08.ogg c09.mkd c09.mp3 c09.ogg c10.mkd c10.mp3 c10.ogg
So, we have made files with the extensions mkd, ogg and mp3 and these are shown with an ls command.
If we move into the directory and use the glob pattern Jon did we see just the mp3 and ogg files:
$ cd scptest/
$ ls -x -w 80 *(*.mp3|*.ogg)
a00.mp3 a00.ogg a01.mp3 a01.ogg a02.mp3 a02.ogg a03.mp3 a03.ogg a04.mp3
a04.ogg a05.mp3 a05.ogg a06.mp3 a06.ogg a07.mp3 a07.ogg a08.mp3 a08.ogg
.
.
.
c05.mp3 c05.ogg c06.mp3 c06.ogg c07.mp3 c07.ogg c08.mp3 c08.ogg c09.mp3
c09.ogg c10.mp3 c10.ogg
What Works
I ran the following command on rpi4 to copy selected files from the scptest directory to another Raspberry Pi called rpi5 where I have created a directory called test for the purpose. I have copied my ssh key to that machine already so no password is prompted for.
$ scp *(*.mp3|*.ogg) dave@rpi5:test/
a00.mp3 100% 0 0.0KB/s 00:00
a00.ogg 100% 0 0.0KB/s 00:00
a01.mp3 100% 0 0.0KB/s 00:00
a01.ogg 100% 0 0.0KB/s 00:00
.
.
.
c10.mp3 100% 0 0.0KB/s 00:00
c10.ogg 100% 0 0.0KB/s 00:00
All of the requested (empty) files were copied.
What Fails
If I try the equivalent from the other host, pulling the files from rpi4 to rpi5, I don’t get what I might expect:
$ scp dave@rpi4:scptest/*(*.mp3|*.ogg) .
bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: `scp -f scptest/*(*.mp3|*.ogg)'
Running the command again with the -v option we can see that the line ‘scp -f scptest/*(*.mp3,*.ogg)’ is being executed on rpi4 and this is causing the error. The conclusion is that scp itself is doing something that’s not compatible with this expression.
My later investigations revealed that extglob is apparently off when this command is being executed, but more of this anon.
Alternatives
First try - attempting to use extended globs
I found an article about this issue on StackExchange with a very comprehensive (if impenetrable) answer.
The answer points out that scp simply hands the filename (or expression) to the remote machine where it’s interpreted by the local shell. This could be any shell.
The answer suggests that the remote filename could be a command for the remote system, but that doesn’t seem to be the case in my very simple test:
$ scp dave@rpi4:'ls' .
scp: ls: No such file or directory
This is probably too naive to work as it is however.
It is suggested that the following command will work though. Note that the command contains a newline inside the string passed to ‘scp' before the word 'bash`’. This is necessary for the command to work:
$ LC_SCPFILES='scptest/*(*.mp3|*.ogg)' scp -o SendEnv=LC_SCPFILES "dave@rpi4:</dev/null
bash -O extglob -c 'exec scp -f -- \$LC_SCPFILES';exit" .
a00.mp3 100% 0 0.0KB/s 00:00
a00.ogg 100% 0 0.0KB/s 00:00
a01.mp3 100% 0 0.0KB/s 00:00
a01.ogg 100% 0 0.0KB/s 00:00
.
.
.
This does work, though understanding why is a challenge.
A more manageable solution is the following function based on the same idea:
safer_scp() (
file=$1; shift
export LC_SCPFILES="${file#*:}"
exec scp -o SendEnv=LC_SCPFILES "${file%%:*}:</dev/null
bash -O extglob -c 'exec scp -f -- \$LC_SCPFILES';exit" "$@"
)
You might want to skip this part since it gets into deep deep Bash and scp magic!
This all hinges on the fact that in this case scp works by doing the following:
It connects to the remote machine using the remote username and host name. It does this using
ssh, creating a “tunnel” between the two and running a shell at the remote end.Over the tunnel it issues a command to be run on the remote machine which consists of
scp -f FILENAME. The-foption runsscpin “remote” mode. This option is undocumented but can be seen in the source code.The remote end copies the file (or files) back to the local end. It interprets the filename or glob expression using the shell opened on the remote machine.
The safer_scp function takes advantage of these features. Note that the body of a function can be any compound command. A series of commands enclosed in parentheses is such a compound command, BUT it executes in a sub-shell where the more usual compound command in braces does not. I am not 100% clear why it is written this way but experimentation has shown that without a body in parentheses running the function will disconnect from the remote machine!
In the function the variable ‘file’ is set to the first argument. This is then removed from the function argument list with ‘shift’.
The variable ‘LC_SCPFILES’ is defined, being set to the piece of the contents of the ‘file’ variable following the colon.
The ‘exec’ command runs the rest of the function as a command which replaces the currently executing shell. The command invoked is an ‘scp’ command which passes the environment variable ‘LC_SCPFILES’ to the remote end (using the -o option with ‘SendEnv=LC_SCPFILES’).
The arguments to ‘scp’ are two strings. The first is:
"${file%%:*}:</dev/null
bash -O extglob -c 'exec scp -f -- \$LC_SCPFILES';exit"
The second argument consists of the remaining arguments to safer_scp ("$@").
The first argument expands variable ‘file’, returning the first part (by removing the colon and everything after it). It then adds a colon and takes input from ‘/dev/null’. This is then followed by a newline.
The rest of the string invokes Bash, setting the ‘extglob’ option with the -O option and reading the following string as a command as specified by the -c option. The command is a further ‘exec’ which runs ‘scp’.
This instance of scp uses the undocumented option -f (as mentioned earlier). This tells scp that it is running as the remote instance.
The -- (double hyphen) is a convention to tell a program that the options have ended. This protects the following filename (in variable LC_SCPFILES) from possibly being interpreted as options.
So, going back to the entire string being handed to the first scp, this does the following:
- It receives the username and host string (as in
dave@rpi4) with a colon at the end. The rest of the remote file specification is/dev/null/and when this is processed the usual remotescpexits. - The part after the newline is then executed. It runs Bash with
extglobon and invokes anotherscpwhich simulates the one which is normally run - but now guaranteed to be in a Bash shell and withextglobon. This then sends the file or files back to the local end after expanding the expanded glob pattern in variableLC_SCPFILES. - The
exitafter the Bash process ensures the process invoked at the remote end shuts down.
This complex set of events compensates for deficiencies of scp and allows expanded glob patterns to be passed through. However, it’s still error-prone, as will be seen later.
The function does actually work, but it’s so obscure and reliant on what seem like edge conditions or hidden features I don’t think it should be used.
Second try - just use simpler globs
If the requirement is to use an extended glob expression in the solution then this one will not suit. However, if the goal is to copy files, then it will!
$ scp dave@rpi4:scptest/*.{mp3,ogg} .
a00.mp3 100% 0 0.0KB/s 00:00
a01.mp3 100% 0 0.0KB/s 00:00
a02.mp3 100% 0 0.0KB/s 00:00
a03.mp3 100% 0 0.0KB/s 00:00
.
.
.
This does the job. The expression passed to the remote end is s simple glob pattern (with a brace expansion) and this does not rely on extglob being on at the remote end. It may not work if the glob uses Bash-specific patterns and the remote account uses a shell other than Bash though.
Third try - use ‘rsync’ with a filter
I have never encountered this issue with ‘scp’ myself when moving files around between servers. I do a lot of file moving both for myself and as an HPR “janitor”. The reason I haven’t seen it is because I usually use ‘rsync’.
There is a way of using rsync to achieve what was wanted here, though it does not use extended glob patterns.
The ‘rsync’ command can be told to copy files from a directory, including those that match a pattern and to exclude the rest. This is done with filters.
The ‘rsync’ command is very powerful and hard to master. In fact there is scope for a whole HPR series on its intricacies. However, we’ll just restrict ourselves to the use of filters here to solve this problem.
Here’s what I do:
- Make a filter stored in a file
- Run ‘
rsync’ with the filter
Making a filter file
I created a file called ‘.rsync_test’:
$ cat .rsync_test
+ *.mp3
+ *.ogg
- *
Lines beginning with ‘+’ are rules for inclusion. Those beginning with ‘-’ are exclusions. The order is significant.
These rules tell ‘rsync’ to include all files ending ‘.mp3’ and ‘.ogg’. Anything else is to be excluded.
Running rsync with the filter
The command would be:
$ rsync -vaP -e ssh --filter=". .rsync_test" dave@rpi4:scptest/ test/
receiving incremental file list
./
a00.mp3
0 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=65/67)
a00.ogg
0 100% 0.00kB/s 0:00:00 (xfr#2, to-chk=64/67)
a01.mp3
0 100% 0.00kB/s 0:00:00 (xfr#3, to-chk=63/67)
a01.ogg
0 100% 0.00kB/s 0:00:00 (xfr#4, to-chk=62/67)
a02.mp3
0 100% 0.00kB/s 0:00:00 (xfr#5, to-chk=61/67)
a02.ogg
0 100% 0.00kB/s 0:00:00 (xfr#6, to-chk=60/67)
.
.
.
c10.mp3
0 100% 0.00kB/s 0:00:00 (xfr#65, to-chk=1/67)
c10.ogg
0 100% 0.00kB/s 0:00:00 (xfr#66, to-chk=0/67)
sent 1,310 bytes received 3,809 bytes 10,238.00 bytes/sec
total size is 0 speedup is 0.00
The options are:
-vaP select verbose mode (v), archive mode (a, shorthand for many
other options) and show progress (P)
-e ssh use ssh to transfer files
--filter=". .rsync_test" use a filter
The filter expression is ‘. .rsync_test’ where the leading ‘.’ is short for ‘merge’ and tells rsync to read filter rules from the file.
The arguments are:
dave@rpi4:scptest/ the remote host and directory to copy from
test/ the local directory to copy to
It is a good idea to use the ‘-n’ option when setting up such a command, to check that everything works as it should, before running it for real. This option turns on ‘dry-run’ mode where the process is run without actually copying anything.
You don’t have to use the filter file. The following command does the same:
$ rsync -vaP -e ssh -f "+ *.mp3" -f "+ *.ogg" -f "- *" dave@rpi4:scptest/ test/
Here ‘-f’ is the short form of ‘--filter’.
I prefer the filter file myself.
Caution
The ‘rsync’ tool is a beast and needs careful treatment! Things to be aware of if you want to go further than this simple guide:
- ‘
rsync’ will traverse a directory hierarchy (it’s recursive) - the presence of a trailing slash on the source directory makes it transfer the contents of the directory. Without it the directory itself and its contents will be copied
- ‘
rsync’ compares source and destination files. If a file already exists at the destination it will not copy it. However, if the source copy is different from the destination copy ‘rsync’ will transfer differences
Another digression
Since I am already well off the rails with this episode I thought I’d go looking at another area commented on by clacke in the context of show 2293.
You are probably aware that file names containing spaces (and other unusual characters) can be difficult to use with commands and programs in Unix and Linux. The question was how scp would behave. I thought I’d do some experimentation with filenames containing spaces.
You might want to skip this part since it gets into more of the guts of scp
I created a file on rpi4 called “what a horrible filename.txt” and tried to pull it across to rpi5. In each case I used the -v option to scp in order to see all the details of what was going on. Be warned that this generates a lot of output.
scp -v dave@rpi4:'scptest/what a horrible filename.txt' test/
This normally is one way filenames with spaces can be dealt with but it fails here because the quotes are removed in the transfer.scp -v dave@rpi4:'scptest/what\ a\ horrible\ filename.txt' test/
Another way of protecting spaces is to escape each of them with a backslash. This time I have used these inside the string. This works. The quotes are removed but the backslashes remain to protect the spaces.scp -v dave@rpi4:"scptest/what\ a\ horrible\ filename.txt" test/
Double quotes are equivalent to single ones in this context, so this works in the same way as example 2.scp -v dave@rpi4:scptest/what\ a\ horrible\ filename.txt test/
This is normally another way that spaces can be protected, but this one fails because the backslashes are removed in the first pass. It is logically equivalent to example 1.scp -v dave@rpi4:scptest/what\\ a\\ horrible\\ filename.txt test/
Since thescpprocess removes quotes and backslashes first time round, we’ll try doubling them. This does not work because the remote end gets the filename with literal backslashes and rejects it.scp -v dave@rpi4:scptest/what\\\ a\\\ horrible\\\ filename.txt test/
Since the last test failed we’ll try trebling the backslashes. This works - rather counter-intuitively I find.scp -v dave@rpi4:'"scptest/what a horrible filename.txt"' test/
Enclosing one sort of quotes in another should work, and indeed it does. Nested quotes are another solution. However, they must be different types of quotes - single inside double or vice versa.
You might wonder how the safer_scp function we saw earlier deals with such filenames. I could not get it to transfer the file using any of these formats.
However, by modifying it slightly (removing the backslash in front of $LC_SCPFILES) it worked:
$ safer_scp() (
> file=$1; shift
> export LC_SCPFILES="${file#*:}"
> exec scp -o SendEnv=LC_SCPFILES "${file%%:*}:</dev/null
> bash -O extglob -c 'exec scp -f -- $LC_SCPFILES';exit" "$@"
> )
$ safer_scp dave@rpi4:'scptest/what\ a\ horrible\ filename.txt' test/
what a horrible filename.txt 100% 0 0.0KB/s 00:00
I wasn’t clear what the backslash was for anyway!
This modified function passed all of the tests of plain filenames and glob patterns which I tried. I am still not sure that I’d use it myself though.
Conclusion
The scp command is built on the original BSD Unix command rcp. I don’t know if this is why it has the quirks we have looked at here, but it does seem to suffer some deficiencies. However, I find it useful and usable most of the time.
Using rsync solves a number of the problems scp shows, though it has its own shortcomings. I think a good working knowledge of scp and rsync is important in a Sysadmin’s toolkit and can be of great use to all Unix/Linux users.
Links
- HPR Show 2293: More supplementary Bash tips
- StackExchange question: Exclude characters for SCP-filepattern