Table of Contents
Overview
This is the fourth show about the Bash functions I use, and it may be the last unless I come up with something else that I think might be of general interest.
There is only one function to look at this time, but it’s fairly complex so needs an entire episode devoted to it.
As before it would be interesting to receive feedback on this function and would be great if other Bash users contributed ideas of their own.
The range_parse function
The purpose of this function is to read a string containing a range or ranges of numbers and turn it into the actual numbers intended. For example, a range like 1-3 means the numbers 1, 2 and 3.
I use this a lot. It’s really helpful when writing a script to select from a list. The script can show the list with a number against each item, then ask the script user to select which items they want to be deleted, or moved or whatever.
For example, I manage the podcasts I am listening to this way. I usually have two or three players with playlists on them. When the battery on one needs charging I can pick up another and continue listening to whatever is on there. I have a script that knows which playlists are on which player, and it asks me which episode I am listening to by listing all the playlists. I answer with a range. Another script then asks which of the episodes that I was listening to have finished. It then deletes the episodes I have heard.
Parsing a collection of ranges then is not particularly difficult, even in Bash, though dealing with some of the potential problems complicates matters a bit.
The function range_parse takes three arguments:
- The maximum value allowed in the range (the minimum is fixed at 1)
- The string containing the range expression itself
- The name of the variable to receive the result
An example of using the function might be:
$ source range_parse.sh
$ range_parse 10 '1-4,7,3,7' parsed
$ echo $parsed
1 2 3 4 7
The function has dealt with the repetition of 7 and the fact that the 3 is already in the range 1-4 and has sorted the result as a string that can be placed in an array or used in a for loop.
Algorithm
The method used for processing the range presented to the function is fairly simple:
- The range string is stripped of spaces
- It is checked to ensure that the characters it contains are digits, commas and hyphens. If not then the function ends with an error
- The comma-separated elements are selected one by one
- Elements consisting of groups of digits (i.e. numbers) are stored away for later
- If the element contains a hyphen then it is checked to ensure it consists of two groups of digits separated by the hyphen, and it is split up and the range of numbers between its start and end is determined
- The results of the step-by-step checking of elements is accumulated for the next stage
- The accumulated elements are checked to ensure they are each in range. Any that are not are rejected and an error message produced showing what was rejected.
- Finally all of the acceptable items are sorted and any duplicates removed and returned as a list in a string. If any errors occurred in the analysis of the range the function returns a ‘false’ value to the caller, otherwise ‘true’ is returned. This allows it to be used where a true/false value is expected, such as in an
ifstatement, if desired.
Analysis of function
Here is the function itself, which may be downloaded from the HPR website as range_parse.sh:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | |
Line 11: There are two ways of declaring a function in Bash. The function name may be followed by a pair of parentheses and then the body of the function (usually enclosed in curly braces). Alternatively the word
functionis followed by the function name, optional parentheses and the function body. There is no significant difference between the two methods.Lines 12 and 13: The first two arguments for the function are stored in local variables
max(the maximum permitted number in the range) andrange(the string holding the range expression to parse). In both cases we use the parameter expansion feature which halts the script with an error message if these arguments are not supplied.Line 14: Here
local -nis used for the local variableresultwhich is to hold the name of a variable external to the function which will receive the result of parsing the expression. Using the-noption makes it a nameref; a reference to another variable. The definition in the Bash manual is as follows:
Whenever the nameref variable is referenced, assigned to, unset, or has its attributes modified (other than using or changing the nameref attribute itself), the operation is actually performed on the variable specified by the nameref variable’s value. A nameref is commonly used within shell functions to refer to a variable whose name is passed as an argument to the function.
There is more to talk about with nameref variables, but we will leave that for another time.
Line 16: Some other variables local to the function are declared here, and one (
exitcode) is given an initial value.Line 21: Here all spaces are being removed from the range list in variable
range.Lines 26 to 29: In this test the
rangevariable is being checked against a regular expression consisting only of the digits 0-9, a comma and a hyphen. These are the only characters allowed in the range list. If the match fails an error message is written and the function returns with a ‘false’ value.Lines 36-61: This is the loop which chops up the range list into its component parts. Each time it iterates a comma-separated element is removed from the
rangevariable, which grows shorter, and the test:
will become true when nothing is left.until [[ -z $range ]]- Lines 40-46: This
ifstatement looks to see if therangevariable contains a comma, using a regular expression.- If it does a variable called
itemis filled with the characters ofrangeup to the first comma. Thenrangeis set to its previous contents without the part up to the first comma. - If there was no comma then
itemis set to the entirety ofrangeandrangeis emptied. This is because this must be the last (or only) element.
- If it does a variable called
- Lines 51-59: At this point the element in
itemis either a plain number or a range expression of the form ‘number-number’. This pair of nestedifstatements determine if it is the latter and attempt to expand the range. The outeriftestsitemagainst a regular expression consisting of a hyphen, and if the result is true the innerifis invoked1.- Line 52: compares the contents of
itemagainst a more complex regular expression. This one looks for one or more digits, a hyphen, and one or more digits.- If found then
itemis edited to replace the hyphen by a pair of dots. This is inside braces as the argument to anechostatement. So, given1-5initemtheechowill be given{1..5}, a brace expansion expression. Theechois the command of anevalstatement (needed to actually execute the expansion), and this is inside a command expansion. The result should be thatitemis filled with the numbers from the expansion so1-5becomes ‘1 2 3 4 5’! - If the regular expression does not match then this is not a valid range, so this is reported in the
elsebranch anditemis cleared of its contents. Also, since we want this error reported to the caller we setexitcodeto 1 for later use.
- If found then
- Line 60: Here a variable called
selectionis being used to accumulate the successive contents ofitemon each iteration. We use the+=form of assignment to make it easier to do this accumulation. Notice that a trailing space is added to ensure none of the numbers collide with one another in the string.
- Line 52: compares the contents of
- Lines 40-46: This
- Lines 66-97: This is an
ifstatement which tests to see if the variableselectioncontains anything. If it does then the contents are validated.- Lines 71-77: This is a loop which cycles through the numbers in the variable. It is a feature of this form of the
forloop that it operates on a list of space-separated items, and that’s whatselectioncontains.- Lines 72-76: This
ifstatement checks each number to ensure that it is in range between 1 and the value in the variablemax.- If it is not in range then the number is appended to the variable
err - If it is in range it is appended to the variable
sel
- If it is not in range then the number is appended to the variable
- Lines 72-76: This
- Lines 82-87: This
ifstatement tests to determine whether there is anything in theerrvariable. If it contains anything then there have been one or more errors, so we want to report this. The test used here seems very strange. The reason for it is discussed below in the Explanations section, explanation 1.- Line 83: The variable
msgis filled with the list of errors. This is done with a command substitution expression where aforloop is used to list the numbers inerrusing anechocommand and these are piped to thesortcommand. Thesortcommand makes what it receives unique and sorts the lines numerically. This rather involved pipeline is needed becausesortrequires a series of lines, and these are provided by theecho. This deals with the possible duplication of the errors and the fact that they are not necessarily in any particular order. - Line 84: Because the process of sorting the erroneous numbers and making them unique has added newlines to them all we use this statement to remove them. This is an example of parameter expansion, and in this one the entire string is scanned for a pattern and each one is replaced by a space. There is a problem with replacing newlines in a string however, since there is no simple way to represent them. Here we use
$'\n'to do this. See the Explanations section below (explanation 2) for further details. - Line 85 and 86: The string of erroneous number is printed here and
exitcodeis set to 1 so the function can flag that there has been an error when it exits. It doesn’t exit though since some uses will simply ignore the returned value and carry on regardless.
- Line 83: The variable
- Lines 92-96: At this point we have extracted all the valid numbers and stored them in
seland we want to sort them and make them unique as we did witherrbefore returning the result to the caller. We start by emptying the variableselectionin anticipation.- Line 93: This
ifstatement checks that theselvariable actually contains anything. This test uses the unusual construct${sel+"${sel}"}, which was explained for an earlier test. (See explanation 1 in the Explanations section below). - Line 94 and 95: These rebuild
selectionby extracting the numbers fromsel, sorting them and making them unique, and then removing the newlines this process has added. See the notes for lines 82-87 above and explanation 2 below.
- Line 93: This
- Lines 71-77: This is a loop which cycles through the numbers in the variable. It is a feature of this form of the
Line 102: Here the variable
resultis set to the contents ofselection. Now, sinceresultis a nameref variable containing the name of a variable passed in when therange_parsefunction was called it is that variable that receives the result.Line 104: Here the function returns to the caller. The value returned is whatever is in
exitcode. By default this is zero, but if any sort of error has occurred it will have been set to 1, as discussed earlier.
Explanations
The expression
${err+"${err}"}(see Lines 82-87 above), also${sel+"${sel}"}(see Line 93 above): As far as I can determine this strange expression is needed because of a bug in the version of Bash I am running.
In all of my scripts I include the lineset -o nounset(set +uis equivalent) which has the result of treating the use of unset variables in parameter expansion as a fatal error. The trouble is that eithererrandselmight be unset in this function in some circumstances. This will result in the function stopping with an error. It should be possible to test a variable to see whether it is unset without the function crashing!
This expression is a case of a parameter expansion of the${parameter:+word}type, but without the colon. It returns a null string if the parameter is unset or null or the contents if it has any - and it does so without triggering the unset variable alarm.
I don’t like resorting to “magic” solutions like this but it seems to be a viable way of avoiding this issue.The expression
$'\n'(see Line 84 above): This is an example of ANSI-C quoting. See the GNU Bash Reference Manual in the ANSI-C Quoting section for the full details.
The construct must be written as$'string'which is expanded to whatever characters are in the string with certain backslash sequences being replaced according to the ANSI-C standard. This allows characters such as newline (\n) and carriage return (\r) as well as Unicode characters to be easily inserted. For exampleecho $'\U2192'produces → (in a browser and in many terminals).
Possible improvements
This function has been around the block for quite a few years. I wrote it originally for a script I developed at work in the 2000’s and have been refining and using it in many other projects since. Preparing it for this episode has resulted in some further refinements!
The initial space removal means that
'7,1-5'and'7 , 1 - 5 'are identical as far as the algorithm is concerned. It also means that'4 2', which might have been written that way because a comma was omitted, is treated as'42'which might be a problem.The command substitutions which sort lists of numbers and make them unique have to make use of the
sortcommand. Ideally I’d like to avoid using external programs in my Bash scripts, but trying to do this type of thing in Bash wheresortdoes a fine job seems a little extreme!The reporting of all of the numbers which are out of range could lead to a slightly bizarre error report if called with arguments such as
20 '5-200'(where the second zero was added in error). Everything from 21-200 will be reported as an error! The function could be cleverer in this regard.
Examples of use
Simple command line usage
$ source range_parse.sh
$ range_parse 10 '1-3,9,7' mylist
$ echo "$mylist"
1 2 3 7 9
$ range_parse 10 '9-6,1,11' mylist
Value(s) out of range: 11
$ echo "$mylist"
1 6 7 8 9
$ range_parse 10 1,,2 somevar
$ echo "$somevar"
1 2
The range_parse function does not care what order the numbers and ranges are organised in the comma-separated list. It does not care about range overlaps either, nor does it care about empty items in the list. It flags items which are out of range but still prepares a final list.
A simple demo script
The simple script called range_demo.sh, which may be downloaded from the HPR website is as follows:
#!/bin/bash -
#
# Test script to run the range_parse function
#
set -o nounset # Treat unset variables as an error
#
# Source the function. In a real script you'd want to provide a path and check
# the file is actually there.
#
source range_parse.sh
#
# Call range_parse with the first two arguments provided to this script. Save
# the output in the variable 'parsed'. The function is called in an 'if'
# statement such that it takes different action depending on whether the
# parsing was successful or not.
#
if range_parse "$1" "$2" parsed; then
echo "Success"
echo "Parsed list: ${parsed}"
else
echo "Failure"
fi
exitAn example call might be:
$ ./range_demo.sh 10 1,9-7,2
Success
Parsed list: 1 2 7 8 9
If you download these files and test the function and find any errors please let me know!!
Links
- The GNU Bash Reference Manual
- ANSI-C Quoting section
- Previous HPR episodes in this group Useful Bash functions:
- Download the range_parse function and the range_demo.sh test script.
Why do it this way? I did a double-take while preparing these notes wondering why I had organised the logic here in this way.
The first part of the loop is concerned with getting the next item from a comma-separated list. At that point the contents of$itemis either a bare number or a'number-number'range. The differentiator between the two is a hyphen, so checking for that character allows the complex regular expression on line 52 to be omitted if it is not there.
If you can think of a better way of doing this please let me know in the comments or by email.↩