477 lines
12 KiB
Plaintext
477 lines
12 KiB
Plaintext
|
|
Gnu Awk - Part 14 (HPR Show 2816)
|
|||
|
|
|
|||
|
|
Redirection of input and output - part 1
|
|||
|
|
Dave Morriss
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Gnu Awk - Part 14 (HPR Show 2816)
|
|||
|
|
|
|||
|
|
|
|||
|
|
Introduction
|
|||
|
|
|
|||
|
|
This is the fourteenth episode of the “ Learning Awk ” series which is being
|
|||
|
|
produced by b-veezi and myself.
|
|||
|
|
|
|||
|
|
In this episode and the next I want to start looking at redirection within Awk
|
|||
|
|
programs. I had originally intended to cover the subject in one episode, but there
|
|||
|
|
is just too much.
|
|||
|
|
|
|||
|
|
So, in the first episode I will be starting with output redirection and then in the
|
|||
|
|
next episode will spend some time looking at the get line command used for
|
|||
|
|
explicit input, often with redirection.
|
|||
|
|
|
|||
|
|
Redirection of output
|
|||
|
|
|
|||
|
|
So far we have seen that when an awk script uses print or printf the output is
|
|||
|
|
written to the standard output (the screen in most cases). The redirection feature
|
|||
|
|
in awk allows output to be written elsewhere.
|
|||
|
|
|
|||
|
|
How this is achieved is described in the following sections.
|
|||
|
|
|
|||
|
|
Redirecting to a file
|
|||
|
|
|
|||
|
|
print items > output-file
|
|||
|
|
printf format, items > output-file
|
|||
|
|
|
|||
|
|
Here, 'items' denotes the items to be printed, 'format' is the format expression
|
|||
|
|
for 'printf', ' output-file ' is an expression which is converted to a string and
|
|||
|
|
contains the name of the output file.
|
|||
|
|
|
|||
|
|
Here’s a simple example. It uses the file of fruit data introduced in episode
|
|||
|
|
number 2. This data file is included with this show ( awk!4 fruit data.txt ):
|
|||
|
|
|
|||
|
|
$ awk 'NR > 1 {print $1 > "fruit_names"}' awkl4_fruit_data.txt
|
|||
|
|
$ cat fruit_names
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
apple
|
|||
|
|
|
|||
|
|
banana
|
|||
|
|
|
|||
|
|
strawberry
|
|||
|
|
|
|||
|
|
grape
|
|||
|
|
|
|||
|
|
apple
|
|||
|
|
|
|||
|
|
plum
|
|||
|
|
|
|||
|
|
kiwi
|
|||
|
|
|
|||
|
|
potato
|
|||
|
|
|
|||
|
|
pineapple
|
|||
|
|
|
|||
|
|
Here the script skips the first line of headers, then prints out the fruit name in
|
|||
|
|
field 1 to the file called ' fruit_names'. Notice the file name is enclosed in
|
|||
|
|
quotes because it is a string.
|
|||
|
|
|
|||
|
|
The script will loop once per line of the input file executing the redirection each
|
|||
|
|
time. However the file contains all of the names in the same order as the input
|
|||
|
|
file. This is because of the following behaviour:
|
|||
|
|
|
|||
|
|
• The output file is erased before the first output is written to it.
|
|||
|
|
|
|||
|
|
• Subsequent writes to the same file do not erase it but append to it.
|
|||
|
|
|
|||
|
|
It is important to be aware that redirection in Awk is similar to but not the same
|
|||
|
|
as that in shell scripts.
|
|||
|
|
|
|||
|
|
What we have done here is not really different from running the following
|
|||
|
|
command where the shell deals with redirection:
|
|||
|
|
|
|||
|
|
$ awk 'NR > 1 {print $1}' awkl4_fruit_data.txt > fruit_names
|
|||
|
|
|
|||
|
|
Here Awk is writing to the standard output stream and the shell is capturing this
|
|||
|
|
stream and redirecting it to a file. However, things get more complex if the
|
|||
|
|
requirement is to write to more than one file from a script.
|
|||
|
|
|
|||
|
|
The following downloadable script ( awk!4 exl.awk i writes to a collection of
|
|||
|
|
output files:
|
|||
|
|
|
|||
|
|
$ cat awkl4_exl.awk
|
|||
|
|
#!/usr/bin/awk -f
|
|||
|
|
|
|||
|
|
# Downloadable example 1 for GNU Awk Part 14
|
|||
|
|
|
|||
|
|
NR > 1 {
|
|||
|
|
|
|||
|
|
colour = $2
|
|||
|
|
|
|||
|
|
fname = "awkl4 " colour " fruit"
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
printf "Writing %s to %s\n",$1,fname
|
|||
|
|
print $1 > fname
|
|||
|
|
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
Running the script writes to files called ' awkl4_brown_fruit ' and similar in the
|
|||
|
|
current directory:
|
|||
|
|
|
|||
|
|
$ ./awkl4_exl.awk awkl4_fruit_data.txt
|
|||
|
|
Writing apple to awkl4_red_fruit
|
|||
|
|
Writing banana to awkl4_yellow_fruit
|
|||
|
|
Writing strawberry to awkl4_red_fruit
|
|||
|
|
Writing grape to awkl4_purple_fruit
|
|||
|
|
Writing apple to awkl4_green_fruit
|
|||
|
|
Writing plum to awkl4_purple_fruit
|
|||
|
|
Writing kiwi to awkl4_brown_fruit
|
|||
|
|
Writing potato to awkl4_brown_fruit
|
|||
|
|
Writing pineapple to awkl4_yellow_fruit
|
|||
|
|
|
|||
|
|
The script announces what it’s doing, which is a little superfluous but helps to
|
|||
|
|
visualise what’s going on.
|
|||
|
|
|
|||
|
|
Notice that since the output file names are generated dynamically and are liable
|
|||
|
|
to change between each line read from the input file the script is doing what was
|
|||
|
|
described earlier - creating them (or emptying them if they already exist) and
|
|||
|
|
then appending to them once open. All the files are closed when the script exits
|
|||
|
|
of course.
|
|||
|
|
|
|||
|
|
The files created are shown below and the contents of one displayed:
|
|||
|
|
|
|||
|
|
$ Is awkl4_*_fruit
|
|||
|
|
|
|||
|
|
awkl4_brown_fruit awkl4_green_fruit awkl4_purple_fruit
|
|||
|
|
awkl4_red_fruit awkl4_yellow_fruit
|
|||
|
|
|
|||
|
|
$ cat awkl4_purple_fruit
|
|||
|
|
|
|||
|
|
grape
|
|||
|
|
|
|||
|
|
plum
|
|||
|
|
|
|||
|
|
Redirecting and appending to an existing file
|
|||
|
|
|
|||
|
|
The next type of redirection uses two greater than signs:
|
|||
|
|
|
|||
|
|
print items » output-file
|
|||
|
|
printf format, items » output-file
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
In this case the output file is expected to exist already. If it does then its contents
|
|||
|
|
are not erased but are appended to. If the file does not exist then it is created and
|
|||
|
|
written to as before.
|
|||
|
|
|
|||
|
|
When redirecting to a file in a shell script it’s common to see something like
|
|||
|
|
this:
|
|||
|
|
|
|||
|
|
echo "Script starting" > script.log
|
|||
|
|
echo "Script ending" » script.log
|
|||
|
|
|
|||
|
|
The use of '» 1 in the second case is necessary because otherwise the file would
|
|||
|
|
have been cleared out before the message was written. Each redirection like this
|
|||
|
|
in Bash involves opening and closing the output file.
|
|||
|
|
|
|||
|
|
In an awk script on the other hand - as we have seen - the file is kept open by the
|
|||
|
|
script until it is closed on exit. There is a ' close' command which will do this
|
|||
|
|
explicitly, and we will look at this shortly.
|
|||
|
|
|
|||
|
|
Redirecting to another program
|
|||
|
|
|
|||
|
|
This type of redirection uses a pipe symbol to send output to a string containing
|
|||
|
|
a command (or commands) for the shell.
|
|||
|
|
|
|||
|
|
print items | command
|
|||
|
|
printf format, items | command
|
|||
|
|
|
|||
|
|
The following example shows the fruit names being written to a pair of
|
|||
|
|
commands in a shell pipeline:
|
|||
|
|
|
|||
|
|
$ awk 'NR > 1 {print $1 | "sort -u | nl"}' awkl4_fruit_data.txt
|
|||
|
|
|
|||
|
|
1 apple
|
|||
|
|
|
|||
|
|
2 banana
|
|||
|
|
|
|||
|
|
3 grape
|
|||
|
|
|
|||
|
|
4 kiwi
|
|||
|
|
|
|||
|
|
5 pineapple
|
|||
|
|
|
|||
|
|
6 plum
|
|||
|
|
|
|||
|
|
7 potato
|
|||
|
|
|
|||
|
|
8 strawberry
|
|||
|
|
|
|||
|
|
The names are sorted using the ' sort' command, requesting that the results be
|
|||
|
|
made unique (' -u'). The output from the sort is run through 'nl' which
|
|||
|
|
numbers the lines.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
As the awk script is run, a sub-process is executed with the two commands. The
|
|||
|
|
first name is then sent to this process, and this repeats with each successive
|
|||
|
|
name. The sub-process finishes when the script finishes.
|
|||
|
|
|
|||
|
|
In this case the 'sort' command will have accumulated all the names, then on
|
|||
|
|
the connection being terminated it will perform the sort and pass the results to
|
|||
|
|
' nl'.
|
|||
|
|
|
|||
|
|
There is a 'close' command in awk which will close the redirection to the
|
|||
|
|
command(s) or to a file. The argument to 'close' needs to be the exact
|
|||
|
|
command(s) which define the process (or the exact file name). For this reason
|
|||
|
|
it’s a good idea to store the commands or file name in an awk variable.
|
|||
|
|
|
|||
|
|
The following downloadable script ( awk!4 ex2.awk ) shows the variable ' cmd'
|
|||
|
|
being used to hold the shell commands. The connection is closed to show how it
|
|||
|
|
would be done, though there is no actual need to do so here.
|
|||
|
|
|
|||
|
|
$ cat awkl4_ex2.awk
|
|||
|
|
#!/usr/bin/awk -f
|
|||
|
|
|
|||
|
|
# Downloadable example 2 for GNU Awk Part 14
|
|||
|
|
BEGIN {
|
|||
|
|
|
|||
|
|
cmd = "sort -u | nl"
|
|||
|
|
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
NR > 1 {
|
|||
|
|
|
|||
|
|
print $1 | cmd
|
|||
|
|
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
END {
|
|||
|
|
|
|||
|
|
close(cmd)
|
|||
|
|
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
Running the script gives the same result as before:
|
|||
|
|
|
|||
|
|
$ ./awkl4_ex2.awk awkl4_fruit_data.txt
|
|||
|
|
|
|||
|
|
1 apple
|
|||
|
|
|
|||
|
|
2 banana
|
|||
|
|
|
|||
|
|
3 grape
|
|||
|
|
|
|||
|
|
4 kiwi
|
|||
|
|
|
|||
|
|
5 pineapple
|
|||
|
|
|
|||
|
|
6 plum
|
|||
|
|
|
|||
|
|
7 potato
|
|||
|
|
|
|||
|
|
8 strawberry
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Here’s a more real world example (at least it’s real in my world). When I’m
|
|||
|
|
preparing an HPR show like this which involves a number of example scripts I
|
|||
|
|
need to run them for testing purposes. I have a main directory for HPR shows
|
|||
|
|
and a sub-directory per show. I like to make soft links to the examples in this
|
|||
|
|
sub-directory so I can run tests without hopping about between directories.
|
|||
|
|
|
|||
|
|
In general I make links in this way:
|
|||
|
|
|
|||
|
|
In -s -f PathToExample BasenameOfExample
|
|||
|
|
|
|||
|
|
I wrote an Awk script to help me which takes path names as input and constructs
|
|||
|
|
shell commands which it pipes into ' sh '.
|
|||
|
|
|
|||
|
|
The following downloadable script ( awk!4 ex3.awk l shows the process.
|
|||
|
|
|
|||
|
|
$ cat awkl4_ex3.awk
|
|||
|
|
#!/usr/bin/awk -f
|
|||
|
|
|
|||
|
|
# Downloadable example 3 for GNU Awk Part 14
|
|||
|
|
|
|||
|
|
{
|
|||
|
|
|
|||
|
|
# Split the path up into components
|
|||
|
|
n = split($0,a,"/")
|
|||
|
|
|
|||
|
|
if (n < 2) {
|
|||
|
|
|
|||
|
|
print "Error in path",$0 > "/dev/stderr"
|
|||
|
|
next
|
|||
|
|
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
# Build the shell command so we can show it
|
|||
|
|
|
|||
|
|
cmd = sprintf("[ -e %s ] && In -s -f %s %s",$0,$0,a[n])
|
|||
|
|
print "» " cmd
|
|||
|
|
|
|||
|
|
# Feed the command to the shell
|
|||
|
|
printf("%s\n",cmd) | "sh"
|
|||
|
|
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
END {
|
|||
|
|
|
|||
|
|
close("sh")
|
|||
|
|
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
The script expects to be given one or more pathnames on standard input. It first
|
|||
|
|
takes the path and splits it up based on the ' /' character. Since ' split' returns
|
|||
|
|
the number of elements then that number will index the last element. We check
|
|||
|
|
that it’s sensible before proceeding. Note that the error message generated by the
|
|||
|
|
' if' test is redirected to ' /dev/stderr 1 . We’ll be looking at this shortly.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
We use 'sprintf ' to make the shell command. It first adds a test that the file
|
|||
|
|
path leads to a file, then if so the shell command uses the 'In' command to
|
|||
|
|
make a soft link. We use the ' - f' option which forces the creation to proceed
|
|||
|
|
even if the link already exists. The first argument to 'In' is the path and the
|
|||
|
|
second the basename (last component) of the file path.
|
|||
|
|
|
|||
|
|
This command is printed for reference, then it is executed by printing to a
|
|||
|
|
process running ' sh ' (which will be the Bourne shell or similar by default).
|
|||
|
|
|
|||
|
|
Running the script can be achieved thus. We use 'printf ' as a simple way of
|
|||
|
|
adding a newline to each pathname. The paths come from a filename expansion
|
|||
|
|
which includes a question mark. Running it gives the following results:
|
|||
|
|
|
|||
|
|
|
|||
|
|
$ printf "%s\n" Gnu_Awk_Part_14/hpr2816/awkl4_ex?.awk | ./awkl4_ex3.awk
|
|||
|
|
|
|||
|
|
» [ -e Gnu_Awk_Part_14/hpr2816/awkl4_exl.awk ] && In -s -f
|
|||
|
|
|
|||
|
|
Gnu_Awk_Part_14/hpr2816/awkl4_exl.awk awkl4_exl.awk
|
|||
|
|
|
|||
|
|
» [ -e Gnu_Awk_Part_14/hpr2816/awkl4_ex2.awk ] && In -s -f
|
|||
|
|
|
|||
|
|
Gnu_Awk_Part_14/hpr2816/awkl4_ex2.awk awkl4_ex2.awk
|
|||
|
|
|
|||
|
|
» [ -e Gnu_Awk_Part_14/hpr2816/awkl4_ex3.awk ] && In -s -f
|
|||
|
|
|
|||
|
|
Gnu_Awk_Part_14/hpr2816/awkl4_ex3.awk awk!4_ex3.awk
|
|||
|
|
|
|||
|
|
|
|||
|
|
This is a script which I can use in all sorts of other contexts, though it probably
|
|||
|
|
needs some refinement to be completely foolproof.
|
|||
|
|
|
|||
|
|
Note that some caution is needed when writing shell commands in awk because
|
|||
|
|
of the potential pitfalls when using quotes. See the GNU Awk User’s Guide
|
|||
|
|
section 10.2.9 for hints.
|
|||
|
|
|
|||
|
|
Redirecting to a coprocess
|
|||
|
|
|
|||
|
|
This type of redirection uses a pipe symbol and an ampersand to send output to a
|
|||
|
|
string containing a command (or commands) for the shell.
|
|||
|
|
|
|||
|
|
print items |& command
|
|||
|
|
printf format, items |& command
|
|||
|
|
|
|||
|
|
This is an advanced feature which is a gawk extension. Unlike the previous
|
|||
|
|
redirection, which sends to a program, this form sends to a program and allows
|
|||
|
|
the program’s output to be read back. That is why the command is referred to as
|
|||
|
|
a coprocess.
|
|||
|
|
|
|||
|
|
Since it is necessary to use our next main topic ' get line' to achieve all of this
|
|||
|
|
we’ll postpone discussing the subject until the next episode.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Redirecting to special files
|
|||
|
|
|
|||
|
|
There are three standard Unix channels that are known as standard input,
|
|||
|
|
standard output, and standard error output (or more commonly standard error).
|
|||
|
|
These are connected to keyboard and screen in the default case.
|
|||
|
|
|
|||
|
|
Normally a Unix program or script reads from standard input and writes to
|
|||
|
|
standard output and generates any error messages on standard error. There is a
|
|||
|
|
lot more to this than described here but this will suffice for the moment.
|
|||
|
|
|
|||
|
|
Gnu Awk can use three special file names to access these channels:
|
|||
|
|
|
|||
|
|
• /dev/stdin: standard input
|
|||
|
|
|
|||
|
|
• /dev/stdout: standard output
|
|||
|
|
|
|||
|
|
• /dev/stderr: standard error output
|
|||
|
|
|
|||
|
|
So, for example, a script can write explicitly to standard error with a command
|
|||
|
|
of the form:
|
|||
|
|
|
|||
|
|
print "Invalid number" > "/dev/stderr"
|
|||
|
|
|
|||
|
|
See the GNU Awk User’s Guide section 5.7 on this subject for more details.
|
|||
|
|
There are also other special names available as described in the Guide in section
|
|||
|
|
T8.
|
|||
|
|
|
|||
|
|
Next episode
|
|||
|
|
|
|||
|
|
I will be continuing with the second half of this episode in a few weeks.
|
|||
|
|
|
|||
|
|
Links
|
|||
|
|
|
|||
|
|
• GNU Awk User’s Guide
|
|||
|
|
|
|||
|
|
o Redirecting output of print and printf
|
|||
|
|
° Special Files for Standard Preopened Data Streams
|
|||
|
|
|
|||
|
|
° Special File names in gawk
|
|||
|
|
|
|||
|
|
• Previous shows in this series on HPR:
|
|||
|
|
|
|||
|
|
° “ Gnu Awk - Part 1 ” - episode 2114
|
|||
|
|
° “ Gnu Awk - Part 2 ” - episode 2129
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
o
|
|||
|
|
|
|||
|
|
|
|||
|
|
“ Gnu Awk - Part 3 ” - episode 2143
|
|||
|
|
o “ Gnu Awk - Part 4 ” - episode 2163
|
|||
|
|
° “ Gnu Awk - Part 5 ” - episode 2184
|
|||
|
|
° “ Gnu Awk - Part 6 ” - episode 2238
|
|||
|
|
° “ Gnu Awk - Part 7 ” - episode 2330
|
|||
|
|
° “ Gnu Awk - Part 8 ” - episode 2438
|
|||
|
|
° “ Gnu Awk - Part 9 ” - episode 2476
|
|||
|
|
o “ Gnu Awk - Part 10 ” - episode 2526
|
|||
|
|
° “ Gnu Awk - Part 11 ” - episode 2554
|
|||
|
|
° “ Gnu Awk - Part 12 ” - episode 2610
|
|||
|
|
° “ Gnu Awk - Part 13 ” - episode 2804
|
|||
|
|
Resources:
|
|||
|
|
|
|||
|
|
° ePub version of these notes
|
|||
|
|
|
|||
|
|
° Examples: awk!4 fruit data.txt . awk!4 exl.awk . awk!4 ex2.awk .
|
|||
|
|
awk!4 ex3.awk
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|