Files
hpr_website/www/eps/hpr2816/hpr2816_full_shownotes_djvu.txt

477 lines
12 KiB
Plaintext
Executable File
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Gnu Awk - Part 14 (HPR Show 2816)
Redirection of input and output - part 1
Dave Morriss
Gnu Awk - Part 14 (HPR Show 2816)
Introduction
This is the fourteenth episode of the “ Learning Awk ” series which is being
produced by b-veezi and myself.
In this episode and the next I want to start looking at redirection within Awk
programs. I had originally intended to cover the subject in one episode, but there
is just too much.
So, in the first episode I will be starting with output redirection and then in the
next episode will spend some time looking at the get line command used for
explicit input, often with redirection.
Redirection of output
So far we have seen that when an awk script uses print or printf the output is
written to the standard output (the screen in most cases). The redirection feature
in awk allows output to be written elsewhere.
How this is achieved is described in the following sections.
Redirecting to a file
print items > output-file
printf format, items > output-file
Here, 'items' denotes the items to be printed, 'format' is the format expression
for 'printf', ' output-file ' is an expression which is converted to a string and
contains the name of the output file.
Heres a simple example. It uses the file of fruit data introduced in episode
number 2. This data file is included with this show ( awk!4 fruit data.txt ):
$ awk 'NR > 1 {print $1 > "fruit_names"}' awkl4_fruit_data.txt
$ cat fruit_names
apple
banana
strawberry
grape
apple
plum
kiwi
potato
pineapple
Here the script skips the first line of headers, then prints out the fruit name in
field 1 to the file called ' fruit_names'. Notice the file name is enclosed in
quotes because it is a string.
The script will loop once per line of the input file executing the redirection each
time. However the file contains all of the names in the same order as the input
file. This is because of the following behaviour:
• The output file is erased before the first output is written to it.
• Subsequent writes to the same file do not erase it but append to it.
It is important to be aware that redirection in Awk is similar to but not the same
as that in shell scripts.
What we have done here is not really different from running the following
command where the shell deals with redirection:
$ awk 'NR > 1 {print $1}' awkl4_fruit_data.txt > fruit_names
Here Awk is writing to the standard output stream and the shell is capturing this
stream and redirecting it to a file. However, things get more complex if the
requirement is to write to more than one file from a script.
The following downloadable script ( awk!4 exl.awk i writes to a collection of
output files:
$ cat awkl4_exl.awk
#!/usr/bin/awk -f
# Downloadable example 1 for GNU Awk Part 14
NR > 1 {
colour = $2
fname = "awkl4 " colour " fruit"
printf "Writing %s to %s\n",$1,fname
print $1 > fname
}
Running the script writes to files called ' awkl4_brown_fruit ' and similar in the
current directory:
$ ./awkl4_exl.awk awkl4_fruit_data.txt
Writing apple to awkl4_red_fruit
Writing banana to awkl4_yellow_fruit
Writing strawberry to awkl4_red_fruit
Writing grape to awkl4_purple_fruit
Writing apple to awkl4_green_fruit
Writing plum to awkl4_purple_fruit
Writing kiwi to awkl4_brown_fruit
Writing potato to awkl4_brown_fruit
Writing pineapple to awkl4_yellow_fruit
The script announces what its doing, which is a little superfluous but helps to
visualise whats going on.
Notice that since the output file names are generated dynamically and are liable
to change between each line read from the input file the script is doing what was
described earlier - creating them (or emptying them if they already exist) and
then appending to them once open. All the files are closed when the script exits
of course.
The files created are shown below and the contents of one displayed:
$ Is awkl4_*_fruit
awkl4_brown_fruit awkl4_green_fruit awkl4_purple_fruit
awkl4_red_fruit awkl4_yellow_fruit
$ cat awkl4_purple_fruit
grape
plum
Redirecting and appending to an existing file
The next type of redirection uses two greater than signs:
print items » output-file
printf format, items » output-file
In this case the output file is expected to exist already. If it does then its contents
are not erased but are appended to. If the file does not exist then it is created and
written to as before.
When redirecting to a file in a shell script its common to see something like
this:
echo "Script starting" > script.log
echo "Script ending" » script.log
The use of '» 1 in the second case is necessary because otherwise the file would
have been cleared out before the message was written. Each redirection like this
in Bash involves opening and closing the output file.
In an awk script on the other hand - as we have seen - the file is kept open by the
script until it is closed on exit. There is a ' close' command which will do this
explicitly, and we will look at this shortly.
Redirecting to another program
This type of redirection uses a pipe symbol to send output to a string containing
a command (or commands) for the shell.
print items | command
printf format, items | command
The following example shows the fruit names being written to a pair of
commands in a shell pipeline:
$ awk 'NR > 1 {print $1 | "sort -u | nl"}' awkl4_fruit_data.txt
1 apple
2 banana
3 grape
4 kiwi
5 pineapple
6 plum
7 potato
8 strawberry
The names are sorted using the ' sort' command, requesting that the results be
made unique (' -u'). The output from the sort is run through 'nl' which
numbers the lines.
As the awk script is run, a sub-process is executed with the two commands. The
first name is then sent to this process, and this repeats with each successive
name. The sub-process finishes when the script finishes.
In this case the 'sort' command will have accumulated all the names, then on
the connection being terminated it will perform the sort and pass the results to
' nl'.
There is a 'close' command in awk which will close the redirection to the
command(s) or to a file. The argument to 'close' needs to be the exact
command(s) which define the process (or the exact file name). For this reason
its a good idea to store the commands or file name in an awk variable.
The following downloadable script ( awk!4 ex2.awk ) shows the variable ' cmd'
being used to hold the shell commands. The connection is closed to show how it
would be done, though there is no actual need to do so here.
$ cat awkl4_ex2.awk
#!/usr/bin/awk -f
# Downloadable example 2 for GNU Awk Part 14
BEGIN {
cmd = "sort -u | nl"
}
NR > 1 {
print $1 | cmd
}
END {
close(cmd)
}
Running the script gives the same result as before:
$ ./awkl4_ex2.awk awkl4_fruit_data.txt
1 apple
2 banana
3 grape
4 kiwi
5 pineapple
6 plum
7 potato
8 strawberry
Heres a more real world example (at least its real in my world). When Im
preparing an HPR show like this which involves a number of example scripts I
need to run them for testing purposes. I have a main directory for HPR shows
and a sub-directory per show. I like to make soft links to the examples in this
sub-directory so I can run tests without hopping about between directories.
In general I make links in this way:
In -s -f PathToExample BasenameOfExample
I wrote an Awk script to help me which takes path names as input and constructs
shell commands which it pipes into ' sh '.
The following downloadable script ( awk!4 ex3.awk l shows the process.
$ cat awkl4_ex3.awk
#!/usr/bin/awk -f
# Downloadable example 3 for GNU Awk Part 14
{
# Split the path up into components
n = split($0,a,"/")
if (n < 2) {
print "Error in path",$0 > "/dev/stderr"
next
}
# Build the shell command so we can show it
cmd = sprintf("[ -e %s ] && In -s -f %s %s",$0,$0,a[n])
print "» " cmd
# Feed the command to the shell
printf("%s\n",cmd) | "sh"
}
END {
close("sh")
}
The script expects to be given one or more pathnames on standard input. It first
takes the path and splits it up based on the ' /' character. Since ' split' returns
the number of elements then that number will index the last element. We check
that its sensible before proceeding. Note that the error message generated by the
' if' test is redirected to ' /dev/stderr 1 . Well be looking at this shortly.
We use 'sprintf ' to make the shell command. It first adds a test that the file
path leads to a file, then if so the shell command uses the 'In' command to
make a soft link. We use the ' - f' option which forces the creation to proceed
even if the link already exists. The first argument to 'In' is the path and the
second the basename (last component) of the file path.
This command is printed for reference, then it is executed by printing to a
process running ' sh ' (which will be the Bourne shell or similar by default).
Running the script can be achieved thus. We use 'printf ' as a simple way of
adding a newline to each pathname. The paths come from a filename expansion
which includes a question mark. Running it gives the following results:
$ printf "%s\n" Gnu_Awk_Part_14/hpr2816/awkl4_ex?.awk | ./awkl4_ex3.awk
» [ -e Gnu_Awk_Part_14/hpr2816/awkl4_exl.awk ] && In -s -f
Gnu_Awk_Part_14/hpr2816/awkl4_exl.awk awkl4_exl.awk
» [ -e Gnu_Awk_Part_14/hpr2816/awkl4_ex2.awk ] && In -s -f
Gnu_Awk_Part_14/hpr2816/awkl4_ex2.awk awkl4_ex2.awk
» [ -e Gnu_Awk_Part_14/hpr2816/awkl4_ex3.awk ] && In -s -f
Gnu_Awk_Part_14/hpr2816/awkl4_ex3.awk awk!4_ex3.awk
This is a script which I can use in all sorts of other contexts, though it probably
needs some refinement to be completely foolproof.
Note that some caution is needed when writing shell commands in awk because
of the potential pitfalls when using quotes. See the GNU Awk Users Guide
section 10.2.9 for hints.
Redirecting to a coprocess
This type of redirection uses a pipe symbol and an ampersand to send output to a
string containing a command (or commands) for the shell.
print items |& command
printf format, items |& command
This is an advanced feature which is a gawk extension. Unlike the previous
redirection, which sends to a program, this form sends to a program and allows
the programs output to be read back. That is why the command is referred to as
a coprocess.
Since it is necessary to use our next main topic ' get line' to achieve all of this
well postpone discussing the subject until the next episode.
Redirecting to special files
There are three standard Unix channels that are known as standard input,
standard output, and standard error output (or more commonly standard error).
These are connected to keyboard and screen in the default case.
Normally a Unix program or script reads from standard input and writes to
standard output and generates any error messages on standard error. There is a
lot more to this than described here but this will suffice for the moment.
Gnu Awk can use three special file names to access these channels:
• /dev/stdin: standard input
• /dev/stdout: standard output
• /dev/stderr: standard error output
So, for example, a script can write explicitly to standard error with a command
of the form:
print "Invalid number" > "/dev/stderr"
See the GNU Awk Users Guide section 5.7 on this subject for more details.
There are also other special names available as described in the Guide in section
T8.
Next episode
I will be continuing with the second half of this episode in a few weeks.
Links
• GNU Awk Users Guide
o Redirecting output of print and printf
° Special Files for Standard Preopened Data Streams
° Special File names in gawk
• Previous shows in this series on HPR:
° “ Gnu Awk - Part 1 ” - episode 2114
° “ Gnu Awk - Part 2 ” - episode 2129
o
“ Gnu Awk - Part 3 ” - episode 2143
o “ Gnu Awk - Part 4 ” - episode 2163
° “ Gnu Awk - Part 5 ” - episode 2184
° “ Gnu Awk - Part 6 ” - episode 2238
° “ Gnu Awk - Part 7 ” - episode 2330
° “ Gnu Awk - Part 8 ” - episode 2438
° “ Gnu Awk - Part 9 ” - episode 2476
o “ Gnu Awk - Part 10 ” - episode 2526
° “ Gnu Awk - Part 11 ” - episode 2554
° “ Gnu Awk - Part 12 ” - episode 2610
° “ Gnu Awk - Part 13 ” - episode 2804
Resources:
° ePub version of these notes
° Examples: awk!4 fruit data.txt . awk!4 exl.awk . awk!4 ex2.awk .
awk!4 ex3.awk