Move under www to ease rsync
This commit is contained in:
476
www/eps/hpr2816/hpr2816_full_shownotes_djvu.txt
Executable file
476
www/eps/hpr2816/hpr2816_full_shownotes_djvu.txt
Executable file
@@ -0,0 +1,476 @@
|
||||
Gnu Awk - Part 14 (HPR Show 2816)
|
||||
|
||||
Redirection of input and output - part 1
|
||||
Dave Morriss
|
||||
|
||||
|
||||
|
||||
Gnu Awk - Part 14 (HPR Show 2816)
|
||||
|
||||
|
||||
Introduction
|
||||
|
||||
This is the fourteenth episode of the “ Learning Awk ” series which is being
|
||||
produced by b-veezi and myself.
|
||||
|
||||
In this episode and the next I want to start looking at redirection within Awk
|
||||
programs. I had originally intended to cover the subject in one episode, but there
|
||||
is just too much.
|
||||
|
||||
So, in the first episode I will be starting with output redirection and then in the
|
||||
next episode will spend some time looking at the get line command used for
|
||||
explicit input, often with redirection.
|
||||
|
||||
Redirection of output
|
||||
|
||||
So far we have seen that when an awk script uses print or printf the output is
|
||||
written to the standard output (the screen in most cases). The redirection feature
|
||||
in awk allows output to be written elsewhere.
|
||||
|
||||
How this is achieved is described in the following sections.
|
||||
|
||||
Redirecting to a file
|
||||
|
||||
print items > output-file
|
||||
printf format, items > output-file
|
||||
|
||||
Here, 'items' denotes the items to be printed, 'format' is the format expression
|
||||
for 'printf', ' output-file ' is an expression which is converted to a string and
|
||||
contains the name of the output file.
|
||||
|
||||
Here’s a simple example. It uses the file of fruit data introduced in episode
|
||||
number 2. This data file is included with this show ( awk!4 fruit data.txt ):
|
||||
|
||||
$ awk 'NR > 1 {print $1 > "fruit_names"}' awkl4_fruit_data.txt
|
||||
$ cat fruit_names
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
apple
|
||||
|
||||
banana
|
||||
|
||||
strawberry
|
||||
|
||||
grape
|
||||
|
||||
apple
|
||||
|
||||
plum
|
||||
|
||||
kiwi
|
||||
|
||||
potato
|
||||
|
||||
pineapple
|
||||
|
||||
Here the script skips the first line of headers, then prints out the fruit name in
|
||||
field 1 to the file called ' fruit_names'. Notice the file name is enclosed in
|
||||
quotes because it is a string.
|
||||
|
||||
The script will loop once per line of the input file executing the redirection each
|
||||
time. However the file contains all of the names in the same order as the input
|
||||
file. This is because of the following behaviour:
|
||||
|
||||
• The output file is erased before the first output is written to it.
|
||||
|
||||
• Subsequent writes to the same file do not erase it but append to it.
|
||||
|
||||
It is important to be aware that redirection in Awk is similar to but not the same
|
||||
as that in shell scripts.
|
||||
|
||||
What we have done here is not really different from running the following
|
||||
command where the shell deals with redirection:
|
||||
|
||||
$ awk 'NR > 1 {print $1}' awkl4_fruit_data.txt > fruit_names
|
||||
|
||||
Here Awk is writing to the standard output stream and the shell is capturing this
|
||||
stream and redirecting it to a file. However, things get more complex if the
|
||||
requirement is to write to more than one file from a script.
|
||||
|
||||
The following downloadable script ( awk!4 exl.awk i writes to a collection of
|
||||
output files:
|
||||
|
||||
$ cat awkl4_exl.awk
|
||||
#!/usr/bin/awk -f
|
||||
|
||||
# Downloadable example 1 for GNU Awk Part 14
|
||||
|
||||
NR > 1 {
|
||||
|
||||
colour = $2
|
||||
|
||||
fname = "awkl4 " colour " fruit"
|
||||
|
||||
|
||||
|
||||
printf "Writing %s to %s\n",$1,fname
|
||||
print $1 > fname
|
||||
|
||||
}
|
||||
|
||||
Running the script writes to files called ' awkl4_brown_fruit ' and similar in the
|
||||
current directory:
|
||||
|
||||
$ ./awkl4_exl.awk awkl4_fruit_data.txt
|
||||
Writing apple to awkl4_red_fruit
|
||||
Writing banana to awkl4_yellow_fruit
|
||||
Writing strawberry to awkl4_red_fruit
|
||||
Writing grape to awkl4_purple_fruit
|
||||
Writing apple to awkl4_green_fruit
|
||||
Writing plum to awkl4_purple_fruit
|
||||
Writing kiwi to awkl4_brown_fruit
|
||||
Writing potato to awkl4_brown_fruit
|
||||
Writing pineapple to awkl4_yellow_fruit
|
||||
|
||||
The script announces what it’s doing, which is a little superfluous but helps to
|
||||
visualise what’s going on.
|
||||
|
||||
Notice that since the output file names are generated dynamically and are liable
|
||||
to change between each line read from the input file the script is doing what was
|
||||
described earlier - creating them (or emptying them if they already exist) and
|
||||
then appending to them once open. All the files are closed when the script exits
|
||||
of course.
|
||||
|
||||
The files created are shown below and the contents of one displayed:
|
||||
|
||||
$ Is awkl4_*_fruit
|
||||
|
||||
awkl4_brown_fruit awkl4_green_fruit awkl4_purple_fruit
|
||||
awkl4_red_fruit awkl4_yellow_fruit
|
||||
|
||||
$ cat awkl4_purple_fruit
|
||||
|
||||
grape
|
||||
|
||||
plum
|
||||
|
||||
Redirecting and appending to an existing file
|
||||
|
||||
The next type of redirection uses two greater than signs:
|
||||
|
||||
print items » output-file
|
||||
printf format, items » output-file
|
||||
|
||||
|
||||
|
||||
In this case the output file is expected to exist already. If it does then its contents
|
||||
are not erased but are appended to. If the file does not exist then it is created and
|
||||
written to as before.
|
||||
|
||||
When redirecting to a file in a shell script it’s common to see something like
|
||||
this:
|
||||
|
||||
echo "Script starting" > script.log
|
||||
echo "Script ending" » script.log
|
||||
|
||||
The use of '» 1 in the second case is necessary because otherwise the file would
|
||||
have been cleared out before the message was written. Each redirection like this
|
||||
in Bash involves opening and closing the output file.
|
||||
|
||||
In an awk script on the other hand - as we have seen - the file is kept open by the
|
||||
script until it is closed on exit. There is a ' close' command which will do this
|
||||
explicitly, and we will look at this shortly.
|
||||
|
||||
Redirecting to another program
|
||||
|
||||
This type of redirection uses a pipe symbol to send output to a string containing
|
||||
a command (or commands) for the shell.
|
||||
|
||||
print items | command
|
||||
printf format, items | command
|
||||
|
||||
The following example shows the fruit names being written to a pair of
|
||||
commands in a shell pipeline:
|
||||
|
||||
$ awk 'NR > 1 {print $1 | "sort -u | nl"}' awkl4_fruit_data.txt
|
||||
|
||||
1 apple
|
||||
|
||||
2 banana
|
||||
|
||||
3 grape
|
||||
|
||||
4 kiwi
|
||||
|
||||
5 pineapple
|
||||
|
||||
6 plum
|
||||
|
||||
7 potato
|
||||
|
||||
8 strawberry
|
||||
|
||||
The names are sorted using the ' sort' command, requesting that the results be
|
||||
made unique (' -u'). The output from the sort is run through 'nl' which
|
||||
numbers the lines.
|
||||
|
||||
|
||||
|
||||
As the awk script is run, a sub-process is executed with the two commands. The
|
||||
first name is then sent to this process, and this repeats with each successive
|
||||
name. The sub-process finishes when the script finishes.
|
||||
|
||||
In this case the 'sort' command will have accumulated all the names, then on
|
||||
the connection being terminated it will perform the sort and pass the results to
|
||||
' nl'.
|
||||
|
||||
There is a 'close' command in awk which will close the redirection to the
|
||||
command(s) or to a file. The argument to 'close' needs to be the exact
|
||||
command(s) which define the process (or the exact file name). For this reason
|
||||
it’s a good idea to store the commands or file name in an awk variable.
|
||||
|
||||
The following downloadable script ( awk!4 ex2.awk ) shows the variable ' cmd'
|
||||
being used to hold the shell commands. The connection is closed to show how it
|
||||
would be done, though there is no actual need to do so here.
|
||||
|
||||
$ cat awkl4_ex2.awk
|
||||
#!/usr/bin/awk -f
|
||||
|
||||
# Downloadable example 2 for GNU Awk Part 14
|
||||
BEGIN {
|
||||
|
||||
cmd = "sort -u | nl"
|
||||
|
||||
}
|
||||
|
||||
NR > 1 {
|
||||
|
||||
print $1 | cmd
|
||||
|
||||
}
|
||||
|
||||
END {
|
||||
|
||||
close(cmd)
|
||||
|
||||
}
|
||||
|
||||
Running the script gives the same result as before:
|
||||
|
||||
$ ./awkl4_ex2.awk awkl4_fruit_data.txt
|
||||
|
||||
1 apple
|
||||
|
||||
2 banana
|
||||
|
||||
3 grape
|
||||
|
||||
4 kiwi
|
||||
|
||||
5 pineapple
|
||||
|
||||
6 plum
|
||||
|
||||
7 potato
|
||||
|
||||
8 strawberry
|
||||
|
||||
|
||||
|
||||
|
||||
Here’s a more real world example (at least it’s real in my world). When I’m
|
||||
preparing an HPR show like this which involves a number of example scripts I
|
||||
need to run them for testing purposes. I have a main directory for HPR shows
|
||||
and a sub-directory per show. I like to make soft links to the examples in this
|
||||
sub-directory so I can run tests without hopping about between directories.
|
||||
|
||||
In general I make links in this way:
|
||||
|
||||
In -s -f PathToExample BasenameOfExample
|
||||
|
||||
I wrote an Awk script to help me which takes path names as input and constructs
|
||||
shell commands which it pipes into ' sh '.
|
||||
|
||||
The following downloadable script ( awk!4 ex3.awk l shows the process.
|
||||
|
||||
$ cat awkl4_ex3.awk
|
||||
#!/usr/bin/awk -f
|
||||
|
||||
# Downloadable example 3 for GNU Awk Part 14
|
||||
|
||||
{
|
||||
|
||||
# Split the path up into components
|
||||
n = split($0,a,"/")
|
||||
|
||||
if (n < 2) {
|
||||
|
||||
print "Error in path",$0 > "/dev/stderr"
|
||||
next
|
||||
|
||||
}
|
||||
|
||||
# Build the shell command so we can show it
|
||||
|
||||
cmd = sprintf("[ -e %s ] && In -s -f %s %s",$0,$0,a[n])
|
||||
print "» " cmd
|
||||
|
||||
# Feed the command to the shell
|
||||
printf("%s\n",cmd) | "sh"
|
||||
|
||||
}
|
||||
|
||||
END {
|
||||
|
||||
close("sh")
|
||||
|
||||
}
|
||||
|
||||
The script expects to be given one or more pathnames on standard input. It first
|
||||
takes the path and splits it up based on the ' /' character. Since ' split' returns
|
||||
the number of elements then that number will index the last element. We check
|
||||
that it’s sensible before proceeding. Note that the error message generated by the
|
||||
' if' test is redirected to ' /dev/stderr 1 . We’ll be looking at this shortly.
|
||||
|
||||
|
||||
|
||||
We use 'sprintf ' to make the shell command. It first adds a test that the file
|
||||
path leads to a file, then if so the shell command uses the 'In' command to
|
||||
make a soft link. We use the ' - f' option which forces the creation to proceed
|
||||
even if the link already exists. The first argument to 'In' is the path and the
|
||||
second the basename (last component) of the file path.
|
||||
|
||||
This command is printed for reference, then it is executed by printing to a
|
||||
process running ' sh ' (which will be the Bourne shell or similar by default).
|
||||
|
||||
Running the script can be achieved thus. We use 'printf ' as a simple way of
|
||||
adding a newline to each pathname. The paths come from a filename expansion
|
||||
which includes a question mark. Running it gives the following results:
|
||||
|
||||
|
||||
$ printf "%s\n" Gnu_Awk_Part_14/hpr2816/awkl4_ex?.awk | ./awkl4_ex3.awk
|
||||
|
||||
» [ -e Gnu_Awk_Part_14/hpr2816/awkl4_exl.awk ] && In -s -f
|
||||
|
||||
Gnu_Awk_Part_14/hpr2816/awkl4_exl.awk awkl4_exl.awk
|
||||
|
||||
» [ -e Gnu_Awk_Part_14/hpr2816/awkl4_ex2.awk ] && In -s -f
|
||||
|
||||
Gnu_Awk_Part_14/hpr2816/awkl4_ex2.awk awkl4_ex2.awk
|
||||
|
||||
» [ -e Gnu_Awk_Part_14/hpr2816/awkl4_ex3.awk ] && In -s -f
|
||||
|
||||
Gnu_Awk_Part_14/hpr2816/awkl4_ex3.awk awk!4_ex3.awk
|
||||
|
||||
|
||||
This is a script which I can use in all sorts of other contexts, though it probably
|
||||
needs some refinement to be completely foolproof.
|
||||
|
||||
Note that some caution is needed when writing shell commands in awk because
|
||||
of the potential pitfalls when using quotes. See the GNU Awk User’s Guide
|
||||
section 10.2.9 for hints.
|
||||
|
||||
Redirecting to a coprocess
|
||||
|
||||
This type of redirection uses a pipe symbol and an ampersand to send output to a
|
||||
string containing a command (or commands) for the shell.
|
||||
|
||||
print items |& command
|
||||
printf format, items |& command
|
||||
|
||||
This is an advanced feature which is a gawk extension. Unlike the previous
|
||||
redirection, which sends to a program, this form sends to a program and allows
|
||||
the program’s output to be read back. That is why the command is referred to as
|
||||
a coprocess.
|
||||
|
||||
Since it is necessary to use our next main topic ' get line' to achieve all of this
|
||||
we’ll postpone discussing the subject until the next episode.
|
||||
|
||||
|
||||
|
||||
Redirecting to special files
|
||||
|
||||
There are three standard Unix channels that are known as standard input,
|
||||
standard output, and standard error output (or more commonly standard error).
|
||||
These are connected to keyboard and screen in the default case.
|
||||
|
||||
Normally a Unix program or script reads from standard input and writes to
|
||||
standard output and generates any error messages on standard error. There is a
|
||||
lot more to this than described here but this will suffice for the moment.
|
||||
|
||||
Gnu Awk can use three special file names to access these channels:
|
||||
|
||||
• /dev/stdin: standard input
|
||||
|
||||
• /dev/stdout: standard output
|
||||
|
||||
• /dev/stderr: standard error output
|
||||
|
||||
So, for example, a script can write explicitly to standard error with a command
|
||||
of the form:
|
||||
|
||||
print "Invalid number" > "/dev/stderr"
|
||||
|
||||
See the GNU Awk User’s Guide section 5.7 on this subject for more details.
|
||||
There are also other special names available as described in the Guide in section
|
||||
T8.
|
||||
|
||||
Next episode
|
||||
|
||||
I will be continuing with the second half of this episode in a few weeks.
|
||||
|
||||
Links
|
||||
|
||||
• GNU Awk User’s Guide
|
||||
|
||||
o Redirecting output of print and printf
|
||||
° Special Files for Standard Preopened Data Streams
|
||||
|
||||
° Special File names in gawk
|
||||
|
||||
• Previous shows in this series on HPR:
|
||||
|
||||
° “ Gnu Awk - Part 1 ” - episode 2114
|
||||
° “ Gnu Awk - Part 2 ” - episode 2129
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
o
|
||||
|
||||
|
||||
“ Gnu Awk - Part 3 ” - episode 2143
|
||||
o “ Gnu Awk - Part 4 ” - episode 2163
|
||||
° “ Gnu Awk - Part 5 ” - episode 2184
|
||||
° “ Gnu Awk - Part 6 ” - episode 2238
|
||||
° “ Gnu Awk - Part 7 ” - episode 2330
|
||||
° “ Gnu Awk - Part 8 ” - episode 2438
|
||||
° “ Gnu Awk - Part 9 ” - episode 2476
|
||||
o “ Gnu Awk - Part 10 ” - episode 2526
|
||||
° “ Gnu Awk - Part 11 ” - episode 2554
|
||||
° “ Gnu Awk - Part 12 ” - episode 2610
|
||||
° “ Gnu Awk - Part 13 ” - episode 2804
|
||||
Resources:
|
||||
|
||||
° ePub version of these notes
|
||||
|
||||
° Examples: awk!4 fruit data.txt . awk!4 exl.awk . awk!4 ex2.awk .
|
||||
awk!4 ex3.awk
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user