Files
hpr-knowledge-base/hpr_transcripts/hpr2824.txt

292 lines
26 KiB
Plaintext
Raw Normal View History

Episode: 2824
Title: HPR2824: Gnu Awk - Part 15
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2824/hpr2824.mp3
Transcribed: 2025-10-19 17:21:30
---
This is HPR Episode 2,824 Entitled, Gnuok Part 15, and is part of the series Accessibility.
It is hosted by Dave Morris and is about 32 minutes long and carries an exquisite flag.
The summary is re-irection of input and output Part 2.
This episode of HPR is brought to you by an Honesthost.com.
Get 15% discount on all shared hosting with the offer code HPR15.
That's HPR15.
Better web hosting that's Honest and Fair at An Honesthost.com.
Hello everybody, this is Dave Morris for Hacker Public Radio.
It's a nice day, I've got the door open, so you might hear background noises from the birds and stuff.
Hopefully nobody in the vicinity is going to start up a lawnmower, let's see.
So this is Gnuok Part 15, and it's about a series that be easier myself for doing.
I'm doing a second of a pair of episodes looking at re-direction in all scripts.
This one I'm going to talk primarily about the Getline command, which is used for explicit input
as opposed to the usual implicit thought that we've seen up to now, and it can include re-direction.
Now the Getline command and its uses is quite a complex subject.
This show is going to be a bit longer than usual, but it's no way it's going to cover
all of the the ins and outs of this subject, so I've redirected you to the Gnuok users guide
for the full details, there's links in the show notes, there's long notes for this particular
episode. So let's start off with the reminder of how ORC processes its rules. I think we alluded to
this, but we've maybe didn't go into enough detail about this as we've been going through the
series. We're looking today at how you can change the default rules, the default methods,
but I thought it was worthwhile just to look at the standard approach to this sort of stuff.
So when the ORC script reads a line from a file or from standard input, then it scans it and triggers,
that causes it to go through all of the rules except for the ones which have a begin and end in
front of them. And the rules are the things that make up the script, there's some sort of a test
followed by bits of ORC inside curly brackets. If a rule matches, then it's going to be run,
and that process will continue until all of the rules have been checked. So it's entirely possible
that multiple rules will match and they will all be executed if so in the sequence that they're
encountered. It's important to bear in mind that they are actually here in that sequence.
So what I've done for this show is to prepare a very very simple data file with three lines in it
and a very simple script which runs against it. Just as an aside, I'm using a command called
there's a there's a thing called Laura Mipson which is sort of fake Latin that tends to be used to
fill out forms or just to use as placeholders in blogs or something like that. And I've mentioned
how you can get hold of this if you want to use it. So I've actually noted how I used it on the
the command line shell the shell command line that I used print f space and then in double quotes
percent s backslash n closed double quotes space then a command substitution dollar open
parenthesis lawram space w space three close parentheses. So that one that does is to run this
lawram command and ask it to generate three words then there's a redirection a greater than sign
to a file called org 15 test data one and I've provided this particular file with the show.
So the script which I've shown here and is again downloaded if you want to play with it is
standalone org script and it contains three rules none of the rules have any matching things in
front of them. So there's no tests that have been carried out. They're just three rules that will
be obeyed. The first one simply prints out the string r1 of which rule one followed by three
hyphens just as a sort of deliverter. The second rule prints out r2 and then that's followed by the
contents of dollar zero. I won't read these out in minute detail because I think you should
know how to do this by now. The third rule prints out r3 followed by the contents of dollar zero
again. So it's really the same as r2 except that it's got a different rule number and we've got
the the data file contains two three nonsense Latin words. I think they're nonsense. Some of them
are not actually but anyway it doesn't make a lot of sense. I learned Latin at school but I've
erased it all from my head since then. So when you run it but then very exciting it simply prints out
r1, three hyphens, r2, first word, volupt, tatibus, r3 prints out the same word. Then r1 again for the
second word and living on the edge by trying to pronounce these where at and then third time
round three hyphens sunt. So wow gosh. So basically what it's showing is that each rule is run
for each line read from the data file. The first rule doesn't do anything at all with the data but
it's still going to be triggered because there are no criteria for trigger. It's going to happen
whatever's been read in and it's going to happen for every line that comes in. So there's nothing
to stop any of these rules from running. So that's how the basic thing works. I think you probably
knew this but I thought it was worth. If you'd asked me before I started looking at this writing
this particular episode, how does this work? I'd have probably scratched my head a bit. So I just
thought it was worth making it entirely clear how it works. The get line command then is a way of
changing how orc reads lines. Normally they're they're all being read one of the time from the
whatever the data source is and there's all that stuff about matching patterns in invoking rules
etc. This is different from the way that other programming languages handle input, though some
can be coerced to do stuff similar in a similar way. But the way that orc reads its data and processes
it is one of its great strengths I think. Now the get line command can be used to read lines
explicitly outside the usual read pattern match action cycle. So this is an example of its
use in a simple way. If it's used on its own with no arguments, it just reads in the next line
and splits it up into fields in the normal way. If you use the normal input, it affects how the
data is read and how rules are executed. So if get line finds a record it returns a one. So
there are there are flags that it returns and if it encounters the end of file it returns zero.
If there's an error while it's reading it returns a minus one and sets a variable called EWRNO
in cover which contains a description of what went wrong. So I've given you another script which
is basically the same as the first one. It's called org15 underscore EX2.org and the only
difference is that rule two, the same three rules except that rule two also contains a get line.
So if we run that script against the same set of data we will already use. Then you get a different
output. For the first line you get R1 is triggered so you get the three hyphons. R2 is triggered and you
see the first word of the file which is this roll up tattibus. But then the get line is invoked
and that goes and gets the second line out of the file and R3 is then triggered because it's
the next one in sequence and it simply prints out that line. So the get line has caused the normal
sequence of reading to to change. Then the next iteration R1 three hyphons R2 contains the last
line of the file. Sunt and the get line will not get anything back. So $0 which is printed by R3
will not be different as it was in the previous iteration. So simply the script simply prints out
the same line again. Hopefully that helps to clarify the effects of get line and against the normal
way that org works. So I've written a slightly more usable or useful or perhaps it's not all
useful but a script anyway which demonstrates a thing that might be more useful. Though it needs
work to make it generic. What we've got here is a file of text, another one of these files of
lore and text where I've simply written out a number of lines and I've then split the lines and
put a hyphen on the first one at the end of the first one. So there are actually six lines in
the file and they're in pairs. The first one of which has got a hyphen as the last character.
What this is meant to signify is that it's a continued line and you want the script to stick
together. The script detects that a line finishes with the hyphen and then it concatenates them
and you can see running it what it's produced. So the general rule I won't go into detail of what's
in here but in general if the last field of a line is a hyphen then that hyphen is deleted
and the line is saved in a variable called line then the get line a get line call then refills
dollar zero and then that is printed preceded by the saved line. That's how you join two lines
together. If there was a line without a hyphen on the end which is entirely possible then it
would just be printed. It didn't actually put that in this example. I should have done this but
I'll let you play with that. Like I said this is very simplistic script. It doesn't cater for
errors in the way in which it's laid out and if you put hyphen on as the last element but you
not left a space in front of it and it's concatenated to the previous word then this algorithm will
spot it and it really you should should be doing that if you were trying to make it into something
actually useful. There's quite a sophisticated example in the Canoe Walk users guide and I've
given a link to it section 4.10.1 where something vaguely similar is being done in a more elegant
and resilient way. So get line can be followed by the name of a variable and in which case the record
is read from the main input stream into that variable. Now the record is not split into fields
under these circumstances and variables like NF the last field is not changed because the field
splitting process has not been invoked. However since the main input stream is being read things
like NR, the variable NR which is number of records will be changed because these are being counted
by Orc. I haven't gone into great detail about the side effects of this. You can find more about
it in the manual. There's also a possibility of reading from a file not too dissimilar from the
way print and print F work as we saw in the last episode. You would write get line then a less
than sign and the name of a file. The name of the file has to be a string expression or a variable
and the expression representing the file can also be used to close that file. So there's a little
snippet here which sets a variable input to some other variable path a slash in double quotes
and a variable file name. So the assumption is that path and file name are two bits of get you to
a particular file and then you put slash between them you're on a unique system. Then get line
less than sign input will open that file and read from it and then once that's happened you can type
close an in parentheses input and it will close that very far and using variables for this is
extremely wise because otherwise you'd have to rely on your ability to exactly type the same string
twice or about the noises off. Okay so you can also of course read from a file into a variable.
So there's reading one line at a time as we said so you can read from that file into into a variable.
I've got an example which is org15 underscore ex4.org which it actually consists of a script that
reads from fruit names, the file fruit names that we created in the previous episode. These two
episodes were actually one originally so they sort of refer to one another a bit but so what it
actually does is it's all done in the begin rule. No other rules in this script and what it's doing
is just just reading in the file and printing it out. I did add a few another fiddly bit into it
so when you if you look at it it's looking at a variable called argc all in capital say argc so
we need that to be two because it actually includes the the name of the script as the first element
and so we need that when the script is invoked we need it to have an argument referring to the file
you want to to process. So it checks to see if it is two and if it's not it prints out needs a
file name argument and it sent it to std e2 standard error output and exits. I just put that in
because I thought it would be useful to show how you can you can do that type of thing. Then the
actual data file is picked up from the array argv in capital square brackets one so that's that
first element. Did I say yeah it needs to be there's two elements in it there needs to be two
elements in it but they're addressed as zero and one. I think I didn't make that clear enough.
So we have a while loop and in the while loop we have in parentheses get line line less than data
so data's got the name of the file so it's going to be reading from that file and after the
parenthesis get line with its various arguments we have a greater than zero so we're looking to see
if the answer if the value that comes back from get line is one or zero because when it's zero
there's no more data history is the end of the file and the loop just has one command that it is
invokes which is print line so it's get lines read into a variable called line and it's simply
printed out and then after that while loop there's a close command which in parentheses uses the
variable data so it closes that file so very very very trivial it simply reads the file and prints
because as a seasoned orc user you will be aware that you could simply have written this as on
the command line orc single quotes open curly brackets print close curly brackets close single
quote space fruit names and it would have done exactly the same thing about anywhere near as much
fuss but this was for demonstration per next the key um get line facility gets a bit more
sophisticated and you can read from a pipe in a walk now the way you do this is to provide a
command a vertical bar and get line or command vertical bar get line and then the name of a variable
read what what happens is that the get line the orcs runs the command as a subprocess and it gets
lines from that command and either does usual splitting field splitting or it stores it in a variable
so org15ex5.org is a simple orcscript which runs as its command which is being stored it's all
it's all within begin rule the command sort called cmd is wget command so you need to have wget
installed on your urlinux system or indeed a bsd system if you wish wget space minus or hyphen
log case q then url which is the hack of a radio stats page or read out it's here in the notes
then hyphen capital O that means output to then that's followed by a file name which is simply
hyphen in which in which case it means to output it to this it's standard out channel well that's
all in double quotes so it's a string for org so then there's a while loop which does a similar
thing it inside the parentheses of the the test that's done every time the loop runs each iteration
it's got cmd in parentheses vertical bar get line close parentheses and then we compare the output
from that to zero we want it to be greater than zero because once the output ends then get
one more return to zero which means stop so inside the loop which has got a body with curly
brackets in closing it because it's a bit more complicated than the previous while loop we used
we've got an if statement where it's testing to see if dollar zero and then a tilde meaning
compare this with regular expression and the regular expression is carrot that up our old thing
shows in q colon so we're looking for a line that begins with shows in q close parentheses there then
if that matches then we want to print f q shows on hpr percent d we isn't print f did i say that
percent d backslash in and we want to print out field number four once the loop has completed then
we close the pipe which we do by giving close the command that we set up earlier in variable cmd
so the statistics is a number of lines stats you get from each there's a number of lines which
contain various attributes of current state of hpr one of them is the number of
shows in the q and what this does is it it picks out just that particular piece of text
so when you run it and i just run it in real time and it comes back and says q shows on hpr
colon 27 because there's 27 in the q just at this precise moment which is the 23rd of april
so i did another example which is essentially the same but uses a slightly different approach
and this is a 15e x6 but we're using get line var named variable to store the stuff so it's the
same command is the same there's a while loop what while loop does is simply gets lines from the
the server and it just doesn't do anything at all with them it simply gets them one at a time
until they've all been collected and then the connection is shut down but what that means is
that the last line that came back can is still still stored in the variable line so we use split
to chop that up into an array called fields using a comma as the delimiter then we can print out
q shows on hpr colon space percent d backslash in as the format spec for printf comma fields square
bracket 10 the 10th item 10th element of this last line which is a comma separated line contains
a number of shows that are in the q so you get back the same answer 27 just to demonstrate that's
a different way of doing so the last thing i want to say about get line is that orc provides or
this is canoe orc some of the other orc variants don't offer this but there's the capability of
accessing a co-process and a co-process is a sub-process but it can be written to
and read from so in the context of the print and print f commands we can send data to the
process the co-processes with the sequence vertical bar ampersand as an operator not just a plain pipe
but with an ampersand after it and i already mentioned this in the last show number 14
and not too surprisingly you can use get line to read this data back using the same operator
it's you can bring it back against fields or you can put it in a variable so i'm not going to go
into a lot of depth this is quite advanced and there's a lot of it a lot of information about it
in the canoe orc uses guide there's a get line and go and co-processes section and there's a whole
subject of two-way IO you can write some quite sophisticated stuff using this so i've written a
simple thing which i've called org15 underscore ex7 and it demonstrates a thing that you could do
with this feature now in this particular example i've got an sq like database which i haven't
provided for download this is a copy of one that i used to keep track of the hpr episodes on
the internet archive this is going to be added to the next database design but won't
sustain alone database and for the purposes of this example it's called orgtest.db now the way
that you talk to the database is by sending it commands in structured query language i have
mentioned this in other shows you might be aware of it but the essence of what what what we're
going to do here is to send it to command which consists of select which is the sort of from
the verb used in sql or structured query language which lets you get data out of a database
select space then id comma title these two fields of the database that i have defined ideas the
show number title is the show title from is the next part of the sequence and episodes is the
name of the table then follow that with where id equals and then some placeholder semicolon
we don't actually type the placeholder in this particular case but what we're going to do is
we're going to use a print f to generate it so whatever goes in that placeholder you'll get back
the answer in the form the show number and the title for a given hpr show so what we have in this
script is we have two rules i've got a begin rule where we're declaring things and we're declaring
db a variable called db which is being set to orc test dot db the name of the file
telling the little database with a command the command is sqlite 3 that's the the command
which you use on the command line which must be followed by the name of a database which you can
then either use interactively or you can feed it commands through that that route and then the
third variable is called query tpl i tend to use tpl to mean template and it in it it's a it's a
string it's actually a template for print f or format template and that select id title from
episode to id equals thing i mentioned before is is in it and the placeholder is percent d and
a semicolon backslash in so that's the begin rule and it set these variables up then what we want
to do is to read the script wants to read numbers and these numbers will be show numbers that
it's to interrogate the database for so the test that we're using for this rule is that dollar zero
the entire line matches a regular expression which consists of the the digits naught to nine one or
more times with nothing else on the line starts on the line and it it ends the the line ends after
the last digit could have been more sophisticated then a light spaces around it but I didn't think
it was worth the trouble for this demo so this particular rule then uses print f with the format
that we already declared called query tpl and we feed it dollar zero as the variable that's
going to be fed into that command that's sql command we send that to the variable cmd which is
running as a co-process and we do it through a vertical bar and ampersen so what that will do
the first time it's invoked is it will cause the co-pressors to start up and it will feed
the co-process will be running sql light on the database expecting individual commands to come in
and the first command it will get will be generated by this print f then the next line is using
the command on the left side and a vertical bar and an ampersen with get line following it
and get lines followed by the name of a variable which is result so command vertical bar ampersen
get line space result so what that will be doing is it will be talking to the co-process and we'll
be pulling back anything that is produced by that query onto the database as the variable result
and the last line is print space result who prints its content so when I've actually done there's
many ways that this could be run the simplest one for the demonstration purposes be to feed it
some numbers in a file which is what I did I called it what 15ex5 data but I haven't included it
in the show because it's no point it's just a line with just a file with three lines in it and
I've included the lines the numbers per one per line 27612789 and 2773 so when you run it with this
data file it just simply returns 2761 HPE Archimension use of February 2019 2789 pacing in storytelling 2773
lead acid battery maintenance and calcium charge volt that's that's all that I mean it looks pretty
simple the the the process the co-process will just keep running until it till the orkscript runs
out of data when the orkscript runs out of data it will simply exit when it exits the co-process
will be killed off by ork you could if you wish to do an explicit close on that co-processor and
that would that would make it go away I didn't do that here because it didn't seem to be entirely
necessary to do but so you get some sort of idea of how you could be running a co-process
which is just sitting there waiting for stuff to be thrown at it and coming back with answers
and you can write a script which will converse with it okay that's all I'm going to say then
about get line this particular show I'm going to finish off with a finale which is pretty much
an announcement now there's a lot more that could be said about this redirection subject input
and output as well as about co-processes as we said and there's many more subjects within
GNU more that could be examined but we feel that now's the time to bring this series to an end
be easy and I feel that the areas of ork a GNU ork that we've not covered in this series might be
left that's left for you to investigate further if you have the need we both feel that ork is a
very useful tool in in many respects but doesn't stand comparison with more advanced scripting languages
such as Python, Ruby and Pearl. Pearl in particular borrowed many ideas from ork and has extended
them considerably over the years and Ruby was designed with Pearl in mind and although it's
probably done some of the things as a language better than Pearl and Python which came out the
subject from a different angle has innovated enormously and is in extremely widely used language
so there are others which I won't go into but just to give you a flavor of the fact that there's
many other languages which are good for text processing other than all so although GNU
wants advanced considerably since it was created I think it shows its age quite a lot and its
usefulness is a bit limited now there are cases where quite complex scripts might be written in all
but the way most people tend to use it as part of a pipeline or inside shell scripts are various
sorts where you might write a complex script in Pearl Python or Ruby for example taking on a large
project solely in ork seems like a pretty bad choice today so before we wind up this series it's
planned to produce one more episode number 16 and in it Beasy and I will record a show together
exactly how I'm not sure I'm more perhaps but something more sophisticated perhaps at the time of
writing at the time of recording there's no time scale though we don't want to let it sit for
too long but we'll endeavor to do this as soon as our schedules allow and we really wanted to
review what has got us here and give a bit more information but why we feel it's not worth
carrying on any further with the with the series and just sort of give you our two different
views on what we've been doing over these years now we've been doing this for a couple of years
a bit more not sure gonna have the dates to hand but anyway that's that's the the plan so I hope
you've enjoyed the series as a whole and have found it useful okay that's it bye bye
you've been listening to hecka public radio at hecka public radio dot org we are a community
podcast network that releases shows every weekday Monday through Friday today's show like all our
shows was contributed by an hbr listener like yourself if you ever thought of recording a podcast
and click on our contributing to find out how easy it really is hecka public radio was found
by the digital dog pound and the infonomican computer club and it's part of the binary revolution
at binrev.com if you have comments on today's show please email the host directly leave a comment
on the website or record a follow-up episode yourself unless otherwise stated today's show is
released on the creative comments attribution share a like three dot org license