Episode: 2824 Title: HPR2824: Gnu Awk - Part 15 Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2824/hpr2824.mp3 Transcribed: 2025-10-19 17:21:30 --- This is HPR Episode 2,824 Entitled, Gnuok Part 15, and is part of the series Accessibility. It is hosted by Dave Morris and is about 32 minutes long and carries an exquisite flag. The summary is re-irection of input and output Part 2. This episode of HPR is brought to you by an Honesthost.com. Get 15% discount on all shared hosting with the offer code HPR15. That's HPR15. Better web hosting that's Honest and Fair at An Honesthost.com. Hello everybody, this is Dave Morris for Hacker Public Radio. It's a nice day, I've got the door open, so you might hear background noises from the birds and stuff. Hopefully nobody in the vicinity is going to start up a lawnmower, let's see. So this is Gnuok Part 15, and it's about a series that be easier myself for doing. I'm doing a second of a pair of episodes looking at re-direction in all scripts. This one I'm going to talk primarily about the Getline command, which is used for explicit input as opposed to the usual implicit thought that we've seen up to now, and it can include re-direction. Now the Getline command and its uses is quite a complex subject. This show is going to be a bit longer than usual, but it's no way it's going to cover all of the the ins and outs of this subject, so I've redirected you to the Gnuok users guide for the full details, there's links in the show notes, there's long notes for this particular episode. So let's start off with the reminder of how ORC processes its rules. I think we alluded to this, but we've maybe didn't go into enough detail about this as we've been going through the series. We're looking today at how you can change the default rules, the default methods, but I thought it was worthwhile just to look at the standard approach to this sort of stuff. So when the ORC script reads a line from a file or from standard input, then it scans it and triggers, that causes it to go through all of the rules except for the ones which have a begin and end in front of them. And the rules are the things that make up the script, there's some sort of a test followed by bits of ORC inside curly brackets. If a rule matches, then it's going to be run, and that process will continue until all of the rules have been checked. So it's entirely possible that multiple rules will match and they will all be executed if so in the sequence that they're encountered. It's important to bear in mind that they are actually here in that sequence. So what I've done for this show is to prepare a very very simple data file with three lines in it and a very simple script which runs against it. Just as an aside, I'm using a command called there's a there's a thing called Laura Mipson which is sort of fake Latin that tends to be used to fill out forms or just to use as placeholders in blogs or something like that. And I've mentioned how you can get hold of this if you want to use it. So I've actually noted how I used it on the the command line shell the shell command line that I used print f space and then in double quotes percent s backslash n closed double quotes space then a command substitution dollar open parenthesis lawram space w space three close parentheses. So that one that does is to run this lawram command and ask it to generate three words then there's a redirection a greater than sign to a file called org 15 test data one and I've provided this particular file with the show. So the script which I've shown here and is again downloaded if you want to play with it is standalone org script and it contains three rules none of the rules have any matching things in front of them. So there's no tests that have been carried out. They're just three rules that will be obeyed. The first one simply prints out the string r1 of which rule one followed by three hyphens just as a sort of deliverter. The second rule prints out r2 and then that's followed by the contents of dollar zero. I won't read these out in minute detail because I think you should know how to do this by now. The third rule prints out r3 followed by the contents of dollar zero again. So it's really the same as r2 except that it's got a different rule number and we've got the the data file contains two three nonsense Latin words. I think they're nonsense. Some of them are not actually but anyway it doesn't make a lot of sense. I learned Latin at school but I've erased it all from my head since then. So when you run it but then very exciting it simply prints out r1, three hyphens, r2, first word, volupt, tatibus, r3 prints out the same word. Then r1 again for the second word and living on the edge by trying to pronounce these where at and then third time round three hyphens sunt. So wow gosh. So basically what it's showing is that each rule is run for each line read from the data file. The first rule doesn't do anything at all with the data but it's still going to be triggered because there are no criteria for trigger. It's going to happen whatever's been read in and it's going to happen for every line that comes in. So there's nothing to stop any of these rules from running. So that's how the basic thing works. I think you probably knew this but I thought it was worth. If you'd asked me before I started looking at this writing this particular episode, how does this work? I'd have probably scratched my head a bit. So I just thought it was worth making it entirely clear how it works. The get line command then is a way of changing how orc reads lines. Normally they're they're all being read one of the time from the whatever the data source is and there's all that stuff about matching patterns in invoking rules etc. This is different from the way that other programming languages handle input, though some can be coerced to do stuff similar in a similar way. But the way that orc reads its data and processes it is one of its great strengths I think. Now the get line command can be used to read lines explicitly outside the usual read pattern match action cycle. So this is an example of its use in a simple way. If it's used on its own with no arguments, it just reads in the next line and splits it up into fields in the normal way. If you use the normal input, it affects how the data is read and how rules are executed. So if get line finds a record it returns a one. So there are there are flags that it returns and if it encounters the end of file it returns zero. If there's an error while it's reading it returns a minus one and sets a variable called EWRNO in cover which contains a description of what went wrong. So I've given you another script which is basically the same as the first one. It's called org15 underscore EX2.org and the only difference is that rule two, the same three rules except that rule two also contains a get line. So if we run that script against the same set of data we will already use. Then you get a different output. For the first line you get R1 is triggered so you get the three hyphons. R2 is triggered and you see the first word of the file which is this roll up tattibus. But then the get line is invoked and that goes and gets the second line out of the file and R3 is then triggered because it's the next one in sequence and it simply prints out that line. So the get line has caused the normal sequence of reading to to change. Then the next iteration R1 three hyphons R2 contains the last line of the file. Sunt and the get line will not get anything back. So $0 which is printed by R3 will not be different as it was in the previous iteration. So simply the script simply prints out the same line again. Hopefully that helps to clarify the effects of get line and against the normal way that org works. So I've written a slightly more usable or useful or perhaps it's not all useful but a script anyway which demonstrates a thing that might be more useful. Though it needs work to make it generic. What we've got here is a file of text, another one of these files of lore and text where I've simply written out a number of lines and I've then split the lines and put a hyphen on the first one at the end of the first one. So there are actually six lines in the file and they're in pairs. The first one of which has got a hyphen as the last character. What this is meant to signify is that it's a continued line and you want the script to stick together. The script detects that a line finishes with the hyphen and then it concatenates them and you can see running it what it's produced. So the general rule I won't go into detail of what's in here but in general if the last field of a line is a hyphen then that hyphen is deleted and the line is saved in a variable called line then the get line a get line call then refills dollar zero and then that is printed preceded by the saved line. That's how you join two lines together. If there was a line without a hyphen on the end which is entirely possible then it would just be printed. It didn't actually put that in this example. I should have done this but I'll let you play with that. Like I said this is very simplistic script. It doesn't cater for errors in the way in which it's laid out and if you put hyphen on as the last element but you not left a space in front of it and it's concatenated to the previous word then this algorithm will spot it and it really you should should be doing that if you were trying to make it into something actually useful. There's quite a sophisticated example in the Canoe Walk users guide and I've given a link to it section 4.10.1 where something vaguely similar is being done in a more elegant and resilient way. So get line can be followed by the name of a variable and in which case the record is read from the main input stream into that variable. Now the record is not split into fields under these circumstances and variables like NF the last field is not changed because the field splitting process has not been invoked. However since the main input stream is being read things like NR, the variable NR which is number of records will be changed because these are being counted by Orc. I haven't gone into great detail about the side effects of this. You can find more about it in the manual. There's also a possibility of reading from a file not too dissimilar from the way print and print F work as we saw in the last episode. You would write get line then a less than sign and the name of a file. The name of the file has to be a string expression or a variable and the expression representing the file can also be used to close that file. So there's a little snippet here which sets a variable input to some other variable path a slash in double quotes and a variable file name. So the assumption is that path and file name are two bits of get you to a particular file and then you put slash between them you're on a unique system. Then get line less than sign input will open that file and read from it and then once that's happened you can type close an in parentheses input and it will close that very far and using variables for this is extremely wise because otherwise you'd have to rely on your ability to exactly type the same string twice or about the noises off. Okay so you can also of course read from a file into a variable. So there's reading one line at a time as we said so you can read from that file into into a variable. I've got an example which is org15 underscore ex4.org which it actually consists of a script that reads from fruit names, the file fruit names that we created in the previous episode. These two episodes were actually one originally so they sort of refer to one another a bit but so what it actually does is it's all done in the begin rule. No other rules in this script and what it's doing is just just reading in the file and printing it out. I did add a few another fiddly bit into it so when you if you look at it it's looking at a variable called argc all in capital say argc so we need that to be two because it actually includes the the name of the script as the first element and so we need that when the script is invoked we need it to have an argument referring to the file you want to to process. So it checks to see if it is two and if it's not it prints out needs a file name argument and it sent it to std e2 standard error output and exits. I just put that in because I thought it would be useful to show how you can you can do that type of thing. Then the actual data file is picked up from the array argv in capital square brackets one so that's that first element. Did I say yeah it needs to be there's two elements in it there needs to be two elements in it but they're addressed as zero and one. I think I didn't make that clear enough. So we have a while loop and in the while loop we have in parentheses get line line less than data so data's got the name of the file so it's going to be reading from that file and after the parenthesis get line with its various arguments we have a greater than zero so we're looking to see if the answer if the value that comes back from get line is one or zero because when it's zero there's no more data history is the end of the file and the loop just has one command that it is invokes which is print line so it's get lines read into a variable called line and it's simply printed out and then after that while loop there's a close command which in parentheses uses the variable data so it closes that file so very very very trivial it simply reads the file and prints because as a seasoned orc user you will be aware that you could simply have written this as on the command line orc single quotes open curly brackets print close curly brackets close single quote space fruit names and it would have done exactly the same thing about anywhere near as much fuss but this was for demonstration per next the key um get line facility gets a bit more sophisticated and you can read from a pipe in a walk now the way you do this is to provide a command a vertical bar and get line or command vertical bar get line and then the name of a variable read what what happens is that the get line the orcs runs the command as a subprocess and it gets lines from that command and either does usual splitting field splitting or it stores it in a variable so org15ex5.org is a simple orcscript which runs as its command which is being stored it's all it's all within begin rule the command sort called cmd is wget command so you need to have wget installed on your urlinux system or indeed a bsd system if you wish wget space minus or hyphen log case q then url which is the hack of a radio stats page or read out it's here in the notes then hyphen capital O that means output to then that's followed by a file name which is simply hyphen in which in which case it means to output it to this it's standard out channel well that's all in double quotes so it's a string for org so then there's a while loop which does a similar thing it inside the parentheses of the the test that's done every time the loop runs each iteration it's got cmd in parentheses vertical bar get line close parentheses and then we compare the output from that to zero we want it to be greater than zero because once the output ends then get one more return to zero which means stop so inside the loop which has got a body with curly brackets in closing it because it's a bit more complicated than the previous while loop we used we've got an if statement where it's testing to see if dollar zero and then a tilde meaning compare this with regular expression and the regular expression is carrot that up our old thing shows in q colon so we're looking for a line that begins with shows in q close parentheses there then if that matches then we want to print f q shows on hpr percent d we isn't print f did i say that percent d backslash in and we want to print out field number four once the loop has completed then we close the pipe which we do by giving close the command that we set up earlier in variable cmd so the statistics is a number of lines stats you get from each there's a number of lines which contain various attributes of current state of hpr one of them is the number of shows in the q and what this does is it it picks out just that particular piece of text so when you run it and i just run it in real time and it comes back and says q shows on hpr colon 27 because there's 27 in the q just at this precise moment which is the 23rd of april so i did another example which is essentially the same but uses a slightly different approach and this is a 15e x6 but we're using get line var named variable to store the stuff so it's the same command is the same there's a while loop what while loop does is simply gets lines from the the server and it just doesn't do anything at all with them it simply gets them one at a time until they've all been collected and then the connection is shut down but what that means is that the last line that came back can is still still stored in the variable line so we use split to chop that up into an array called fields using a comma as the delimiter then we can print out q shows on hpr colon space percent d backslash in as the format spec for printf comma fields square bracket 10 the 10th item 10th element of this last line which is a comma separated line contains a number of shows that are in the q so you get back the same answer 27 just to demonstrate that's a different way of doing so the last thing i want to say about get line is that orc provides or this is canoe orc some of the other orc variants don't offer this but there's the capability of accessing a co-process and a co-process is a sub-process but it can be written to and read from so in the context of the print and print f commands we can send data to the process the co-processes with the sequence vertical bar ampersand as an operator not just a plain pipe but with an ampersand after it and i already mentioned this in the last show number 14 and not too surprisingly you can use get line to read this data back using the same operator it's you can bring it back against fields or you can put it in a variable so i'm not going to go into a lot of depth this is quite advanced and there's a lot of it a lot of information about it in the canoe orc uses guide there's a get line and go and co-processes section and there's a whole subject of two-way IO you can write some quite sophisticated stuff using this so i've written a simple thing which i've called org15 underscore ex7 and it demonstrates a thing that you could do with this feature now in this particular example i've got an sq like database which i haven't provided for download this is a copy of one that i used to keep track of the hpr episodes on the internet archive this is going to be added to the next database design but won't sustain alone database and for the purposes of this example it's called orgtest.db now the way that you talk to the database is by sending it commands in structured query language i have mentioned this in other shows you might be aware of it but the essence of what what what we're going to do here is to send it to command which consists of select which is the sort of from the verb used in sql or structured query language which lets you get data out of a database select space then id comma title these two fields of the database that i have defined ideas the show number title is the show title from is the next part of the sequence and episodes is the name of the table then follow that with where id equals and then some placeholder semicolon we don't actually type the placeholder in this particular case but what we're going to do is we're going to use a print f to generate it so whatever goes in that placeholder you'll get back the answer in the form the show number and the title for a given hpr show so what we have in this script is we have two rules i've got a begin rule where we're declaring things and we're declaring db a variable called db which is being set to orc test dot db the name of the file telling the little database with a command the command is sqlite 3 that's the the command which you use on the command line which must be followed by the name of a database which you can then either use interactively or you can feed it commands through that that route and then the third variable is called query tpl i tend to use tpl to mean template and it in it it's a it's a string it's actually a template for print f or format template and that select id title from episode to id equals thing i mentioned before is is in it and the placeholder is percent d and a semicolon backslash in so that's the begin rule and it set these variables up then what we want to do is to read the script wants to read numbers and these numbers will be show numbers that it's to interrogate the database for so the test that we're using for this rule is that dollar zero the entire line matches a regular expression which consists of the the digits naught to nine one or more times with nothing else on the line starts on the line and it it ends the the line ends after the last digit could have been more sophisticated then a light spaces around it but I didn't think it was worth the trouble for this demo so this particular rule then uses print f with the format that we already declared called query tpl and we feed it dollar zero as the variable that's going to be fed into that command that's sql command we send that to the variable cmd which is running as a co-process and we do it through a vertical bar and ampersen so what that will do the first time it's invoked is it will cause the co-pressors to start up and it will feed the co-process will be running sql light on the database expecting individual commands to come in and the first command it will get will be generated by this print f then the next line is using the command on the left side and a vertical bar and an ampersen with get line following it and get lines followed by the name of a variable which is result so command vertical bar ampersen get line space result so what that will be doing is it will be talking to the co-process and we'll be pulling back anything that is produced by that query onto the database as the variable result and the last line is print space result who prints its content so when I've actually done there's many ways that this could be run the simplest one for the demonstration purposes be to feed it some numbers in a file which is what I did I called it what 15ex5 data but I haven't included it in the show because it's no point it's just a line with just a file with three lines in it and I've included the lines the numbers per one per line 27612789 and 2773 so when you run it with this data file it just simply returns 2761 HPE Archimension use of February 2019 2789 pacing in storytelling 2773 lead acid battery maintenance and calcium charge volt that's that's all that I mean it looks pretty simple the the the process the co-process will just keep running until it till the orkscript runs out of data when the orkscript runs out of data it will simply exit when it exits the co-process will be killed off by ork you could if you wish to do an explicit close on that co-processor and that would that would make it go away I didn't do that here because it didn't seem to be entirely necessary to do but so you get some sort of idea of how you could be running a co-process which is just sitting there waiting for stuff to be thrown at it and coming back with answers and you can write a script which will converse with it okay that's all I'm going to say then about get line this particular show I'm going to finish off with a finale which is pretty much an announcement now there's a lot more that could be said about this redirection subject input and output as well as about co-processes as we said and there's many more subjects within GNU more that could be examined but we feel that now's the time to bring this series to an end be easy and I feel that the areas of ork a GNU ork that we've not covered in this series might be left that's left for you to investigate further if you have the need we both feel that ork is a very useful tool in in many respects but doesn't stand comparison with more advanced scripting languages such as Python, Ruby and Pearl. Pearl in particular borrowed many ideas from ork and has extended them considerably over the years and Ruby was designed with Pearl in mind and although it's probably done some of the things as a language better than Pearl and Python which came out the subject from a different angle has innovated enormously and is in extremely widely used language so there are others which I won't go into but just to give you a flavor of the fact that there's many other languages which are good for text processing other than all so although GNU wants advanced considerably since it was created I think it shows its age quite a lot and its usefulness is a bit limited now there are cases where quite complex scripts might be written in all but the way most people tend to use it as part of a pipeline or inside shell scripts are various sorts where you might write a complex script in Pearl Python or Ruby for example taking on a large project solely in ork seems like a pretty bad choice today so before we wind up this series it's planned to produce one more episode number 16 and in it Beasy and I will record a show together exactly how I'm not sure I'm more perhaps but something more sophisticated perhaps at the time of writing at the time of recording there's no time scale though we don't want to let it sit for too long but we'll endeavor to do this as soon as our schedules allow and we really wanted to review what has got us here and give a bit more information but why we feel it's not worth carrying on any further with the with the series and just sort of give you our two different views on what we've been doing over these years now we've been doing this for a couple of years a bit more not sure gonna have the dates to hand but anyway that's that's the the plan so I hope you've enjoyed the series as a whole and have found it useful okay that's it bye bye you've been listening to hecka public radio at hecka public radio dot org we are a community podcast network that releases shows every weekday Monday through Friday today's show like all our shows was contributed by an hbr listener like yourself if you ever thought of recording a podcast and click on our contributing to find out how easy it really is hecka public radio was found by the digital dog pound and the infonomican computer club and it's part of the binary revolution at binrev.com if you have comments on today's show please email the host directly leave a comment on the website or record a follow-up episode yourself unless otherwise stated today's show is released on the creative comments attribution share a like three dot org license