Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
291
hpr_transcripts/hpr2824.txt
Normal file
291
hpr_transcripts/hpr2824.txt
Normal file
@@ -0,0 +1,291 @@
|
||||
Episode: 2824
|
||||
Title: HPR2824: Gnu Awk - Part 15
|
||||
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2824/hpr2824.mp3
|
||||
Transcribed: 2025-10-19 17:21:30
|
||||
|
||||
---
|
||||
|
||||
This is HPR Episode 2,824 Entitled, Gnuok Part 15, and is part of the series Accessibility.
|
||||
It is hosted by Dave Morris and is about 32 minutes long and carries an exquisite flag.
|
||||
The summary is re-irection of input and output Part 2.
|
||||
This episode of HPR is brought to you by an Honesthost.com.
|
||||
Get 15% discount on all shared hosting with the offer code HPR15.
|
||||
That's HPR15.
|
||||
Better web hosting that's Honest and Fair at An Honesthost.com.
|
||||
Hello everybody, this is Dave Morris for Hacker Public Radio.
|
||||
It's a nice day, I've got the door open, so you might hear background noises from the birds and stuff.
|
||||
Hopefully nobody in the vicinity is going to start up a lawnmower, let's see.
|
||||
So this is Gnuok Part 15, and it's about a series that be easier myself for doing.
|
||||
I'm doing a second of a pair of episodes looking at re-direction in all scripts.
|
||||
This one I'm going to talk primarily about the Getline command, which is used for explicit input
|
||||
as opposed to the usual implicit thought that we've seen up to now, and it can include re-direction.
|
||||
Now the Getline command and its uses is quite a complex subject.
|
||||
This show is going to be a bit longer than usual, but it's no way it's going to cover
|
||||
all of the the ins and outs of this subject, so I've redirected you to the Gnuok users guide
|
||||
for the full details, there's links in the show notes, there's long notes for this particular
|
||||
episode. So let's start off with the reminder of how ORC processes its rules. I think we alluded to
|
||||
this, but we've maybe didn't go into enough detail about this as we've been going through the
|
||||
series. We're looking today at how you can change the default rules, the default methods,
|
||||
but I thought it was worthwhile just to look at the standard approach to this sort of stuff.
|
||||
So when the ORC script reads a line from a file or from standard input, then it scans it and triggers,
|
||||
that causes it to go through all of the rules except for the ones which have a begin and end in
|
||||
front of them. And the rules are the things that make up the script, there's some sort of a test
|
||||
followed by bits of ORC inside curly brackets. If a rule matches, then it's going to be run,
|
||||
and that process will continue until all of the rules have been checked. So it's entirely possible
|
||||
that multiple rules will match and they will all be executed if so in the sequence that they're
|
||||
encountered. It's important to bear in mind that they are actually here in that sequence.
|
||||
So what I've done for this show is to prepare a very very simple data file with three lines in it
|
||||
and a very simple script which runs against it. Just as an aside, I'm using a command called
|
||||
there's a there's a thing called Laura Mipson which is sort of fake Latin that tends to be used to
|
||||
fill out forms or just to use as placeholders in blogs or something like that. And I've mentioned
|
||||
how you can get hold of this if you want to use it. So I've actually noted how I used it on the
|
||||
the command line shell the shell command line that I used print f space and then in double quotes
|
||||
percent s backslash n closed double quotes space then a command substitution dollar open
|
||||
parenthesis lawram space w space three close parentheses. So that one that does is to run this
|
||||
lawram command and ask it to generate three words then there's a redirection a greater than sign
|
||||
to a file called org 15 test data one and I've provided this particular file with the show.
|
||||
So the script which I've shown here and is again downloaded if you want to play with it is
|
||||
standalone org script and it contains three rules none of the rules have any matching things in
|
||||
front of them. So there's no tests that have been carried out. They're just three rules that will
|
||||
be obeyed. The first one simply prints out the string r1 of which rule one followed by three
|
||||
hyphens just as a sort of deliverter. The second rule prints out r2 and then that's followed by the
|
||||
contents of dollar zero. I won't read these out in minute detail because I think you should
|
||||
know how to do this by now. The third rule prints out r3 followed by the contents of dollar zero
|
||||
again. So it's really the same as r2 except that it's got a different rule number and we've got
|
||||
the the data file contains two three nonsense Latin words. I think they're nonsense. Some of them
|
||||
are not actually but anyway it doesn't make a lot of sense. I learned Latin at school but I've
|
||||
erased it all from my head since then. So when you run it but then very exciting it simply prints out
|
||||
r1, three hyphens, r2, first word, volupt, tatibus, r3 prints out the same word. Then r1 again for the
|
||||
second word and living on the edge by trying to pronounce these where at and then third time
|
||||
round three hyphens sunt. So wow gosh. So basically what it's showing is that each rule is run
|
||||
for each line read from the data file. The first rule doesn't do anything at all with the data but
|
||||
it's still going to be triggered because there are no criteria for trigger. It's going to happen
|
||||
whatever's been read in and it's going to happen for every line that comes in. So there's nothing
|
||||
to stop any of these rules from running. So that's how the basic thing works. I think you probably
|
||||
knew this but I thought it was worth. If you'd asked me before I started looking at this writing
|
||||
this particular episode, how does this work? I'd have probably scratched my head a bit. So I just
|
||||
thought it was worth making it entirely clear how it works. The get line command then is a way of
|
||||
changing how orc reads lines. Normally they're they're all being read one of the time from the
|
||||
whatever the data source is and there's all that stuff about matching patterns in invoking rules
|
||||
etc. This is different from the way that other programming languages handle input, though some
|
||||
can be coerced to do stuff similar in a similar way. But the way that orc reads its data and processes
|
||||
it is one of its great strengths I think. Now the get line command can be used to read lines
|
||||
explicitly outside the usual read pattern match action cycle. So this is an example of its
|
||||
use in a simple way. If it's used on its own with no arguments, it just reads in the next line
|
||||
and splits it up into fields in the normal way. If you use the normal input, it affects how the
|
||||
data is read and how rules are executed. So if get line finds a record it returns a one. So
|
||||
there are there are flags that it returns and if it encounters the end of file it returns zero.
|
||||
If there's an error while it's reading it returns a minus one and sets a variable called EWRNO
|
||||
in cover which contains a description of what went wrong. So I've given you another script which
|
||||
is basically the same as the first one. It's called org15 underscore EX2.org and the only
|
||||
difference is that rule two, the same three rules except that rule two also contains a get line.
|
||||
So if we run that script against the same set of data we will already use. Then you get a different
|
||||
output. For the first line you get R1 is triggered so you get the three hyphons. R2 is triggered and you
|
||||
see the first word of the file which is this roll up tattibus. But then the get line is invoked
|
||||
and that goes and gets the second line out of the file and R3 is then triggered because it's
|
||||
the next one in sequence and it simply prints out that line. So the get line has caused the normal
|
||||
sequence of reading to to change. Then the next iteration R1 three hyphons R2 contains the last
|
||||
line of the file. Sunt and the get line will not get anything back. So $0 which is printed by R3
|
||||
will not be different as it was in the previous iteration. So simply the script simply prints out
|
||||
the same line again. Hopefully that helps to clarify the effects of get line and against the normal
|
||||
way that org works. So I've written a slightly more usable or useful or perhaps it's not all
|
||||
useful but a script anyway which demonstrates a thing that might be more useful. Though it needs
|
||||
work to make it generic. What we've got here is a file of text, another one of these files of
|
||||
lore and text where I've simply written out a number of lines and I've then split the lines and
|
||||
put a hyphen on the first one at the end of the first one. So there are actually six lines in
|
||||
the file and they're in pairs. The first one of which has got a hyphen as the last character.
|
||||
What this is meant to signify is that it's a continued line and you want the script to stick
|
||||
together. The script detects that a line finishes with the hyphen and then it concatenates them
|
||||
and you can see running it what it's produced. So the general rule I won't go into detail of what's
|
||||
in here but in general if the last field of a line is a hyphen then that hyphen is deleted
|
||||
and the line is saved in a variable called line then the get line a get line call then refills
|
||||
dollar zero and then that is printed preceded by the saved line. That's how you join two lines
|
||||
together. If there was a line without a hyphen on the end which is entirely possible then it
|
||||
would just be printed. It didn't actually put that in this example. I should have done this but
|
||||
I'll let you play with that. Like I said this is very simplistic script. It doesn't cater for
|
||||
errors in the way in which it's laid out and if you put hyphen on as the last element but you
|
||||
not left a space in front of it and it's concatenated to the previous word then this algorithm will
|
||||
spot it and it really you should should be doing that if you were trying to make it into something
|
||||
actually useful. There's quite a sophisticated example in the Canoe Walk users guide and I've
|
||||
given a link to it section 4.10.1 where something vaguely similar is being done in a more elegant
|
||||
and resilient way. So get line can be followed by the name of a variable and in which case the record
|
||||
is read from the main input stream into that variable. Now the record is not split into fields
|
||||
under these circumstances and variables like NF the last field is not changed because the field
|
||||
splitting process has not been invoked. However since the main input stream is being read things
|
||||
like NR, the variable NR which is number of records will be changed because these are being counted
|
||||
by Orc. I haven't gone into great detail about the side effects of this. You can find more about
|
||||
it in the manual. There's also a possibility of reading from a file not too dissimilar from the
|
||||
way print and print F work as we saw in the last episode. You would write get line then a less
|
||||
than sign and the name of a file. The name of the file has to be a string expression or a variable
|
||||
and the expression representing the file can also be used to close that file. So there's a little
|
||||
snippet here which sets a variable input to some other variable path a slash in double quotes
|
||||
and a variable file name. So the assumption is that path and file name are two bits of get you to
|
||||
a particular file and then you put slash between them you're on a unique system. Then get line
|
||||
less than sign input will open that file and read from it and then once that's happened you can type
|
||||
close an in parentheses input and it will close that very far and using variables for this is
|
||||
extremely wise because otherwise you'd have to rely on your ability to exactly type the same string
|
||||
twice or about the noises off. Okay so you can also of course read from a file into a variable.
|
||||
So there's reading one line at a time as we said so you can read from that file into into a variable.
|
||||
I've got an example which is org15 underscore ex4.org which it actually consists of a script that
|
||||
reads from fruit names, the file fruit names that we created in the previous episode. These two
|
||||
episodes were actually one originally so they sort of refer to one another a bit but so what it
|
||||
actually does is it's all done in the begin rule. No other rules in this script and what it's doing
|
||||
is just just reading in the file and printing it out. I did add a few another fiddly bit into it
|
||||
so when you if you look at it it's looking at a variable called argc all in capital say argc so
|
||||
we need that to be two because it actually includes the the name of the script as the first element
|
||||
and so we need that when the script is invoked we need it to have an argument referring to the file
|
||||
you want to to process. So it checks to see if it is two and if it's not it prints out needs a
|
||||
file name argument and it sent it to std e2 standard error output and exits. I just put that in
|
||||
because I thought it would be useful to show how you can you can do that type of thing. Then the
|
||||
actual data file is picked up from the array argv in capital square brackets one so that's that
|
||||
first element. Did I say yeah it needs to be there's two elements in it there needs to be two
|
||||
elements in it but they're addressed as zero and one. I think I didn't make that clear enough.
|
||||
So we have a while loop and in the while loop we have in parentheses get line line less than data
|
||||
so data's got the name of the file so it's going to be reading from that file and after the
|
||||
parenthesis get line with its various arguments we have a greater than zero so we're looking to see
|
||||
if the answer if the value that comes back from get line is one or zero because when it's zero
|
||||
there's no more data history is the end of the file and the loop just has one command that it is
|
||||
invokes which is print line so it's get lines read into a variable called line and it's simply
|
||||
printed out and then after that while loop there's a close command which in parentheses uses the
|
||||
variable data so it closes that file so very very very trivial it simply reads the file and prints
|
||||
because as a seasoned orc user you will be aware that you could simply have written this as on
|
||||
the command line orc single quotes open curly brackets print close curly brackets close single
|
||||
quote space fruit names and it would have done exactly the same thing about anywhere near as much
|
||||
fuss but this was for demonstration per next the key um get line facility gets a bit more
|
||||
sophisticated and you can read from a pipe in a walk now the way you do this is to provide a
|
||||
command a vertical bar and get line or command vertical bar get line and then the name of a variable
|
||||
read what what happens is that the get line the orcs runs the command as a subprocess and it gets
|
||||
lines from that command and either does usual splitting field splitting or it stores it in a variable
|
||||
so org15ex5.org is a simple orcscript which runs as its command which is being stored it's all
|
||||
it's all within begin rule the command sort called cmd is wget command so you need to have wget
|
||||
installed on your urlinux system or indeed a bsd system if you wish wget space minus or hyphen
|
||||
log case q then url which is the hack of a radio stats page or read out it's here in the notes
|
||||
then hyphen capital O that means output to then that's followed by a file name which is simply
|
||||
hyphen in which in which case it means to output it to this it's standard out channel well that's
|
||||
all in double quotes so it's a string for org so then there's a while loop which does a similar
|
||||
thing it inside the parentheses of the the test that's done every time the loop runs each iteration
|
||||
it's got cmd in parentheses vertical bar get line close parentheses and then we compare the output
|
||||
from that to zero we want it to be greater than zero because once the output ends then get
|
||||
one more return to zero which means stop so inside the loop which has got a body with curly
|
||||
brackets in closing it because it's a bit more complicated than the previous while loop we used
|
||||
we've got an if statement where it's testing to see if dollar zero and then a tilde meaning
|
||||
compare this with regular expression and the regular expression is carrot that up our old thing
|
||||
shows in q colon so we're looking for a line that begins with shows in q close parentheses there then
|
||||
if that matches then we want to print f q shows on hpr percent d we isn't print f did i say that
|
||||
percent d backslash in and we want to print out field number four once the loop has completed then
|
||||
we close the pipe which we do by giving close the command that we set up earlier in variable cmd
|
||||
so the statistics is a number of lines stats you get from each there's a number of lines which
|
||||
contain various attributes of current state of hpr one of them is the number of
|
||||
shows in the q and what this does is it it picks out just that particular piece of text
|
||||
so when you run it and i just run it in real time and it comes back and says q shows on hpr
|
||||
colon 27 because there's 27 in the q just at this precise moment which is the 23rd of april
|
||||
so i did another example which is essentially the same but uses a slightly different approach
|
||||
and this is a 15e x6 but we're using get line var named variable to store the stuff so it's the
|
||||
same command is the same there's a while loop what while loop does is simply gets lines from the
|
||||
the server and it just doesn't do anything at all with them it simply gets them one at a time
|
||||
until they've all been collected and then the connection is shut down but what that means is
|
||||
that the last line that came back can is still still stored in the variable line so we use split
|
||||
to chop that up into an array called fields using a comma as the delimiter then we can print out
|
||||
q shows on hpr colon space percent d backslash in as the format spec for printf comma fields square
|
||||
bracket 10 the 10th item 10th element of this last line which is a comma separated line contains
|
||||
a number of shows that are in the q so you get back the same answer 27 just to demonstrate that's
|
||||
a different way of doing so the last thing i want to say about get line is that orc provides or
|
||||
this is canoe orc some of the other orc variants don't offer this but there's the capability of
|
||||
accessing a co-process and a co-process is a sub-process but it can be written to
|
||||
and read from so in the context of the print and print f commands we can send data to the
|
||||
process the co-processes with the sequence vertical bar ampersand as an operator not just a plain pipe
|
||||
but with an ampersand after it and i already mentioned this in the last show number 14
|
||||
and not too surprisingly you can use get line to read this data back using the same operator
|
||||
it's you can bring it back against fields or you can put it in a variable so i'm not going to go
|
||||
into a lot of depth this is quite advanced and there's a lot of it a lot of information about it
|
||||
in the canoe orc uses guide there's a get line and go and co-processes section and there's a whole
|
||||
subject of two-way IO you can write some quite sophisticated stuff using this so i've written a
|
||||
simple thing which i've called org15 underscore ex7 and it demonstrates a thing that you could do
|
||||
with this feature now in this particular example i've got an sq like database which i haven't
|
||||
provided for download this is a copy of one that i used to keep track of the hpr episodes on
|
||||
the internet archive this is going to be added to the next database design but won't
|
||||
sustain alone database and for the purposes of this example it's called orgtest.db now the way
|
||||
that you talk to the database is by sending it commands in structured query language i have
|
||||
mentioned this in other shows you might be aware of it but the essence of what what what we're
|
||||
going to do here is to send it to command which consists of select which is the sort of from
|
||||
the verb used in sql or structured query language which lets you get data out of a database
|
||||
select space then id comma title these two fields of the database that i have defined ideas the
|
||||
show number title is the show title from is the next part of the sequence and episodes is the
|
||||
name of the table then follow that with where id equals and then some placeholder semicolon
|
||||
we don't actually type the placeholder in this particular case but what we're going to do is
|
||||
we're going to use a print f to generate it so whatever goes in that placeholder you'll get back
|
||||
the answer in the form the show number and the title for a given hpr show so what we have in this
|
||||
script is we have two rules i've got a begin rule where we're declaring things and we're declaring
|
||||
db a variable called db which is being set to orc test dot db the name of the file
|
||||
telling the little database with a command the command is sqlite 3 that's the the command
|
||||
which you use on the command line which must be followed by the name of a database which you can
|
||||
then either use interactively or you can feed it commands through that that route and then the
|
||||
third variable is called query tpl i tend to use tpl to mean template and it in it it's a it's a
|
||||
string it's actually a template for print f or format template and that select id title from
|
||||
episode to id equals thing i mentioned before is is in it and the placeholder is percent d and
|
||||
a semicolon backslash in so that's the begin rule and it set these variables up then what we want
|
||||
to do is to read the script wants to read numbers and these numbers will be show numbers that
|
||||
it's to interrogate the database for so the test that we're using for this rule is that dollar zero
|
||||
the entire line matches a regular expression which consists of the the digits naught to nine one or
|
||||
more times with nothing else on the line starts on the line and it it ends the the line ends after
|
||||
the last digit could have been more sophisticated then a light spaces around it but I didn't think
|
||||
it was worth the trouble for this demo so this particular rule then uses print f with the format
|
||||
that we already declared called query tpl and we feed it dollar zero as the variable that's
|
||||
going to be fed into that command that's sql command we send that to the variable cmd which is
|
||||
running as a co-process and we do it through a vertical bar and ampersen so what that will do
|
||||
the first time it's invoked is it will cause the co-pressors to start up and it will feed
|
||||
the co-process will be running sql light on the database expecting individual commands to come in
|
||||
and the first command it will get will be generated by this print f then the next line is using
|
||||
the command on the left side and a vertical bar and an ampersen with get line following it
|
||||
and get lines followed by the name of a variable which is result so command vertical bar ampersen
|
||||
get line space result so what that will be doing is it will be talking to the co-process and we'll
|
||||
be pulling back anything that is produced by that query onto the database as the variable result
|
||||
and the last line is print space result who prints its content so when I've actually done there's
|
||||
many ways that this could be run the simplest one for the demonstration purposes be to feed it
|
||||
some numbers in a file which is what I did I called it what 15ex5 data but I haven't included it
|
||||
in the show because it's no point it's just a line with just a file with three lines in it and
|
||||
I've included the lines the numbers per one per line 27612789 and 2773 so when you run it with this
|
||||
data file it just simply returns 2761 HPE Archimension use of February 2019 2789 pacing in storytelling 2773
|
||||
lead acid battery maintenance and calcium charge volt that's that's all that I mean it looks pretty
|
||||
simple the the the process the co-process will just keep running until it till the orkscript runs
|
||||
out of data when the orkscript runs out of data it will simply exit when it exits the co-process
|
||||
will be killed off by ork you could if you wish to do an explicit close on that co-processor and
|
||||
that would that would make it go away I didn't do that here because it didn't seem to be entirely
|
||||
necessary to do but so you get some sort of idea of how you could be running a co-process
|
||||
which is just sitting there waiting for stuff to be thrown at it and coming back with answers
|
||||
and you can write a script which will converse with it okay that's all I'm going to say then
|
||||
about get line this particular show I'm going to finish off with a finale which is pretty much
|
||||
an announcement now there's a lot more that could be said about this redirection subject input
|
||||
and output as well as about co-processes as we said and there's many more subjects within
|
||||
GNU more that could be examined but we feel that now's the time to bring this series to an end
|
||||
be easy and I feel that the areas of ork a GNU ork that we've not covered in this series might be
|
||||
left that's left for you to investigate further if you have the need we both feel that ork is a
|
||||
very useful tool in in many respects but doesn't stand comparison with more advanced scripting languages
|
||||
such as Python, Ruby and Pearl. Pearl in particular borrowed many ideas from ork and has extended
|
||||
them considerably over the years and Ruby was designed with Pearl in mind and although it's
|
||||
probably done some of the things as a language better than Pearl and Python which came out the
|
||||
subject from a different angle has innovated enormously and is in extremely widely used language
|
||||
so there are others which I won't go into but just to give you a flavor of the fact that there's
|
||||
many other languages which are good for text processing other than all so although GNU
|
||||
wants advanced considerably since it was created I think it shows its age quite a lot and its
|
||||
usefulness is a bit limited now there are cases where quite complex scripts might be written in all
|
||||
but the way most people tend to use it as part of a pipeline or inside shell scripts are various
|
||||
sorts where you might write a complex script in Pearl Python or Ruby for example taking on a large
|
||||
project solely in ork seems like a pretty bad choice today so before we wind up this series it's
|
||||
planned to produce one more episode number 16 and in it Beasy and I will record a show together
|
||||
exactly how I'm not sure I'm more perhaps but something more sophisticated perhaps at the time of
|
||||
writing at the time of recording there's no time scale though we don't want to let it sit for
|
||||
too long but we'll endeavor to do this as soon as our schedules allow and we really wanted to
|
||||
review what has got us here and give a bit more information but why we feel it's not worth
|
||||
carrying on any further with the with the series and just sort of give you our two different
|
||||
views on what we've been doing over these years now we've been doing this for a couple of years
|
||||
a bit more not sure gonna have the dates to hand but anyway that's that's the the plan so I hope
|
||||
you've enjoyed the series as a whole and have found it useful okay that's it bye bye
|
||||
you've been listening to hecka public radio at hecka public radio dot org we are a community
|
||||
podcast network that releases shows every weekday Monday through Friday today's show like all our
|
||||
shows was contributed by an hbr listener like yourself if you ever thought of recording a podcast
|
||||
and click on our contributing to find out how easy it really is hecka public radio was found
|
||||
by the digital dog pound and the infonomican computer club and it's part of the binary revolution
|
||||
at binrev.com if you have comments on today's show please email the host directly leave a comment
|
||||
on the website or record a follow-up episode yourself unless otherwise stated today's show is
|
||||
released on the creative comments attribution share a like three dot org license
|
||||
Reference in New Issue
Block a user