Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
300
hpr_transcripts/hpr2163.txt
Normal file
300
hpr_transcripts/hpr2163.txt
Normal file
@@ -0,0 +1,300 @@
|
||||
Episode: 2163
|
||||
Title: HPR2163: Gnu Awk - Part 4
|
||||
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2163/hpr2163.mp3
|
||||
Transcribed: 2025-10-18 15:08:36
|
||||
|
||||
---
|
||||
|
||||
This is HPR episode 2,163 entitled Genoaq Part 4 and is part of the series Learning Auk.
|
||||
It is hosted by Dave Morris and is about 31 minutes long.
|
||||
The summer is recapping the last episode and looking at variables in an Auk program.
|
||||
This episode of HPR is brought to you by AnanasThost.com.
|
||||
Get 15% discount on all shared hosting with the offer code HPR15.
|
||||
That's HPR15.
|
||||
Better web hosting that's honest and fair at AnanasThost.com.
|
||||
Hi everyone, this is Dave Morris and this is episode 4 in the Genoaq series.
|
||||
Be easy and I are progressing with this and we've now, well this is episode 4 as I say,
|
||||
so it means we now have a series which we've called Learning Auk so they're all joined together
|
||||
so you can find them easier and that sort of thing.
|
||||
Okay, so what I'm going to start with this time is a recap of the previous episode
|
||||
and then I'm going to go into a bit more detail about variables in Auk.
|
||||
So in the last episode you saw logical operators, they're also called Boolean operators.
|
||||
If that means anything to you, Boolean algebra and that type of thing.
|
||||
Boolean algebra has not and an all operators.
|
||||
Well in Auk the double ampersand means and double vertical bar or pipe symbol means Auk.
|
||||
One that wasn't covered was the not operator which is an exclamation mark.
|
||||
So we can generate some quite complex Boolean expressions with this but
|
||||
I'll leave that or we'll leave that, I'm not sure who's going to do this but we'll leave it
|
||||
till later because we want to deal with this in the context of other Auk statements in
|
||||
in an Auk program walk script so we'll expand on this a bit later on.
|
||||
You also saw last episode the next statement and this we discovered is a way of stopping
|
||||
processing on the current input record so it really does abort everything.
|
||||
No more patterns are tested against it.
|
||||
The pattern that's currently executing in the current rule I should say the actions
|
||||
in the current rule are finished with stopped at that point.
|
||||
It's a statement in a similar way to things like print and so you can't use it anywhere else
|
||||
other than in the action part of a rule and you can't use it in begin or end rules either
|
||||
and I'm going to talk about that in a minute.
|
||||
So beginning and end, beginning and end are actually patterns.
|
||||
They're in capitals, capital B E G I N and E N D.
|
||||
They're patterns which are special and they have to have to work with an action.
|
||||
You can't have either of them without an action.
|
||||
The action is being in curly brackets as you know and the whole Shibang
|
||||
begin an action, end an action, make up rule in the same way as we've seen with the pattern
|
||||
action sequences.
|
||||
So the begin stuff is run before the main pattern action rules are processed.
|
||||
That is the input file or files are read.
|
||||
End rules are run after everything's been been read and processed and you can have more than one
|
||||
begin and more than one end doesn't actually matter which order they occur in in terms of
|
||||
the begins versus the ends but if you have multiple begins then they are executed in the order
|
||||
that they are encountered similarly with end.
|
||||
So in the last episode we also started to look at variables.
|
||||
It's it's difficult when describing this sort of issue.
|
||||
This is effectively a language we're talking about here or
|
||||
and you can't really start at the beginning and work through it at the end because there
|
||||
isn't really a beginning you know because it's it's quite difficult to find a linear path through
|
||||
it. So we're sort of going ahead into areas that haven't really been explained yet just to
|
||||
demonstrate certain functions and processes and so on.
|
||||
So there was a bunch of things that were commented on that were shown last episode.
|
||||
Variables arrays and loops and that sort of thing.
|
||||
So we're going to look at all of these in a bit more detail in this episode so I'm trying
|
||||
to consolidate them all. Okay that was a quick recap of where we were from last episode
|
||||
and I now want to start talking about variables in relation to ORC.
|
||||
They've already seen things like NR capital NR capital NF which is the record number and the
|
||||
field number in the early part of the series and in the last episode you saw that you can
|
||||
create your own variables too. So what's a variable? Well as you find in most other programming
|
||||
languages it's a named storage area that can hold a value and it has certain rules about how
|
||||
you construct the name. It consists of letters, digits and the underscore in the case of ORC
|
||||
and it mustn't start with the digit. The case of the letters is significant so lower case sum
|
||||
and capital S, lower case UM and capital S capital M are three variables that you would
|
||||
speak them the same but they're different. The other name for these types of variables they
|
||||
can just hold a single value they're called scalars. You might see that name I'm mentioning
|
||||
these because you might see them if you look in the manual. So variable in ORC can contain a numeric
|
||||
value or a string value. ORC deals with the conversion of one of these to the other as appropriate.
|
||||
Sometimes it might mistake if you like to put it that way what it was you intended it might
|
||||
need some assistance but we'll refer to these later. Now one of the things you learn as a
|
||||
somebody learning programming or did so back in the day when I was learning this sort of stuff is
|
||||
when you create a variable in the language you need to initialize it because there's no
|
||||
definition of what it contains before you use it but in ORC that's not so. All variables begin
|
||||
as an empty string and an empty string is the equivalent of zero if you need to use it as a number.
|
||||
So how do you set variables to values? Well you do it as you do in most languages you
|
||||
use an assignment so I've given an example here count equals three that's an assignment count is
|
||||
the name of the variable the equals is the assignment operator three is the value you're going to put
|
||||
into it and last episode saw an assignment like used usd plus equals dollar three what this actually
|
||||
means is increment the contents of variable used the variable with the name used by the contents
|
||||
of field three so uses the variable plus equals is this special type of assignment and dollar three as
|
||||
you already know means field three there is an assumption here that dollar three contains a
|
||||
numeric value but we'll come on to what would happen if it didn't a bit later on it's a
|
||||
shorthand version of used equals used plus dollar two so what that means is add the contents of
|
||||
used to the contents of field three and then save the result back in the variable used. So the first
|
||||
time the variable is incremented its contents are taken to be zero and as I've said it used to be
|
||||
that if you were writing in C or Fortran or Pascal or one of those sorts of older languages
|
||||
compiled languages you you could not get away with it but in Orc and many other scripting languages
|
||||
these days it's it's not a problem so we've started down the road of looking at arithmetic operators
|
||||
so I thought we would stop and look at the whole the whole list it's a pretty short list but I'll
|
||||
just go through them briefly there's a table in the in the long notes here which you can refer to
|
||||
if you need to but you've had any experience of programming most of these will be very very obvious
|
||||
to you one thing to note before we proceed is that all numbers in Orc are floating point numbers
|
||||
that is they have a decimal point in them this can catch you out in some edge cases because comparing
|
||||
floating point numbers for equality doesn't always give you the result that you would expect one
|
||||
but we'll we'll highlight these as we go along what I've done here is to put together a list
|
||||
bake based on what's in the Gnuwark user's guide and as before there's a reference to it if you
|
||||
want to go and examine it yourself I've listed in them as they do in the order of their precedence
|
||||
from highest to lowest so the first one is the circumflex character which is exponentiation so
|
||||
x circumflex y means x raised the power of y so something like two circumflex three that's
|
||||
two to the power of three which has the value eight in Orc there is a double asterisk operator which
|
||||
is does the same job but it's not the standard version Gnuwark and it is slightly different from
|
||||
standard Orc so we're trying to stick to pretty much the mainstream stuff as much as possible
|
||||
because otherwise you you might get caught out if you try and run your Orc script on a different
|
||||
machine a different system a BSD system wasn't another perhaps a Mac or something so we're not
|
||||
going to use the double asterisk operator so a minus sign in trying to put a variable or a number
|
||||
obviously negates it plus sign in front of one is unri plus and that's actually a way in which
|
||||
you can tell Orc to treat a variable as a number and I was typing this out I was trying to think
|
||||
of cases where you'd want to do that and I couldn't come up with any but hopefully some will
|
||||
occur to me as we go along the asterisk is multiplication the forward slash is division and there's
|
||||
a note here which is that because all numbers in Orc are floating point the result is not rounded
|
||||
to an integer so three divided by four which would be written as three forward slash four it has
|
||||
the value 0.75 whereas if you did the same thing in bash for example which is purely integer you
|
||||
typed something like echo dollar open parenthesis open rendsis three slash four close parenthesis
|
||||
you'd get the answer zero because it's rounded it to an integer to all number the percent symbol
|
||||
is the remainder after division so x percent y is the the remainder after x has been divided by
|
||||
one so three percent four is three so it doesn't it can't be divided by four there's and the
|
||||
remainder is three five percent two is one because two goes into five twice leaving one remainder the
|
||||
plus sign is also addition so x plus y so you'll be meaning and the hyphen
|
||||
the minor sign is subtraction x minus one so pretty obvious so if you've already seen
|
||||
the plus equals operator this is an assignment operator these are shorthand forms of more verbose
|
||||
assignments which is we've already looked at in one particular case so I put together a table
|
||||
which is a modification of the GNUORC user's guide table showing all of these operators so you might
|
||||
do plus equals minus equals asterisk equals slash equals percent equals and circumflex equals and
|
||||
I think you probably get what that means in in all of the cases let's just look at the last one
|
||||
circumflex equals so if you wrote variable circumflex equals power so you might
|
||||
might type x circumflex equals two what that means is raise x to the power of two so x becomes x
|
||||
squared I wrote a little script just to demonstrate these things and it's available if you want it
|
||||
and it's called arithmetic assignment operators dot org and it's I've listed its contents and it's
|
||||
simply a bunch of expressions statements which use these various operators and print out the
|
||||
result yet the whole thing is in a begin rule because we don't want the script to actually
|
||||
do any file processing it's just doing a little demonstration of its internal computation
|
||||
capabilities as I've written it say for example the first line after the begin it is x equals 42
|
||||
semi-colon print quotes x is close quotes comma x so there are there are two statements there one
|
||||
is the assignment statement which sets x to 42 the second one is a print which prints out x is
|
||||
the string x is followed by the contents of x so there there's a semicolon between them if you
|
||||
write two statements on a line then you need semicolons between them they could have been written
|
||||
on two successive lines but I just thought a little bit of need to doing it this way so you need
|
||||
semicolon statement separators if there are multiple statements on a line but you don't need them
|
||||
if there if there's only one statement per line so there's no semicolon on the end if you're used to
|
||||
other languages where this is necessary then orc doesn't make it so it doesn't matter if you
|
||||
put a semicolon on the end of the line as well if you want to there's something to be said for
|
||||
doing that I guess but you don't need to okay so I've got an example here of what happened when
|
||||
you run this and I'm not reading you that because it's pretty obvious so let's talk about type
|
||||
conversion so variable can contain a numeric value or a string at any point in time as we've seen
|
||||
when converting from a number to a string then what you get is a string containing the number
|
||||
a little bit more to it than that but we'll leave that for another time converting from a string
|
||||
to a number on the other hand well there needs to be something that can be interpreted as a number
|
||||
within the string in other words it needs to begin with a digit sequence so my little example here
|
||||
uses the string nine gag dot com and if you set into a variable called s and then I set x equal
|
||||
to s plus one and print x so the answer is ten because or pulled the nine off the front of this
|
||||
address IP address and simply added one to it so the the nine off the front was converted to
|
||||
number and then one was added to it if there's no valid number in a string when you come to do
|
||||
this type of conversion then orc will treat it as zero so orc will handle strings containing
|
||||
all sorts of numbers so it'll handle energy numbers like number 42 floating point numbers like
|
||||
4.2 and also exponential numbers and the notation for this which is common in many languages
|
||||
one e three i've used a capital E in this case but it could also be a lower case one e three
|
||||
means one times ten to the to the three so it's a thousand so i've got a little example of
|
||||
these three strings being fed to a print f statement and printed out and the print f uses the
|
||||
g format control letter which we haven't really looked at we're going to spend some time on these
|
||||
control letters a bit later on but the g one is for printing general numbers so it prints 42
|
||||
as 42 4.2 as 4.2 and one e three comes out as a thousand also in last the last episode
|
||||
these are used some operators which consisted of two operators together plus plus i think he used
|
||||
and these are called increment and decrement operators and they increment or decrement the value
|
||||
of a variable by one and if you've been following my series on on bash and parameter expansion
|
||||
or various expansions i covered arithmetic expansion where i talked about these in the bash
|
||||
context you can look at episode 1951 if you've gotten or if you're interested so again i've produced
|
||||
a list of the various variables the various operators i should say so for example plus plus
|
||||
variable name means increment the variable returning the new value as the value of expression
|
||||
so plus plus variable is different from variable plus plus because the first one plus plus
|
||||
means add one to it and then return the result variable plus plus it's called a post increment in
|
||||
this case returns the contents the variable before it's had one added to it then adds one to it okay
|
||||
so this is in a similar pair minus minus variable which decrements it and then returns that value
|
||||
and variable minus minus which returns the value and then subtract one from it there's some
|
||||
examples of how this might be used a little bit later on in the notes so that's scalar variables
|
||||
and but there's also a whole bunch of other capabilities in the shape of arrays within
|
||||
orc or provides one dimensional arrays now there's a little note here to the effect that what does
|
||||
actually allow you to have multi-dimensional arrays traditional orc offers this by a sort of
|
||||
a hacky solution. Gnu orc provides true arrays of arrays but i'm not sure that we're going to
|
||||
cover that in this particular series because it's pretty much on the edge if i wanted to do this
|
||||
personally i would not be using orc to to do it but you you may think otherwise of course but to
|
||||
think it might be a better we'd simply point you at the manual to to go further with this but
|
||||
i thought it was worth just pointing out that there's quite a lot in Gnu orc. The thing about
|
||||
arrays in orc is that they are so-called associative arrays that which also known as hashes so let's talk
|
||||
about what an array is it has a name and its name it's got to conform to the rules we talked about
|
||||
for scalars you can't have an array called the same thing as a as a scalar variable an array can
|
||||
store multiple values and to get at them you use an index since this is a scripting language
|
||||
it's different from compiled languages the arrays can be any length and can be expanded it can
|
||||
contract it at will so given an array let's call it a we might store a value in it so we type a
|
||||
open square bracket one closed square bracket equals and then a string i've put hpr in double
|
||||
quotes double quotes is the way you define a string in in orc by the way so the array name is a
|
||||
the index is one and the contents of a square brackets one is the string hpr so if you if you
|
||||
used to using arrays in other languages you might assume that the index is numeric but it's not
|
||||
it's a string all array indices are strings because orc arrays are these types of things they're
|
||||
associative you use a string as the index into it so it's an associative array or a hash
|
||||
their index but arbitrary string values and they make up a sort of a lookup table it's actually
|
||||
quite powerful capability so in one of the examples in last episode we saw this is just an extract
|
||||
from an from an example nr not equal to one that was a patent open curly bracket a square bracket
|
||||
dollar two closed square bracket plus plus close curly brace so we saw that and here the
|
||||
orc script was being used to produce a frequency count of colors and we were looking through the file
|
||||
file 1.txt which you already have a copy of would imagine field two in this file is the name
|
||||
of a color so what we're doing here is we're using the color name as an index and we're simply
|
||||
incrementing that array element so I've tried to explain it in text and here is what I've typed
|
||||
means index the array a by the string contents of field two if the element doesn't exist
|
||||
created so like this this thing can be used even before there there is an a an array a or an array
|
||||
a with that particular element in it since orcs very relaxed about initialization this array element
|
||||
will be taken to be zero when it's created and then the plus plus on the end will increment it to
|
||||
one if the element already exists then its previous value will be incremented so if you ran
|
||||
this particular bit of code was in the last episode it just went through all of the rows in the
|
||||
file one file one dot txt file then if you could look at the insides of that the array when it
|
||||
had finished you'd find an index with the string brown and the contents would be two meaning
|
||||
that there were two instances of the the color brown so there's an out that means there's
|
||||
an element a open square brackets open double quotes brown close double quotes close square
|
||||
bracket and in that array element there's a number two I also noted that a square brackets dollar
|
||||
two plus plus is the same as a square bracket dollar two close square bracket plus equals one
|
||||
both mean the same thing I don't know you're already there ahead of me we also saw last time
|
||||
the concept of looping through an array to print it out and we had I'll just read this out quickly
|
||||
without going into a lot of details for b in a print b a brackets b so this was a case of sort of
|
||||
rushing ahead into areas that we hadn't really explained yet but it was necessary to get some of
|
||||
the precursor concepts sorted out we haven't looked at looping and other such statements in all
|
||||
yet but we need to look at this one now so we can understand how you would process an entire array
|
||||
so briefly the four statement provides a way to repeat a given set of statements a number of time
|
||||
we'll have a look at this and other related things like while and do and so forth later on
|
||||
this particular variant of the four statement allows the processing of arrays and it consists of
|
||||
the word four followed by in parenthesis variable name followed by the word in followed by the name
|
||||
of an array so it's saying for every element of this array and then the four statement is followed
|
||||
by one or more statements which are being controlled by it so the expression variable in array
|
||||
results in all of the index values in the nominated array being provided one at a time and while the
|
||||
loop runs the variable is set to the successive index values and the body the body part the dependent
|
||||
statements are executed now the body part can be a single statement or a group if a group is used
|
||||
then you have to put curly braces around them but if there's only one statement you only need
|
||||
you don't need any any curly braces now one thing about the way or works is that the order in which
|
||||
the array index values are provided is not defined so they sort of come out in a in a sort of random
|
||||
order it's not really random but it's a it's an arbitrary undefined order different orc versions
|
||||
will use different orders in the way it processes this now GNU org does can have extensions which
|
||||
allow the ordering of this the index values to be controlled but we'll deal with this later so
|
||||
let's just look at the example from the last episode and I've made some modifications to it
|
||||
change names and that type of thing later that slightly differently just to demonstrate the concept
|
||||
a little bit more clearly and this particular example is in a file which you can download from
|
||||
the hpl website and it's called color underscore count org I've used the american spelling because
|
||||
be easy what had used it throughout his example and I know it's basically his example I've stolen
|
||||
and hacked around so the array has been renamed from a to count because it holds counts or frequencies
|
||||
of the number of times of colors encountered the raised index by the names of the colors in field
|
||||
two and when we look through the array in the end rule we use the variable color to store the
|
||||
latest index and I took out semi-colons and curly braces that were not really necessary just to
|
||||
really demonstrate that they could be removed without any any problem so I'll not read this one
|
||||
out because you've seen the just of it last time so you might want to check this one out just to
|
||||
see how different it is in terms of its layout and use of use of variable names and so on
|
||||
so when it runs it does the same as the previous version does it prints out a list of it actually runs
|
||||
against the file called file1.csv and it prints out a csv list comma section separated variable list
|
||||
that should be consisting of the headline color comma count and brown comma two purple comma two
|
||||
etc so it's just the count of the number of occurrences of those colors so to finish then I want to
|
||||
just mention the built-in variables that we've seen so far and we saw another one added to the list
|
||||
in the last episode so this this one is called FS capital F capital S and it stands for field
|
||||
separator and it's the internal variable within the org script that matches the the minus capital
|
||||
F option that you give to the command so for example minus capital F and then in double quotes
|
||||
a comma on the command line is the same as assigning FS equals double quotes comma close double
|
||||
inside the script the statement FS equals etc needs to be in a begin rule in order that it can be
|
||||
set early enough in the script you can't put it in a pattern action style rule because that happens
|
||||
too late you've already grabbed the first record by then most likely or a record at least and
|
||||
and and that that was separated out based on whatever the default separator is which is a space
|
||||
so I've just put a little example here where org hyphen f double quotes comma quote begin curly
|
||||
bracket print quotes FS is close quotes comma FS close curly bracket close quote and you get the answer
|
||||
FS is comma so you can see see there it's just to demonstrate the point really we also saw
|
||||
OFS the variable which is all in capital's OFS which is the output field separator and it controls
|
||||
the the format of the output record produced when you use the print statement and normally it's set
|
||||
to a space so we're giving example here where org is run it and there's a begin rule within it
|
||||
and it simply consists of print followed by in double quotes hello comma and then in double
|
||||
quotes world so there's two two arguments to the print command separated by a comma and the answer
|
||||
produced is hello space world because the comma whenever whenever you put a comma in one of these
|
||||
things it it tells the print statement to output the OFS variable contents which is a space
|
||||
by default so if you were to do my second example which is pretty much the same except that instead
|
||||
of there being a comma between hello and world two separate strings there's nothing then you get
|
||||
hello world with no space in between them and that's because all because seen these two strings as
|
||||
the arguments to print and it can coordinate them together and given print one argument which
|
||||
consists of the string hello followed by world no spaces OFS variable can be set to a string if
|
||||
you want to so I did a rather silly example where I set in a begin rule OFS equals double quotes space
|
||||
blog close a space double quotes semicolon print hello comma world as before and then you get
|
||||
out instead of a space between hello world you get the word blog just proves the point can be
|
||||
useful sometime now the OFS variable only affects the behavior of print not print f so there's an
|
||||
example here showing OFS being set to a tab character then we print out using print f hello world
|
||||
and it comes out without a tab in it as no effect so with this just reiterating that print f is
|
||||
always followed by at least one argument and that first argument has got to be the control
|
||||
the format string which specifies how what it's to print out and how it's to be formatted and then
|
||||
it can be followed by any number of further arguments separated by commas so this one has got
|
||||
the first argument is this string that's got percent s space percent s backslash in in order to get
|
||||
a new line the end of it and then comma then the string hello comma and then the string world
|
||||
so the three arguments in total
|
||||
you've been listening to hecka public radio at hecka public radio dot org we are a community podcast
|
||||
network that releases shows every weekday Monday through Friday today show like all our shows
|
||||
was contributed by an hbr listener like yourself if you ever thought of recording a podcast
|
||||
and click on our contributing to find out how easy it really is hecka public radio was found
|
||||
by the digital dog pound and the infonomican computer club and it's part of the binary revolution
|
||||
at binrev.com if you have comments on today's show please email the host directly leave a comment
|
||||
on the website or record a follow-up episode yourself unless otherwise status today's show is
|
||||
released on the creative comments attribution share a live three dot org license
|
||||
Reference in New Issue
Block a user