hpr_transcripts/hpr2554.txt

Episode: 2554
Title: HPR2554: Gnu Awk - Part 11
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2554/hpr2554.mp3
Transcribed: 2025-10-19 05:27:31

---

This in HPR episode 2,554 entitled Gnurk Part 11, and in part of the series Learning Ork,
it is hosted by me and in about 28 minutes long, and currently in a clean flag.
The summary is, in part 11 of the series, we string and umber built-in functions.
This episode of HBR is brought to you by an Honesthost.com.
Get 15% discount on all shared hosting with the offer code HBR15.
That's HBR15.
Better web hosting that's Honest and Fair at An Honesthost.com
Hello Hacker Public Radio fans, this is easy once again.
Coming in with another episode in the series of the AUK language, I guess you can call it.
It's more than just a command.
It's a type scripting language.
And in this episode, I want to talk about at least given an introduction to functions in AUK.
Now there are two basic types of functions.
There are built-in functions and there are user-defined functions.
And I'm not going to talk about user-defined functions.
I'll let that rest on another episode, but instead I want to talk about some built-in functions.
And then two sets of built-in functions in particular.
I want to talk about numeric and string functions.
But if you look at the AUK documentation, there's also IO functions, time functions, bitwise functions, type functions, and functions for doing translation.
Using the I18N protocol.
And I'm going to leave all of that out for now.
I'm going to focus on just, like I said, numeric and string functions.
And these types of functions are common types of functions that you will find in many programming languages.
And so let's just get right into it.
I'm going to do this episode a little bit differently than the previous ones.
I'm going to try to stay on the path of, I guess, my last episode, which is keeping out a lot of the commands.
And trying to do a summary of it and do an explanation instead, because I find, at least for me, when I'm listening to myself talk, I get a little bit bored with that.
So I'm going to go right into it.
So let's start out with some of the built-in functions.
Now the documentation puts the numerical functions first, and I'll do the same thing.
Now they include A10 to Coss, EXP, INT, Log, RAND, and SIN, SQRT, Srand.
And if you've noticed the names of those, there are similar to a lot of other functions that you might have heard of.
So A10 is an Coss and SIN.
Those are all geometry functions.
So you know, it's sine cosine and arc tangent.
So those should be familiar to you already.
And they're called in a similar way that you do anything else in an arc.
So instead of just saying in your AUX statement, $2, if you know that second argument is a number, you could do cosine 2, $2, and the cosine would be in round brackets.
And you can also do things like first set the variable, say n equals $2, and then say cosine n, and that works just fine.
And the way the function works for these geometric functions, they are all in radians.
So A10 has two arguments and an argument are the things that go inside the parentheses, Y and X.
So if you do geometry, you know, the way you connect the arc tangent is, you know, the hypotenuse kind of thing.
Don't make me go back to learning math again. I don't want to have to do it.
But all these things are in radians as one thing to know.
And then to one of the things that's really cool in the documentation, it tells you that you can calculate pi by using A10, 0, negative 1.
And that's going to give AUX closest approximation to the pi, the number pi.
So cosine X, like I said, where X is the angle in radians.
And same thing with sine X or SIN X, where X is in radians.
So now let's go to some of the other ones. We have EXP X, EXP in parentheses X.
And that is exponential. So if you want to do E to the X.
So if I want to say 13 to the 7, I would say 13 times 10 to the 7.
Oh, no, that's not the right one. That's a different one. EXP is E to the X.
So it's the exponential of X. So yeah, E to the X.
And it's going to have a number that's going to depend on the floating point representation available to your machine.
The X is changing this item to the nearest integer.
And it does a weird thing where it doesn't round up.
They always are floored. So if you do in 3.9, the answer is 3, not 4, as you might think.
So there's also log X, which does the natural log of X.
And if you remember, you can get other logs by, if you remember math, you can get not other types of log.
If you do other math using basically dividing that log by another log, I'm not going to go into the math right now.
But that's a, if you need more than just log of X, there's ways that you can get more than just the natural log of X, there's ways you can get that.
One of my, one of the ones that I actually use in my off program sometimes is Rand N.
And that returns a random number.
That's in a uniform distribution between 1 and 0, between 0 and 1.
So, but it is never 0 and it's never 1.
So it's always going to be some number in between 0 and 1.
And if you want to get a random integer, you'll just do a return int of N times Rand.
And usually Rand N doesn't take any arguments.
So if you just do Rand N, it's going to give you a random number.
But if you, you can make another function and I'll leave this to the next episode to talk about how to do custom functions.
But you can make a custom function is called Rand int with the argument of N and you can make it return int with N times Rand N.
And you'll see that another program language is to that's a way that you can get if your program language doesn't already have a random int.
This is a common way of getting a random integer.
Another thing, another way I've done it is instead of using it since it always floors.
I, you can use another language which is you can use something that will do a round and make it so that it'll go up to the next number up.
It actually gives you a more uniform distribution if you do it that way.
That's beside the point of this particular talk.
So you can do sqrt or square root of x.
And that's just what it sounds like.
So square root of four is to square root of three is nine of nine is three and so forth.
There's another one called srand.
And that one is going to give you a random number but it's a more, it's a seated rounded now that's a seated random number.
And so it takes in an array or a list of numbers.
And you can, the way the documentation exposes is that each seed value leads to a particular sequence of random numbers.
So you could say, I want to have five in a row.
And I follow with them all to use the same seed.
I can, I can use four, three, four, four and the last two will be the same.
And one thing that's important to know that you can't setting that seed.
And this is different. Like if you use Python, if you set, if you set a seed and you do a random number and you put that on another computer, you are going to guarantee to get the same numbers.
And an off, you're not guaranteed to get the same value if even if you use the same seed.
And I'm not going to go into what seating means.
The basic idea is that the way a lot of programming languages implement a random number is not truly random.
There is some, they do some cheats to do what they call pseudo random number.
And traditionally, they, they'll just pick a kind of pseudo random seed and then use that as the way to generate the random number.
And you specify that seed and then you little use that seed every time.
But yeah, that's another conversation. And actually, there's a good Wikipedia article.
Maybe I'll put a link to it to that in the show notes.
But yeah, that's, that's pretty much it for the numeric, the numeric functions.
Now, as for the string functions, they are a little bit different.
And I was not very familiar with this until I started working with working on this podcast.
And so there's a couple different types.
There are string manipulation functions and then there's actual string.
So some of them, some of whom work on a raise strings and some of them work directly on the strings.
And I'll go into all those and a little bit of detail, but not too detailed.
And because I don't want this to go on too long, but the basic I, there is a sort, a sort, I,
Gen sub, g sub, index, length, match,
split, split, s print, f, str to numb,
sub, and sub, str to lower to upper.
Some of those I'm not going to go into because they're pretty self-explanatory because like to upper,
you can imagine what that does. And to lower, you can probably imagine what that does.
And then length, you can probably imagine what that does.
But for the rest, let's go into it.
So a sort and a sort, I, are things that I didn't really understand until I read this.
So if you, so if you have an array in a, and let's call that array a,
and a, well, it's not really an array, it's more of a dictionary,
or hash table, is what they call in some languages.
But say a, and then brackets, double quotes last, equals GE.
This is the example out of the documentation.
And then a, and then square brackets, double quotes, first, equals sack,
and a, and then double brackets, double quotes, middle, equals coal.
So as you know, it's called the sack, if it was in the right order, if it was first last middle,
it was both called a sack.
But anyway, if you had those, if you had last first and middle keys in your dictionary
with the values of the sack and coal respectively,
and if you did a, and if you did a sort on that a variable,
it would return something where a, double, inside a square brackets,
one is called a, inside a square brackets, two is the, and a inside a square brackets, three is sack.
What it did, is it sorted them using the value as the, as the sort, as the sort,
and it renamed the, the, the keys, one, two, and three.
It's something that I never, I don't know why it would be useful, but it's really cool that it does that.
And then a sort i is, is similar to that, except for, instead of using the key,
instead of using the value, it uses the key or the, what they call the index,
that's what the i stands for, to do the sorting.
So a sort i on that same variable, called a, would return a one as first, a two, as last, and a three, as middle.
Because it's going to, it's going to use the, it's going to sort the keys and make those the values.
That makes sense.
So something, something that you might find useful.
So if you notice, I put f, first, first, and a one, because f is before l, and l is before m, for middle.
So, um, I, like I said, I haven't seen it for many reasons to use, especially a sort i, unless, um, but I have seen reasons to use a sort.
Now, gensub and g sub are pretty similar.
They are ways to do replacement those texts.
Gensub, you use, uh, regular expressions, uh, to do the replacements.
So you have a regular expression, you have the, uh, that you want to find, you do the replacement text.
And that, so it's regular expression comma replacement text comma.
And then the how, as the last part, which is like the flags that you put on a regular expression.
And then you can also have a comma and then the target, um, after that, as a, if it's not just on this particular variable, you can do it on like a whole target array of, of members.
And then in the, in the example they have, where you have an a, a, which is a, b, c, with a space in the df.
And then b is the gensub of this regular expression, and it gives you out the, the result.
And so the difference between gensub and g sub is that, um, g sub, the target, um, first of all, you don't have to do worry about a how.
And second of all, it's always going to return for all of the, it's going to search for the longest left most non overlapping matches that you search for.
So it's like doing gensub with a comma g.
Anyway, so if, if you want to read more about, um, string, that string function.
And, and this is actually pretty useful. I've, I've never really used, um, gensub that much, but I do use g sub quite, quite a bit where.
Um, and it's a lot more simple, a lot of times I'll just use it, um, kind of like they do in this example, where I'm, I'm really just looking for a specific piece of text or a really simple regular expression and replacing it with a standard piece of text.
And this is useful when you have messy data that you, you're trying to clean up, and you know, um, like you could use said to do this.
If, if the data isn't in a certain format, you might want to use awk instead, where you don't want to replace every time this value shows up, but only when it's in the column three.
This is, um, useful.
And like I said, there are ways to do this and said that are a little bit more, um, am I paying clean.
Uh, but awk allows you to do, um, regular, regular expression, both finding and replacing.
So g, g sub and gen separate ways to do replacement, and then when I get to match and index, you'll be able to see, um, just finding stuff.
So the next one is, uh, is index index, like I said, is a way to find the starting position of an object, um, of, of a regular expression.
Um, and, and so it's going to say, and, and it says you want to find the first argument is the thing it's, it's the target.
And then the second argument is the thing you want to find.
So if I had the word, um, and this is example in the, in the documentation, which I'll put a link to in the, in the show notes, the, um, it's saying,
index of the word peanut and looking for a n in, in peanut.
So the two letters a n. And if you know how to spell peanut, the third character of peanut is where a n starts. So it turns three.
The next one is length. Like I said, that's pretty self explanatory, but it'll print you the length of that string match.
And one thing I don't like about how all says it doesn't, it's not consistent about when you, where you put the, um, the text that you're looking for.
But, um, at least for me, it's, I always have to look back and see.
But, um, so for match, you put the string and then you put the replacement.
And it's, um, we put the regular expression and it's going to return the string with the longest leftmost substring of what you're looking for.
So instead of doing a replacement, you're going to be able to find a substring matching.
And so if I, if I look at this one example in the documentation, it's pretty self explanatory, where you're going to find the leftmost.
The left, the longest leftmost substring matched by the record expression.
And it's going to be similar to, to index where it returns you that character position at which the substring begins.
And it'll return zero if it, if it doesn't find it.
That is different from n, where n is the exact string.
And it'll be an array.
If there are more than one, is that right?
Return the position and characters where that occurrence begins.
Yep, yep, okay.
So no, it'll just be the first time it finds it.
But there's no regular expression on, on index, there is for match.
Uh, so I'm not going to go into Pat's, uh, Pat's split because it confuses me.
And I've never wanted to use it.
So I don't feel like I'll do a good job describing it.
But split, on the other hand, I can do split.
If you've used pearl or used python, you'd be familiar with split where you have a string.
And you have what you want to, if you have a string, you want to split it up into an array using,
uh, a separator as the separator.
So for instance, I'm not going to use example and, and, and this, because, uh, I mean, it's not useful.
But a lot of times you might have, say you're looking at a pipe-delimited, um, file.
And then one of those, uh, rows, it might contain an array of information that are separated by, by commas.
So in, if you're using all you can say, we'll use the field separator of pipe.
But on that fourth column, I'm going to do a split where I'm going to split using the comma.
And I'm going to use, and I make that an array that can then manipulate that array of, uh, of data inside of that column.
Um, and so next one is s print f.
And that is returning without printing the string of a print f.
That would have printed, but it's not printing.
So it's a way to store the results of a print f statement into a variable.
And as I went over print f before, print f is a way that you can, um, you can, um,
put some formatting into, uh, the print statement that you get from an, uh, an, an arc statement.
Um, instead of printing that to the screen, you can store that in a variable by using s print f.
Str to numb is very, um, it makes sense.
You have some character and you want to turn it into a number.
And, you know, this is the way you do that.
So, uh, one of the things that, um, you need, it's, it's the, it's like the, the character representation or the octal of that, of that number.
So the input of str to numb is the octal number.
And so in example, it uses 0 x 11, which, if you put 0 x 11 to str to numb, it returns 17.
Um, I've never had a use for octals, but I'm sure some people do.
It's probably useful for reading machine data that reports out stuff into octal format.
Actually, um, only if you were leading with a zero x will assume it's an octal.
If you don't have a zero or a zero x in front, it'll just think it's a regular number.
So if you just put, if we had a string of the number four, it will convert that into an int four.
Or if we had a string of the number 2.47, and it will turn that into a float.
Uh, the next one is sub, which is, uh, going to search for the, uh, a string and it will allow you to return the longest substring matching the record expression that you, that you put in the, in the expression, which is, you know, obviously very useful.
So I, I've used sub, sub a lot, uh, not just in this, but in other languages too.
Sub str is slightly different where you can, it's, it's more of a, instead of looking for an expression, you're doing, uh, just the number.
You're using the index, you're saying, I want to just do a substring starting at the third character going to whatever length that you want.
So you can see the difference between, if I want to, if I have a string of, uh, super califagilus against palabishes.
If I don't, or there's a, if that's, if that's a string, if I say, sub, and I look for the word super, or I go for, I look for s, I look for super.
And I want to replace super with duper, I would use sub and it'll be duper califagilus has palabishes or whatever.
Um, but if I use sub sr and I say sub sr, that as the string, comma, one comma, five, it'll just put the word super because that is the substring.
So sub, g sub and g sub are related.
And, uh, basically, um, sub sr and kind of, it kind of is like match and index, they were in the same family.
And then the last two are, uh, two lower into upper, which is what it sounds like.
It's turning whatever case that you're currently in into either lower case or to upper case.
So, uh, that's pretty much it.
Uh, I'm not going to go into any more detail. There's a lot of more detail about how to do regular expressions.
And, and if they're going to be greedy or not greedy in doing your, um, regular expressions.
And of course, it's not the same as another language that you might have used.
I mean, uh, and it handles escape characters differently.
And I do not want to turn this into a regular expression, uh, podcast.
So I'll leave that for another date.
But, and I'm already, wow, 27 minutes.
So I'm going to call it a day right here and say, thank you for tuning into another episode of Hagrid Public Radio and continue to hack.
You've been listening to Hagrid Public Radio at Hagrid Public Radio.org.
We are a community podcast network that release the shows every weekday, Monday through Friday.
Today's show, like all our shows, was contributed by an HBR listener like yourself.
If you ever thought of recording a podcast, then click on our contributing to find out how easy it really is.
Hagrid Public Radio was founded by the digital dog pound and the infonomicon computer club.
And it's part of the binary revolution at binrev.com.
If you have comments on today's show, please email the host directly.
Leave a comment on the website or record a follow-up episode yourself.
Unless otherwise status, today's show is released on the creative comments,
attribution, share a light, 3.0 license.