Files
hpr-knowledge-base/hpr_transcripts/hpr2330.txt

239 lines
17 KiB
Plaintext
Raw Normal View History

Episode: 2330
Title: HPR2330: Awk Part 7
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2330/hpr2330.mp3
Transcribed: 2025-10-19 01:24:34
---
This in HPR episode 2,330 entitled Book Part 7, and in part on the series Learning Ork,
it is hosted by being it and in about 21 minutes long, and Karima Clean Flag.
The summer is looping in ork explained by a sleep deprived host.
This episode of HPR is brought to you by Ananasthos.com.
Get 15% discount on all shared hosting with the offer code HPR15, that's HPR15.
Better web hosting that's honest and fair at Ananasthos.com.
Hello Hacker Public Radio, this is your boy B easy, once again, with a little update.
So it's been a little while since the last time I was on, I've been incredibly busy at
work, doing a bunch of stuff that I don't normally do, while I haven't had to do in a while.
And I'll probably do another episode about some of the stuff I'm doing with web frameworks
with Python, and with rest APIs, if people are interested in that kind of things.
And one of the things I want to talk about when I get there is how to use Wireshark to
reverse engineer an API that's going over HTTP and not HTTPS.
So just some of a future thought that I might want to do.
But in the meantime, we have one more episode for the AUK series.
Now this one I want to just do a brief discussion about the looping mechanism in AUK, just like
any other program language AUK has, they've built it to loop over a list of values or arrays.
Now you might think, well, isn't that what AUK is doing the whole time when you have
a whole list of things that are separate by columns, is it just doing a loop through
those?
Yes, it is.
But there might be times when you might want to loop in other ways.
And I'm going to go over some of them right now.
So first I'm just going to talk about the basic syntax of AUK loop structure.
There is the while loop, there's the for loop, and then there's the do while loop.
And I want to start with while.
So a while loop in most program languages, very similar to how it is in AUK.
You have some conditional statement that you're going to continue to loop until that conditional
statement is met.
So for instance, in the example that I'm giving, I'm going to have just a number that is represented
by the variable I, and I is going to start off as one.
And while I is less than or equal to 10, we're going to continue to loop through and process
the thing that I'm going to process.
The first thing I'm going to process is just plain text.
I'm just going to use AUK just to do a for loop as an example.
So the syntax I want to do a begin statement in my AUK file, I'm going to just begin with
the open square bracket, and then inside of there, I have I equals one, semicolon, and
then on the next line, while in parentheses, I is greater than or equal to 10.
So greater than sign is the on the US keyboard, the one next to the M while holding shift.
So it's the comma, but with the shift down equals 10, close parentheses, and then open
a new curly brackets on the next line, print the square of comma, I, and then inside
of quotes again, is, and then comma, I, times I.
So what I'm basically doing is saying the square of I, the square of the value that is represented
by I, which the first iteration is one, is one times one, and then on the last line,
it says I equals I plus one, so that I'm incrementing I by one.
And so it's very simple syntax, you might have seen it, and C or C plus plus or some
of the language, similarly, where you go.
So if I print that out, if I run AUK-F on my while that AUK file, which will be right
in the show notes, it's going to write on different lines, the square of one is one,
the square of two is four, square of three, is nine, four, sixteen, five, twenty-five,
and so on.
So it gets a hundred, a ten, and a hundred.
And then at the last line, have exit, and then close the last square, how bracket.
So very simple, it's a way that you can go through values, and I want to go and do a
little bit more meaningful value example in the future.
But let's just continue on with the examples I'm looking at right now.
So a do while loop is a little different in that it does the thing first, and then starts
a loop.
And so for the example here, I'm going to say begin, and then I equals one, and then
the word do, do, do, and then begin open curly bracket, print the square of comma i, comma
is comma i times i, and then i equals i plus one, and then after that close the curly bracket
and then say while in parenthesis, i is not equal to, so I don't have to say it's not
less than two or anything, I go exit after that.
And so when it first starts, when I first get into the program, i is one, there's a do statement
so it's just going to do what it says to do, which is print that one statement, it's going
to iterate i, now i is now two, and now I get to the while loop, and now that i is two,
it stops.
So if that makes sense, I started out as one, by the time it got to the while loop, it
is two, so stop doing the while loop.
And I use the example of while i is not equal to on purpose, so you could see that it's not
using a less than or equal then, it's not using it just less than, you can do it on just
the equal side, or not equal.
So as you can see, that's the basic structure of a do while loop, not very creative example,
but I just want to explain how it works, and you might be able to think of some ways that
you can use this in all, so for instance if you wanted to sum up a whole bunch of things
and at the end of a grouping, write some information, or at the beginning of a grouping, you would
do the do, you would do a bunch of stuff and then break the continuum at some time.
Now for loop, for loops, work very similarly, the example I want to give is basically the
same example that we just had, which is with the square of i is i times i, in this case
we're going to say begin, and we're going to go for in parentheses i equals one semicolon,
i is less than or equal to ten, semicolon, i plus plus, closed parentheses, and then
open square bracket and then say the print, print square of i is i times i, close the
squares, curly bracket, exit, curls, on the curly bracket again.
So this one, it's a lot like how you do it in C plus plus or in C sharp, how you do
it I loop, for the, where inside of the four, inside of the parentheses, you have three
statements usually, this is the typical way you do it, you have the setter, you have
the conditional, and then you have the anchor renter, or the anchor renter, and in
arc you can use the plus plus symbol to increase the value by one, or you can use the minus
minus symbol to do your piece by the value by one.
Other languages you might see a plus equals one or minus equals one does something similar.
Something to know about arc, and I'm not sure if I've ever seen an example in arc where
you can do plus plus i, or minus minus i, in some languages, if you put the increment
or a decrement or before the variable, it'll do the decrement or increment before the
process starts.
I don't think arc has that functionality, and please comment if you know otherwise.
But in this, so, as you can see, it does the same thing.
Now four has another usage, and if you have ever done anything in Perl or Python, it's
really a lot more like Python, how it handles a for loop one on a write.
So you can say four a and b, or four b and a, and where if you say four a and b, b is
an array, and a is an instance, or the first interval of a sort of like in Perl or for each
loop.
So in example, I want to go here, I'm actually going to use an old example from one of my
earlier episodes that has to do with doing a distinct count on a file.
So in the earlier episode, I had this distinct that arc file that had at the beginning, n
r, nautical one, so the number is nautical one, open square bracket, and then I'm creating
an array right now called a and in square.
So this wasn't explained very well before, but this, because this was about just an example,
but there's an array called a and the second count of a, I am incrementing right now.
So a is the is the whole file of data where n r is nautical nautical one.
And so I have this value in row two that I am that I want to add, I want to sum.
So I'm doing a in brackets, dollar sign two, close the bracket plus plus, but that means
is I'm creating this array and I'm taking the second column of this file that I'm processing
and I'm summing it.
So at the beginning at zero, you don't have to do a setter for an arc to say that is zero,
but zero first when it hits the first value where n r is nautical one with a row number
is not one.
It's adding that value to zero, so it's giving it a value.
And then on the second row, it's adding this value to the last value and on the third
record is going to add this value to the sum of the ones that came before it and so on.
So then after that plus plus, I close the bracket, the curly brace to an end statement.
And inside of curly braces here, I do a four inside of parentheses B and A. So as I was
just talking about before, it's doing, we're declaring a new variable called B for every
row in the new array that I created called A and I'm going to print B. So what you're
going to do since this is happening at the end of an end loop is that you're going to get
just the distinct values of B and A. So I don't know how well I'm explaining that, but the
point being that every time you see that value, you're going to, you're going to do an increment.
I want to do one other one that might be a little bit more clear.
Yeah, I like this one.
So there's another one called sum column dot, I think this mix is a little bit more clear
than the distinct one is.
So I have, in my begin, I have a bunch of things that I'm declaring in my begin statement
at the beginning.
I'm doing a fs equals in parentheses and in double quotes comma, all fs equals comma.
That's saying that fields operator in my input file is a comma and the fields operator
in my output file is a comma.
And then I want to print before, so before anything else, I want to print color, comma,
sum.
So that's my header in my new file, I'm creating, I'm closing the, the begin and then
I'm done.
NR equals not equals one, as we talked about in the previous episodes, whenever, whenever
you do a conditional before a bracket like that, you're basically stating the conditional
statement that you want, that you want to put on doing processing over this file.
So NR not equals one is saying when the file, when the row number is not equal one, so
skip the first row out of the first row.
So while you're not on the first row, do a, um, dollar sign inside of brackets, dollar
sign two plus equals three plus equals, dollar sign three.
So I'm saying I'm making this new state, this new variable inside of this a column, a
array, that is the sum of third column in my file, and then I'm going to end and I'm
saying four B and A, very similar to what will happen before, print B, comma, AB.
And so what this one's doing, it's doing a similar thing as the, as the, the stinks
where I am printing a B, which is, which is the, the row.
And so as we saw in the other example, where we did a distinct list of the colors, this
is getting that same distinct list of the colors.
And so I'm printing that.
And then I'm also taking that array and looking at the beef column in that array.
So beef me, um, looking at for the column that's, for the color that's called, I don't
know, blue, what is the sum of all of the numbers where the, where the color was blue.
So once again, I don't know how well I'm explaining that.
I'm pretty sure Dave is going to come on top of this and explain a little bit better.
But the, the, the basic idea, and it's really simple if you, if you look at the examples
saying you apply it to other examples where you're, you're really just saying I'm taking
this value, um, I'm making this new array called a and I am making the second column of
a, I'm, I'm getting all the distinct colors and with the column that's next to the color
column, which is the third column, I'm summing them.
And then I can print out both the color and the sum amount.
And if you remember, for those files, it was, uh, we would, we were using this file one
dot TXT or the file one dot CSV.
So if I did, um, which I'm about to do, arc dash F, uh, I'm good to, I forgot, arc dash
F, some column that dashed arc on the file column file one dot CSV, I, um, some column dot
arc file one dot CSV, I get a row, the first row saying color sum.
And then the next one saying brown, 13 purple, 12 red, seven yellow, 11 green, eight.
I use stuff like this though a lot of times nowadays, since I'm doing a lot more stuff
programmatically and doing a lot more stuff with data analysts, I'll, I'll do this stuff
in any of the Python or R a lot more now, uh, nowadays, because those are systems that
are a lot more robust for doing this type of analysis, but for certain situations where
either I'm not, I don't have those tools available or it'll take more time, giving those tools
out and starting up a process or an association to somewhere else, arc is a, is a great thing
to use for this.
For instance, I've used this same exact process to look through my, um, through a server's
access log, so go into the engine X or Apache access log and looking for people trying
to that are really slamming on the, uh, on the server and then reporting those, um, it,
you can use it this exact, basically this exact function that I did with the sum column
dot arc file where you would say you, you do, um, I believe those, those you can use
or separate by, uh, by colon.
So instead of saying, um, or FS, uh, FS equals column, comma, you say FS equals, um, colon.
And then you just pick whichever field is going to be the one with the IP address and then
you, so instead of A in brackets, two, it'd be A in brackets, whatever column, um, the
IP address is in and then plus equals whatever column, um, if you want a sum, you can do
a sum or you can just do a plus plus if, so if you just say plus, plus right there, that
means you're just giving the list and then you could see, um, you could do the same print
statement, we say, um, you would say the color, um, the IP address and the count right
next to it.
That is really, really useful.
So you'd say, okay, well, I know my IP address, I'm at, I SSH into the server all the time.
I'm there 15 times.
There's a coworker mind, the SSH in a bunch of times.
Here's the normal traffic that you get on the site, which, you know, there's like a really
long tail of, uh, of regular traffic.
But there's 1500 hits from this, from these two IP addresses that are like one number
apart and you do a look up on them and they're from like, um, Ukraine or something, you're
like, oh, well, that's probably people just, you know, scanning for open ports or looking
for, and sometimes you can actually look at that deeper and see, oh, look, this is a,
they're looking at the slash domain, the domain slash admin and basically, or slash log
in and seeing if those screens exist and trying to, you know, brute force log in.
So it's interesting to see that kind of stuff and see people trying to, trying to get into
your, your sites.
Um, same type of thing for, um, for looking at different type of customer data sometimes
at, from, and, uh, like I said, a lot of times I do it in our Python now, but it's like
a really easy thing to do to be able to do grouping and, uh, counting or grouping and
summing.
That's why I usually use the four loops for, while loops, I really don't have that much
of use for, um, because if I'm going to do, write something that needs that much, um,
of a language support, I'm going to reach for a different tool like Python or our, or
if I really had to, uh, Pearl, I guess, see, I guess she's sharp.
So I, I tend to stay out of everything, um, that needs to be compiled if I can help
it.
It's because what I do, that it doesn't matter how long it takes, um, to, to run, it
matters how long it takes to write.
So that's just my personal opinion, uh, don't hate me for that.
Um, anyway, thanks for, uh, listening to this episode and stay tuned for more.
You've been listening to Hacker Public Radio at Hacker Public Radio dot org.
We are a community podcast network that releases shows every weekday, Monday through Friday.
Today's show, like all our shows, was contributed by an HBR listener like yourself.
If you ever thought of recording a podcast, then click on our contributing to find out
how easy it really is.
Hacker Public Radio was founded by the digital dog pound and the infonomicon computer club
and is part of the binary revolution at binrev.com.
If you have comments on today's show, please email the host directly, leave a comment on
the website or record a follow-up episode yourself.
Unless otherwise status, today's show is released on the creative comments, attribution,
share a like, 3.0 license.