- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
239 lines
17 KiB
Plaintext
239 lines
17 KiB
Plaintext
Episode: 2330
|
|
Title: HPR2330: Awk Part 7
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2330/hpr2330.mp3
|
|
Transcribed: 2025-10-19 01:24:34
|
|
|
|
---
|
|
|
|
This in HPR episode 2,330 entitled Book Part 7, and in part on the series Learning Ork,
|
|
it is hosted by being it and in about 21 minutes long, and Karima Clean Flag.
|
|
The summer is looping in ork explained by a sleep deprived host.
|
|
This episode of HPR is brought to you by Ananasthos.com.
|
|
Get 15% discount on all shared hosting with the offer code HPR15, that's HPR15.
|
|
Better web hosting that's honest and fair at Ananasthos.com.
|
|
Hello Hacker Public Radio, this is your boy B easy, once again, with a little update.
|
|
So it's been a little while since the last time I was on, I've been incredibly busy at
|
|
work, doing a bunch of stuff that I don't normally do, while I haven't had to do in a while.
|
|
And I'll probably do another episode about some of the stuff I'm doing with web frameworks
|
|
with Python, and with rest APIs, if people are interested in that kind of things.
|
|
And one of the things I want to talk about when I get there is how to use Wireshark to
|
|
reverse engineer an API that's going over HTTP and not HTTPS.
|
|
So just some of a future thought that I might want to do.
|
|
But in the meantime, we have one more episode for the AUK series.
|
|
Now this one I want to just do a brief discussion about the looping mechanism in AUK, just like
|
|
any other program language AUK has, they've built it to loop over a list of values or arrays.
|
|
Now you might think, well, isn't that what AUK is doing the whole time when you have
|
|
a whole list of things that are separate by columns, is it just doing a loop through
|
|
those?
|
|
Yes, it is.
|
|
But there might be times when you might want to loop in other ways.
|
|
And I'm going to go over some of them right now.
|
|
So first I'm just going to talk about the basic syntax of AUK loop structure.
|
|
There is the while loop, there's the for loop, and then there's the do while loop.
|
|
And I want to start with while.
|
|
So a while loop in most program languages, very similar to how it is in AUK.
|
|
You have some conditional statement that you're going to continue to loop until that conditional
|
|
statement is met.
|
|
So for instance, in the example that I'm giving, I'm going to have just a number that is represented
|
|
by the variable I, and I is going to start off as one.
|
|
And while I is less than or equal to 10, we're going to continue to loop through and process
|
|
the thing that I'm going to process.
|
|
The first thing I'm going to process is just plain text.
|
|
I'm just going to use AUK just to do a for loop as an example.
|
|
So the syntax I want to do a begin statement in my AUK file, I'm going to just begin with
|
|
the open square bracket, and then inside of there, I have I equals one, semicolon, and
|
|
then on the next line, while in parentheses, I is greater than or equal to 10.
|
|
So greater than sign is the on the US keyboard, the one next to the M while holding shift.
|
|
So it's the comma, but with the shift down equals 10, close parentheses, and then open
|
|
a new curly brackets on the next line, print the square of comma, I, and then inside
|
|
of quotes again, is, and then comma, I, times I.
|
|
So what I'm basically doing is saying the square of I, the square of the value that is represented
|
|
by I, which the first iteration is one, is one times one, and then on the last line,
|
|
it says I equals I plus one, so that I'm incrementing I by one.
|
|
And so it's very simple syntax, you might have seen it, and C or C plus plus or some
|
|
of the language, similarly, where you go.
|
|
So if I print that out, if I run AUK-F on my while that AUK file, which will be right
|
|
in the show notes, it's going to write on different lines, the square of one is one,
|
|
the square of two is four, square of three, is nine, four, sixteen, five, twenty-five,
|
|
and so on.
|
|
So it gets a hundred, a ten, and a hundred.
|
|
And then at the last line, have exit, and then close the last square, how bracket.
|
|
So very simple, it's a way that you can go through values, and I want to go and do a
|
|
little bit more meaningful value example in the future.
|
|
But let's just continue on with the examples I'm looking at right now.
|
|
So a do while loop is a little different in that it does the thing first, and then starts
|
|
a loop.
|
|
And so for the example here, I'm going to say begin, and then I equals one, and then
|
|
the word do, do, do, and then begin open curly bracket, print the square of comma i, comma
|
|
is comma i times i, and then i equals i plus one, and then after that close the curly bracket
|
|
and then say while in parenthesis, i is not equal to, so I don't have to say it's not
|
|
less than two or anything, I go exit after that.
|
|
And so when it first starts, when I first get into the program, i is one, there's a do statement
|
|
so it's just going to do what it says to do, which is print that one statement, it's going
|
|
to iterate i, now i is now two, and now I get to the while loop, and now that i is two,
|
|
it stops.
|
|
So if that makes sense, I started out as one, by the time it got to the while loop, it
|
|
is two, so stop doing the while loop.
|
|
And I use the example of while i is not equal to on purpose, so you could see that it's not
|
|
using a less than or equal then, it's not using it just less than, you can do it on just
|
|
the equal side, or not equal.
|
|
So as you can see, that's the basic structure of a do while loop, not very creative example,
|
|
but I just want to explain how it works, and you might be able to think of some ways that
|
|
you can use this in all, so for instance if you wanted to sum up a whole bunch of things
|
|
and at the end of a grouping, write some information, or at the beginning of a grouping, you would
|
|
do the do, you would do a bunch of stuff and then break the continuum at some time.
|
|
Now for loop, for loops, work very similarly, the example I want to give is basically the
|
|
same example that we just had, which is with the square of i is i times i, in this case
|
|
we're going to say begin, and we're going to go for in parentheses i equals one semicolon,
|
|
i is less than or equal to ten, semicolon, i plus plus, closed parentheses, and then
|
|
open square bracket and then say the print, print square of i is i times i, close the
|
|
squares, curly bracket, exit, curls, on the curly bracket again.
|
|
So this one, it's a lot like how you do it in C plus plus or in C sharp, how you do
|
|
it I loop, for the, where inside of the four, inside of the parentheses, you have three
|
|
statements usually, this is the typical way you do it, you have the setter, you have
|
|
the conditional, and then you have the anchor renter, or the anchor renter, and in
|
|
arc you can use the plus plus symbol to increase the value by one, or you can use the minus
|
|
minus symbol to do your piece by the value by one.
|
|
Other languages you might see a plus equals one or minus equals one does something similar.
|
|
Something to know about arc, and I'm not sure if I've ever seen an example in arc where
|
|
you can do plus plus i, or minus minus i, in some languages, if you put the increment
|
|
or a decrement or before the variable, it'll do the decrement or increment before the
|
|
process starts.
|
|
I don't think arc has that functionality, and please comment if you know otherwise.
|
|
But in this, so, as you can see, it does the same thing.
|
|
Now four has another usage, and if you have ever done anything in Perl or Python, it's
|
|
really a lot more like Python, how it handles a for loop one on a write.
|
|
So you can say four a and b, or four b and a, and where if you say four a and b, b is
|
|
an array, and a is an instance, or the first interval of a sort of like in Perl or for each
|
|
loop.
|
|
So in example, I want to go here, I'm actually going to use an old example from one of my
|
|
earlier episodes that has to do with doing a distinct count on a file.
|
|
So in the earlier episode, I had this distinct that arc file that had at the beginning, n
|
|
r, nautical one, so the number is nautical one, open square bracket, and then I'm creating
|
|
an array right now called a and in square.
|
|
So this wasn't explained very well before, but this, because this was about just an example,
|
|
but there's an array called a and the second count of a, I am incrementing right now.
|
|
So a is the is the whole file of data where n r is nautical nautical one.
|
|
And so I have this value in row two that I am that I want to add, I want to sum.
|
|
So I'm doing a in brackets, dollar sign two, close the bracket plus plus, but that means
|
|
is I'm creating this array and I'm taking the second column of this file that I'm processing
|
|
and I'm summing it.
|
|
So at the beginning at zero, you don't have to do a setter for an arc to say that is zero,
|
|
but zero first when it hits the first value where n r is nautical one with a row number
|
|
is not one.
|
|
It's adding that value to zero, so it's giving it a value.
|
|
And then on the second row, it's adding this value to the last value and on the third
|
|
record is going to add this value to the sum of the ones that came before it and so on.
|
|
So then after that plus plus, I close the bracket, the curly brace to an end statement.
|
|
And inside of curly braces here, I do a four inside of parentheses B and A. So as I was
|
|
just talking about before, it's doing, we're declaring a new variable called B for every
|
|
row in the new array that I created called A and I'm going to print B. So what you're
|
|
going to do since this is happening at the end of an end loop is that you're going to get
|
|
just the distinct values of B and A. So I don't know how well I'm explaining that, but the
|
|
point being that every time you see that value, you're going to, you're going to do an increment.
|
|
I want to do one other one that might be a little bit more clear.
|
|
Yeah, I like this one.
|
|
So there's another one called sum column dot, I think this mix is a little bit more clear
|
|
than the distinct one is.
|
|
So I have, in my begin, I have a bunch of things that I'm declaring in my begin statement
|
|
at the beginning.
|
|
I'm doing a fs equals in parentheses and in double quotes comma, all fs equals comma.
|
|
That's saying that fields operator in my input file is a comma and the fields operator
|
|
in my output file is a comma.
|
|
And then I want to print before, so before anything else, I want to print color, comma,
|
|
sum.
|
|
So that's my header in my new file, I'm creating, I'm closing the, the begin and then
|
|
I'm done.
|
|
NR equals not equals one, as we talked about in the previous episodes, whenever, whenever
|
|
you do a conditional before a bracket like that, you're basically stating the conditional
|
|
statement that you want, that you want to put on doing processing over this file.
|
|
So NR not equals one is saying when the file, when the row number is not equal one, so
|
|
skip the first row out of the first row.
|
|
So while you're not on the first row, do a, um, dollar sign inside of brackets, dollar
|
|
sign two plus equals three plus equals, dollar sign three.
|
|
So I'm saying I'm making this new state, this new variable inside of this a column, a
|
|
array, that is the sum of third column in my file, and then I'm going to end and I'm
|
|
saying four B and A, very similar to what will happen before, print B, comma, AB.
|
|
And so what this one's doing, it's doing a similar thing as the, as the, the stinks
|
|
where I am printing a B, which is, which is the, the row.
|
|
And so as we saw in the other example, where we did a distinct list of the colors, this
|
|
is getting that same distinct list of the colors.
|
|
And so I'm printing that.
|
|
And then I'm also taking that array and looking at the beef column in that array.
|
|
So beef me, um, looking at for the column that's, for the color that's called, I don't
|
|
know, blue, what is the sum of all of the numbers where the, where the color was blue.
|
|
So once again, I don't know how well I'm explaining that.
|
|
I'm pretty sure Dave is going to come on top of this and explain a little bit better.
|
|
But the, the, the basic idea, and it's really simple if you, if you look at the examples
|
|
saying you apply it to other examples where you're, you're really just saying I'm taking
|
|
this value, um, I'm making this new array called a and I am making the second column of
|
|
a, I'm, I'm getting all the distinct colors and with the column that's next to the color
|
|
column, which is the third column, I'm summing them.
|
|
And then I can print out both the color and the sum amount.
|
|
And if you remember, for those files, it was, uh, we would, we were using this file one
|
|
dot TXT or the file one dot CSV.
|
|
So if I did, um, which I'm about to do, arc dash F, uh, I'm good to, I forgot, arc dash
|
|
F, some column that dashed arc on the file column file one dot CSV, I, um, some column dot
|
|
arc file one dot CSV, I get a row, the first row saying color sum.
|
|
And then the next one saying brown, 13 purple, 12 red, seven yellow, 11 green, eight.
|
|
I use stuff like this though a lot of times nowadays, since I'm doing a lot more stuff
|
|
programmatically and doing a lot more stuff with data analysts, I'll, I'll do this stuff
|
|
in any of the Python or R a lot more now, uh, nowadays, because those are systems that
|
|
are a lot more robust for doing this type of analysis, but for certain situations where
|
|
either I'm not, I don't have those tools available or it'll take more time, giving those tools
|
|
out and starting up a process or an association to somewhere else, arc is a, is a great thing
|
|
to use for this.
|
|
For instance, I've used this same exact process to look through my, um, through a server's
|
|
access log, so go into the engine X or Apache access log and looking for people trying
|
|
to that are really slamming on the, uh, on the server and then reporting those, um, it,
|
|
you can use it this exact, basically this exact function that I did with the sum column
|
|
dot arc file where you would say you, you do, um, I believe those, those you can use
|
|
or separate by, uh, by colon.
|
|
So instead of saying, um, or FS, uh, FS equals column, comma, you say FS equals, um, colon.
|
|
And then you just pick whichever field is going to be the one with the IP address and then
|
|
you, so instead of A in brackets, two, it'd be A in brackets, whatever column, um, the
|
|
IP address is in and then plus equals whatever column, um, if you want a sum, you can do
|
|
a sum or you can just do a plus plus if, so if you just say plus, plus right there, that
|
|
means you're just giving the list and then you could see, um, you could do the same print
|
|
statement, we say, um, you would say the color, um, the IP address and the count right
|
|
next to it.
|
|
That is really, really useful.
|
|
So you'd say, okay, well, I know my IP address, I'm at, I SSH into the server all the time.
|
|
I'm there 15 times.
|
|
There's a coworker mind, the SSH in a bunch of times.
|
|
Here's the normal traffic that you get on the site, which, you know, there's like a really
|
|
long tail of, uh, of regular traffic.
|
|
But there's 1500 hits from this, from these two IP addresses that are like one number
|
|
apart and you do a look up on them and they're from like, um, Ukraine or something, you're
|
|
like, oh, well, that's probably people just, you know, scanning for open ports or looking
|
|
for, and sometimes you can actually look at that deeper and see, oh, look, this is a,
|
|
they're looking at the slash domain, the domain slash admin and basically, or slash log
|
|
in and seeing if those screens exist and trying to, you know, brute force log in.
|
|
So it's interesting to see that kind of stuff and see people trying to, trying to get into
|
|
your, your sites.
|
|
Um, same type of thing for, um, for looking at different type of customer data sometimes
|
|
at, from, and, uh, like I said, a lot of times I do it in our Python now, but it's like
|
|
a really easy thing to do to be able to do grouping and, uh, counting or grouping and
|
|
summing.
|
|
That's why I usually use the four loops for, while loops, I really don't have that much
|
|
of use for, um, because if I'm going to do, write something that needs that much, um,
|
|
of a language support, I'm going to reach for a different tool like Python or our, or
|
|
if I really had to, uh, Pearl, I guess, see, I guess she's sharp.
|
|
So I, I tend to stay out of everything, um, that needs to be compiled if I can help
|
|
it.
|
|
It's because what I do, that it doesn't matter how long it takes, um, to, to run, it
|
|
matters how long it takes to write.
|
|
So that's just my personal opinion, uh, don't hate me for that.
|
|
Um, anyway, thanks for, uh, listening to this episode and stay tuned for more.
|
|
You've been listening to Hacker Public Radio at Hacker Public Radio dot org.
|
|
We are a community podcast network that releases shows every weekday, Monday through Friday.
|
|
Today's show, like all our shows, was contributed by an HBR listener like yourself.
|
|
If you ever thought of recording a podcast, then click on our contributing to find out
|
|
how easy it really is.
|
|
Hacker Public Radio was founded by the digital dog pound and the infonomicon computer club
|
|
and is part of the binary revolution at binrev.com.
|
|
If you have comments on today's show, please email the host directly, leave a comment on
|
|
the website or record a follow-up episode yourself.
|
|
Unless otherwise status, today's show is released on the creative comments, attribution,
|
|
share a like, 3.0 license.
|