- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
280 lines
24 KiB
Plaintext
280 lines
24 KiB
Plaintext
Episode: 2476
|
|
Title: HPR2476: Gnu Awk - Part 9
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2476/hpr2476.mp3
|
|
Transcribed: 2025-10-19 03:49:35
|
|
|
|
---
|
|
|
|
This is HPR Episode 2476 entitled Gnurk, Part I and in part of the series Learning Ork.
|
|
It is hosted by me and in about 33 minutes long and currently an explicit flag.
|
|
The summary is in part I of the series, we discuss the print function.
|
|
This episode of HPR is brought to you by archive.org.
|
|
Support universal access to all knowledge by heading over to archive.org forward slash donate.
|
|
Hello Hacker Public Radio fans, this is Bee Yeezy once again with another exciting episode
|
|
regarding the very known but often not very well known utility of Ock.
|
|
And this episode we're going to be talking about some functionality with the function
|
|
called printf. If you're familiar with C, printf is a familiar statement to you.
|
|
If you're using a program or another languages like Python,
|
|
the idea of printf is very similar to that as well and I'm going to go into it a little bit
|
|
in this episode. So without any further ado.
|
|
From what we're using the Gnurk ManPage, which is from www.gnurk.org slash software,
|
|
slash gauk slash manual slash gauk.html hashtag printf.
|
|
And so I have some notes but I'm also going to just point you directly to the guide and I have
|
|
some examples to go over as well. So from the documentation, printf, the printf function allows
|
|
for greater control over output and comparison to print. To follow along, you can either go over with
|
|
the manual or just listen up and take any pointers as you hear them. So the basic syntax of
|
|
printf in an AUX statement is printf space then the formatting comma and then the elements that
|
|
you want to format. So it could be item 1, item 1 item 2 item 3 item 4. So however many items you
|
|
want to format, it goes after the formatting column. So in C or in Python, it's similar to the
|
|
the printf function in C and the .format function, a string method in Python.
|
|
So if you have an idea of what those things are, you'll have a good idea of what printf is.
|
|
So the big difference between printf and AUC is that, or one of the big differences is that
|
|
when you do a print statement, it automatically adds a new line to the end of the statement.
|
|
Printf does not do that. So if you want to have different lines for every row, you have to do
|
|
an escape character for, or a special character for new line, which is backslash n.
|
|
So if you remember any of the previous episodes, I had this example CSV file called file1.csv,
|
|
which just has a bunch of columns with the name of a fruit, the color of it, the amount
|
|
and I guess the quantity or something like that. So in this example, if I do AUC,
|
|
space dash capital F comma, so saying looking for the formatting of the field denominator of comma
|
|
in single parentheses, and our exclamation point equal sign, so not equals one. So if you're not
|
|
talking about the first row, printf color, space, percent s, space has space, percent s dot,
|
|
and then comma, close the, close parentheses, and then that close, close, double quotes, comma,
|
|
dollar sign 2, dollar sign 3, what it's going to do, it's going to say the color blank has blank,
|
|
and it's going to look in the second column for the first blank, and it's going to look in the
|
|
third column for the second week. I don't know if that makes sense, but if you're familiar with other
|
|
systems, it's very similar to that. So the idea is you put in placeholder attributes,
|
|
for the things that are not perfectly defined strings, and then you fill in the blanks with every
|
|
row of that column using the dollar sign notation that you know already from using AUC. So that little
|
|
placeholder, there are two main functions that happens in that placeholder, and those are control
|
|
and formatting options. And so control letters are things that actually from the documentation,
|
|
and from my understanding, control letters, control, or cast the output of that placeholder
|
|
to a specific type, and when I say type, I mean what programming you think of types like integer,
|
|
string, float, and you can use this as a way to convert insta floats, insta cars, you know,
|
|
insta hex, a hexadecimal, all types of, or hexadecimal to cars, all types of things that you can do
|
|
with this. And so I'm not going to go over every single control letter because there's a good
|
|
amount of them, but I want to go over the ones I've used are found most useful, and ones that I
|
|
didn't know of that you probably wouldn't hear of. So like I said, there are ones for hex and
|
|
and other things, but percent C will turn the output into a character. So print F and double quotes
|
|
percent C, and then outside the double quotes comma 65 will print the 65th character in the character
|
|
table, which is the letter A. So I don't know if we ever have used for that, that's something
|
|
that is available. Something that's more useful or using percent i or percent d to cast to float.
|
|
So if I say up to cast to integer, so if I say print F and double quotes percent i outside
|
|
the double quotes comma 3.4, it's going to print three because it's going to change that 3.4 to an
|
|
integer. If I use percent F, it's going to cast it to float. So if I do print F inside of the
|
|
double quotes C, comma 65, instead of just printing 65, it's going to print 65.00 with a bunch of
|
|
trailing zeros at the end. We can control how many zeros are at the end using the formatting,
|
|
which we'll talk about in a little bit. So there's also percent lowercase A and percent uppercase E,
|
|
which do scientific notation, so E to the number. So and then you can control how many characters
|
|
you want to use before it starts to do ET. So if you want 65 to go like out of the box,
|
|
if you do print F, percent E 65, it's going to do 6.5 times 0.00, 6.5, 0.00, E plus 0.1. But if you
|
|
want it 6.5 to me, just left 65, just left 65, but have 6,500, there's ways to control when to
|
|
actually use the scientific notation and when not to, which is useful because most people can
|
|
just read 65, but it might be useful to have, you know, 8,327 using scientific notation.
|
|
So like I was saying, to do that, you use this other control letter called percent G.
|
|
And that will cast and then you can use that with a combination of that with formatting options to
|
|
say how many letters you want, how many numbers you want to use in your before you start doing
|
|
scientific notation. So if I did print F and double quotes, percent 2.2G and then comma 65,
|
|
it'll leave 65 alone. But if I did 6,650, it would do 6.5 E.
|
|
Or if I changed it and said use .1G and did 65, it would now go 6E to the first.
|
|
So I hope that's clear. The basic idea is that you can control scientific notation.
|
|
The amount of characters that you want to go to scientific notation and if you want to
|
|
and how you want to round scientific notation, basically how many significant figures you want to have.
|
|
Another thing that you can do is percent S, which is to cast it to a string.
|
|
So the difference between percent C and percent S is that say the thing that you want to quote
|
|
and the place order was a full string of like the word statement.
|
|
Percent C would only print the first letter because it's going to cast it to a car,
|
|
which is just a single character, whereas percent S would cast it as a string and its entirety.
|
|
Percent U cast things to an unsigned end, which has some really weird properties.
|
|
So I think this is the 32 or 64 bit integer.
|
|
And if you use a negative number, it's going to start at the back of that 32 bit string
|
|
and work his way back. So print F, parentheses, I mean, double quotes, percent U,
|
|
6 is going to give you the number 6. But if you do percent, if you do negative 6,
|
|
it's going to actually do like 1, 8, 4, 4, 6, 7, 4, 4, 0, 7, 3, 7, 0, 9, 5, 5, 1, 6, 1, 0,
|
|
which is the end of the how many bits are held in the unsigned end minus 6.
|
|
It's a weird thing. I don't know when you'd ever want to use that. I'm sure there are some
|
|
use cases for that. But there you go.
|
|
And there are a lot of others, but you can go see them in a documentation.
|
|
Now, like I said, there's two big parts of print F. One is the control letters,
|
|
the other part is the formatting. And I use this all the time. Sometimes,
|
|
I use it sometimes in an arc when I want to make something pretty, but I use it all the time
|
|
when I'm programming another languages where I want to be able to make a command line utility.
|
|
And actually, it's a perfect candidate for this as well, where you want to be able to control
|
|
how many spaces there are in the block of text that you want to control. And so in the soon
|
|
notes, you'll see a crude illustration of what I mean by this. And first, I'll start out
|
|
with the statement. And then I'll talk about the crude examples. So let me, let me see.
|
|
I just want to pull up my example. Let me see. I have print F basic. And then I have the fancy
|
|
print F. Yes, some calls. So if you remember in one of my previous episodes, I did, I talked about
|
|
how do you use the how do you write a file and how do you use things like begin and end and do
|
|
things to do sums and counts. So in the previous version, I did, let me see, I want to get these
|
|
two files on my side. So in the previous version, I had a file that did a begin and that Fs equals
|
|
input and double quotes comma, all Fs equals and double quotes comma. So I'm saying the input file
|
|
field separator is comma and the output file separator is comma. And then after that, on the next
|
|
line, print and double quotes color is comma sum. So I'm just doing a header line. And then after
|
|
that, I'm doing a little, I'm doing that little for loop thing where I do a inside of square
|
|
brackets, the dollar sign two plus equals dollar sign three. And then after that, I do a four A and B
|
|
for B and A print B comma A bracket B. So if you remember from that episode, that's the way that you
|
|
I can say here's a one row will have all the colors and one row have all the sums of the colors.
|
|
A sum of the amount of that color. So column three is the amount.
|
|
So if I wanted to do that, instead of just having it print in any format, which you'll see is that
|
|
some of the letters, some of the, when you print it out, the word purple has more letters than
|
|
the word red. So red is all going to use up three spaces before the comma and purple is going to use
|
|
up what is that seven. And so you're going to have this little, if you wanted to look at where the
|
|
numbers are, they're not going to be in a straight line. That's basically the limitation of when
|
|
you use the print statement. So if you want to get around that and make a nice neat little table,
|
|
you can follow, you can use print F to do that. And so in my example, I'm doing, I'm going to
|
|
another step forward. And this example will be in the, in the show notes, where I'm doing a,
|
|
a similar print, a begin statement when I go begin at Fs equals and double codes comma.
|
|
And then after the begin statements over and are not equals one, just like we were doing before.
|
|
So I'm not looking at the first row. And then I'm setting, I'm going to have four variables
|
|
and then someone have A, B, C, and D. That's just what I chose to use. It's not very descriptive,
|
|
but it's, uh, and the simple example should be too confusing. So I'm going A inside of square
|
|
brackets, the dollar sign two. So plus equals, um, dollar sign three. So I'm going to have
|
|
all of the A, all the colors. And then I'm going to do a sum of the amounts. And then I'm going to do
|
|
a C plus equals three. So that is just looking at the, not by color, but by the total amount.
|
|
And then I do D plus equals one. And that's going to give me the count. So not, not dollar sign one,
|
|
but just one, the number one. So every, every row, just add one to this variable called D.
|
|
So so far, you see I have A, B, A, C, and D where A is all the colors in a group.
|
|
And then inside of the, um, inside of the brackets is going to be the count of how many
|
|
there are in that color for, for all the, uh, different fruits.
|
|
C is going to be the total count. D is going to be, uh, the total sum at D is going to be the
|
|
total count. And so then after that, I do an N. And inside my end statement, I do a four B and A,
|
|
like we did in the other examples. But at this time, instead of doing just a print, I'm doing a
|
|
print to F. And inside of double codes, I'm going color, colon, space, percent minus seven S.
|
|
Yes. Some colon, space, percent to I. And then back sash N and close the double code.
|
|
And then comma at B, comma A, B. That's the end of that, uh, that's the end of the four state of it.
|
|
Then after that, I'm doing a print of just 22, um, or yeah, 22 dashes. And then after that,
|
|
I'm doing a print F and double quotes, a percent minus 18 S, space, percent three I,
|
|
back sash N, close the double code, comma, the words total sum inside of, uh, double quotes,
|
|
comma C. So if, if I can explain that, I'm going to say, and I'll get into what the formatting
|
|
means in a little bit, but I'm saying, I'm going to, you'll see that I have these two variables,
|
|
place orders, one is an S and one is an I. And I'm going to replace the first one with the words
|
|
total sum. And I'm going to replace the second one with the letter with the variable that's held
|
|
in the value of the talon variable C. Then on the next line, I'm doing something similar,
|
|
print, uh, print F. And then inside of the double codes, percent minus 18 S minus three,
|
|
percent three I, uh, back sash N, and I'm doing total count colon D. So on this line, you can see I'm
|
|
going to replace the first thing with the word total count. And then the second thing with what
|
|
the value that's stored in D, which is the integer. And then on the last line, I do a print F
|
|
similarly with a percent minus 18 S, percent three point one F, backslash and close the double
|
|
code mean colon comma C divided by D. So if I can explain that one, I'm doing a similar thing,
|
|
where I'm going to still hold 18, I'll give away the answer. I'm going to hold 18 species to hold
|
|
the word mean. And then I'm going to do C divided by D to get the average. And I'm only going to use
|
|
I'm going to use one placeholder, one, I want to get only one significant figure after the
|
|
after the decimal. Wow, a lot of words to describe only a 69 statement, which will be available to
|
|
you in the show notes. But what's going to happen after you do that is you're going to have a nice little
|
|
table with exactly 22 lines. And you're going to see, you're going to see color, then the color,
|
|
then some, and then some, and then you're going to have a for every line of the for every different
|
|
color, you're going to have the sum of the colors. They're going to have a straight line, and they're
|
|
going to have total sum, and then all the way at the end, you're going to say you're going to see
|
|
the number of the total and then total count, and then at the end, you're going to see the count,
|
|
and then mean, and then you're going to see the mean with only two digits. And also the decimal point
|
|
in between. And so you'll see that when you get the output, it's a nice clean table, and that's
|
|
what you want a lot of times. And so the way I like to think of it, and if I go back to my crude
|
|
little illustration, is I like, I count the spaces, and I say, okay, I want my first very, like,
|
|
basically, I look at, sometimes I'll do like, really precise, sometimes I won't, but if I look at
|
|
my data, and I know that the longest one is going to be eight characters, and I'll do,
|
|
I'll make place order for length of 10 so that everything will fit in that length, and then I will do
|
|
a place order for the next variable, and the next variable, depending on how long they're going to be.
|
|
If I don't know how long they are, because they're really big data set,
|
|
I'll just hold a good amount of spaces, depending on how much screen space I'm willing to give up.
|
|
So if I'm going to give up 80 characters, and I only have three variables that I want to look at,
|
|
I'll give the first, if I know one of them is like, website URL, I'll give that one like 35
|
|
characters just to make just to be safe. And so you'll see in the illustrations how
|
|
you can like, you leave a certain amount of room, and then for that variable, and then you can say,
|
|
you could use little tricks to make it either right, just to find or left, just to find.
|
|
And I want to go through that, how to do that right now. So I want to give the example first,
|
|
because for me, I learn by example, so just hearing the example, I'm often running, I can set work.
|
|
But for those like, I want to hear a little bit more in depth. Here's what the formatting
|
|
items do, really. Oh, actually, I'm going to talk about
|
|
modifying some of control letters in a little bit. But let's just continue with these
|
|
with the four matters.
|
|
So just doing the number is going to give you, it's going to tell you how many spaces
|
|
from the left, you should put to hold this amount of text or this, whatever the string is.
|
|
To do, if you do a negative sign, it's going to put all the extra spaces on the right.
|
|
So that's why in art, in my example, I did negative seven and negative 18, because I want
|
|
my first, my first thing to be all left justified. So I want all the, I want to, so using seven,
|
|
for example, and if the thing I want to hold is the word red, I want all of the spaces to be
|
|
after the D. If I use, if I just did a regular seven, all the spaces that before the R would be
|
|
would be available or bright. So that would be right justified.
|
|
So that's the, that's the basic idea of how to use. And actually, that will take you a long way.
|
|
The other part that will take you a long way is for, for floating points,
|
|
the, you do a point and then the number. And that is how many decimal points to use after the point.
|
|
And so if you want this whole thing to hold four spaces. And so you only want to reserve four
|
|
spaces for it. And if there's a decimal, only allow three, you would do percent 4.3 F. So
|
|
for saying how many spaces to use, the point three is telling you how many of those that I'm
|
|
willing to use for, for the numbers after the decimal point. Excuse me, it's going to say I'm
|
|
going to go over that again. So for the point, for the, for the percent F, the, the number before
|
|
the point is how many characters you want to put before the point. And the, after the point is
|
|
how many characters you want to round your float to. So if I just go to my terminal here and I go
|
|
off, I'm just going to do a separate one. I'm just going to do it inside a single quotes.
|
|
Oh, it's quickly where I get print.
|
|
Print F.
|
|
Persons. I'm just doing the example right out of the documentation. It's percent inside of
|
|
double quotes percent 4.3 F. Then close. And then comma. And then 1950. Then close the bracket.
|
|
And close the single quote.
|
|
Uh, yep. Actually, I don't need, I don't need those. I can just do that.
|
|
Yep. Then you're going to see that I will get. Oh, that didn't work.
|
|
Yep. So basically, the idea is that if you look at the documentation, if you do a 4.3 F,
|
|
and then you do, um, and then you do the, um,
|
|
the, the number is going to do 1950.000. And so if I did, um, if I did 95, it would go
|
|
space space 9.5.000. That's right. Okay. So I'll explain that right now.
|
|
It's a lot easier when you just look at the examples. So if you want to put, um,
|
|
spaces in front, if you could instead put leading zeros. So using the zero in front of the,
|
|
um, in front of the, the item, you can go, um, so if I did zero, I, it would, um,
|
|
so if I did, um, that, that would allow me to put leading zeros in front of any of the,
|
|
uh, any of the placeholders. And it's so much of what you do in, in Python, specifically with Z
|
|
Phil. Um, that's, that's the, um, that's the zero. The, um, if, if you want to, um, so let,
|
|
so let me go back and talk about a couple of these modifiers as well. If, if you want to, um,
|
|
because a lot of times you'll want to repeat the same item over and over again.
|
|
And so you can use, um, position of modifier, um, to, to get the position of, uh, uh, of that string.
|
|
So for instance, if I have, um, three columns, and I wanted to print the first column,
|
|
the third column, the second column, the first column, and the third column again, you don't, um,
|
|
you can use this modifier to do that. So if I did,
|
|
percent to dollar sign S, and that, that's going to say, I want this one to be, uh, a space
|
|
to be controlled by, uh, by a string, but I want to use, so two dollar sign is saying, I want to
|
|
use the second item. And then if I say percent one dollar sign S, I want this to be the second
|
|
dollar sign of the, of the first item. Now this is kind of, in my mind, this is kind of,
|
|
counterintuitive because we use a regular print statement. You do dollar sign and then the number.
|
|
And so that's what, that's one of the things that's a little bit confusing, at least for me,
|
|
um, going through, um, just one or two more examples because we're getting a little long now,
|
|
um, you can do a, a, a, a plus sign before the modifier, um, which says that you should always allow
|
|
for this, um, item to be signed. So if you do a plus I, it's always going to be signed in our
|
|
plus flow. It's always going to have the sign in front of it. I talked about the leading zeroes.
|
|
Um, and then I talked about the, the width and the precision modifiers. So, um, I think that's it.
|
|
There's a lot to talk about here. And there's a lot more to even, to even go through just the section
|
|
on print F in the documentation is, uh, several print pages long. And so I've, I've scratched the
|
|
surface. Hopefully I've given you just a little bit of a, uh, taste of what it can do.
|
|
Uh, and hopefully I didn't confuse you too much because sometimes hearing these things in, uh,
|
|
audio form in there, uh, written is not the, the most, uh, clear thing to understand. But I,
|
|
I encourage you to look both at, uh, the canoe, uh, uh, documentation and also, um, grim wires, um,
|
|
um, documentation on set and arc. Uh, so if you go to www.grymoi.com.
|
|
And then slash unix slash awk.html. He does, um, he or she, I don't know who this is,
|
|
actually does a lot of explanation of how to use awk and it has lots of different examples. That's,
|
|
that was the first, I think that I really used to, to do love by, um, learning awk.
|
|
And, uh, I suggest also, you know, if you're really interested in learning, not just this,
|
|
but learning anything to go ahead and try it and, and, and to start with some type of use case,
|
|
um, for, uh, for, for what you need to do. So if you have a CSV file or an Excel file,
|
|
you can convert to CSV. And there's something that you normally just do in Excel where you just
|
|
select all the lines and do some or do a, um, a pivot table. Try to use one of these tools instead.
|
|
And you might ask yourself at the beginning, well, why would you want to do that?
|
|
After you've done it and, and you see the power that you can get out of it without ever
|
|
having to open up a program that takes, um, no, lots of time to open up and something that
|
|
is not easily repeatable versus making a script. You can repeat it and you can do it on
|
|
huge, uh, sample sets. You'll, you'll, you'll get the value. Sometimes I work with data that's
|
|
either too big to fit in Excel or, or, or, LibreOfficeCalk, which is what you use most of the time,
|
|
or if, if it is going to fit in there, it's going to be crashy or slow. And it just,
|
|
and awk just flies through that data. So, um, um, um, definitely check it out. Also,
|
|
if there's a book I could recommend, there's a book called, um, Data Science on the command line.
|
|
And, um, I'll put that in the show notes as well. But, uh, that also has some examples of awk and
|
|
a whole bunch of other, um, command line utilities that you can use to process data really quickly.
|
|
All right, so that's it for, uh, this episode of Hacker Public Radio. And as we like to say, uh,
|
|
here in, uh, and B Easyland, keep hacking.
|
|
You've been listening to Hacker Public Radio at Hacker Public Radio dot org.
|
|
We are a community podcast network that releases shows every weekday, Monday through Friday.
|
|
Today's show, like all our shows, was contributed by an HPR listener like yourself.
|
|
If you ever thought of recording a podcast, then click on our contribute link to find out
|
|
how easy it really is. Hacker Public Radio was founded by the digital dog pound and the
|
|
infonomicum computer club, and it's part of the binary revolution at binrev.com. If you have
|
|
comments on today's show, please email the host directly, leave a comment on the website
|
|
or record a follow-up episode yourself. Unless otherwise state is, today's show is released
|
|
under a creative comments, attribution, share a light 3.0 license.
|