Files

280 lines
24 KiB
Plaintext
Raw Permalink Normal View History

Episode: 2476
Title: HPR2476: Gnu Awk - Part 9
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2476/hpr2476.mp3
Transcribed: 2025-10-19 03:49:35
---
This is HPR Episode 2476 entitled Gnurk, Part I and in part of the series Learning Ork.
It is hosted by me and in about 33 minutes long and currently an explicit flag.
The summary is in part I of the series, we discuss the print function.
This episode of HPR is brought to you by archive.org.
Support universal access to all knowledge by heading over to archive.org forward slash donate.
Hello Hacker Public Radio fans, this is Bee Yeezy once again with another exciting episode
regarding the very known but often not very well known utility of Ock.
And this episode we're going to be talking about some functionality with the function
called printf. If you're familiar with C, printf is a familiar statement to you.
If you're using a program or another languages like Python,
the idea of printf is very similar to that as well and I'm going to go into it a little bit
in this episode. So without any further ado.
From what we're using the Gnurk ManPage, which is from www.gnurk.org slash software,
slash gauk slash manual slash gauk.html hashtag printf.
And so I have some notes but I'm also going to just point you directly to the guide and I have
some examples to go over as well. So from the documentation, printf, the printf function allows
for greater control over output and comparison to print. To follow along, you can either go over with
the manual or just listen up and take any pointers as you hear them. So the basic syntax of
printf in an AUX statement is printf space then the formatting comma and then the elements that
you want to format. So it could be item 1, item 1 item 2 item 3 item 4. So however many items you
want to format, it goes after the formatting column. So in C or in Python, it's similar to the
the printf function in C and the .format function, a string method in Python.
So if you have an idea of what those things are, you'll have a good idea of what printf is.
So the big difference between printf and AUC is that, or one of the big differences is that
when you do a print statement, it automatically adds a new line to the end of the statement.
Printf does not do that. So if you want to have different lines for every row, you have to do
an escape character for, or a special character for new line, which is backslash n.
So if you remember any of the previous episodes, I had this example CSV file called file1.csv,
which just has a bunch of columns with the name of a fruit, the color of it, the amount
and I guess the quantity or something like that. So in this example, if I do AUC,
space dash capital F comma, so saying looking for the formatting of the field denominator of comma
in single parentheses, and our exclamation point equal sign, so not equals one. So if you're not
talking about the first row, printf color, space, percent s, space has space, percent s dot,
and then comma, close the, close parentheses, and then that close, close, double quotes, comma,
dollar sign 2, dollar sign 3, what it's going to do, it's going to say the color blank has blank,
and it's going to look in the second column for the first blank, and it's going to look in the
third column for the second week. I don't know if that makes sense, but if you're familiar with other
systems, it's very similar to that. So the idea is you put in placeholder attributes,
for the things that are not perfectly defined strings, and then you fill in the blanks with every
row of that column using the dollar sign notation that you know already from using AUC. So that little
placeholder, there are two main functions that happens in that placeholder, and those are control
and formatting options. And so control letters are things that actually from the documentation,
and from my understanding, control letters, control, or cast the output of that placeholder
to a specific type, and when I say type, I mean what programming you think of types like integer,
string, float, and you can use this as a way to convert insta floats, insta cars, you know,
insta hex, a hexadecimal, all types of, or hexadecimal to cars, all types of things that you can do
with this. And so I'm not going to go over every single control letter because there's a good
amount of them, but I want to go over the ones I've used are found most useful, and ones that I
didn't know of that you probably wouldn't hear of. So like I said, there are ones for hex and
and other things, but percent C will turn the output into a character. So print F and double quotes
percent C, and then outside the double quotes comma 65 will print the 65th character in the character
table, which is the letter A. So I don't know if we ever have used for that, that's something
that is available. Something that's more useful or using percent i or percent d to cast to float.
So if I say up to cast to integer, so if I say print F and double quotes percent i outside
the double quotes comma 3.4, it's going to print three because it's going to change that 3.4 to an
integer. If I use percent F, it's going to cast it to float. So if I do print F inside of the
double quotes C, comma 65, instead of just printing 65, it's going to print 65.00 with a bunch of
trailing zeros at the end. We can control how many zeros are at the end using the formatting,
which we'll talk about in a little bit. So there's also percent lowercase A and percent uppercase E,
which do scientific notation, so E to the number. So and then you can control how many characters
you want to use before it starts to do ET. So if you want 65 to go like out of the box,
if you do print F, percent E 65, it's going to do 6.5 times 0.00, 6.5, 0.00, E plus 0.1. But if you
want it 6.5 to me, just left 65, just left 65, but have 6,500, there's ways to control when to
actually use the scientific notation and when not to, which is useful because most people can
just read 65, but it might be useful to have, you know, 8,327 using scientific notation.
So like I was saying, to do that, you use this other control letter called percent G.
And that will cast and then you can use that with a combination of that with formatting options to
say how many letters you want, how many numbers you want to use in your before you start doing
scientific notation. So if I did print F and double quotes, percent 2.2G and then comma 65,
it'll leave 65 alone. But if I did 6,650, it would do 6.5 E.
Or if I changed it and said use .1G and did 65, it would now go 6E to the first.
So I hope that's clear. The basic idea is that you can control scientific notation.
The amount of characters that you want to go to scientific notation and if you want to
and how you want to round scientific notation, basically how many significant figures you want to have.
Another thing that you can do is percent S, which is to cast it to a string.
So the difference between percent C and percent S is that say the thing that you want to quote
and the place order was a full string of like the word statement.
Percent C would only print the first letter because it's going to cast it to a car,
which is just a single character, whereas percent S would cast it as a string and its entirety.
Percent U cast things to an unsigned end, which has some really weird properties.
So I think this is the 32 or 64 bit integer.
And if you use a negative number, it's going to start at the back of that 32 bit string
and work his way back. So print F, parentheses, I mean, double quotes, percent U,
6 is going to give you the number 6. But if you do percent, if you do negative 6,
it's going to actually do like 1, 8, 4, 4, 6, 7, 4, 4, 0, 7, 3, 7, 0, 9, 5, 5, 1, 6, 1, 0,
which is the end of the how many bits are held in the unsigned end minus 6.
It's a weird thing. I don't know when you'd ever want to use that. I'm sure there are some
use cases for that. But there you go.
And there are a lot of others, but you can go see them in a documentation.
Now, like I said, there's two big parts of print F. One is the control letters,
the other part is the formatting. And I use this all the time. Sometimes,
I use it sometimes in an arc when I want to make something pretty, but I use it all the time
when I'm programming another languages where I want to be able to make a command line utility.
And actually, it's a perfect candidate for this as well, where you want to be able to control
how many spaces there are in the block of text that you want to control. And so in the soon
notes, you'll see a crude illustration of what I mean by this. And first, I'll start out
with the statement. And then I'll talk about the crude examples. So let me, let me see.
I just want to pull up my example. Let me see. I have print F basic. And then I have the fancy
print F. Yes, some calls. So if you remember in one of my previous episodes, I did, I talked about
how do you use the how do you write a file and how do you use things like begin and end and do
things to do sums and counts. So in the previous version, I did, let me see, I want to get these
two files on my side. So in the previous version, I had a file that did a begin and that Fs equals
input and double quotes comma, all Fs equals and double quotes comma. So I'm saying the input file
field separator is comma and the output file separator is comma. And then after that, on the next
line, print and double quotes color is comma sum. So I'm just doing a header line. And then after
that, I'm doing a little, I'm doing that little for loop thing where I do a inside of square
brackets, the dollar sign two plus equals dollar sign three. And then after that, I do a four A and B
for B and A print B comma A bracket B. So if you remember from that episode, that's the way that you
I can say here's a one row will have all the colors and one row have all the sums of the colors.
A sum of the amount of that color. So column three is the amount.
So if I wanted to do that, instead of just having it print in any format, which you'll see is that
some of the letters, some of the, when you print it out, the word purple has more letters than
the word red. So red is all going to use up three spaces before the comma and purple is going to use
up what is that seven. And so you're going to have this little, if you wanted to look at where the
numbers are, they're not going to be in a straight line. That's basically the limitation of when
you use the print statement. So if you want to get around that and make a nice neat little table,
you can follow, you can use print F to do that. And so in my example, I'm doing, I'm going to
another step forward. And this example will be in the, in the show notes, where I'm doing a,
a similar print, a begin statement when I go begin at Fs equals and double codes comma.
And then after the begin statements over and are not equals one, just like we were doing before.
So I'm not looking at the first row. And then I'm setting, I'm going to have four variables
and then someone have A, B, C, and D. That's just what I chose to use. It's not very descriptive,
but it's, uh, and the simple example should be too confusing. So I'm going A inside of square
brackets, the dollar sign two. So plus equals, um, dollar sign three. So I'm going to have
all of the A, all the colors. And then I'm going to do a sum of the amounts. And then I'm going to do
a C plus equals three. So that is just looking at the, not by color, but by the total amount.
And then I do D plus equals one. And that's going to give me the count. So not, not dollar sign one,
but just one, the number one. So every, every row, just add one to this variable called D.
So so far, you see I have A, B, A, C, and D where A is all the colors in a group.
And then inside of the, um, inside of the brackets is going to be the count of how many
there are in that color for, for all the, uh, different fruits.
C is going to be the total count. D is going to be, uh, the total sum at D is going to be the
total count. And so then after that, I do an N. And inside my end statement, I do a four B and A,
like we did in the other examples. But at this time, instead of doing just a print, I'm doing a
print to F. And inside of double codes, I'm going color, colon, space, percent minus seven S.
Yes. Some colon, space, percent to I. And then back sash N and close the double code.
And then comma at B, comma A, B. That's the end of that, uh, that's the end of the four state of it.
Then after that, I'm doing a print of just 22, um, or yeah, 22 dashes. And then after that,
I'm doing a print F and double quotes, a percent minus 18 S, space, percent three I,
back sash N, close the double code, comma, the words total sum inside of, uh, double quotes,
comma C. So if, if I can explain that, I'm going to say, and I'll get into what the formatting
means in a little bit, but I'm saying, I'm going to, you'll see that I have these two variables,
place orders, one is an S and one is an I. And I'm going to replace the first one with the words
total sum. And I'm going to replace the second one with the letter with the variable that's held
in the value of the talon variable C. Then on the next line, I'm doing something similar,
print, uh, print F. And then inside of the double codes, percent minus 18 S minus three,
percent three I, uh, back sash N, and I'm doing total count colon D. So on this line, you can see I'm
going to replace the first thing with the word total count. And then the second thing with what
the value that's stored in D, which is the integer. And then on the last line, I do a print F
similarly with a percent minus 18 S, percent three point one F, backslash and close the double
code mean colon comma C divided by D. So if I can explain that one, I'm doing a similar thing,
where I'm going to still hold 18, I'll give away the answer. I'm going to hold 18 species to hold
the word mean. And then I'm going to do C divided by D to get the average. And I'm only going to use
I'm going to use one placeholder, one, I want to get only one significant figure after the
after the decimal. Wow, a lot of words to describe only a 69 statement, which will be available to
you in the show notes. But what's going to happen after you do that is you're going to have a nice little
table with exactly 22 lines. And you're going to see, you're going to see color, then the color,
then some, and then some, and then you're going to have a for every line of the for every different
color, you're going to have the sum of the colors. They're going to have a straight line, and they're
going to have total sum, and then all the way at the end, you're going to say you're going to see
the number of the total and then total count, and then at the end, you're going to see the count,
and then mean, and then you're going to see the mean with only two digits. And also the decimal point
in between. And so you'll see that when you get the output, it's a nice clean table, and that's
what you want a lot of times. And so the way I like to think of it, and if I go back to my crude
little illustration, is I like, I count the spaces, and I say, okay, I want my first very, like,
basically, I look at, sometimes I'll do like, really precise, sometimes I won't, but if I look at
my data, and I know that the longest one is going to be eight characters, and I'll do,
I'll make place order for length of 10 so that everything will fit in that length, and then I will do
a place order for the next variable, and the next variable, depending on how long they're going to be.
If I don't know how long they are, because they're really big data set,
I'll just hold a good amount of spaces, depending on how much screen space I'm willing to give up.
So if I'm going to give up 80 characters, and I only have three variables that I want to look at,
I'll give the first, if I know one of them is like, website URL, I'll give that one like 35
characters just to make just to be safe. And so you'll see in the illustrations how
you can like, you leave a certain amount of room, and then for that variable, and then you can say,
you could use little tricks to make it either right, just to find or left, just to find.
And I want to go through that, how to do that right now. So I want to give the example first,
because for me, I learn by example, so just hearing the example, I'm often running, I can set work.
But for those like, I want to hear a little bit more in depth. Here's what the formatting
items do, really. Oh, actually, I'm going to talk about
modifying some of control letters in a little bit. But let's just continue with these
with the four matters.
So just doing the number is going to give you, it's going to tell you how many spaces
from the left, you should put to hold this amount of text or this, whatever the string is.
To do, if you do a negative sign, it's going to put all the extra spaces on the right.
So that's why in art, in my example, I did negative seven and negative 18, because I want
my first, my first thing to be all left justified. So I want all the, I want to, so using seven,
for example, and if the thing I want to hold is the word red, I want all of the spaces to be
after the D. If I use, if I just did a regular seven, all the spaces that before the R would be
would be available or bright. So that would be right justified.
So that's the, that's the basic idea of how to use. And actually, that will take you a long way.
The other part that will take you a long way is for, for floating points,
the, you do a point and then the number. And that is how many decimal points to use after the point.
And so if you want this whole thing to hold four spaces. And so you only want to reserve four
spaces for it. And if there's a decimal, only allow three, you would do percent 4.3 F. So
for saying how many spaces to use, the point three is telling you how many of those that I'm
willing to use for, for the numbers after the decimal point. Excuse me, it's going to say I'm
going to go over that again. So for the point, for the, for the percent F, the, the number before
the point is how many characters you want to put before the point. And the, after the point is
how many characters you want to round your float to. So if I just go to my terminal here and I go
off, I'm just going to do a separate one. I'm just going to do it inside a single quotes.
Oh, it's quickly where I get print.
Print F.
Persons. I'm just doing the example right out of the documentation. It's percent inside of
double quotes percent 4.3 F. Then close. And then comma. And then 1950. Then close the bracket.
And close the single quote.
Uh, yep. Actually, I don't need, I don't need those. I can just do that.
Yep. Then you're going to see that I will get. Oh, that didn't work.
Yep. So basically, the idea is that if you look at the documentation, if you do a 4.3 F,
and then you do, um, and then you do the, um,
the, the number is going to do 1950.000. And so if I did, um, if I did 95, it would go
space space 9.5.000. That's right. Okay. So I'll explain that right now.
It's a lot easier when you just look at the examples. So if you want to put, um,
spaces in front, if you could instead put leading zeros. So using the zero in front of the,
um, in front of the, the item, you can go, um, so if I did zero, I, it would, um,
so if I did, um, that, that would allow me to put leading zeros in front of any of the,
uh, any of the placeholders. And it's so much of what you do in, in Python, specifically with Z
Phil. Um, that's, that's the, um, that's the zero. The, um, if, if you want to, um, so let,
so let me go back and talk about a couple of these modifiers as well. If, if you want to, um,
because a lot of times you'll want to repeat the same item over and over again.
And so you can use, um, position of modifier, um, to, to get the position of, uh, uh, of that string.
So for instance, if I have, um, three columns, and I wanted to print the first column,
the third column, the second column, the first column, and the third column again, you don't, um,
you can use this modifier to do that. So if I did,
percent to dollar sign S, and that, that's going to say, I want this one to be, uh, a space
to be controlled by, uh, by a string, but I want to use, so two dollar sign is saying, I want to
use the second item. And then if I say percent one dollar sign S, I want this to be the second
dollar sign of the, of the first item. Now this is kind of, in my mind, this is kind of,
counterintuitive because we use a regular print statement. You do dollar sign and then the number.
And so that's what, that's one of the things that's a little bit confusing, at least for me,
um, going through, um, just one or two more examples because we're getting a little long now,
um, you can do a, a, a, a plus sign before the modifier, um, which says that you should always allow
for this, um, item to be signed. So if you do a plus I, it's always going to be signed in our
plus flow. It's always going to have the sign in front of it. I talked about the leading zeroes.
Um, and then I talked about the, the width and the precision modifiers. So, um, I think that's it.
There's a lot to talk about here. And there's a lot more to even, to even go through just the section
on print F in the documentation is, uh, several print pages long. And so I've, I've scratched the
surface. Hopefully I've given you just a little bit of a, uh, taste of what it can do.
Uh, and hopefully I didn't confuse you too much because sometimes hearing these things in, uh,
audio form in there, uh, written is not the, the most, uh, clear thing to understand. But I,
I encourage you to look both at, uh, the canoe, uh, uh, documentation and also, um, grim wires, um,
um, documentation on set and arc. Uh, so if you go to www.grymoi.com.
And then slash unix slash awk.html. He does, um, he or she, I don't know who this is,
actually does a lot of explanation of how to use awk and it has lots of different examples. That's,
that was the first, I think that I really used to, to do love by, um, learning awk.
And, uh, I suggest also, you know, if you're really interested in learning, not just this,
but learning anything to go ahead and try it and, and, and to start with some type of use case,
um, for, uh, for, for what you need to do. So if you have a CSV file or an Excel file,
you can convert to CSV. And there's something that you normally just do in Excel where you just
select all the lines and do some or do a, um, a pivot table. Try to use one of these tools instead.
And you might ask yourself at the beginning, well, why would you want to do that?
After you've done it and, and you see the power that you can get out of it without ever
having to open up a program that takes, um, no, lots of time to open up and something that
is not easily repeatable versus making a script. You can repeat it and you can do it on
huge, uh, sample sets. You'll, you'll, you'll get the value. Sometimes I work with data that's
either too big to fit in Excel or, or, or, LibreOfficeCalk, which is what you use most of the time,
or if, if it is going to fit in there, it's going to be crashy or slow. And it just,
and awk just flies through that data. So, um, um, um, definitely check it out. Also,
if there's a book I could recommend, there's a book called, um, Data Science on the command line.
And, um, I'll put that in the show notes as well. But, uh, that also has some examples of awk and
a whole bunch of other, um, command line utilities that you can use to process data really quickly.
All right, so that's it for, uh, this episode of Hacker Public Radio. And as we like to say, uh,
here in, uh, and B Easyland, keep hacking.
You've been listening to Hacker Public Radio at Hacker Public Radio dot org.
We are a community podcast network that releases shows every weekday, Monday through Friday.
Today's show, like all our shows, was contributed by an HPR listener like yourself.
If you ever thought of recording a podcast, then click on our contribute link to find out
how easy it really is. Hacker Public Radio was founded by the digital dog pound and the
infonomicum computer club, and it's part of the binary revolution at binrev.com. If you have
comments on today's show, please email the host directly, leave a comment on the website
or record a follow-up episode yourself. Unless otherwise state is, today's show is released
under a creative comments, attribution, share a light 3.0 license.