- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
223 lines
18 KiB
Plaintext
223 lines
18 KiB
Plaintext
Episode: 2114
|
|
Title: HPR2114: Gnu Awk - Part 1
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2114/hpr2114.mp3
|
|
Transcribed: 2025-10-18 14:30:26
|
|
|
|
---
|
|
|
|
This is HPR episode 2,140 entitled Gnurk Part 1 and is part of the series Bash Crypting.
|
|
It is hosted by me and in about 23 minutes long.
|
|
The summer is an introduction and the architect passing tool.
|
|
This episode of HPR is brought to you by AnanasThost.com.
|
|
Get 15% discount on all shared hosting with the offer code HPR15.
|
|
That's HPR15.
|
|
Better web hosting that's honest and fair at AnanasThost.com.
|
|
Welcome Hacker Public Radio fans, this is Bee Easy once again.
|
|
This time I'm going to do a series of tutorials you can call them, working in collaboration
|
|
with the famous Dave Morse, which makes me really excited.
|
|
He's allowing me to do the intro and he'll intro himself as well as go into a deep dive
|
|
as we proceed, but we are going to be doing a little tutorial on Ock.
|
|
In particular I'm going to be focusing on Gnurk, which is very similar to the original
|
|
Unix version and it has some additional features.
|
|
I don't know if we're going to go into the differences between Ock and Gnurk, but we
|
|
are going to at least start with some of the basics right now.
|
|
So without any further ado, here's Ock.
|
|
So from its man page, Ock is the Gnurk project's implementation of the Ock programming language.
|
|
It conforms to the definition of the language in the POSIX 1003.1 standard.
|
|
And this version is in turn based on the description of the Ock programming language by
|
|
aho karagin and wineburger.
|
|
And Gnurk provides the additional features found in the current version of Brian Kernigan's
|
|
Ock and a number of good news specific extensions.
|
|
So that's the beginning to the description of Gnurk in the man page.
|
|
Ock is a powerful text parsing tool to be specific and like in the description says, it
|
|
is its own language.
|
|
Now Dave, especially but also myself, we're going to go into how to put Gnurk into a language
|
|
inside of a text file, a dot Ock file if you will.
|
|
But I'm going to start off with some basic commands to get our feet wet with Ock because
|
|
you can just do it on the command line with a simple inline coding.
|
|
I use this tool all the time, both inline and files.
|
|
The good thing about putting in files is it's easy to go back to and run the same command
|
|
over and over again on different files.
|
|
But it's really handy if you don't feel like opening up or you don't want to open up
|
|
or you can't open up a tool like library office and parse CSV files or if you just have
|
|
some really complex stuff that you might be pulling in from a pipe from like a said command
|
|
or wget command where you're getting stuff off the internet and you want to parse it in
|
|
real time and put it into a file or parse it into another tool that's going to do more
|
|
processing on it later.
|
|
So I'm going to try to see if I can get a file uploaded but if not, I have example files
|
|
right in the show notes.
|
|
So all you have to do really is just copy and paste the example files right from inside
|
|
the show notes and put it into a text file and you should be on your way.
|
|
So the basic syntax of AUK is AUK and then some options and then inside of single quotes
|
|
a pattern and inside still inside the single quotes inside of curly brackets actions
|
|
before you end the curly of the single quotes and then the file that you want to do that
|
|
to or the group of files that you want to do that to.
|
|
So it kind of sounds hard but it really is pretty simple to get started so you're just
|
|
going to do AUK dash something a pattern to search for but the pattern is optional and
|
|
then the action that you're going to do file.txt.tsv whatever that that whatever you're working
|
|
with.
|
|
So for example purposes I created a file called file1.txt and a companion file that's all
|
|
the same data that's file1.txt.tsv the difference between two is one is space delimited the
|
|
other one is or white space delimited the other one is comma delimited.
|
|
Delimited means the way that you're going to separate the different fields in the file.
|
|
So comma separated file CSV means that your delimiter or the limit of that column is
|
|
separated by the comma in a white space one it's going to be separated by any white space
|
|
and that's the default in AUK is that it's going to parse whatever you're looking at
|
|
whatever text string it's looking at by the white space and it's going to put it into
|
|
columns that way.
|
|
So if you look at the file that I have in file1.txt the first column is the headers name color
|
|
and amount and then under the name I have a bunch of different fruit apple banana strawberry
|
|
grape apple again plum kiwi potato I guess that's not a fruit and pineapple and then the next
|
|
column over I have different colors I have red fruit apple yellow for banana strawberry red grape
|
|
purple and then for that second apple I have green in this column now so we have a green apple
|
|
and a red apple then plum for the plum column I have purple then brown for kiwi brown
|
|
for potato and yellow for pineapple and then in my third column I have the amount of each one
|
|
of those items so I have four apples six bananas three strawberries 10 grapes eight green apples two
|
|
plums four kiwis four nine potatoes and five pineapples now this is going to be a cool file because
|
|
we're going to be able to do a lot of things with it and later episodes we're going to be able
|
|
to do a wrist metic on these and do some aggregate functions on it but for now we're going to do
|
|
something really simple we're going to just do the command AUK and then inside of single quotes
|
|
you put also curly brackets so single quote single curly bracket print dollar sign two close curly
|
|
bracket and then second single quote file 1.txt space file 1.txt so what that is is all print column
|
|
to a file 1.txt so that like we said like I said before the actions go inside the curly
|
|
braces since we didn't have anything before the curly braces there was no pattern to match
|
|
so it's just going to look in the entire file and it's going to look in that second column
|
|
and since I didn't give it any way to to the limit the file other than its default it's going to
|
|
use white space and in my example file I lighten up the white spaces so that they are all so it
|
|
looks nice but AUK doesn't care about that it it will just parse it on white space no matter what
|
|
so whether it's one space or ten spaces or in one column or three spaces in another column and
|
|
25 spaces in another column it doesn't care it's going to parse them all the same and put them all
|
|
into even columns starting on the first now white space character so a couple of things that
|
|
you can see is that it's kind of intuitive it starts with 1 it doesn't start with 0 like other
|
|
program languages so you're going to say print 1 is going to be the first column print 2 is going
|
|
to print the second column so if I say in this file example if I say print 2 I'm going to print out
|
|
all the colors it's going to first put out the header row color let's go say red yellow red
|
|
purple green purple brown brown yellow so one special character to our special column number
|
|
is 0 so if you do dollar sign 0 it's going to print all the columns so that's just something to know
|
|
so going back to our example I'm going to do a little bit I'm going to add to that example I'm going
|
|
to say all now inside the first single quote I want to say dollar sign 2 equals equals and then
|
|
double quotes yellow and then you can put a space but or not um start the curly bracket print 1
|
|
closed the curly bracket closed the single quote file 1 that takes tea what this is doing
|
|
since we have now something before the curly brackets before our action we have our pattern
|
|
and our pattern is dollar sign equals equals 2 oh and yellow so look in the second column for the
|
|
word yellow and print column 1 and file 1.txt if you remember the file we had a bananas and
|
|
pineapples I have both of those in there as yellow so let's go to just print out banana pineapple
|
|
it's going to skip the header column because the header column didn't have the term yellow in it
|
|
that's one thing to understand about it's not going to automatically print the headers unless you
|
|
tell it to and we'll talk about that a little bit later in another episode
|
|
now right now we've been working with this file that is space-separated which has a lot of uses
|
|
especially on the command line where you're when you're going to pipe uh other commands into it
|
|
and you just want to see like you might want to do ls dash l and then pipe that into
|
|
awk and then you can separate by the columns that way that's fine but a lot of times when you're
|
|
working with data you're going to be working with either tab separated files or comma separated
|
|
files and so if you're not using a plain white space separated file or I like to do pipe
|
|
separated a lot of times because then you don't have to worry about curly brackets I'm a curly
|
|
double-coats around the um around the text fields to get around commas inside of a text
|
|
you want to we might want to use a different file separator so there's different ways to do
|
|
file separation and awk I'm going to go over the most apparent which is using an option the dash
|
|
capital f option the the character or characters that follow capital dash capital f is
|
|
your separator so if you just do dash f uh dash capital f comma that's going to tow awk to use
|
|
commas for the separator so that's fine you really don't need us actually you do not want to put a
|
|
space between it you don't need any other characters if you just put dash f comma it's going to do
|
|
that if you do dash f period it's going to do a dot separated however sometimes you might want to
|
|
do more complicated field separators that are more than one character in that case you want to
|
|
put your field separator inside of double quotes and you might see that sometimes in other people's
|
|
examples when they are just using commas they'll do dash f double quote comma double quote with no
|
|
spaces in between that's going to do the same thing as uh dash capital f comma so I have a
|
|
similar file called file one dot csv which is the same exact file but taking out the spaces and put
|
|
a comma in between and if we run the same command of awk this time awk dash capital f and inside
|
|
of double quotes comma space inside of single quotes dollar sign two equals equals inside of
|
|
double quotes yellow space inside of curly brackets print dollar sign one and the and the
|
|
the single quotes file one dot csv it's going to give us the same exact output is if we were doing
|
|
the white space delimited one without the dash f option which is banana and a pineapple
|
|
inside of those patterns you can also use regular expressions as well I have an example here
|
|
that's awk inside of single quotes dollar sign two and till day which is the on a usk
|
|
keyboard layout it's the one right above the tab
|
|
if you hit shift so till day space inside of forward slashes so awk for regular expressions like
|
|
the till day to say it's kind of like pearl well it likes the till day to say this is going to be
|
|
a regular expression and inside of forward slashes the expression that you want to evaluate
|
|
and I'm not going to go into regular expressions but uh that's a whole another topic but in this
|
|
example I'm doing p dot plus p so I'm looking for a p any one or more characters in between
|
|
and then another p and then I'm going to go um and after that I'm going to do inside of curly
|
|
brackets or action now print zero dollar sign zero and the close single quote file one dot txt so
|
|
I'm looking for any words that have the pattern of p anything in between in column two p anything
|
|
in between p and it returns the entire line of grape purple ten because purple which is in
|
|
column two has two letters in between the p and the second p and then also plum in the second
|
|
column is also purple so it's matching purple in both cases numbers can be evaluated in the pattern
|
|
as well so and it does this kind of intuitively so if you in our example we have numbers in our
|
|
third column so if I say all dollar sign three greater than five and then inside of our action
|
|
print dollar sign one comma space dollar sign two close the action close the single quote file one
|
|
dot txt I'm going to print both the first and the second column if the value in the third column
|
|
is greater than five so it's a good idea to go look at that um example but it's it's pretty
|
|
intuitive you're going to say if column three is greater than five print column one and column two
|
|
I'm sure you can see applications for this if you ever have to work with data that is um
|
|
that you have to manipulate um so continuing along with this uh I give the output of you're
|
|
going to find banana grape apple and potato because those are all the ones that had values that
|
|
were higher than five in our um example file you could also take that and redirect the output
|
|
of that into a file so if I do that same exact thing and say at the end of all
|
|
so I'm going to do for this example I just want to show it doesn't matter because it's still
|
|
going to print it out with space element um all dash capital F comma inside of the single quotes
|
|
thousand three greater than five inside of our action curly braces print
|
|
dollar sign one comma space the dollar sign two and the action file one dot csv
|
|
then greater than sign again output dot txt it's going to put name
|
|
color in the first line banana yellow grape purple apple green potato brown in a file called
|
|
output dot txt so that's a good way it's a nice way to be able to filter out things that you want
|
|
from a file and put it into another file and here's a cool trick that I learned on one of my
|
|
recent uh references that I gave at the end of the uh episode if you do this command
|
|
awk print awk and inside of the single quotes inside the curly braces print greater than
|
|
sign dollar two and then right next to the dollar two inside of doublecoats dot txt close the
|
|
parenthesis uh clear of the curly brace close the single quote file one dot txt so I recommend
|
|
for any of these episodes that we're going to be doing on the series that if you really want to
|
|
follow along and you don't want to just listen to our lovely voices that you probably get out the
|
|
show notes because they're it's really helpful but anyway um that command of five of awk print
|
|
so we're actually doing a redirect inside of our print statement that's what that curly bracket
|
|
that print curly um greater than sign means we're doing a redirect inside of our print statement
|
|
it's it's dollar two dot txt so we're looking at column two and whatever is in there
|
|
we're going to put um all matching ones are going to go into their own file
|
|
I'm not explaining this very well I'll do it again uh so print um greater than sign dollar two
|
|
and then and doublecoats dot txt file one dot txt is going to create a group of files
|
|
one yellow dot txt one red dot txt one color dot txt one brown dot txt one green dot txt
|
|
because those are all the different things that you can find in that um second column
|
|
and it's going to put print out in my example it's going to print out all the data that's in
|
|
um that all the columns that are in there and it's going to go into their own files so it's a really
|
|
quick way to take a whole bunch of data that might be all intermingled and separate it all into
|
|
individual files of like information so it's like doing a if you're going to do this in Excel
|
|
you'd have to do a filter and then pick pick the ones um uncheck the boxes that you don't want
|
|
pick the only one that you do want highlight all those copy it paste into another file
|
|
and save that file and then do the same thing for the next option in your filter and your next
|
|
option in your filter next option in your filter this and one command automatically make all the
|
|
different file a whole series of files based off of the um the pattern that you're matching it's
|
|
really cool um i mean elistemy maybe i'm just a dork that's fine oh but that's uh some of the commands
|
|
that you can do now one other thing i'm going to introduce but i'm not going to go into right now
|
|
is that sometimes with awk you can get really complicated in how you both set up how you're going
|
|
to parse the file so in your pattern um if you want to do some pre-processing and then do some more
|
|
processing on it and then do like some counts and some sums and some division and all that kind of
|
|
stuff you might want to it it's going to get really cumbersome on the command line so you're
|
|
going to want to put all that in a file and a lot of times the the convention is it'll be the
|
|
file name dot awk and then to get access to it you'll do awk dash lowercase f
|
|
file name dot awk and then file one dot txt and i'm pretty sure that they're a remainder of our
|
|
episodes we're going to be using the files because as we get more advanced in the awk
|
|
it really does like i said get cumbersome to deal with awk on the command line when you have
|
|
you know 15 lines of commands that you want to put in uh so that's the introduction
|
|
i'm excited to get into this series with uh with Dave hopefully we are able to enlighten some
|
|
people teach some new things and hopefully i'll learn a couple new things as we go i've already
|
|
learned this new technique with this uh separating things into individual files based on the the
|
|
match so it's pretty cool i have a couple also of a couple of uh resources that i found
|
|
online to help so i don't know if anyone knows about linux.die.net so linux.de.net slash man
|
|
that is like the man page for everything in linux so you'll find like so linux.die.net slash man
|
|
slash one slash awk is the man one page of awk. another really cool tutorial and i'll be doing
|
|
some of my examples following this or from www.linuxschool.deunuxschool.com and then some other ones are
|
|
from techman um upcoming in our series we will be talking about more of the other options besides
|
|
dash lowercase f and dash capital f uh we will also be talking about some of the built in variables
|
|
that are in awk and we will do some arithmetic operations some fancy text manipulation as
|
|
much as we can without going into said and going over the awk language and its syntax once again
|
|
thank you for listening hacker public radio this is be easy signing out
|
|
you've been listening to hacker public radio at hackerpublicradio.org we are a community podcast
|
|
network that releases shows every weekday Monday through Friday today's show like all our shows
|
|
was contributed by an hbr listener like yourself if you ever thought of recording a podcast
|
|
then click on our contributing to find out how easy it really is hacker public radio was found
|
|
by the digital dog pound and the infonomicon computer club and it's part of the binary revolution
|
|
at binrev.com if you have comments on today's show please email the host directly leave a comment
|
|
on the website or record a follow up episode yourself unless otherwise status today's show is
|
|
released on the creative comments attribution share a like 3.0 license
|