Files
hpr-knowledge-base/hpr_transcripts/hpr1501.txt

134 lines
8.8 KiB
Plaintext
Raw Normal View History

Episode: 1501
Title: HPR1501: AWK
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1501/hpr1501.mp3
Transcribed: 2025-10-18 04:19:04
---
MUSIC
This is Leandier for Hacker Public Radio, recording Thursday April 10, 2014, ock.
I'm recording today in the car on my drive home from work, so I apologize for the
recording quality. Today's show is in response to a request from Ken Fallon for
a show explaining the ock text processing language. Now in my previous episode
for Hacker Public Radio, I had used ock to help in translating some audio into
Morse code. Now if that doesn't really sound like a job for a text processing language,
you'd be right. Even though ock is traditionally looked at only for processing text,
it really is a touring complete general purpose programming language. In many ways,
it has a lot of similarity to JavaScript. And in fact, they do sort of have a common lineage
JavaScript was heavily influenced by Pearl, which was in turn heavily influenced by ock.
So I'm going to go ahead and give a little bit of history on ock. Ock was developed by
three individuals working at Bell Labs during the early Unix days. That was Brian Kernigan,
Kevin Weinberger, and Alfred Ajo. Now Brian Kernigan, many of you will recognize as
co-author of with Dennis Richie of the scene programming language.
Now while Weinberger and Ajo will be less familiar to some of you, those with a background in
text-based search, or some other algorithmic fields, might recognize Ajo's name from the Ajo
Corasic string search algorithm, which was a little pre-processing.
Can very efficiently search for a large number of strings in a stream of inputs in a single pass.
The basic structure of an ock program is written as a list of rules.
Each rule is made up of an optional pattern and an optional action.
When ock starts up, it loads whatever the program is, then iterates through
each file provided on the command line or standard in if no file is given.
And then for each file,
it runs any rules with the begin pattern.
Now excuse me, it only runs the begin pattern once after loading the program.
Then for each file, it iterates through each record, which is by default a line, but can be
any content separated by a given regular expression, which goes in the special record separated variable.
So for each line, ock goes through each normal rule in the file,
checks to see if the pattern matches, and if it does, it runs the corresponding action.
Then after all files are processed, it runs any end pattern rules.
Now for normal patterns, there are a few different kinds.
The most common is a single regular expression enclosed in slashes,
and if the line matches the regular expression,
ock will run the action.
Patterns can also be any Boolean expression,
any string or numeric expression will be, which will be evaluated in a Boolean context,
which I'll get to a little later,
or a range pattern, which is I believe can only be regular expressions, and that will run
from a line matching the first expression until a line matching the second expression.
Now within an action, the syntax is really very similar to the C programming language.
That's actually one of the reasons I like ock so much.
I like my language is bracy, but unlike C, ock has a few special capabilities
that really make it nice as a kind of quick one-liner or prototype in kind of language.
And the main feature, the main difference between ock and C,
is that ock is essentially very loosely typed. There are only two types in ock.
There is an array type, which is essentially an associative array or hash table,
and then there are scalar types, which are essentially stored as strings,
but depending on the context in which they are evaluated,
I can represent a string, a number or Boolean.
Now a string context is essentially any operation that expects a string,
a function that operate on a string, a string concatenation, and so on.
And since that's the internal representation of the variables, that works just fine.
Now a numeric context is anything that expects a number.
I like that once again, functions that take a number,
printf format specifiers that expect a number,
mathematical operators, there are actually no bitwise operators in ock.
A few ock variants do provide some extensions that give bitwise functions,
but I'm going to avoid those and try to stick to the ock specified by posics,
and I'll provide a link to their specification in the show notes.
While that description is somewhat technical, it really does give a very good description,
and a very precise description of how ock is supposed to function.
So any strings evaluated in a numeric context will try to parse
base 10 or a decimal number from the beginning of the string.
If that fails, it will essentially be evaluated as not a number,
otherwise will be converted to a floating point.
Representation.
Now a Boolean context is anything where a Boolean is expected.
Conditions in a branch or a loop
as the expression for a pattern
or in context with any of the Boolean operators.
Now in that context, any uninitialized variable is considered false.
Any variable that has been assigned a value is essentially
attempted to be treated in an numeric context
in which case a zero is considered false, anything else is considered true,
and if that fails, it's considered in a string context in which case the empty string is false,
and all other strings are true.
Standard ock provides a very small set of built-in functions.
There are some to do regular expression-based substitution on strings.
There's one for splitting a string into an array.
There are various formatted printing functions
and a handful of mathematical functions,
square root trigonometric functions, and so on.
Now ock is rather famous for appearing in extremely
terse, difficult to read, one-liners, and are directly at the command prompt.
But I would argue that the best way to get to learn ock is actually to treat it like any other
programming language and have a nicely formatted ock program in a file,
where you've got your rules all laid out and well-indented.
Not only does that make it easier to read and maintain your ock program,
but in the
canoe ock package known as Gawk,
they provide decock, which is an interactive ock debugger.
Now it's somewhat limited compared to more featureful debuggers like GDB,
but it does allow you to step break points, step through an ock program,
display the contents of line fields and variables,
and really has proven quite useful to me in working on some more complicated programs.
But once again, line-based break points are essentially useless if your program is all on one line,
so again the need for readable formatting.
Another resource that may be of use if anyone is trying to learn ock
would be the book of the ock programming language written by the authors of the language,
and occasionally you can find this book available for free download as a PDF online,
otherwise I'm sure various bookstores will carry it,
Amazon, and so forth. This book, while ostensibly specific to ock,
comes up time and time again in recommendations for general programming books.
And that's due to the fact that the book is written not just as a reference to the language,
although it does provide one, but it gives examples of real world problems
that can be addressed with programming and explains how those problems can be solved using ock,
and really tackles some issues of programming as a whole, modularity, code reuse, and so forth.
So I've skimmed this book, I've called it for ideas,
but I think I've seen enough that I can give it a recommendation that if you're interested
in the ock language, this is definitely a book worth reading.
And just as an example of some of the things that can be done with ock,
running formulas over tables of data, similar to things you might do in a spreadsheet program,
calculating relational joins between files, so essentially taking flat files and treating them
like database tables. And in someone more personal example, using it for processing
Morse code audio. So if that peaked your interest, I would definitely encourage you to check out
this language. I've found it quite useful, and I hope you will the same. Enjoy.
You have been listening to Hacker Public Radio at Hacker Public Radio.
We are a community podcast network that releases shows every weekday Monday through Friday.
Today's show, like all our shows, was contributed by a HPR listener like yourself.
If you ever consider recording a podcast, then visit our website to find out how easy it really is.
Hacker Public Radio was founded by the digital dog pound and the economical and
computer cloud. HPR is funded by the binary revolution at binref.com. All binref projects are
crowd-responsive by linear pages. From shared hosting to custom private clouds,
go to lunar pages.com for all your hosting needs. Unless otherwise stasis, today's show is
released under a creative commons, attribution, share a life, 3.0 license.