134 lines
8.8 KiB
Plaintext
134 lines
8.8 KiB
Plaintext
|
|
Episode: 1501
|
||
|
|
Title: HPR1501: AWK
|
||
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1501/hpr1501.mp3
|
||
|
|
Transcribed: 2025-10-18 04:19:04
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
MUSIC
|
||
|
|
This is Leandier for Hacker Public Radio, recording Thursday April 10, 2014, ock.
|
||
|
|
I'm recording today in the car on my drive home from work, so I apologize for the
|
||
|
|
recording quality. Today's show is in response to a request from Ken Fallon for
|
||
|
|
a show explaining the ock text processing language. Now in my previous episode
|
||
|
|
for Hacker Public Radio, I had used ock to help in translating some audio into
|
||
|
|
Morse code. Now if that doesn't really sound like a job for a text processing language,
|
||
|
|
you'd be right. Even though ock is traditionally looked at only for processing text,
|
||
|
|
it really is a touring complete general purpose programming language. In many ways,
|
||
|
|
it has a lot of similarity to JavaScript. And in fact, they do sort of have a common lineage
|
||
|
|
JavaScript was heavily influenced by Pearl, which was in turn heavily influenced by ock.
|
||
|
|
So I'm going to go ahead and give a little bit of history on ock. Ock was developed by
|
||
|
|
three individuals working at Bell Labs during the early Unix days. That was Brian Kernigan,
|
||
|
|
Kevin Weinberger, and Alfred Ajo. Now Brian Kernigan, many of you will recognize as
|
||
|
|
co-author of with Dennis Richie of the scene programming language.
|
||
|
|
Now while Weinberger and Ajo will be less familiar to some of you, those with a background in
|
||
|
|
text-based search, or some other algorithmic fields, might recognize Ajo's name from the Ajo
|
||
|
|
Corasic string search algorithm, which was a little pre-processing.
|
||
|
|
Can very efficiently search for a large number of strings in a stream of inputs in a single pass.
|
||
|
|
The basic structure of an ock program is written as a list of rules.
|
||
|
|
Each rule is made up of an optional pattern and an optional action.
|
||
|
|
When ock starts up, it loads whatever the program is, then iterates through
|
||
|
|
each file provided on the command line or standard in if no file is given.
|
||
|
|
And then for each file,
|
||
|
|
it runs any rules with the begin pattern.
|
||
|
|
Now excuse me, it only runs the begin pattern once after loading the program.
|
||
|
|
Then for each file, it iterates through each record, which is by default a line, but can be
|
||
|
|
any content separated by a given regular expression, which goes in the special record separated variable.
|
||
|
|
So for each line, ock goes through each normal rule in the file,
|
||
|
|
checks to see if the pattern matches, and if it does, it runs the corresponding action.
|
||
|
|
Then after all files are processed, it runs any end pattern rules.
|
||
|
|
Now for normal patterns, there are a few different kinds.
|
||
|
|
The most common is a single regular expression enclosed in slashes,
|
||
|
|
and if the line matches the regular expression,
|
||
|
|
ock will run the action.
|
||
|
|
Patterns can also be any Boolean expression,
|
||
|
|
any string or numeric expression will be, which will be evaluated in a Boolean context,
|
||
|
|
which I'll get to a little later,
|
||
|
|
or a range pattern, which is I believe can only be regular expressions, and that will run
|
||
|
|
from a line matching the first expression until a line matching the second expression.
|
||
|
|
Now within an action, the syntax is really very similar to the C programming language.
|
||
|
|
That's actually one of the reasons I like ock so much.
|
||
|
|
I like my language is bracy, but unlike C, ock has a few special capabilities
|
||
|
|
that really make it nice as a kind of quick one-liner or prototype in kind of language.
|
||
|
|
And the main feature, the main difference between ock and C,
|
||
|
|
is that ock is essentially very loosely typed. There are only two types in ock.
|
||
|
|
There is an array type, which is essentially an associative array or hash table,
|
||
|
|
and then there are scalar types, which are essentially stored as strings,
|
||
|
|
but depending on the context in which they are evaluated,
|
||
|
|
I can represent a string, a number or Boolean.
|
||
|
|
Now a string context is essentially any operation that expects a string,
|
||
|
|
a function that operate on a string, a string concatenation, and so on.
|
||
|
|
And since that's the internal representation of the variables, that works just fine.
|
||
|
|
Now a numeric context is anything that expects a number.
|
||
|
|
I like that once again, functions that take a number,
|
||
|
|
printf format specifiers that expect a number,
|
||
|
|
mathematical operators, there are actually no bitwise operators in ock.
|
||
|
|
A few ock variants do provide some extensions that give bitwise functions,
|
||
|
|
but I'm going to avoid those and try to stick to the ock specified by posics,
|
||
|
|
and I'll provide a link to their specification in the show notes.
|
||
|
|
While that description is somewhat technical, it really does give a very good description,
|
||
|
|
and a very precise description of how ock is supposed to function.
|
||
|
|
So any strings evaluated in a numeric context will try to parse
|
||
|
|
base 10 or a decimal number from the beginning of the string.
|
||
|
|
If that fails, it will essentially be evaluated as not a number,
|
||
|
|
otherwise will be converted to a floating point.
|
||
|
|
Representation.
|
||
|
|
Now a Boolean context is anything where a Boolean is expected.
|
||
|
|
Conditions in a branch or a loop
|
||
|
|
as the expression for a pattern
|
||
|
|
or in context with any of the Boolean operators.
|
||
|
|
Now in that context, any uninitialized variable is considered false.
|
||
|
|
Any variable that has been assigned a value is essentially
|
||
|
|
attempted to be treated in an numeric context
|
||
|
|
in which case a zero is considered false, anything else is considered true,
|
||
|
|
and if that fails, it's considered in a string context in which case the empty string is false,
|
||
|
|
and all other strings are true.
|
||
|
|
Standard ock provides a very small set of built-in functions.
|
||
|
|
There are some to do regular expression-based substitution on strings.
|
||
|
|
There's one for splitting a string into an array.
|
||
|
|
There are various formatted printing functions
|
||
|
|
and a handful of mathematical functions,
|
||
|
|
square root trigonometric functions, and so on.
|
||
|
|
Now ock is rather famous for appearing in extremely
|
||
|
|
terse, difficult to read, one-liners, and are directly at the command prompt.
|
||
|
|
But I would argue that the best way to get to learn ock is actually to treat it like any other
|
||
|
|
programming language and have a nicely formatted ock program in a file,
|
||
|
|
where you've got your rules all laid out and well-indented.
|
||
|
|
Not only does that make it easier to read and maintain your ock program,
|
||
|
|
but in the
|
||
|
|
canoe ock package known as Gawk,
|
||
|
|
they provide decock, which is an interactive ock debugger.
|
||
|
|
Now it's somewhat limited compared to more featureful debuggers like GDB,
|
||
|
|
but it does allow you to step break points, step through an ock program,
|
||
|
|
display the contents of line fields and variables,
|
||
|
|
and really has proven quite useful to me in working on some more complicated programs.
|
||
|
|
But once again, line-based break points are essentially useless if your program is all on one line,
|
||
|
|
so again the need for readable formatting.
|
||
|
|
Another resource that may be of use if anyone is trying to learn ock
|
||
|
|
would be the book of the ock programming language written by the authors of the language,
|
||
|
|
and occasionally you can find this book available for free download as a PDF online,
|
||
|
|
otherwise I'm sure various bookstores will carry it,
|
||
|
|
Amazon, and so forth. This book, while ostensibly specific to ock,
|
||
|
|
comes up time and time again in recommendations for general programming books.
|
||
|
|
And that's due to the fact that the book is written not just as a reference to the language,
|
||
|
|
although it does provide one, but it gives examples of real world problems
|
||
|
|
that can be addressed with programming and explains how those problems can be solved using ock,
|
||
|
|
and really tackles some issues of programming as a whole, modularity, code reuse, and so forth.
|
||
|
|
So I've skimmed this book, I've called it for ideas,
|
||
|
|
but I think I've seen enough that I can give it a recommendation that if you're interested
|
||
|
|
in the ock language, this is definitely a book worth reading.
|
||
|
|
And just as an example of some of the things that can be done with ock,
|
||
|
|
running formulas over tables of data, similar to things you might do in a spreadsheet program,
|
||
|
|
calculating relational joins between files, so essentially taking flat files and treating them
|
||
|
|
like database tables. And in someone more personal example, using it for processing
|
||
|
|
Morse code audio. So if that peaked your interest, I would definitely encourage you to check out
|
||
|
|
this language. I've found it quite useful, and I hope you will the same. Enjoy.
|
||
|
|
You have been listening to Hacker Public Radio at Hacker Public Radio.
|
||
|
|
We are a community podcast network that releases shows every weekday Monday through Friday.
|
||
|
|
Today's show, like all our shows, was contributed by a HPR listener like yourself.
|
||
|
|
If you ever consider recording a podcast, then visit our website to find out how easy it really is.
|
||
|
|
Hacker Public Radio was founded by the digital dog pound and the economical and
|
||
|
|
computer cloud. HPR is funded by the binary revolution at binref.com. All binref projects are
|
||
|
|
crowd-responsive by linear pages. From shared hosting to custom private clouds,
|
||
|
|
go to lunar pages.com for all your hosting needs. Unless otherwise stasis, today's show is
|
||
|
|
released under a creative commons, attribution, share a life, 3.0 license.
|