Files
hpr-knowledge-base/hpr_transcripts/hpr4114.txt
Lee Hanken 7c8efd2228 Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use
- Search episodes, transcripts, hosts, and series
- 4,511 episodes with metadata and transcripts
- Data loader with in-memory JSON storage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00

324 lines
20 KiB
Plaintext

Episode: 4114
Title: HPR4114: Introduction to jq - part 2
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr4114/hpr4114.mp3
Transcribed: 2025-10-25 19:48:39
---
This is Hacker Public Radio Episode 4114 for Thursday 9 May 2024.
Today's show is entitled Introduction to JQ Part 2.
It is hosted by Dave Morris and is about 25 minutes long.
It carries an explicit flag.
The summary is Options to JQ Learning about Filters.
Hello everybody, my name is Dave Morris, welcome to Hacker Public Radio.
This show is called An Introduction to JQ and it's Part 2 of a series.
Today I'm going to be talking about the options to the JQ command and
start talking about filters.
That's where most of the important stuff is for the use of JQ in the filters.
So in the last episode we looked at how JSON data is structured.
Fairly simple, but there are rules of course.
And we saw how you could feed JSON data through JQ and get a reformatted output
which could be printed or displayed or whatever.
Now we're going to look at a few of the options to the JQ command.
There's quite a large number of them which you can get from Man JQ,
but I'm not going to cover them all because some of them are really quite obscure.
And if I think that they are necessary, I will talk about them when we get to that part of the subject.
JQ is invoked in usual sort of way of Unix commands.
JQ is the command, it can be followed by options.
And then it can be, it needs to be followed by a filter.
We've already seen the simplest filter in the last episode.
And then optionally it can be followed by names of files that contain the JSON data.
So it's quite usual to be running this against a file because it's good way to capture stuff.
But you can also run JQ against data coming down from the web
on that and being delivered to the STDN channel, standard in.
And we saw how to do that with curl last time.
Third terms are options.
There's obviously a help option which is useful for checking out summary of what's available.
Then hyphen F followed by a file name or hyphen hyphen from hyphen file
followed by a file name will read the filter from a file.
And this is quite important.
It's a bit like the ORC stuff which I did with Mr. Young a few years ago
where you can prepare quite a complex program because that's what a filter is basically.
And put it in a file.
You can also put comments in there so it makes it easier to read and so forth.
Sometimes filters can be quite large.
So this is a great way of doing things.
And then you can point JQ at the file and on you go.
So that's an option as opposed to the files that's got the data in of course.
Then you can use the option hyphen hyphen compact hyphen output
and the alternative is hyphen lower kc.
Now by default as we already saw, JQ pretty prints at JSON output.
It lays it all out with lots of new lines in it and it also colors it.
Which I didn't show because I can't really do it justice in these notes.
But you can write it out in a more compact form
which you could then store away after you've done some work on it.
Some changes to it perhaps.
So additions you could store it in a file so that whatever needs it can just get it
from a more compact file rather than having to read all the laid out for human readability stuff.
Then there's another option hyphen hyphen color hyphen output
or alternatively hyphen capital C.
And then there's a corresponding option hyphen hyphen monochrome hyphen output
or hyphen capital M.
So this is all about the colors that JQ produces
to highlight the elements of the JSON it's displaying.
If it's writing to a terminal by default it will generate colored output.
You can also force it to produce color even if you're writing to a pipe or to a file
although putting it in a file might not be a smart move.
And you can enable the color with hyphen capital C and disable it with hyphen capital M.
I'll talk a little bit very very soon about how useful that can be.
You can also change the way the indentation is done.
There's two real ways of doing this.
One is by using the option hyphen hyphen tab.
Let's get, of course, to use a tab for each indentation level
instead of two spaces.
Personally, I don't find that useful at all, but you might do.
Then there's hyphen hyphen indent followed by a number.
That's a number of spaces for indentation and you can't have more than seven.
You're going to have more than two.
I'm going to do less I suppose.
So I made a note here just to enlarge on this business of color.
And what I often do is to use JQ to take a file and display it just using the dot filter.
But I switch on color, force it to produce color with the hyphen capital C.
And then I pipe it to less.
Less won't display colors unless you use the hyphen capital R option.
But this formula that I've written down in the notes here is quite useful thing to remember.
I find anyway, because then you can page through a large piece of JSON
and still see all the colors, which are quite useful for identifying the start and things.
Okay, that's enough about options, I think.
Let's look at filters.
This is where most of the content in this episode will be.
And in fact, the series is all will be about filters larger.
It's going to take a few shows to get through a good proportion of them.
So we saw the in the last episode we talked about JSON containing arrays and objects.
And raise if you remember, enclosed in square brackets.
And there elements can be any of the data types we saw.
I listed them out in the notes last time.
So you can have an array of arrays, array of objects, array of both objects and arrays.
All of these are possible.
They can be simpler items in your array, of course, numbers and strings.
Object on the other hand, contain collections of key items
where the keys are strings, various types and the values they are associated with
can be any of the data types.
We saw an example of that.
You looked at moderately closely last episode.
So put some examples here just to remind you there's some simple arrays.
So the square bracket 1, 2, 3,
closed square bracket, which is just a simple three element array with three integers.
And I've done the same again except that there's a fourth element and the fourth element is an array.
So that's in square brackets, 4, 5, 6.
So it ends with a double closed square bracket.
There's one containing the three strings hacker comma public comma radio.
Remember that strings in JSON have to be enclosed in double quotes.
And there's another one containing the names of all of the days of the week.
And if we look at simple objects,
well, objects tend to be a bit more complicated.
But here's one where it contains two elements.
One is got the key name.
Remember the keys are strings.
They've been closed in double quotes and then they're followed by a colon.
And in this particular case, the name is the key to hacker public radio
as a string.
And then the next key is type and the colon, of course.
And in the string associated with it is podcast.
So I thought, oh, this is great.
I probably won't come up with anything really interesting in the way of objects.
So how about looking around on the internet for places that will generate you bits of sample JSON to to play with?
I found one called random user generator API.
So it's really for people who are making testing out software that collects information
about users on their system.
Maybe they want to register them and give them museums and names and passwords and stuff.
I ran this, which is quite cool.
Actually, if this sounds all interesting to you,
probably find it's quite an entertaining thing to run.
It generates a lot of information for each person.
What I did with it was I think it makes an array
of objects, if you ask for more than one, maybe ask for one, I don't remember.
But what I did was to extract one as an object.
I know you return certain parts of it.
So what we have is the key's gender,
and that's followed by the word female.
And then we've got name, which is an object in itself.
Title Mrs. First Jenny Last Silver.
So that's the components of the person's name.
This imaginary person's name.
DOB is consists of date and age and the date of birth.
The age is 74, 1950 is the date of birth.
Generally enough, this date contains, is that microseconds?
And I don't think anybody's birth date is likely to be stored in terms of microseconds.
But this is sort of randomly generated stuff, so we forgive it.
N-A-T, which presumably means nationality, is GB.
I asked for only people that could be in United Kingdom.
But have a look at that.
You probably look at that and realize the components of it
are string two objects and another string.
That will give you some sort of feel for what Jason can look like.
I did another one, found another source.
This is a project on GitHub, where people are collecting together country information.
So they have a, I think there's an interface to it,
but I didn't actually look into it.
I just found that there was a file called countries.json,
which I grabbed and pulled bits out of.
And what I've put here is the entry for the country of Mexico.
So there's a name object which contains the common name
for the country, the official name, then it talks about.
The native reference to the name of the country is in Spanish and it shows it here.
And then there's an array for the capital.
I think when I looked, there's several countries that have multiple capital,
which I wasn't aware of.
So there's some fascinating information, if you're interested, in geography and everything.
Yeah, I could waste hours fiddling with this sort of data.
And then I included into this an array.
So there's a key array,
it's a key leading to an array, just like with the capital called borders.
And borders is an array consisting of, in this case, three strings, three letters strings.
And the strings are the short names, short form names of the bordering countries.
So BLZ is Belize, GTM is Guatemala and USA, here's obviously USA.
So those are the bordering countries.
There's other entries which I didn't copy about, whether it's got a coast or not,
whether it's land locked and so on.
You might find it interesting, I certainly did.
I'm going to be using this again.
So that's really just talking about Jason again,
just so that we've got things to look at, which are,
and also to do a little bit of filtering on that we can reference back to.
So let's get into filters then.
The first one is called the identity filter.
And it's the simplest filter.
We've already encountered it, it's adopted, it's a full stop.
Usually in single quotes remember,
because other bits of filters might contain the characters which are rather than to bash.
So you want to be really careful that you are not accidentally triggering bash to interpret them.
So what I've done here is, well, we already know this filter
simply takes the input from a file or from standard in and produces the same value,
but it pretty prints it by default.
So I've got an example here where I echo in Square Breck,
it's Hacker Public Radio, as I did before, pipe that into JQ.
This because I didn't use the quotes, it does actually work.
But if you don't just get into the habit of quoting around your filters,
then you're going to get caught out.
So that was just a proof that it works.
And it lays it out with the open square bracket on one line,
then Hacker, comma, public, comma, radio on separate lines,
and then the close square bracket on the last line.
So that's great.
And if coloring is relevant, it will do it.
But I'm not catering for colors in these notes.
If you're using this technique to display numbers,
there are issues about the way in which JQ stores numbers and how it then represents them.
So it tries to use exponential notation in many cases.
So it can be a teeny bit confusing.
I didn't think it was worth going into in the notes,
but I pointed to where you can find more in the documentation.
So that's the most basic filter.
And I've got the next one is the object identifier index filter.
I'm using the terminology from the JQ documentation.
It doesn't mean a huge not to me,
but I think you'll get it in a minute when we get a bit further.
So this form of filter refers to object keys.
And to get a key is usually referenced with a full stop
followed by the name of the key.
So in the HBR statistics data that we looked at in the last show,
there's a top-level key, hosts,
which refers to the number of currently registered hosts.
And if you have run curl and written the output to a file,
which I recommend, rather than running curl for every time you run JQ,
then in my case, I've assumed that it's in a file called stats.json.
Then you can type the command line JQ, single quote, dot hosts, close quote, stats.json.
And you will get 357.
Well, you will do the day I'm recording this, but it will change.
Hopefully it will change.
Often, there's also a key, which I didn't mention in the last show,
which is the first one that you see when you look at the JSON.
And it's stats underscore generated.
So this is a Unix time, which is the second since the Unix epoch,
which is first of January 1970, midnight, I think.
Now, you think, oh, yeah, that's all fine.
But how do I turn seconds from 1970 into a date?
Well, the answer is you can, but you can do it in JQ, actually.
But I'm not going to talk about that until later.
But if you wanted to do this, you can feed what is returned by JQ into, you feed it into the date command.
So I've got the example here, date, high from D, then in double quotes,
because we're doing a month substitution.
First of all, at sign, then dollar open parentheses, JQ, single quote, dot stats underscore generated,
close quotes, stats dot JSON, close parentheses, close double quotes.
Then, so that's saying, here's a date, I want you to print it in this format.
The format that's been requested is, and that these formats always have to begin with a plus,
plus single quote, percent capital F, space, percent capital T, close quote.
So what that returns, again, this is in the sumble I heard when I was preparing these notes.
2024, iPhone 04, iPhone 18, space 15, call on 30, call on 07.
So if you give the iPhone D option to date a unique time, and you proceed it with an at sign,
which says this is a unique time, and it will be converted by date into proper, proper date,
because it's read as an epoch time.
As it stands, it gave me a time relative to my local time, which is UTC plus 100,
no 100, one hour. In other words, it's third day saving time for UK, which is called BST.
So doing this way, you put a full stop, and then the name of a key, it only works in JQ,
when the keys contain only ASCII characters, and underscores. Don't start with a digit.
So if you want to use other characters, or you want to start with a digit for that matter,
then you have to enclose the key in double quotes, or square brackets and double quotes.
So imagine if the JSON file you're processing is got stats hyphen generated as a key,
then you'd have to put dot, open, double quotes, stats hyphen generated,
close quotes, and that would work, because the double quotes effectively protect the fact that it's
not a sort of standard key. Or you could put square brackets around the whole thing, so it'd be
dot, open square bracket, double quotes, stats hyphen generated, close, double quotes,
close, square bracket. But this general form of dot, open square bracket,
string is valid in all contexts, that's a sort of basic way in which you refer to a key,
but they're nice, some nice shortcuts to avoid having to type all that stuff, and string
in this context, obviously means the JSON string and double quotes. And this is referred to in
documentation as an object index. So however, another example, when we were looking at the
the HBR statistics last time, there's a field next underscore free, which is the number of
shows until the next free slot, how close we're getting to falling off that cliff that I see
looming so often. So if you look at the file, I've got it in these notes, but it's in the
previous ones. If you look at an example of it, you'll find that the next free is actually a key
within an object, where the object is called slot, sorry. So if you used the command jq, quote,
dot slot, close quote, stats dot JSON, you will get back an object, open curly bracket, then the
string next underscore free, call on the number eight, and so forth. There's another one there,
another key in there. So we got back an object, but we actually want the value in it. So we went
to the object, we asked for the key slot, which gave us an object. So we actually want to get into
that object and get the next thing. So what we can do in jq filters is we can chain the filters.
So if you give it the filter expression, open single quote dot slot, then follow that by a pipe
symbol, and then dot next underscore free, what will happen is the pipe symbol means run the first
filter and then pass it to the second filter. So running the first filter gets back the object that
we just saw in the previously on the page, and the second filter gets that specific item out of
that object. Luckily, well, maybe not, maybe it's lucky or not, but you can write this in a
shorthand way. So your filter can be single quote dot slot dot next underscore free, close quote,
and that chains the two together without the need for the pipe. You will probably find
that there will be cases where you need to use the pipe because the shorthand doesn't get you
where you want to be. And we'll be looking at some of those cases in the next episode probably.
So you can see that, and this is the thing that Mr. X was doing in one of his recent shows that I
mentioned last time, was getting out that number. So I think he wanted to alert himself to the fact
that HBO's running out of shows. You can do that with jq on the command line, and you could write
a basket branded that flashes a message or rings a bell or something or other based on that value.
I like to think of this dot slot dot next underscore free thing. It's a bit like a file system path
where you put directory names, but you separate them and slashes. So it is like that. It's sort of
a hierarchical reference to objects within objects and so forth. It makes the extraction of the
desired data easier to visualize, I think, I do really like that capability. So last filter for this
episode is an array index. Really, this is pretty simple. I think you'll find if you've had
involved with programming languages with arrays, that's everything, isn't it? Most languages,
anyway. So we saw the dot square bracket string where string is a key in an object. So it makes
sense for array indexing to be dot square bracket number, a closed square bracket. The number
represents an integer starting at zero or a negative integer, which is interesting. Meaning of a
negative number is to count backwards from the last element of the array. And obviously a positive
integer is the element number, but it starts at zero. So if you've ran the example here,
echo, and then that's that array, which contains the names of the days of the week. And you pipe
that into jq, and the filter is quote dot square brackets one, then it will return element one,
which is Monday. So it starts Sunday in this particular case, yeah. Then I do another example
where we're echoing another array, but we've got the abbreviated names of the month. This time,
the filter is quote dot open bracket minus one, closed bracket quote, and it returns sat,
because minus one means the last element, as I said already, minus two would be Friday.
And so the last example is the array, the nested array thing I'm referenced earlier on. So echo,
quote, open bracket one, two, three, each with a comma, then another bracket four, five, six,
and then close the two square brackets, pipe that into jq, and ask it for square bracket,
minus one. So the last element in that element array is an array. So you get back an array,
which is laid out one, one element per line, and the brackets on separate lines and so on. So
hopefully that, that is all quite clear. I think if you're a programmer saying Python, this sort of
concept is not going to be particularly difficult. jq does have its own idiosyncrasies, which we'll
look at more. So we're going to end it there. And there's some links to the various things I've
talked about, documentation and so forth, and in case you need to follow through. All right then.
Thanks very much. Bye.
You have been listening to Hecker Public Radio at Hecker Public Radio. Does it work?
Today's show was contributed by a HBR listener like yourself. If you ever thought of
podcast, you click on our contribute link to find out how easy it really is. Hosting for HBR has
been kindly provided by an honesthost.com, the internet archive, and our sync.net. On the
Sadois status, today's show is released on their creative commons, attribution, 4.0 international
license.