- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
324 lines
20 KiB
Plaintext
324 lines
20 KiB
Plaintext
Episode: 4114
|
|
Title: HPR4114: Introduction to jq - part 2
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr4114/hpr4114.mp3
|
|
Transcribed: 2025-10-25 19:48:39
|
|
|
|
---
|
|
|
|
This is Hacker Public Radio Episode 4114 for Thursday 9 May 2024.
|
|
Today's show is entitled Introduction to JQ Part 2.
|
|
It is hosted by Dave Morris and is about 25 minutes long.
|
|
It carries an explicit flag.
|
|
The summary is Options to JQ Learning about Filters.
|
|
Hello everybody, my name is Dave Morris, welcome to Hacker Public Radio.
|
|
This show is called An Introduction to JQ and it's Part 2 of a series.
|
|
Today I'm going to be talking about the options to the JQ command and
|
|
start talking about filters.
|
|
That's where most of the important stuff is for the use of JQ in the filters.
|
|
So in the last episode we looked at how JSON data is structured.
|
|
Fairly simple, but there are rules of course.
|
|
And we saw how you could feed JSON data through JQ and get a reformatted output
|
|
which could be printed or displayed or whatever.
|
|
Now we're going to look at a few of the options to the JQ command.
|
|
There's quite a large number of them which you can get from Man JQ,
|
|
but I'm not going to cover them all because some of them are really quite obscure.
|
|
And if I think that they are necessary, I will talk about them when we get to that part of the subject.
|
|
JQ is invoked in usual sort of way of Unix commands.
|
|
JQ is the command, it can be followed by options.
|
|
And then it can be, it needs to be followed by a filter.
|
|
We've already seen the simplest filter in the last episode.
|
|
And then optionally it can be followed by names of files that contain the JSON data.
|
|
So it's quite usual to be running this against a file because it's good way to capture stuff.
|
|
But you can also run JQ against data coming down from the web
|
|
on that and being delivered to the STDN channel, standard in.
|
|
And we saw how to do that with curl last time.
|
|
Third terms are options.
|
|
There's obviously a help option which is useful for checking out summary of what's available.
|
|
Then hyphen F followed by a file name or hyphen hyphen from hyphen file
|
|
followed by a file name will read the filter from a file.
|
|
And this is quite important.
|
|
It's a bit like the ORC stuff which I did with Mr. Young a few years ago
|
|
where you can prepare quite a complex program because that's what a filter is basically.
|
|
And put it in a file.
|
|
You can also put comments in there so it makes it easier to read and so forth.
|
|
Sometimes filters can be quite large.
|
|
So this is a great way of doing things.
|
|
And then you can point JQ at the file and on you go.
|
|
So that's an option as opposed to the files that's got the data in of course.
|
|
Then you can use the option hyphen hyphen compact hyphen output
|
|
and the alternative is hyphen lower kc.
|
|
Now by default as we already saw, JQ pretty prints at JSON output.
|
|
It lays it all out with lots of new lines in it and it also colors it.
|
|
Which I didn't show because I can't really do it justice in these notes.
|
|
But you can write it out in a more compact form
|
|
which you could then store away after you've done some work on it.
|
|
Some changes to it perhaps.
|
|
So additions you could store it in a file so that whatever needs it can just get it
|
|
from a more compact file rather than having to read all the laid out for human readability stuff.
|
|
Then there's another option hyphen hyphen color hyphen output
|
|
or alternatively hyphen capital C.
|
|
And then there's a corresponding option hyphen hyphen monochrome hyphen output
|
|
or hyphen capital M.
|
|
So this is all about the colors that JQ produces
|
|
to highlight the elements of the JSON it's displaying.
|
|
If it's writing to a terminal by default it will generate colored output.
|
|
You can also force it to produce color even if you're writing to a pipe or to a file
|
|
although putting it in a file might not be a smart move.
|
|
And you can enable the color with hyphen capital C and disable it with hyphen capital M.
|
|
I'll talk a little bit very very soon about how useful that can be.
|
|
You can also change the way the indentation is done.
|
|
There's two real ways of doing this.
|
|
One is by using the option hyphen hyphen tab.
|
|
Let's get, of course, to use a tab for each indentation level
|
|
instead of two spaces.
|
|
Personally, I don't find that useful at all, but you might do.
|
|
Then there's hyphen hyphen indent followed by a number.
|
|
That's a number of spaces for indentation and you can't have more than seven.
|
|
You're going to have more than two.
|
|
I'm going to do less I suppose.
|
|
So I made a note here just to enlarge on this business of color.
|
|
And what I often do is to use JQ to take a file and display it just using the dot filter.
|
|
But I switch on color, force it to produce color with the hyphen capital C.
|
|
And then I pipe it to less.
|
|
Less won't display colors unless you use the hyphen capital R option.
|
|
But this formula that I've written down in the notes here is quite useful thing to remember.
|
|
I find anyway, because then you can page through a large piece of JSON
|
|
and still see all the colors, which are quite useful for identifying the start and things.
|
|
Okay, that's enough about options, I think.
|
|
Let's look at filters.
|
|
This is where most of the content in this episode will be.
|
|
And in fact, the series is all will be about filters larger.
|
|
It's going to take a few shows to get through a good proportion of them.
|
|
So we saw the in the last episode we talked about JSON containing arrays and objects.
|
|
And raise if you remember, enclosed in square brackets.
|
|
And there elements can be any of the data types we saw.
|
|
I listed them out in the notes last time.
|
|
So you can have an array of arrays, array of objects, array of both objects and arrays.
|
|
All of these are possible.
|
|
They can be simpler items in your array, of course, numbers and strings.
|
|
Object on the other hand, contain collections of key items
|
|
where the keys are strings, various types and the values they are associated with
|
|
can be any of the data types.
|
|
We saw an example of that.
|
|
You looked at moderately closely last episode.
|
|
So put some examples here just to remind you there's some simple arrays.
|
|
So the square bracket 1, 2, 3,
|
|
closed square bracket, which is just a simple three element array with three integers.
|
|
And I've done the same again except that there's a fourth element and the fourth element is an array.
|
|
So that's in square brackets, 4, 5, 6.
|
|
So it ends with a double closed square bracket.
|
|
There's one containing the three strings hacker comma public comma radio.
|
|
Remember that strings in JSON have to be enclosed in double quotes.
|
|
And there's another one containing the names of all of the days of the week.
|
|
And if we look at simple objects,
|
|
well, objects tend to be a bit more complicated.
|
|
But here's one where it contains two elements.
|
|
One is got the key name.
|
|
Remember the keys are strings.
|
|
They've been closed in double quotes and then they're followed by a colon.
|
|
And in this particular case, the name is the key to hacker public radio
|
|
as a string.
|
|
And then the next key is type and the colon, of course.
|
|
And in the string associated with it is podcast.
|
|
So I thought, oh, this is great.
|
|
I probably won't come up with anything really interesting in the way of objects.
|
|
So how about looking around on the internet for places that will generate you bits of sample JSON to to play with?
|
|
I found one called random user generator API.
|
|
So it's really for people who are making testing out software that collects information
|
|
about users on their system.
|
|
Maybe they want to register them and give them museums and names and passwords and stuff.
|
|
I ran this, which is quite cool.
|
|
Actually, if this sounds all interesting to you,
|
|
probably find it's quite an entertaining thing to run.
|
|
It generates a lot of information for each person.
|
|
What I did with it was I think it makes an array
|
|
of objects, if you ask for more than one, maybe ask for one, I don't remember.
|
|
But what I did was to extract one as an object.
|
|
I know you return certain parts of it.
|
|
So what we have is the key's gender,
|
|
and that's followed by the word female.
|
|
And then we've got name, which is an object in itself.
|
|
Title Mrs. First Jenny Last Silver.
|
|
So that's the components of the person's name.
|
|
This imaginary person's name.
|
|
DOB is consists of date and age and the date of birth.
|
|
The age is 74, 1950 is the date of birth.
|
|
Generally enough, this date contains, is that microseconds?
|
|
And I don't think anybody's birth date is likely to be stored in terms of microseconds.
|
|
But this is sort of randomly generated stuff, so we forgive it.
|
|
N-A-T, which presumably means nationality, is GB.
|
|
I asked for only people that could be in United Kingdom.
|
|
But have a look at that.
|
|
You probably look at that and realize the components of it
|
|
are string two objects and another string.
|
|
That will give you some sort of feel for what Jason can look like.
|
|
I did another one, found another source.
|
|
This is a project on GitHub, where people are collecting together country information.
|
|
So they have a, I think there's an interface to it,
|
|
but I didn't actually look into it.
|
|
I just found that there was a file called countries.json,
|
|
which I grabbed and pulled bits out of.
|
|
And what I've put here is the entry for the country of Mexico.
|
|
So there's a name object which contains the common name
|
|
for the country, the official name, then it talks about.
|
|
The native reference to the name of the country is in Spanish and it shows it here.
|
|
And then there's an array for the capital.
|
|
I think when I looked, there's several countries that have multiple capital,
|
|
which I wasn't aware of.
|
|
So there's some fascinating information, if you're interested, in geography and everything.
|
|
Yeah, I could waste hours fiddling with this sort of data.
|
|
And then I included into this an array.
|
|
So there's a key array,
|
|
it's a key leading to an array, just like with the capital called borders.
|
|
And borders is an array consisting of, in this case, three strings, three letters strings.
|
|
And the strings are the short names, short form names of the bordering countries.
|
|
So BLZ is Belize, GTM is Guatemala and USA, here's obviously USA.
|
|
So those are the bordering countries.
|
|
There's other entries which I didn't copy about, whether it's got a coast or not,
|
|
whether it's land locked and so on.
|
|
You might find it interesting, I certainly did.
|
|
I'm going to be using this again.
|
|
So that's really just talking about Jason again,
|
|
just so that we've got things to look at, which are,
|
|
and also to do a little bit of filtering on that we can reference back to.
|
|
So let's get into filters then.
|
|
The first one is called the identity filter.
|
|
And it's the simplest filter.
|
|
We've already encountered it, it's adopted, it's a full stop.
|
|
Usually in single quotes remember,
|
|
because other bits of filters might contain the characters which are rather than to bash.
|
|
So you want to be really careful that you are not accidentally triggering bash to interpret them.
|
|
So what I've done here is, well, we already know this filter
|
|
simply takes the input from a file or from standard in and produces the same value,
|
|
but it pretty prints it by default.
|
|
So I've got an example here where I echo in Square Breck,
|
|
it's Hacker Public Radio, as I did before, pipe that into JQ.
|
|
This because I didn't use the quotes, it does actually work.
|
|
But if you don't just get into the habit of quoting around your filters,
|
|
then you're going to get caught out.
|
|
So that was just a proof that it works.
|
|
And it lays it out with the open square bracket on one line,
|
|
then Hacker, comma, public, comma, radio on separate lines,
|
|
and then the close square bracket on the last line.
|
|
So that's great.
|
|
And if coloring is relevant, it will do it.
|
|
But I'm not catering for colors in these notes.
|
|
If you're using this technique to display numbers,
|
|
there are issues about the way in which JQ stores numbers and how it then represents them.
|
|
So it tries to use exponential notation in many cases.
|
|
So it can be a teeny bit confusing.
|
|
I didn't think it was worth going into in the notes,
|
|
but I pointed to where you can find more in the documentation.
|
|
So that's the most basic filter.
|
|
And I've got the next one is the object identifier index filter.
|
|
I'm using the terminology from the JQ documentation.
|
|
It doesn't mean a huge not to me,
|
|
but I think you'll get it in a minute when we get a bit further.
|
|
So this form of filter refers to object keys.
|
|
And to get a key is usually referenced with a full stop
|
|
followed by the name of the key.
|
|
So in the HBR statistics data that we looked at in the last show,
|
|
there's a top-level key, hosts,
|
|
which refers to the number of currently registered hosts.
|
|
And if you have run curl and written the output to a file,
|
|
which I recommend, rather than running curl for every time you run JQ,
|
|
then in my case, I've assumed that it's in a file called stats.json.
|
|
Then you can type the command line JQ, single quote, dot hosts, close quote, stats.json.
|
|
And you will get 357.
|
|
Well, you will do the day I'm recording this, but it will change.
|
|
Hopefully it will change.
|
|
Often, there's also a key, which I didn't mention in the last show,
|
|
which is the first one that you see when you look at the JSON.
|
|
And it's stats underscore generated.
|
|
So this is a Unix time, which is the second since the Unix epoch,
|
|
which is first of January 1970, midnight, I think.
|
|
Now, you think, oh, yeah, that's all fine.
|
|
But how do I turn seconds from 1970 into a date?
|
|
Well, the answer is you can, but you can do it in JQ, actually.
|
|
But I'm not going to talk about that until later.
|
|
But if you wanted to do this, you can feed what is returned by JQ into, you feed it into the date command.
|
|
So I've got the example here, date, high from D, then in double quotes,
|
|
because we're doing a month substitution.
|
|
First of all, at sign, then dollar open parentheses, JQ, single quote, dot stats underscore generated,
|
|
close quotes, stats dot JSON, close parentheses, close double quotes.
|
|
Then, so that's saying, here's a date, I want you to print it in this format.
|
|
The format that's been requested is, and that these formats always have to begin with a plus,
|
|
plus single quote, percent capital F, space, percent capital T, close quote.
|
|
So what that returns, again, this is in the sumble I heard when I was preparing these notes.
|
|
2024, iPhone 04, iPhone 18, space 15, call on 30, call on 07.
|
|
So if you give the iPhone D option to date a unique time, and you proceed it with an at sign,
|
|
which says this is a unique time, and it will be converted by date into proper, proper date,
|
|
because it's read as an epoch time.
|
|
As it stands, it gave me a time relative to my local time, which is UTC plus 100,
|
|
no 100, one hour. In other words, it's third day saving time for UK, which is called BST.
|
|
So doing this way, you put a full stop, and then the name of a key, it only works in JQ,
|
|
when the keys contain only ASCII characters, and underscores. Don't start with a digit.
|
|
So if you want to use other characters, or you want to start with a digit for that matter,
|
|
then you have to enclose the key in double quotes, or square brackets and double quotes.
|
|
So imagine if the JSON file you're processing is got stats hyphen generated as a key,
|
|
then you'd have to put dot, open, double quotes, stats hyphen generated,
|
|
close quotes, and that would work, because the double quotes effectively protect the fact that it's
|
|
not a sort of standard key. Or you could put square brackets around the whole thing, so it'd be
|
|
dot, open square bracket, double quotes, stats hyphen generated, close, double quotes,
|
|
close, square bracket. But this general form of dot, open square bracket,
|
|
string is valid in all contexts, that's a sort of basic way in which you refer to a key,
|
|
but they're nice, some nice shortcuts to avoid having to type all that stuff, and string
|
|
in this context, obviously means the JSON string and double quotes. And this is referred to in
|
|
documentation as an object index. So however, another example, when we were looking at the
|
|
the HBR statistics last time, there's a field next underscore free, which is the number of
|
|
shows until the next free slot, how close we're getting to falling off that cliff that I see
|
|
looming so often. So if you look at the file, I've got it in these notes, but it's in the
|
|
previous ones. If you look at an example of it, you'll find that the next free is actually a key
|
|
within an object, where the object is called slot, sorry. So if you used the command jq, quote,
|
|
dot slot, close quote, stats dot JSON, you will get back an object, open curly bracket, then the
|
|
string next underscore free, call on the number eight, and so forth. There's another one there,
|
|
another key in there. So we got back an object, but we actually want the value in it. So we went
|
|
to the object, we asked for the key slot, which gave us an object. So we actually want to get into
|
|
that object and get the next thing. So what we can do in jq filters is we can chain the filters.
|
|
So if you give it the filter expression, open single quote dot slot, then follow that by a pipe
|
|
symbol, and then dot next underscore free, what will happen is the pipe symbol means run the first
|
|
filter and then pass it to the second filter. So running the first filter gets back the object that
|
|
we just saw in the previously on the page, and the second filter gets that specific item out of
|
|
that object. Luckily, well, maybe not, maybe it's lucky or not, but you can write this in a
|
|
shorthand way. So your filter can be single quote dot slot dot next underscore free, close quote,
|
|
and that chains the two together without the need for the pipe. You will probably find
|
|
that there will be cases where you need to use the pipe because the shorthand doesn't get you
|
|
where you want to be. And we'll be looking at some of those cases in the next episode probably.
|
|
So you can see that, and this is the thing that Mr. X was doing in one of his recent shows that I
|
|
mentioned last time, was getting out that number. So I think he wanted to alert himself to the fact
|
|
that HBO's running out of shows. You can do that with jq on the command line, and you could write
|
|
a basket branded that flashes a message or rings a bell or something or other based on that value.
|
|
I like to think of this dot slot dot next underscore free thing. It's a bit like a file system path
|
|
where you put directory names, but you separate them and slashes. So it is like that. It's sort of
|
|
a hierarchical reference to objects within objects and so forth. It makes the extraction of the
|
|
desired data easier to visualize, I think, I do really like that capability. So last filter for this
|
|
episode is an array index. Really, this is pretty simple. I think you'll find if you've had
|
|
involved with programming languages with arrays, that's everything, isn't it? Most languages,
|
|
anyway. So we saw the dot square bracket string where string is a key in an object. So it makes
|
|
sense for array indexing to be dot square bracket number, a closed square bracket. The number
|
|
represents an integer starting at zero or a negative integer, which is interesting. Meaning of a
|
|
negative number is to count backwards from the last element of the array. And obviously a positive
|
|
integer is the element number, but it starts at zero. So if you've ran the example here,
|
|
echo, and then that's that array, which contains the names of the days of the week. And you pipe
|
|
that into jq, and the filter is quote dot square brackets one, then it will return element one,
|
|
which is Monday. So it starts Sunday in this particular case, yeah. Then I do another example
|
|
where we're echoing another array, but we've got the abbreviated names of the month. This time,
|
|
the filter is quote dot open bracket minus one, closed bracket quote, and it returns sat,
|
|
because minus one means the last element, as I said already, minus two would be Friday.
|
|
And so the last example is the array, the nested array thing I'm referenced earlier on. So echo,
|
|
quote, open bracket one, two, three, each with a comma, then another bracket four, five, six,
|
|
and then close the two square brackets, pipe that into jq, and ask it for square bracket,
|
|
minus one. So the last element in that element array is an array. So you get back an array,
|
|
which is laid out one, one element per line, and the brackets on separate lines and so on. So
|
|
hopefully that, that is all quite clear. I think if you're a programmer saying Python, this sort of
|
|
concept is not going to be particularly difficult. jq does have its own idiosyncrasies, which we'll
|
|
look at more. So we're going to end it there. And there's some links to the various things I've
|
|
talked about, documentation and so forth, and in case you need to follow through. All right then.
|
|
Thanks very much. Bye.
|
|
You have been listening to Hecker Public Radio at Hecker Public Radio. Does it work?
|
|
Today's show was contributed by a HBR listener like yourself. If you ever thought of
|
|
podcast, you click on our contribute link to find out how easy it really is. Hosting for HBR has
|
|
been kindly provided by an honesthost.com, the internet archive, and our sync.net. On the
|
|
Sadois status, today's show is released on their creative commons, attribution, 4.0 international
|
|
license.
|