375 lines
32 KiB
Plaintext
375 lines
32 KiB
Plaintext
|
|
Episode: 3228
|
||
|
|
Title: HPR3228: YAML basics
|
||
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr3228/hpr3228.mp3
|
||
|
|
Transcribed: 2025-10-24 19:11:38
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
This is Haka Public Radio Episode 3228 for Wednesday 16th of December 2020.
|
||
|
|
Today's show is entitled, YAML Basics and in part on the series,
|
||
|
|
Programming 101, It is hosted by Klaatu and in about 34 minutes long and Karima Clean Flag.
|
||
|
|
The summer is, Learn about sequence and mapping in YAML.
|
||
|
|
This episode of HBR is brought to you by AnanasThost.com.
|
||
|
|
Get 15% discount on all shared hosting with the offer code HBR15.
|
||
|
|
That's HBR15.
|
||
|
|
Better web hosting that's Honest and Fair at AnanasThost.com.
|
||
|
|
Everyone, this is Klaatu, you're listening to Acro Public Radio.
|
||
|
|
This episode is about YAML.
|
||
|
|
YAML ain't a markup language.
|
||
|
|
Well, if it's not a markup language, what is it?
|
||
|
|
It is a data serialization format.
|
||
|
|
That's how it describes itself.
|
||
|
|
YAML is a, is for data serialization.
|
||
|
|
And what that means is that it is a text format with a specific structure.
|
||
|
|
We all know that you can have text formats with no structure,
|
||
|
|
just plain text, type some ASCII symbols into a file, save it.
|
||
|
|
Now you've got text without structure.
|
||
|
|
And that works for some things, for beat poetry and for random thoughts.
|
||
|
|
That works fine for programmatic things that you want to parse and understand and process.
|
||
|
|
That doesn't, that usually does not work so well.
|
||
|
|
And it is a lot easier to ingest data into a computer or an application running on a computer.
|
||
|
|
If you have some kind of predictability, some kind of preset structure.
|
||
|
|
You can, of course, invent your own structure.
|
||
|
|
It's not rocket science.
|
||
|
|
You could list, for instance, a series of animals in a text file
|
||
|
|
by placing the animal type or family or genus.
|
||
|
|
I don't really know scientific terms in one column.
|
||
|
|
And then, delimited by a, I don't know, a space.
|
||
|
|
And then, list the other thing that animals are called, like their proper name,
|
||
|
|
their other name, their common name.
|
||
|
|
So, for instance, you might have penguin space, emperor, penguin space, gen two,
|
||
|
|
penguin space, rock hopper, and so on.
|
||
|
|
And then you might switch over to a different species.
|
||
|
|
For instance, cat space, house, cat space, lion, cat space.
|
||
|
|
I'm quickly realizing that I know nothing about animals or how they're categorized.
|
||
|
|
But anyway, the point is you've got this column of an animal type,
|
||
|
|
and then some delimiting character, and then the more common name for that animal.
|
||
|
|
That seems pretty simple, and you could do that,
|
||
|
|
and parse that pretty quickly, probably with cut or awk or whatever you want to use.
|
||
|
|
And that would work.
|
||
|
|
That would be fine.
|
||
|
|
But the problem is that when you're inventing your own schema like that,
|
||
|
|
you do frequently find that you've not accounted for something.
|
||
|
|
And so, in the moment, you're thinking, okay, well, I'll just make this up as I go along,
|
||
|
|
and it seems to be working for the first five, and then suddenly you hit the thing where it says,
|
||
|
|
penguin space, little space blue.
|
||
|
|
And now suddenly, your parser, which believes that it delimits each field by a space,
|
||
|
|
thinks that this penguin is simply called little, because it sees the space
|
||
|
|
between little and blue, little blue is a name of a New Zealand penguin.
|
||
|
|
And it throws the rest of it out, because it doesn't know,
|
||
|
|
it wasn't told about a third potential field.
|
||
|
|
So, it takes the first two and moves on to the next line, and so on.
|
||
|
|
YAML or any structured data format that is widely known helps you prevent those kinds of mistakes.
|
||
|
|
You can, because you learn that schema, you learn that method of serializing data,
|
||
|
|
and then you use it, and then you are able to leverage libraries and
|
||
|
|
applications, and other things that other people have developed to help you
|
||
|
|
parse the data that you've entered, or that you want to ingest one way or another.
|
||
|
|
That's the main, I think, advantage to, for instance, YAML.
|
||
|
|
There are other advantages, but I mean, that's kind of just the fact that it exists,
|
||
|
|
and that there are other people using it is one of the main advantages.
|
||
|
|
People also like YAML, because it appears to be relatively intuitive,
|
||
|
|
and that's kind of what I wanted to talk about today.
|
||
|
|
It's actually deceptively not as intuitive as you think, or said a different way.
|
||
|
|
It's very intuitive, but the thing that you figured out through your intuition might be wrong.
|
||
|
|
I'll start then with something that I rarely do start with, which is the wrong way to interpret,
|
||
|
|
YAML, and I don't generally like to talk about the wrong way of doing something,
|
||
|
|
because that generally just confuses people.
|
||
|
|
But in this case, I kind of feel like it's important to get this out in the open,
|
||
|
|
because when you look at YAML, you think that it looks more familiar than it actually is.
|
||
|
|
Here's what I mean. Let's say that you've got a YAML file, and you want to list those penguins.
|
||
|
|
You might think, okay, well, I get the gist, I get the idea here.
|
||
|
|
I do a dash and a space, and I put an item, and then what I want to, and remember, this is wrong,
|
||
|
|
and what I want to show a child-parent relationship, then I indent and continue with my dashes.
|
||
|
|
It's just like a bullet list, right? No, it's not. I'm telling you the wrong thing.
|
||
|
|
I'm telling you lies right now. Don't listen to me. Don't internalize any of this.
|
||
|
|
So you might think that YAML is sort of a structured bullet list, and as long as you've done a bullet
|
||
|
|
list in your notebook, like in a scrap piece of paper, when you're going out to do groceries or
|
||
|
|
something, you might think that it's essentially the same thing. It is not, though.
|
||
|
|
If you were to do that, if you were to, for instance, make a YAML file that opened with dash,
|
||
|
|
space penguins, and then the next line, space, space, so you're indenting now, dash, space,
|
||
|
|
emperor, next line, dash, or space, space, dash, space, gender, next line, space, space,
|
||
|
|
dash, and so on. So you're indenting things as you would for, you know, your own little,
|
||
|
|
if you were going out to the store, you would, you know, bullet point hardware store,
|
||
|
|
and then under that, you would put a little dash, maybe hammer, dash, nails, dash, screwdriver,
|
||
|
|
dash screws, and so on. And you would know, when you look at that, you think, okay, well,
|
||
|
|
I see that the heading here is hardware store, and under that are all the things that I need to get
|
||
|
|
from the hardware store, from within the hardware store. And then the next heading, which I'll
|
||
|
|
de-dent, will be the grocery store. And so then I'll put under that, all of the things that I need
|
||
|
|
within the grocery store, and so on. That's a very intuitive kind of natural way that most of us
|
||
|
|
have learned how to list things. Heading, sub-item, sub-item, sub-item, heading, sub-item, sub-item,
|
||
|
|
sub-item. That is not the structure of a YAML file though. So let's talk about YAML correctly.
|
||
|
|
Let's talk about what it actually is now that we know what it is not. There are, luckily,
|
||
|
|
you'll be happy to know two data structures in YAML. That's all there are. Now you can embed
|
||
|
|
those data structures within one another, but the building blocks that you have for YAML files,
|
||
|
|
there are only two of them. For this process, for this exercise, or when working with YAML,
|
||
|
|
you want to have two things available to you. One, you want YAML Lint. YAML Lint is, I think it's
|
||
|
|
a Python script, and that looks at a YAML file and tells you whether or not it is valid. But more
|
||
|
|
important than that, because you might be thinking, well, I can just throw my YAML file at my
|
||
|
|
application, and if my application crashes, then I'll know that it's not valid, right? Or my parser,
|
||
|
|
or whatever. YAML Lint gives you really good description of what is wrong. It tells you,
|
||
|
|
generally, it tells you exactly what's wrong with your YAML syntax. You can install it with a PIP
|
||
|
|
install YAML Lint, and that's a double L there in the middle YAML L-I-N-T YAML Lint. So install that,
|
||
|
|
you'll do yourself that favor, you'll thank yourself later. You'll also need a text editor.
|
||
|
|
That's how you make YAML, you type it into a text editor, so that's pretty simple as well. Now
|
||
|
|
there's another thing that I tend to use, which you're welcome to use yourself. It is called
|
||
|
|
YAML2JSON. This is a Python script that I wrote myself for my use. It's online. I'll link to it
|
||
|
|
in the show notes. You can use that one, or probably I imagine there must be half a dozen other
|
||
|
|
ones out there online that you could find. But I do like to use this because YAML, again,
|
||
|
|
intuitively, you look at it, and it may look to you like it's one thing. But seeing it with the
|
||
|
|
limiters or rather scope characters to define the scope of things sometimes changes the data structure
|
||
|
|
or makes it a little bit more apparent. Now that might just be me. It might just be what my
|
||
|
|
eyes prefer to see, so that may or may not be important to you. But I do find it useful to have
|
||
|
|
a YAML2JSON parser. JSON is a subset of YAML, technically speaking. I mean, I don't know that
|
||
|
|
the creators of JSON talk to the creators of YAML. I don't know what the overlap there is. But
|
||
|
|
when looking at the structure of data serialization, I guess people generally consider JSON a subset
|
||
|
|
of YAML. But it is kind of wildly different when you are looking at it. Visually, it's quite
|
||
|
|
different. But the two translate, or well, at least YAML translates to JSON more or less naturally.
|
||
|
|
And certainly with the Python, YAML library, it's literally just one method that you call or
|
||
|
|
function that you call. And then you have all your data in JSON. So you can kind of compare the two
|
||
|
|
and make sure that you're logic and the way that you're sort of thinking of your data is reflected
|
||
|
|
in the way that you have structured it in YAML. Okay, let's talk YAML. So first of all, YAML,
|
||
|
|
as I said, two data structures. There are sequences and there are mappings. And I guess I should
|
||
|
|
really say those as singular. So there is a sequence and there is a mapping. So let's talk about
|
||
|
|
each one. A sequence is exactly what it sounds like. It is a sequential list. It is a list of items
|
||
|
|
in a sequence. These items are indicated to YAML by a dash and a followed by a space. So it's a
|
||
|
|
dash space and then some string or some value, I should say, doesn't have to be a string. That's a
|
||
|
|
sequence. So in a YAML file, if you wanted to create a list, a sequence of things you could do,
|
||
|
|
for instance, let's do a emax list dot YAML. Well, now that we're typing, I have to mention that
|
||
|
|
the first file that the first line of a YAML file needs to be three dashes. That is the, what they
|
||
|
|
call a YAML document delimiter. When there are three dashes, when YAML sees three dashes all on one
|
||
|
|
line, it knows that this has started a new record, essentially. Okay, so we've got our three dashes
|
||
|
|
on its own line and then I'll hit return. And then we'll just do, like I say, a sequence. So we'll do
|
||
|
|
dash space, emperor, and then a new line, dash space, gen 2, new line, dash space, little blue.
|
||
|
|
And that's where we'll end it, sort of. Now YAML really, really likes to see a new line character
|
||
|
|
at the end of a record. So I'm going to not do that right now, just to kind of prove to you how
|
||
|
|
useful YAML Lint is. And then I'm going to run my new file through YAML Lint, that's YAML Lint.SpaceList.YAML.
|
||
|
|
And it gives me an error. It says no new line character at the end of the file, new line at
|
||
|
|
end of file. So there, it's telling me exactly what the problem is. And it's giving me the
|
||
|
|
opportunity to fix that. So, well, it's not giving me the opportunity. It's refusing to continue
|
||
|
|
until I fix it. So I'm going to open the file back up. I'm just going to hit return at the end
|
||
|
|
at the, on the very last line. And then I'll run it through YAML Lint again. And it says that it's
|
||
|
|
valid. So now I know that I've got good YAML. And that's the sequence. That's a list of items
|
||
|
|
in a YAML document. That's valid YAML. Looking at it, honestly, it is pretty obvious as to what
|
||
|
|
that is. We understand that that's a list. But if we wanted to see it in a different structure,
|
||
|
|
just to kind of really, really drive home this point, we can do a YAML to JSON list.YAML.
|
||
|
|
Remember, this is that little YAML to JSON converter that I wrote. But like I say, there's probably
|
||
|
|
a dozen others out there online. And I'll also link to this one. So on this, the output of this
|
||
|
|
script shows my YAML in JSON format. And it kind of confirms what I said. This is a list,
|
||
|
|
right? Well, as it happens in JSON, this looks exactly like, well, to my eyes, it looks exactly like
|
||
|
|
a Python list, square bracket, quote, inferr, quote, quote, comma, quote, gen2, quote, quote,
|
||
|
|
comma, quote, little space blue, quote, quote, quote, square bracket. There you go. It's a list.
|
||
|
|
Each item is a distinct element in this simple array. So if I were to take that data and pass it
|
||
|
|
to something like Python or Java or Lua and say, hey, treat this as an array and give me the, I
|
||
|
|
don't know, first element, then I would, I am pretty confident that I would get gen2 back. I said
|
||
|
|
the zero with element, I'm pretty confident I get emperor back and so on. And that's what we want.
|
||
|
|
So now, just to prove to you that a list is very specific in how it can be structured in YAML,
|
||
|
|
I'm going to, I'm going to second guess ourselves. And I'm going to go back into this list.YAML.
|
||
|
|
I've got my three dashes. That's good. We'll keep that. We know that's necessary.
|
||
|
|
And then now this is wrong. So be, be alert that this is incorrect. I'm going to do a dash
|
||
|
|
space penguins. So in my incorrect thinking, I'm pretending like this is my header penguins. It's
|
||
|
|
not. This isn't going to work. And then I'll go to the next line and indent emperor, gen2,
|
||
|
|
and little blue. So now I've got penguins on its own line. And then indent it. I've got a
|
||
|
|
dash-based emperor, dash-based gen2, dash-based little blue. I'm going to save that. We're going to
|
||
|
|
run that through YAML length. It says it's valid YAML. So I guess we're good to go, right? I could
|
||
|
|
end the demonstration here. Well, no, I can't. So I'm going to do YAML 2json list that YAML again.
|
||
|
|
And now the data structure looks really different. I've got the square brackets and I've got the
|
||
|
|
quotes. And inside the quotes, I've got penguins, space dash-based emperor, space dash-based
|
||
|
|
gen2, space dash-based little space blue. So now my array, my list, contains exactly one
|
||
|
|
item, which is penguins' emperor, gen2, little blue. That's the item that it contains, right?
|
||
|
|
So if I said, if I passed this to a parser of some sort or to a language of some sort and said,
|
||
|
|
hey, give me the first element of this array, it would tell me that there was an index error. There
|
||
|
|
is no first element. If I asked it the zero-eth element of this array, it would return the entire
|
||
|
|
string, penguins, dash, emperor, dash, gen2, dash, little blue. So these are not distinct elements
|
||
|
|
any longer. These, these, as far as YAML and now translated to JSON, as far as those two markup,
|
||
|
|
not markup languages, data, serializers, understand. This is all one element and that is not what we want.
|
||
|
|
So a sequence is exactly, it is exactly a single, I guess, column of items, delimited by a dash
|
||
|
|
space. You cannot just indent things, willy and nilly, as they say, in order to sort of suggest, as
|
||
|
|
your brain wants to do, is to suggest a parent-child relationship between those items. That's not what YAML,
|
||
|
|
that's not actually talking to YAML the way that you think it's talking to YAML. So that is one
|
||
|
|
kind of data element, a sequence. And that is, that's as simple as it gets. It's just a sequence of
|
||
|
|
items with a dash space in front of each item. Each item is on its own line and there is no
|
||
|
|
indentation happening here. So that seems a little bit too simple. Luckily, there's another kind
|
||
|
|
of data and it's called a mapping. So I'm going to emacsmap.yml. Let's do that. And we open with
|
||
|
|
three dashes. If you said that, then you have learned the first important stage of YAML. So that's
|
||
|
|
good. Three dashes on its own line. And then we'll do something like, I don't know, penguin,
|
||
|
|
and I'm just typing, just literally, just, I'm starting with p-e-n-g-u-i-n, and then colon,
|
||
|
|
emperor. And that's sort of, that's an element. Penguin, colon, emperor. That is a mapping
|
||
|
|
element, according to YAML. It has a key, which is penguin, and a value, which is emperor. Now,
|
||
|
|
what if I went to the second line? Well, actually, let's, let's stop there. So I'll go to the
|
||
|
|
second line just to get that new line character. Remember, I said that YAML really likes those new
|
||
|
|
line characters at the end. We'll run that through YAML int map.yml. And I get no errors. We'll run
|
||
|
|
it through my fancy little YAML to JSON conversion program. YAML to JSON map.yml. And I get a
|
||
|
|
JSON data structure back that is a brace with quote penguin, closed quote colon, and then quote
|
||
|
|
emperor, closed quote, closed brace. So to me, that looks more or less like a Python dictionary,
|
||
|
|
for instance. You've got your key and your value. It's a key in value pair. It's a pretty common
|
||
|
|
structure in, well, certainly configuration files. But lots, lots of different things. You use key
|
||
|
|
in value pairs, databases, spreadsheets, and so on. That's pretty common. And that's what that
|
||
|
|
gives you. Now, let's explore a subtlety of YAML, which is, are youably not necessarily, necessarily,
|
||
|
|
related to this, but it's something that I want to bring up. So you've got three dashes,
|
||
|
|
any penguin, colon, emperor, and then I'm going to do it in the next line, penguin, colon,
|
||
|
|
gen two, and then a blank one or a carriage return of whatever it's called a new line character
|
||
|
|
at the end of that string. Now, I'm going to pipe it through YAML int and it tells me that there's
|
||
|
|
an error duplication of key penguin is in mapping. So what that's saying is that there are two keys
|
||
|
|
in the same record. Now, if I pipe this through, interestingly, through my little YAML
|
||
|
|
to JSON parser or converter, it actually doesn't error out. It just gives me the most recent key
|
||
|
|
value pair. So I pipe that through and it says penguin, gen two. And I mean, that's not wrong. It's
|
||
|
|
just, it, my emperor penguin got eaten. So I'm going to go back into my map here. And remember,
|
||
|
|
I said those three dashes are really important to YAML and that they sort of delineate these YAML
|
||
|
|
documents as in YAML lingo. They're documents. I mean, they're in the same text file. So to you
|
||
|
|
and me, it probably feels like, well, that is one document. But YAML, you can separate it for YAML
|
||
|
|
with these three dashes so that you can have sort of two documents in one file. So I've got a
|
||
|
|
three dash penguin, colon, emperor, three dashes penguin, colon, gen two, and then my new line
|
||
|
|
character. Oh, and I think I actually have to close that with three dashes. Pretty darn sure
|
||
|
|
that I have to do that. Pipe that through YAML lint. I get no errors. And then I pipe it through my
|
||
|
|
little converter and I get penguin, emperor penguin, gen two, and then null. So that's just kind of
|
||
|
|
a point of order that you, that any, that any YAML document or if you want to think of them as
|
||
|
|
records, you, you cannot double up. You cannot validly double up on your keys. There, there must be
|
||
|
|
unique keys within each record or each document. Okay. So point is we've got this new mechanism
|
||
|
|
available to us now, which is a word followed by a colon and then a space and then another word or
|
||
|
|
another value. And, and this maps a value to a specific key. It's like assigning a variable
|
||
|
|
in a programming language very, very much like that. You're just saying, well, here's, here's a term
|
||
|
|
that I'll use broadly to represent whatever I need to represent at any given time. So we say penguin,
|
||
|
|
sometimes we're meaning emperor, sometimes we're meaning gen two, sometimes we're meeting a little
|
||
|
|
blue, whatever. And we can specify what kind of penguin by mapping that value, the specific value
|
||
|
|
to this sort of generic term that that we've chosen doesn't have to be penguin. But in this case,
|
||
|
|
it makes sense that it is penguin because that is what we're talking about. And certainly descriptive
|
||
|
|
keys tend to, it's very advantageous. So we'll, we'll keep it at penguin, colon, space, emperor.
|
||
|
|
And that's a mapping. That's all you need to know about mappings to be honest. That's, that's,
|
||
|
|
again, that's as complex as it gets. The, the, the, now you know, the two data types, the two
|
||
|
|
building blocks of yaml, you know, sequence, which is a bunch of different lines with a space in
|
||
|
|
a dash in front of each value, and you know a mapping, which is a key, colon, and then a value.
|
||
|
|
You're done practically. But of course, you're not really done because in fact, you can use,
|
||
|
|
these are building blocks. These things are, are building blocks, meaning that you can, you can
|
||
|
|
combine them in new and interesting ways. So let's say that you do want to list those penguins,
|
||
|
|
again. So you might have, for instance, in map.yaml, three dashes, and then we'll open it up with
|
||
|
|
just the word penguin, colon. But then instead of giving just one value, emperor, what if we fed
|
||
|
|
it, what if we entered a sequence instead? Let's try that. So we're going to do penguin, colon,
|
||
|
|
and then new line. So no, no value to this key. And then we're going to indense. I'm just going
|
||
|
|
to do two, two indents. You, you, you can do more or fewer actually indentations. But I think the,
|
||
|
|
the convention seems to be two spaces, although I don't know, maybe some Python people prefer like
|
||
|
|
four spaces or something. I'm not sure. But I'm going to do a space space. And then I'm going to
|
||
|
|
do my dash space to open up a sequence. And I'm going to type in the word emperor. And then I'm
|
||
|
|
going to go to the next line, which emacs automatically indense for me. So I'll just do a dash,
|
||
|
|
dash space, gen two, next line, dash space, little blue. So now if you can picture it, I've got
|
||
|
|
penguin, colon, and then an indented block that is a sequence. I'm, I'm terminating that with a
|
||
|
|
new line character. Of course, I'm going to run that through yaml lint. I get no errors. And then
|
||
|
|
I'm going to pass it through my JSON file just so that we can kind of see it in a different context.
|
||
|
|
And this is exactly what I'd hoped for. So now I have a JSON entity here to, again, to my eyes.
|
||
|
|
I would call this a dictionary from Python experience. So penguin, well, quote, penguin,
|
||
|
|
close, quote, colon. And then within this, within this, this bracketed section, or this race,
|
||
|
|
the curly brace section, I have embedded square bracket lists, emperor, comma, gen two,
|
||
|
|
comma, little blue. So I have a dictionary element, penguin, colon, some value. But the value is
|
||
|
|
a list of three different values, emperor, gen two, and little blue. And those are all distinct.
|
||
|
|
So I could identify with, again, some programming language of my choosing. I could identify the key
|
||
|
|
of penguin, and then zero in on which kind of penguin I wanted from, from this list of all possible
|
||
|
|
penguin types. And just to build on this example really quick, we could go back into this file and
|
||
|
|
and add another key. Like, so we have penguins. And then the sequence of penguins,
|
||
|
|
Imperage into a little blue. We could also do, for instance, I don't know, demon, colon,
|
||
|
|
space, space, dash, space. We could do BST. That's the name of the BST guy, right? BST.
|
||
|
|
And the new line, space, space, dash, space, imp, and then space, space, dash, space,
|
||
|
|
globretzu. And then in that with a new line, then we can do a YAML lens on that,
|
||
|
|
just to make sure that we're still valid. Yep, no problems there. And then run that through my
|
||
|
|
little converter. And sure enough, we've got curly braces, quote, penguin, closed quote, colon,
|
||
|
|
and then a list of all the penguins, Imperage into a little blue. And then a comma, and then a new
|
||
|
|
key, quote, demon, closed quote, colon. And then a list assigned to that value, or to that key,
|
||
|
|
rather, with BST imp and globretzu. So you can have different keys with a unique, with its own
|
||
|
|
little sequence embedded into each key, all within the same YAML document. Now, if we did that
|
||
|
|
with, for instance, penguins again, then we would have to make a new document, right? That would
|
||
|
|
be a different data set. But because the keys are unique, penguin, demon, will to be, or
|
||
|
|
I don't know what kind of animal a will to be stizz, or a new bovine cow. Are they cows,
|
||
|
|
bulls? I'm not sure. Whatever. Shouldn't have chosen animals for my example set here.
|
||
|
|
The point, though, is that a unique key, as long as you get unique keys, you can fill up your YAML
|
||
|
|
document with whatever you need. It's just, you can't have the same key appearing more than once.
|
||
|
|
Okay, so now let's talk about embedding, I don't know, maps into a sequence. Let's try that,
|
||
|
|
should we? Yeah, why not? Let's try it emacsmap.yaml. This time, we're going to, let's make our key
|
||
|
|
just like animal. And then we'll do a colon's, colon space, colon new line. And then on the next,
|
||
|
|
wait, what am I doing? I'm embedding maps into a sequence. Okay, got it. So I've got an animal
|
||
|
|
colon, space, space, dash, space, penguin, colon, emperor, penguin, colon, gen two, penguin,
|
||
|
|
colon, little blue. Close it with a new line character of course, and then run it through. So
|
||
|
|
before I hit return here on this YAML, do you think that's going to work? Or do you think
|
||
|
|
that's going to fail? So I had animal as my key. And then as the value, I had a sequence. And in
|
||
|
|
each sequence item I had, the key is penguin, followed by the type of penguin. There's no wrong
|
||
|
|
answer. Well, there is a wrong answer, but you shouldn't fear. There's no penalty for having
|
||
|
|
the wrong answer. Just kind of think about it for a moment. It is a bit tricky. Well, it turns out,
|
||
|
|
if you hit YAML lent, it does not fail. It succeeds. That's some valid stuff there. And again,
|
||
|
|
I feel like, I don't know, from my brain, YAML doesn't necessarily make that super obvious as to
|
||
|
|
why. I mean, it's kind of obvious now, but when first sort of grappling with YAML, it was very
|
||
|
|
confusing to me. And it sort of has everything to do with scope. So when you're separating the
|
||
|
|
different keys with those three dashes, you're scoping out where that key appears. And you're
|
||
|
|
saying, well, this key is valid here. And then I'm going to put three dashes. And now I can have
|
||
|
|
that key again with a different value, and nobody cares, because we all understand those three
|
||
|
|
dashes mean it. We're in a different document. Well, it's kind of similar in this setup where we have
|
||
|
|
animal as the key, but then we embed a list into that value. So the value of this key is its own
|
||
|
|
list. Well, that's one thing, but to make it even more complex within that list, we've embedded
|
||
|
|
mappings, which have their own little scope, because they're each in their own little items.
|
||
|
|
So you've got map, you've got penguin emperor enclosed in curly braces, you've got penguin
|
||
|
|
gintu enclosed by curly braces. You've got little blue penguin little blue by curly braces.
|
||
|
|
So the exact JSON of that, and you can try this on your own if you need to see it to kind of wrap
|
||
|
|
your mind around it, but it's curly brace, quote, animal, quote, quote, colon space square bracket
|
||
|
|
curly brace, quote, penguin, close quote, quote, colon, quote, emperor, quote, quote,
|
||
|
|
close curly brace, comma, and so on. Until you get to the very end, and then you close your
|
||
|
|
final curly brace, which would be one after little blue, you close your square bracket, which kind
|
||
|
|
of closes the value out for the animal, and then finally, close the whole thing, the whole document,
|
||
|
|
which is the curly brace. So I don't know how well that comes across through audio, but you can
|
||
|
|
try it on your own, and you can kind of see, you'll see why, oh yeah, okay, I get why those
|
||
|
|
keys can be distinct from one another in this setup while they couldn't be distinct in some other
|
||
|
|
setup. And that's this underscores, even if you don't sort of internalize what I just said,
|
||
|
|
this does underscore the flexibility of those simple two little building blocks that you can embed
|
||
|
|
in one another, and it completely changes sort of the structure or the scope of the data, the way
|
||
|
|
that the data relates to each other. In one format, you're only allowed to have one kind of penguin,
|
||
|
|
and then you embed it into a mapping and into a sequence, and suddenly you can have all the
|
||
|
|
penguins you want. It can be confusing to look at when you're looking at, for instance, an
|
||
|
|
Ansible Playbook, and you're looking at this thing, and you see name, name, name, name, you know,
|
||
|
|
you see all these these repeating keys, and you just think, why are these here? Well, they're
|
||
|
|
there because they they've been scoped into a different data element, and so they're not
|
||
|
|
interfering with one another. And that's a very important thing to realize when looking, especially
|
||
|
|
I find at Ansible Playbooks, because you do see a lot of what appears to be repetition in an
|
||
|
|
Ansible Playbook. And I think mentioning Ansible Playbooks is important in a way, because it also
|
||
|
|
kind of it exposes the fact that they're there really isn't a right or wrong way to structure
|
||
|
|
your valid YAML data. So there is a wrong way to structure YAML, and YAML Lint will tell you if
|
||
|
|
you've done that. But as long as it is valid, there isn't really necessary, I mean, a data scientist
|
||
|
|
or rather information scientists might very much argue with me on that on that concept, and I would
|
||
|
|
gladly concede, but in general, like for your own purposes, I mean, there might be optimal ways,
|
||
|
|
right? But as long as it's valid YAML, you can structure your data practically any way that you
|
||
|
|
want as long as you have anticipated that in your parser, or as long as you account for it in your
|
||
|
|
parser. So again, going back to Ansible, which is what made me think of this, there are there are
|
||
|
|
certain structures in an Ansible Play that you look at it, and you just think, well, wait a minute,
|
||
|
|
now that is neither a mapping nor a sequence. So why is that valid? And it kind of goes back to
|
||
|
|
that wrong example that I did where we were mapping list items to another list item, to a sequence,
|
||
|
|
to a sequence, which is not possible, right? You have to, you can map a key to a sequence,
|
||
|
|
or a sequence to a key, I guess, but you can't just indent things whenever you want in hopes of
|
||
|
|
there being some kind of suggestion of inheritance. And the reason for that is, if the reason
|
||
|
|
that Ansible tends to sometimes allow that sort of thing is because the parser knows that that's
|
||
|
|
what it's going to get. So in other words, if I go back to the bad YAML example here, which I think
|
||
|
|
I call bad.YAML, if I go back to that and and type it back through my converter, remember we got
|
||
|
|
this big huge chunk of a list that is one item, penguins, dash, emperor, dash, gentus, dash,
|
||
|
|
little blue. Well, as long as my parser knows that I am going to feed it and array that contains one
|
||
|
|
item, but that that one item contains four strings separated by space, dash, space, then there's
|
||
|
|
no problem. So I'm saying this because I want you to know that there are no YAML police. They're
|
||
|
|
not going to come knocking on your door, making sure that your YAML sort of like makes the most
|
||
|
|
sense and is the most optimal and logical order of data that you could have possibly have written.
|
||
|
|
If you're writing YAML for your own data from scratch, you're having to think of how you're
|
||
|
|
structuring it yourself, then go for it. Do whatever makes sense for you and then have your parser
|
||
|
|
process that accordingly. If you have to adapt it later or change it, you modify it later
|
||
|
|
because you realized, well, that really shouldn't just be a list. There should be a parent item
|
||
|
|
there. And of course, you know because you know the two different types of YAML building blocked
|
||
|
|
data types. You'd know that, well, okay, if I want a heading item there, I need to make a mapping.
|
||
|
|
And in that mapping, there will be a list. So I need to indent that list under this mapping
|
||
|
|
that has a key and no value or rather a key and then the value is the sequence that is then
|
||
|
|
indented. That sort of thing. As long as you're making valid YAML choices, then your YAML is valid
|
||
|
|
and you will be able to parse it with lots and lots of different pre-written YAML libraries
|
||
|
|
without fear. As long as your parser knows how to then interpret what you're giving it, you are
|
||
|
|
good as gold. So that's it. That's YAML. I hope that really does help. YAML, like I say, can be
|
||
|
|
deceptively simple when you look at it. Then you start thinking about it and figuring out how to
|
||
|
|
recreate it and you realize you have no idea what the structure is. It looks like you do, but you
|
||
|
|
don't. No worries. Now you really do. Sequence, mapping, and several combinations of those. That's
|
||
|
|
all you need to know. Good luck. Have fun. Thanks for listening.
|
||
|
|
You've been listening to Heka Public Radio at HekaPublicRadio.org. We are a community podcast
|
||
|
|
network that releases shows every weekday Monday through Friday. Today's show, like all our shows,
|
||
|
|
was contributed by an HPR listener like yourself. If you ever thought of recording a podcast,
|
||
|
|
then click on our contributing to find out how easy it really is. Heka Public Radio was founded
|
||
|
|
by the digital dog pound and the infonomicum computer club and is part of the binary revolution
|
||
|
|
at binrev.com. If you have comments on today's show, please email the host directly, leave a comment
|
||
|
|
on the website or record a follow-up episode yourself. Unless otherwise stated, today's show is
|
||
|
|
released on the Creative Commons Attribution ShareLight 3.0 license.
|