745 lines
68 KiB
Plaintext
745 lines
68 KiB
Plaintext
|
|
Episode: 1291
|
||
|
|
Title: HPR1291: Parsing an ISO8601 formatted duration field with Perl
|
||
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1291/hpr1291.mp3
|
||
|
|
Transcribed: 2025-10-17 23:06:52
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
Hi everybody, my name is Ken Thalam and I'm joined by Dave Morris and we're going to
|
||
|
|
do a joint episode tonight. How are you Dave? I can, I'm good, thank you. It's traditional
|
||
|
|
that we do some playful banter, so let's do playful banter. Okay, enough of that. That
|
||
|
|
was it. It's just hot and I have all the windows open so if you're here. Yeah, yes, pretty warm
|
||
|
|
here. Even for Edinburgh, it's, it's pretty amazing. Okay, we want to tackle the most boring
|
||
|
|
of topics. Well, to be honest, we're in the middle of the winter drought or the summer
|
||
|
|
drought or winter drought here in Australia. Here in HPR, there are no shells and I want
|
||
|
|
to ask Dave a load of questions about a pearl script that he has written for me and
|
||
|
|
Dave, you've been so kind as to a writer script and be come on to talk to me about it. Okay,
|
||
|
|
so parse 8601. First of all, let's find out what 8601 is. It's an ISO 8601 and it is
|
||
|
|
a date format and if you go to xkcd forward slash 1179 link in the show notes, you will
|
||
|
|
find a humorous cartoon depicting that all dates should be 8601. And basically what
|
||
|
|
it is is year-two-digit month, four-digit year-two-digit month-two-digit day-of-month
|
||
|
|
day-matter-t hour-two-digit-hour colon, two-digit-minus colon, two-digit-second. More or less,
|
||
|
|
there are summer abbreviations that you can do and there will also be a link in the show
|
||
|
|
notes to a two-day specification for that and also for the Wikipedia page on that. So that's
|
||
|
|
the first thing. Any comments on that, Dave? No, no, no, the only comment I had was I've already
|
||
|
|
started on some show notes, so the definition of the two definitions of the spec we're
|
||
|
|
talking about tonight are in that. That's cool. Boss, that is the date format which everybody
|
||
|
|
should be using and ironically enough, as I look in your paroscrypt, Dave, more, pointing
|
||
|
|
at you, you are not using ISO 8601. I know, I wish to submit a patch. I'm actually, okay,
|
||
|
|
I'll give you my excuses now. The way that I write post scripts, I use a Vim plug-in, which is
|
||
|
|
whose name I've forgotten just for a moment, it's called some pull thingy. And it has the capability
|
||
|
|
of pointing at the field and going click or talking in a bunch of funny characters and it will
|
||
|
|
then update the date. So I keep the created date and the revision date updated by that means,
|
||
|
|
and it does a more human legible but less standard format in there. So maybe I should nag the
|
||
|
|
the creator of this thing, say, how about we have a different date form? Actually, I'm
|
||
|
|
to think of it, there may well be a way of fiddling that under the configuration file. So,
|
||
|
|
not look to that yet. Yes, the point I can ask, and this highlights exactly the issue that I have
|
||
|
|
with any other date format, other than 8601, is that we have 0407 2013 is when I was created on.
|
||
|
|
And it's somebody from the states looking at that, we'll go, okay, that's the 7th of March,
|
||
|
|
March, no April. No, April, yes. Yeah. So if you put the field away round for a start,
|
||
|
|
everybody knows all this is a four-digit year, therefore it's ISO, it's 601.
|
||
|
|
Oh, I do understand the, whenever you're doing anything machine readable, then definitely 8601
|
||
|
|
is the way to go. Also, maybe a little bit of debate when it comes to human readable.
|
||
|
|
No, no, if human readable, I dismiss that argument. If people start using that format,
|
||
|
|
it's completely within a week or two, you're completely converted.
|
||
|
|
It's, and the right sense, it's the year, month, day. Okay, as I read it, I have the days on the
|
||
|
|
bottom right hand corner of my PC, and that's the form up there. It gets very, very easy to read.
|
||
|
|
Plus, it also means if you're saving files or something, you can sort them by days without having
|
||
|
|
that. Oh, absolutely, yes. Yes, yes. Yes, yes. Yes, I have certainly done this for many years.
|
||
|
|
Yes. Okay, but that is besides the point. There is a, the such a thing, that is the format for the
|
||
|
|
days, but there's such a thing as a duration, and that is also covered by the same specification,
|
||
|
|
and that duration follows slightly different standard. It kind of works in the same way,
|
||
|
|
but it is missing. It starts, first of all, with the letter P, and I believe it originally
|
||
|
|
started meaning period, but now it's just a start identifier. Followed by zero or more years,
|
||
|
|
zero or more months, zero or more weeks, zero or more days, followed by the T for time designator
|
||
|
|
if it's there, followed by zero more hours, zero more minutes, zero more seconds. Anything you want
|
||
|
|
to say about that? The week's thing I'm not sure about, where did you get that from? That's
|
||
|
|
the IS from the Wikipedia page. Yeah, yeah. I saw it in the Wikipedia reference, but not in what I
|
||
|
|
took to be the original specification of the thing. So whether the original has been embellished a bit
|
||
|
|
to include weeks, I don't know. Do you know anything more about it? I think what has happened
|
||
|
|
here is that the ISL standard organization, being an international standard body charge
|
||
|
|
hardcore money for their specs. So therefore, these specs in order to use them, you need to
|
||
|
|
purchase them. And they're very, very, very expensive. And what they do, smart as they are, is they,
|
||
|
|
if you buy one, then they will reference four others in this. And then the one that you actually
|
||
|
|
looking for might be four deep down. And we're not talking cheap to you know, there are six or
|
||
|
|
700 euros for a, for one of these specs. And what you were looking at was the W3C consortium
|
||
|
|
version, which is publicly available. And I think for all intents and purposes, that's
|
||
|
|
closest thing we need to base it on because I think they also cover the XML specification as well.
|
||
|
|
So everything goes back to that and then they go back to the ISO. It's excellent. But that
|
||
|
|
particular one didn't seem to include a weak spec that I could see anyway. Personally,
|
||
|
|
I don't like the idea of weeks anyway because it's real. I don't like the idea of months either
|
||
|
|
to be brutally honest with you. Well, a week is a more clear definition than a month.
|
||
|
|
It would be yes. A week would be more clear as well. There's a lot of wobbliness in this,
|
||
|
|
this whole business of date specs. I mean, duration specs because you know, this is a month long,
|
||
|
|
but which month? Yeah, exactly. To the year long, but which year? No, it's a leap year or not.
|
||
|
|
And it's all rather messy. So what I've seen myself in an application for this, which has been,
|
||
|
|
in my experience, people tend to use just multiples of days. So if they want to say a year,
|
||
|
|
they'll say 365 days. So many hours, so many minutes, so many seconds to avoid that
|
||
|
|
disambiguity, is that a word? Ambiguity, I would imagine. Yes. Yes. Yes. Turn that upside down.
|
||
|
|
And you avoid the ambiguity of that. They explicitly do it. However, if you're writing
|
||
|
|
specifications yourself at home and want to make sure that people are using this format,
|
||
|
|
duration format, you should also highlight in your documentation that you are forbidding the use
|
||
|
|
of month and possibly years over, possibly weeks as well. Yeah. Yeah. Okay. Okay. Yeah. Days, I think
|
||
|
|
I've put some comments in the script that I've written to that general effect already. So
|
||
|
|
just for the future readers to ponder. Yes. If anybody's interested enough to read it. Good.
|
||
|
|
Anyway, so you have a plural script. Tell us about plural, what it is. Well, Paul's a wonderful
|
||
|
|
language, which the origin of which I can't really remember. I'm afraid, but it goes back quite a
|
||
|
|
long way. And it is a language which was brought together from the ideas that already existed in
|
||
|
|
the Unix operating system. So the author looked at things like grab and awk and said and thought,
|
||
|
|
oh, it would be nice if you have the language that incorporated them all. And he created
|
||
|
|
Paul. And there it was. So it's a strange, a very warm yes indeed. It was a linguist, which is
|
||
|
|
an interesting position to start from. It has its, guess you call them strong advantages.
|
||
|
|
And others would call them idiosyncrasies, which have come about as a consequence of
|
||
|
|
the origins from a linguist. But it does explain a lot actually. And it kind of helps you
|
||
|
|
to remember it as you're learning. If you know that to start off, if you don't, it's
|
||
|
|
you'll be getting ahead against the law. Well, it's, yes. Anyway, that's that's Perlina in a very
|
||
|
|
small nutshell, which is actually a book. Why? That's probably Randall. Randall sure. This is
|
||
|
|
got his finger in there somewhere, I'm not sure. But yes. So it's the fact that I would like to
|
||
|
|
think that the fact that the the O'Reilly books use animal symbols on the front, animal logos
|
||
|
|
on the front. And the pearl one tends to be a camel. And a camel is traditionally an animal
|
||
|
|
that was created by committee. It gives you some indication of what you're in for when it comes
|
||
|
|
to Perl. However, Perl is a wonderful language and it's very, very powerful. Many people say,
|
||
|
|
oh, I hate Perl because every Perl script I ever look at is a mess and I can't make
|
||
|
|
head or toe of it. And but the same thing could apply to any language. I've seen
|
||
|
|
the many. Okay, any not, not any, but many. I've seen some pretty dire C programs and some appalling
|
||
|
|
Pascal programs and et cetera, et cetera. Python, perhaps, there's, they're more
|
||
|
|
trained. They're required. Yeah, yeah. But you know, it's my point really is that it's the writer
|
||
|
|
who you have to complain to rather than the language. Yes. And this is actually why we're having
|
||
|
|
this discussion. But before you go, before we go further, I actually want to recommend a,
|
||
|
|
if you're lobbed with some Perl scripts or have to do some Perl work yourself and want to get
|
||
|
|
off the ground pretty quick, I found Sam's teacher self Perl in 24 hours while third edition by
|
||
|
|
Clinton Pierce to be quite a very easy read just covers the basics and goes, doesn't go too deep
|
||
|
|
or try to be too smarmy and smart as certain books do. It's very down-to-earth and
|
||
|
|
nice examples, but that's a by-the-by. Good, good. But another hint perhaps is if you are
|
||
|
|
able to and you're allowed to, feeding the script through the Perl tidy utility is a damn good
|
||
|
|
thing to do because it takes it and formats it in a standardized style and makes it look a lot
|
||
|
|
prettier and more readable. I always do with mine. Okay, I fear greatly what would happen
|
||
|
|
what was that? We have somebody, somebody just joined the channel.
|
||
|
|
I'm just going to mute them if that's okay, sorry plumber user, but you're making
|
||
|
|
recording a show at the moment. So yes, but it's yeah, go on. You're going to deny my
|
||
|
|
recommendation of using the entirety, perhaps? No, no, not at all. I just very worried about what
|
||
|
|
would happen if I ran through some of my scripts. You see, I labored under the illusion that my
|
||
|
|
my Perl food was improving, but then I asked Dave here for a problem with some regular
|
||
|
|
expressions to do this ISO 8601 duration thing and then I realized how far from that goal I have
|
||
|
|
drifted. So I want to walk through your script and don't worry, I know everybody, this is relatively
|
||
|
|
short scripts, it won't be that painful. And hey, if it is that painful, the reason you're listening
|
||
|
|
to this show is because you didn't sit down on the mic and record the show and send it in. Yeah,
|
||
|
|
so don't complain to us. Thank you very much. Anyway, Dave, your script starts with hash
|
||
|
|
exclamation mark slash user bin Perl. I think we get that, bash scripts, the first line defines
|
||
|
|
the bash script. Now, then what you have is 28 lines of comments, which are bullet. The file,
|
||
|
|
what is called the usage, what is called the description options, some notes about just
|
||
|
|
author and copyright and stuff. So tell me, do you start off with that first or do you,
|
||
|
|
where does that come? Is that the last thing you do or the first thing you do?
|
||
|
|
Firstly, first thing, because I use this Perl plugin, which is completely forgotten
|
||
|
|
number three. Oh, Perl support, it's called. I did talk about my episode about the in plug-ins.
|
||
|
|
It's a simple matter to simply open up, open a window and say,
|
||
|
|
bung the standard template in there and for a program. And you get a script template like this,
|
||
|
|
comment template like this, I mean, and another thing is if you wish, because you can modify
|
||
|
|
the template to your own desires. Yeah, okay, yeah, perfect. And you, yeah, but you start off
|
||
|
|
them before how knowing what you're going to code and not just sit down and try and hack something
|
||
|
|
like I do. Well, okay, the real answer is at the point, point before I actually sit down with
|
||
|
|
an editor, I would probably try writing some stuff on the command line, because Perl can be
|
||
|
|
involved from the command line. If you type at the command line, Perl's based minus E and then
|
||
|
|
open quote, single quote, preferably, the stuff you type between the single quotes is a little
|
||
|
|
Perl script. So if you're playing around with experimenting with the regular expressions, which
|
||
|
|
are the nastiest things to prepare, it's often a brilliant thing to do just to try out your ideas
|
||
|
|
through that. That means, you know, with me, I get you. So, for example, I did, when I was
|
||
|
|
designing the regular expression here, I did that using the Perl minus E function and then simply
|
||
|
|
put an example of a duration time spec into a variable and through it at the regular expression
|
||
|
|
to see what happened. So it was a fair bit of 10, 15 minutes of playing around like that before,
|
||
|
|
before I actually resorted to editing something. Okay, so then everything else is just a wrapper
|
||
|
|
around the regular expression that you've written. Effectively, yes, yes. I just heed the regular
|
||
|
|
expression into the file and then start structuring some Perl around it and played with it from there.
|
||
|
|
Now, we're as far as line, oops, line 30, which is use 50101. Does that mean you must use that
|
||
|
|
version of Perl 5010? That means that one will, that one will be greater. Okay. Yeah, that one will
|
||
|
|
be higher. So then the use strict and use warnings, I think those are related to stopping you
|
||
|
|
do shortcuts. Yes, you have to declare variables, strict means declare or variables before you use them.
|
||
|
|
Perl is very, very easy going about these things. I hate that personally, I much sooner declare
|
||
|
|
everything, be rigorous about it. And the warnings just, just enables the warning level.
|
||
|
|
Yeah, sorry. That's part of the standard template that I just used to create, create
|
||
|
|
Perl scripts. That's what it is. I suspected that as obviously it's a pure and wordable location.
|
||
|
|
I'm surprised you don't use dumper as all are, is your Perl food way beyond the user?
|
||
|
|
Well, I wouldn't, I don't want to load modules into a script that I'm not going to use.
|
||
|
|
If I was going to use it, yes, but I mean, I could add it as part of the template and comment it out,
|
||
|
|
but I don't really want the script to be loading stuff that it doesn't need.
|
||
|
|
If a plan, if a plan to use it for debugging, then yes, but not otherwise.
|
||
|
|
Now, we need to explain to people here, there's a whole goal of additional modules that you can
|
||
|
|
load into Perl to basically do everything you want. And there's a site out there called CPAN,
|
||
|
|
which contains all these modules. So rather than reinventing the wheel, I think the general
|
||
|
|
rule of thumb is you should, you should not reinvent the wheel. You should go to CPAN and get
|
||
|
|
your stuff, bring it in and reuse that. Did I get that operator?
|
||
|
|
You did indeed, yes, it's one of the great strengths of Perl. There is a huge archive of
|
||
|
|
libraries modules that you can include in a script. And I would be extremely surprised if you
|
||
|
|
couldn't find anything that you need already done out there. Yes, that's where you declare them
|
||
|
|
at the beginning of your script. So we're at line 34, so you're going to say my dollar duration,
|
||
|
|
and the dollar duration, the dollar stands for scalar. Scalar variable, yes, is like a,
|
||
|
|
and he, Larry Wall picked the dollar because it looked like an S, so you would know it was a
|
||
|
|
scalar. Did you know that? Yes, and he picked an S for an array because it has, it looks like an
|
||
|
|
S, an A sign for array, and he picked a percent for a hash because it looks a bit like a H.
|
||
|
|
I didn't know all of those. No, no, no. And that's usually as far as I get into Perl books,
|
||
|
|
because there's also another excellent cartoon, you know, like that I've seen for webcomics,
|
||
|
|
where it goes. Here are two ducks. Add one duck is three ducks. And then using the same logic,
|
||
|
|
logarithmic for n over x, y, z, you know, really complicated algorithm. The note underneath is,
|
||
|
|
this is, this is how most computer books are written. The, you're really simple first two paragraphs,
|
||
|
|
and then the dumps dies really deeply in, which is why you sound as teachers at Perl,
|
||
|
|
it's actually quite good. But only I digress. It's not part of, but I'm not called it.
|
||
|
|
It's true. What you said is true, though. Okay, so you just have some other stuff. My sign,
|
||
|
|
which is a plus or minus, or which, yeah, I guess you're going to do that. Now, I want to go
|
||
|
|
onto your clearing an array here of labels. Yep. And you use in this QW thing, which I have not used
|
||
|
|
ever before, believe it or not, seen it used. So years, months, days, hours, minutes, seconds,
|
||
|
|
and correct me from wrong. What the QW does is puts quotes, comma, quotes, comma, quotes, comma,
|
||
|
|
quotes, comma, comma. Why have I not used that before? Am I completely thick?
|
||
|
|
It's a, it's a wonderful shortcut. It's definitely something to be, to be used. I'd recommend
|
||
|
|
you do so. And, but you see, I think what happens with me is I start off coding without any,
|
||
|
|
just rambling, you know, pick a piece of code here or hack it there, or I need another variable.
|
||
|
|
I'll put it up, quotes, comma, quotes now. And then I need another one. I'll pick it up and just
|
||
|
|
put it up. You know, it starts off as being one. And then it expands. Yeah, I know. What you see
|
||
|
|
with my coding is I'm incredibly finicky and the control freak when it comes to this sort of stuff.
|
||
|
|
So I have to keep going over it and tidy it up and make it prettier. So you've seen the result
|
||
|
|
of much purification here. Oh, so it's worth it though, because when you come back and look at it
|
||
|
|
later on, it's clearer and cleaner. And then you can probably hack lumps out of it and stick it
|
||
|
|
in another script easily. And there you are. You've got the result of your previous tidying already
|
||
|
|
appear to it. Yeah. And then actually this is, this is specifically really why I wanted to do
|
||
|
|
this show. It's not about necessarily about pearl. It's about making pearl, you know, essentially this
|
||
|
|
is a, a beautiful script. I spend quite a lot of the last few weeks looking at not beautiful
|
||
|
|
scripts because I've written myself. And then Dave sent this thing over to me and I nearly was
|
||
|
|
I was I was very impressed and very happy to get it because I've answered my question when I was
|
||
|
|
actually nearly crying. We got the how little time I took you to write it and how pretty and
|
||
|
|
useful it was at the end. Yeah, yeah, well, there you go. Let's see. We'll move on. So what
|
||
|
|
what you're doing is you're defining the labels and you're defining the fields which is going to
|
||
|
|
be interesting in a minute. And something new I see in this version is you're defining a hash
|
||
|
|
of ISO duration. It's it's not for any great nothing very important. It's merely me messing around
|
||
|
|
and showing how it might be nice to stick all this into a hash with labels on it. And if you look
|
||
|
|
a bit later on there's an example of how it's how it's done. So I was there they're just I was
|
||
|
|
going to write some notes around this and say here's how you do this if you ever need to do it.
|
||
|
|
I prefer using hashes over over I will we'll come to we'll come to this. I just need to remember
|
||
|
|
to to talk about this. Okay, so then we go down line 30 skipping over the commons which are
|
||
|
|
really useful. We've got 35 and 36 now you're going to do two regular expressions here. And we're
|
||
|
|
specifically even though the reason for this script is that I was having trouble with this. Well
|
||
|
|
I consider a fairly complex regular expression. I don't know if you degree or not. It's moderately so
|
||
|
|
taken while he's down. Well you know you're if you if you delve into regular expression you'll find
|
||
|
|
some some that will just completely blow your head off. Yes you will. So this is this once you get
|
||
|
|
to it. It's actually not that hard to do. Okay, so correct me if I'm wrong the first one you're
|
||
|
|
checking for an integer which is slash d plus. So that is I need to just with zero or more one or
|
||
|
|
more. There has to be one or more one of that slash d means digit and the plus that follows it
|
||
|
|
means one or more. Okay hold on to that talk now because later on I need to check for a digit
|
||
|
|
with the years in front of it. Okay, so and you're looking for a fraction which you define as
|
||
|
|
any square bracket zero dash nine so square bracket which means any number zero to nine. Why not
|
||
|
|
use the slash d then there? No particular reason actually. No particular reason why not. I'm not I
|
||
|
|
think that I know what it was. I was actually that was that was me just being being a bit silly
|
||
|
|
there because I was actually going to include the decimal point within the within square brackets.
|
||
|
|
So not hyphen nine will stop would mean it's quite happy to accept the digits not to nine in
|
||
|
|
that full stop. And then I thought oh no actually that's not very good because we need there to be
|
||
|
|
only one decimal point where that would have allowed any number of them. So so I changed it but
|
||
|
|
I didn't yeah that that should be tidy. That's Mr. a tidying pass. One is only human of course.
|
||
|
|
Okay so you guys what we have now in mind because I've cracked us from people with
|
||
|
|
men's profile is slash D plus so meaning one or more digits. And then we have parentheses open
|
||
|
|
parentheses question mark colon or escape. So like backslash period or full stop backslash D plus
|
||
|
|
close parentheses question mark. Okay so digits plus and then in the in between the two brackets
|
||
|
|
is take that store that as a variable later yes. In between the parentheses use that as a variable.
|
||
|
|
No no no. Here's the issue here when you put parentheses around an regular expression or a part
|
||
|
|
of a regular expression what that means is you want to capture the stuff that matches that bit
|
||
|
|
of the regular expression. Yes. So something something when the expression is fired up we'll grab
|
||
|
|
piece of the text that's fed to it. But there are times when you want to enclose things and then
|
||
|
|
apply some function to them and you don't want your brackets to be capturing so they're called.
|
||
|
|
Oh yes. This this expression open parentheses question mark colon followed by stuff and the
|
||
|
|
close parentheses means I want you to bracket these so that they're a unit but not use the capturing
|
||
|
|
process. Yeah. And the reason for that is because in this particular expression we want to say
|
||
|
|
we're looking for a digit one or more digits followed by a full stop followed by one or more
|
||
|
|
digits or just a straight digit. I mean the decimal here could either be something with a decimal
|
||
|
|
point and some following digits or not. So we want to put a put the the non capturing parentheses
|
||
|
|
around the dot other digits and the question mark after it means that it's option. So it applies
|
||
|
|
to that whole bracketed expression. So so the reason you put it in brackets in the first place
|
||
|
|
right assuming that is removing this capturing thing which was what I was referring to put
|
||
|
|
them into a variable. So it says then the whole thing is slash D plus so I mean one digit plus
|
||
|
|
so that'd be the before the fraction part. And then they then it just says these three characters
|
||
|
|
open brackets question mark colon is just to say this is evaluate this is one unit.
|
||
|
|
Yes exactly. And then this escape dot means don't use the dot as a this or more characters
|
||
|
|
which is it as a regular dot followed by a slash D which is digits where there has to be one.
|
||
|
|
Close the bracket. Close the question. Close the brackets meaning that evaluation. So what you're
|
||
|
|
saying in there is a dot with one or more digits would be required if I didn't have the final
|
||
|
|
question mark at the end. Yes. Yes. Okay. It allows that to be optional. Now people driving along
|
||
|
|
on the law more perhaps brain has just blown up because because of this we strongly advise that
|
||
|
|
you follow along with this in fact we should put this in the as the title of the show.
|
||
|
|
Well see now what I've done in the show notes that I've started drafting right yeah but
|
||
|
|
well two things two things to say here. Number one is the script itself is available for
|
||
|
|
for download it will it's up on Gatorius and the URL of it is in the in the show notes right so
|
||
|
|
you can grab your own copy of it when you sit and gaze at it while you're listening to the show
|
||
|
|
if you want to follow this through. The other thing I've done is in the show notes I've actually
|
||
|
|
started on a little tutorial about how to how to knit a regular expression or what I do to knit
|
||
|
|
a regular expression anyway. So I'm trying to sort of explain step by step how I arrived at the
|
||
|
|
expression that's going to follow. How did I do it? Well a simple way to approach this sort of
|
||
|
|
stuff I always find is don't get too bogged down in the detail right we've got a thing here where
|
||
|
|
you've got to have a P starting expression then it's followed by one or more digits and a Y
|
||
|
|
but maybe not you don't have to put that in if all you want to do is express the thing in days
|
||
|
|
then you can put P number D and that will express the number of days you can emit that bit entirely
|
||
|
|
and just put the T portion and then number of hours or minutes or second so all of these things
|
||
|
|
are optional and as you start to think of all the possible ways of expressing that your brain
|
||
|
|
explodes so yes the way I did it was simply to write a simple regular expression that says
|
||
|
|
taking the an example of one of these duration things with all of the fields filled in how do you
|
||
|
|
write an expression to match it so my expression just went P some numbers Y P some number
|
||
|
|
followed by some numbers M etc yeah it's hard to express in words it's why I've tried to write
|
||
|
|
some notes about it and then I went over that okay well from there I need to capture bits of it
|
||
|
|
because as it was it wasn't being captured so put some brackets around the bits that want captured
|
||
|
|
and then some of these bits are optional so put some of these special non-capturing brackets
|
||
|
|
around that lot and put a question mark on the end to make them optional and and so on and so
|
||
|
|
forth so gradually it was built up layer by layer you with me I am completely with you so say we
|
||
|
|
were to do this with something simpler like an email address and you would first of all start with
|
||
|
|
writing jewel as example.com try and capture that and then you might approach that say it's a
|
||
|
|
string of letters followed by an ad followed by a string of letters but hang on no it's more than
|
||
|
|
letters so stick some numbers in there as well in the in the in the brackets that you're using to
|
||
|
|
match it say and then you say oh no hang on you can also have hyphens in there so bang them in as
|
||
|
|
well oh you can also have dots in there put them in can you have a plus is that valid you're not
|
||
|
|
yes it is valid but by some people's definitions of mail addresses and so on put what was that
|
||
|
|
if you can't use that a lot yes yes it's valid it's valid but a lot of miles servers don't
|
||
|
|
don't accept it it used to be a standard I used to use it a lot because it's a great way of
|
||
|
|
putting variance on your address it so if you sign up to a mailing list you put plus
|
||
|
|
something rather on the end of your your normal address and then you know it's come from the
|
||
|
|
mailing list and you can filter it easily on the basis of that but some mail service Microsoft
|
||
|
|
particularly mail mail just exchange does not like it I think you have to do stuff to switch it on
|
||
|
|
okay that's a buy the buy on you yeah what you're seeing is we take the complicated thing we break
|
||
|
|
it down start with something and then work up build your way up let well let me express it a
|
||
|
|
different way when I was when I was learning to to to write programs and I was actually teaching
|
||
|
|
I was running an evening class at an education evening class many years ago in Pascal and there
|
||
|
|
was a book that had come out at that time written by a couple of guys at Glasgow University
|
||
|
|
Computers Science Department and they were using the technique or stepwise refinement so you start
|
||
|
|
with your specification possibly on a bit of paper in a simple possible form you know program read
|
||
|
|
some data do some stuff with it writes write and answer that's your program then you look at each
|
||
|
|
piece of it and you say well what does that actually mean and you then expand it into maybe now you
|
||
|
|
can actually start writing some statements in your in your language and and then you take each each
|
||
|
|
non expanded piece of that and and refine that so you're going through it step by step refining
|
||
|
|
it layer by layer it's it's it's not a it's not a well regarded methodology these days but I think
|
||
|
|
it should be personally because it's a method I use but yeah I end up a crap called and you end up
|
||
|
|
with beautiful calls I guess I'm missing a step somewhere along well maybe maybe I polish more I
|
||
|
|
don't know yeah that could be it all right um let's continue on with the so you've defined two
|
||
|
|
variables which are actually what they how to do a regular expression on the integer and on a
|
||
|
|
fraction one thing I should say hold on before you move on yep that these things are enclosed in
|
||
|
|
qr brackets qr means this is this another thing like qw but it means this is a regular expression
|
||
|
|
what it does is it causes the burl interpreter to compile the regular expression at that point in
|
||
|
|
time but you can then go and uh bandy this thing around and use it in other other contexts um we
|
||
|
|
could get into what's compiler what's a compiled regular expression and what's not but uh I'm not
|
||
|
|
even sure I could answer that very well uh moment but uh uh let's leave it as a marker that that's
|
||
|
|
that's quite a powerful feature of the poem and that's why I've used it there okay cool uh of course
|
||
|
|
that fact completely uh went over my head uh there you go there you go so now you have the
|
||
|
|
real regular expression where you're bringing in all of these together sub parts of regular
|
||
|
|
expressions um you're wrapping this in the qr which obviously is a compiled
|
||
|
|
again you might wonder why the hell did I put it there because the regular expression is to be
|
||
|
|
used in an if statement because we're gonna check things against you know does it match this
|
||
|
|
regular expression or not well the reason is because if you put it in the if expression it looks
|
||
|
|
absolutely nightmarish I think I really like to generate these things as a compiled
|
||
|
|
statements somewhere else where I can you know fiddle about with it and make it look pretty
|
||
|
|
and then can use it later on in the and possibly more than once um elsewhere in the in the script
|
||
|
|
yes it does actually look and actually that explains uh those explains one of the questions
|
||
|
|
because you have uh while we we'll get to it later on um how you're using it how you were able to
|
||
|
|
avoid using it um okay cool no that explains that okay now within this one you have open brackets
|
||
|
|
question mark x closed bracket what's that hello make expressions have the capability of
|
||
|
|
lots of extra gobbins which is which is um which is not available in many other regular
|
||
|
|
expression environments um uh which you can put within the expression so they're sort of
|
||
|
|
pragmas and uh an extensions and so forth um simpler regular expressions you can simply put
|
||
|
|
normally put slashes around your your regular expression um so slash some some expression slash
|
||
|
|
followed can be followed by modifier so the very there are a lot of modifiers one of them is i
|
||
|
|
which means all of that regular expression don't care about the case of anything you match in it
|
||
|
|
uh the one that i'm using here is x which means the regular expression can be formatted with spaces
|
||
|
|
in it comments in it laid out with new lines in it and look generally made to look prettier you
|
||
|
|
you can either stick that at the end after the close um curly bracket or you can stick it at the front
|
||
|
|
in this format using the the the fancier version i love this because this is this i think this
|
||
|
|
this regular expression would actually give somebody a heart attack if they if they looked at it is
|
||
|
|
now if it was all if it was all on one line yes it would give you it would definitely you would have
|
||
|
|
a hell of a job to work out where one sub expression began and and the next ended and so forth
|
||
|
|
so for those not following along reading the script there's 11 lines of this regular expression
|
||
|
|
which includes variables that we have already defined previously which we have discussed
|
||
|
|
before and each of those lines are kind of filled out in a nice column so tabed in you have
|
||
|
|
the regular expression tab the comment and exactly what each part of the regular expression
|
||
|
|
does so it's really really tidy really really easy to read and i wish more people would do this
|
||
|
|
so that's line 54 line 55 is uh sharrott's open brackets open square brackets plus minus
|
||
|
|
close square bracket question mark close bracket now let me assume a string begins with the
|
||
|
|
optional sign so the uh sharrott is a start of line correct yeah and i i think that's the correct
|
||
|
|
word for like the chinese has type symbol and then the square brackets are a plus and minus symbol
|
||
|
|
followed by a question mark and the question mark means it may be there are not one oh one oh zero
|
||
|
|
one or zero yeah okay proceeding expression yeah got you and you're enclosing that in a regular
|
||
|
|
bracket and nothing special about those those are capturing brackets those are capturing so that
|
||
|
|
means now the percent one it is percent one isn't it uh yes in these go this is captured and what's
|
||
|
|
it must be a magic variable because i'm not assigning a variable name to it
|
||
|
|
oh this pearl is very flexible this an aphorism that goes with with pearl is a t w a m i i can't
|
||
|
|
remember what it is there's more than one way to do it whatever the initial letters are of that
|
||
|
|
Tim Whitty or something people would pronounce it um there's there's always multiple ways of
|
||
|
|
achieving a thing in pearl um this this um this particular one is um well when you are when
|
||
|
|
you're capturing elements in a regular expression uh yes you're quite right they go somewhere um
|
||
|
|
they can go into what you could effectively call magical variables that are sort of behind the
|
||
|
|
scenes that you can you can access after you've applied the regular expression to to the string
|
||
|
|
that you're working with um but you can also make it return a a list um a list of of items which you
|
||
|
|
then assign to a list of variables or to an array um a bit later on the if this is what you're
|
||
|
|
going to do this is it uses the assign to to list type of uh and this is kind of what confused me
|
||
|
|
area on because i was using the percent one percent two percent three which isn't really handy if
|
||
|
|
you need to add in another line that you missed because then you need to change the numbers
|
||
|
|
yes yes yes you better not to do that if you can help it so assign directly to variable names
|
||
|
|
that you know are an array yes okay that that's that's by far the best way of doing it with
|
||
|
|
it not again all right so the next line is an easy one it's got the letter p now i don't know if
|
||
|
|
that p should be i've always seen there's another case p but i would need to read the spec to see
|
||
|
|
whether it would be yes i don't know i assumed all these letters were were uh
|
||
|
|
uppercase and if they're working do you would put it you would not put it in the bracket because you
|
||
|
|
don't want to capture it you just follow by follow up with a nigh um you can how would
|
||
|
|
it make sense instead of you can do that thing where you put brackets around it with a question mark
|
||
|
|
and an i uh i think not followed by anything i can't remember now what what follows what follows the
|
||
|
|
modifier yeah or you could just you could simply modify the whole string because you probably don't
|
||
|
|
care about if you don't care about the case of that you probably don't care about the case of
|
||
|
|
anything else but yeah you'd probably you'd probably ignore case in the whole string the whole
|
||
|
|
regular expression yeah true no okay cool cool but this is there's another example of meh let's
|
||
|
|
let me check the spec do it step by step by step as what you're saying yes yes okay yes and then we
|
||
|
|
go on to the next line which one two three four five five more five lines are very similar
|
||
|
|
so don't get worry pox the open parentheses colon sorry question mark colon what do we say that
|
||
|
|
was non capture was this is an enclosure but a non capturing non capturing enclosure non capturing
|
||
|
|
parentheses okay then we have which follows on two parentheses surrounding the int which was the
|
||
|
|
slash the plus from before which we compiled in regular expression yep yeah um so what do you say
|
||
|
|
in there and then followed by the letter y close parentheses close question mark so the non capturing
|
||
|
|
bit is the y and the capturing bit is the int for you yes yes and the exact same lines repeated
|
||
|
|
for month days hours and minutes and in there we have one thing you've missed though
|
||
|
|
you you assume that the closing the last question mark was some some are balancing the first one
|
||
|
|
well it isn't I'm afraid it's it's it's to make what optional it makes the whole of that that expression
|
||
|
|
from the first parenthesis to the last optional because your your um yes duration might
|
||
|
|
emit the year spec completely absolutely yes yes correct so the in order so a non capturing
|
||
|
|
uh parenthesis is just terminated with a regular presentices without anything yes yes perfect
|
||
|
|
it's it's now you're doing a phone uh funky one here for the time because you're encasing the
|
||
|
|
whole time thing in and non capturing parentheses as well yes with an optional with an optional
|
||
|
|
character mark because the whole yes the whole time the whole the whole time thing might not exist
|
||
|
|
and then if it if it does then there must be a t to start it absolutely yeah yeah okay I get that
|
||
|
|
and then the fraction is the fraction regular expressions just before the seconds and that is more
|
||
|
|
to do with the specification than all of this okay perfect now halfway through the scripts folks hope
|
||
|
|
we haven't put all those truck drivers to sleep wake up wake up okay now we're at line 70 if you're
|
||
|
|
following log did the did the the riveting news here on hpur and we're two minutes overdue for
|
||
|
|
the community news to start but asher will plow on ahead nobody's online okay then we have duration
|
||
|
|
equals shift right now and now she shift means something completely different huh I won't ask no one
|
||
|
|
asks anything from a peck on the cheek to a snog all right okay there you go there you go that's
|
||
|
|
the best definition I heard from a good friend of words so this is obviously taking it from
|
||
|
|
where now at the argument string yep where did the arguments we happen to find arguments because
|
||
|
|
this magically appear out of the door well when the script runs as with anything that you run from
|
||
|
|
the command line in the next unit system you can follow your your invocation with any number of
|
||
|
|
arguments which are parsed by the the shell and fed to the the script for the program or whatever it
|
||
|
|
is so this is the pulled mechanism for saying get me the first one of these things that they're
|
||
|
|
presented as an array to the to the script and this is where there's more than one thing to do it
|
||
|
|
because you've got that dollar arg a capital A or g yep which I have been using because I'm an idiot
|
||
|
|
apparently well it's it's a way to do it's another way to there's many ways of doing things in
|
||
|
|
in pearl this is this is the the simple but simple to look at complex to explain method of doing it
|
||
|
|
but yes it's shift when in this context the top level of the script shift means get the arguments
|
||
|
|
from the command line and pearl is very very much that that the context it's very much
|
||
|
|
dependent on the context yes because if we were doing a loop through a an array for instance the
|
||
|
|
shift then will refer to the array as opposed to arguments correct it can do yes and it can also
|
||
|
|
mean different things in a subroutine yes which is why I think I think it means an
|
||
|
|
equivalent of the arg dollar because then it was very clear to me yes well I wouldn't argue with
|
||
|
|
you because to me that's that's a sort of shorthand and it me it's very meaningful to me because
|
||
|
|
I've done this so many times but but when I was an early pearl programmer I was I wouldn't have done
|
||
|
|
that probably I just thought nah I'm going to be puzzled by what the hell that means when it comes
|
||
|
|
to rereading it well what will concern me more is that if I put it in the wrong place then I
|
||
|
|
suddenly I'm shifting arguments from the from the motor program as opposed to the loop that I'm
|
||
|
|
going through or or the loop I then pick up and put it into a subroutine because I want to make
|
||
|
|
it clear and then suddenly shift the context of shift changes well it's actually quite logical
|
||
|
|
shift at the top level means shift from the the argument array to the program shift in a subroutine
|
||
|
|
means shift from the argument array given to the subroutine shift with an argument itself
|
||
|
|
an array argument itself means shift off this array okay it's it's it's pretty logical yes yes
|
||
|
|
to me anyway okay folks just as a reminder here that you can send in your own
|
||
|
|
programs and talk about them for hours if you wish as well anyway you were going to say
|
||
|
|
oh I was really going to say that this wild loop is is there so that you can feed the script a whole
|
||
|
|
bunch of these expressions and parse them all rather than fire it up multiple times with one at
|
||
|
|
one at a time I did not know this that was why yes the would have been under this is actually a very
|
||
|
|
very clean and elegant way of getting parameters through I must say yes money pennies and how do you
|
||
|
|
yes the other thing is there's a along with it there's a file of example expressions so all you
|
||
|
|
need to do is to is to put a cat statement on the command line with with them as a command so that
|
||
|
|
that's the contents of that file are simply offered as a command line to the script when it runs
|
||
|
|
so it's pretty common bash convention yes and if you want more information on the cat command you
|
||
|
|
should go back to Dan wash goes minutes in the shell series here okay print duration nothing
|
||
|
|
special here they variable duration which will print whatever the first parameter in this case
|
||
|
|
goes we'll run on to it the first time then we skip over and then we get to the to the good
|
||
|
|
misimpedated if open brackets space open bracket dollar sign comma a percent feels close bracket equals
|
||
|
|
open brackets duration space equals tilde dollar or e for regular expression close bracket close brackets
|
||
|
|
open squiggly brackets I don't know what that is open squiggly bracket for the if statement yeah now
|
||
|
|
question number one here is the sign how do I know that they if the sign is is optional you said
|
||
|
|
so will it not then okay well I want to explain to people I want to ask you this is my assumption
|
||
|
|
of starting over on the right here you have duration equals tilde regular expression and the
|
||
|
|
equals tilde is the format saying this is a regular expression so all that junk
|
||
|
|
perform it on the variable duration which is the first argument because we just shifted it correct
|
||
|
|
enough that's right it's it's supplying the regular expression to the contents of duration
|
||
|
|
okay and those two prejudices following those explodes that out to being one one
|
||
|
|
wasn't the course one quote comma so we had a year one year one month one day T one hour one
|
||
|
|
minus one second it would be a quarter one comma one comma one comma one comma one and that
|
||
|
|
would pipe back into the other side of the equal sign correct or not yes yes it essentially correct
|
||
|
|
yeah it's that expression the regular expression application there is operating in what pearl calls
|
||
|
|
list context the fact in a bracket means I'm a list give me an answer back as a list so
|
||
|
|
come back as a list forget the commas because a list is a sort of entity of items in a in a row
|
||
|
|
if you like a stack or something would it be an array yes effective yes yes yes and the the
|
||
|
|
equals that precedes that so effectively you've got two list expressions you've got a bracket
|
||
|
|
thing with the regular expression stuff in it and you've got a bracketed thing before that with
|
||
|
|
variables in it so it says on the right hand side generate a list and on the left hand side
|
||
|
|
stuff the results of that list generation into these variables okay so the first item from the list
|
||
|
|
goes into sign anything left goes into at fields which is the array so you're the you're filling
|
||
|
|
a whole set of things together and why did I split them for people you and I know I just want
|
||
|
|
to make an explanation here for people who are following along I don't know why you get this far
|
||
|
|
if if you're not into programming don't know why you get this far if you are into programming but
|
||
|
|
let me hey it's our show we can do what we want more live on the edge a array would be a bit like a
|
||
|
|
chest of drawers I imagine so you know you put a value into one chest of drawers and then underneath
|
||
|
|
there's another one underneath there's another one underneath there's another just that's how I
|
||
|
|
liked to think of it now perfectly fine concept yes okay now stack of pigeonholes or how do you like
|
||
|
|
to stack a pigeon holes not really unless it's a two-dimensional array correct yes yeah picked up
|
||
|
|
something don't know what it was for the picked it up it's on the underside of my shoe okay now
|
||
|
|
have I found a bug in your program because you have on one hand you're going to do all this funky
|
||
|
|
stuff so you're going to get all these values multiple values and then you're going to put them
|
||
|
|
onto the other side and the first one you're going to assume a sign yeah and I just set it as I
|
||
|
|
set it out loud I just realized that even if these things are optional it still returns an empty
|
||
|
|
position it returns it returns beautiful they don't know value in Poe which is called undefined
|
||
|
|
so therefore this chest of drawers is already defined the size of it is already defined by
|
||
|
|
the regular expression because by the number of capturing elements correct in the regular
|
||
|
|
expression yes it almost sounds like I know what I'm doing and you know for a moment there I have
|
||
|
|
but the beauty of this is I recorded it and I can prove to myself later on I didn't understand
|
||
|
|
at the time all right so sign you just want to put in there into a separate one itself rather than
|
||
|
|
in in fields yes yes because because it's easier it is it's it's not a it's not a field of a date it's
|
||
|
|
a it's a separate entity yeah okay and now you're doing a really cool parallel thing which I must say
|
||
|
|
I did a little happy dance I knew about it but I did a happy dance when it finally hit home
|
||
|
|
and it is signed so the variable sign is equal to the plus character on less sign which is
|
||
|
|
which means if the sign isn't filled in or is empty or undefined then you make a plus
|
||
|
|
by default yep that's right it is beautiful so it defaults to plus yeah and this again just this
|
||
|
|
formatting of unless something do something or die unless blah is kind of the opposite of the if
|
||
|
|
it's another example of this plural do with many different ways and it's also it's also a linguist
|
||
|
|
view of the world I always think because you know in English you say if and you also say unless
|
||
|
|
so makes sense that the programming language would do that as well yeah but then I have the
|
||
|
|
I struggle them with my purl you know pick a pick a way of doing it and be consistent with
|
||
|
|
consistency of coding because sometimes I will do you know for the same loop I will do it with an if
|
||
|
|
statement and otherwise and sometimes I'll do it with an unless statement yeah what I'm
|
||
|
|
telling to do now myself is I use an unless if it's a very simple one-liner and I use an if if
|
||
|
|
it's if there's going to be multiple options and well I would I tend to use that the rule that if
|
||
|
|
it makes sense if it's meaningful then then use the unless you know if you say unless some test then
|
||
|
|
do something sometimes that's more meaningful than if not some tests then do something it's it depends
|
||
|
|
it depends on on the context I think but the fact you've got both options options of doing optional
|
||
|
|
ways of achieving it is one of the things I like about Perl yeah but okay so we're checking
|
||
|
|
essentially what we're doing is it is checking here if a variable is defined or not and yes quite
|
||
|
|
often when I'm looking at data I'm checking to see whether it's defined or not if it's defined I want
|
||
|
|
to do something if it's not defined I also want to do something but with that unless thing you can't
|
||
|
|
go into four different lines of things it's it's one single line you can you can you can you can
|
||
|
|
it's just you can use it instead of an if just your if test is reversed you put it at the front
|
||
|
|
yeah yeah yeah so you can say unless brackets dollar x equals one equals equals one
|
||
|
|
and that's the same if it's not yeah but you can you then therefore like if I want I want to
|
||
|
|
set x equals to one I want to print out something I want to run another variable oh not in your
|
||
|
|
subroutine I want to do something else which you can do with an if statement because you enclose
|
||
|
|
it in the currently bracket yes same with the nice oh but I haven't seen people do that well I
|
||
|
|
suppose a lot of people don't like it I personally find that quite fun but it depends on the
|
||
|
|
context it depends you would only use it I think if if it was appropriate you know if if you've got
|
||
|
|
a if you've got a variable that you're using as a Boolean flag where you know something like unless
|
||
|
|
unless light is on then put the light on or something and that if if it reads well in a sort of
|
||
|
|
garbled English form then sometimes that's more meaningful I believe okay yeah but the only
|
||
|
|
thing that I fear is that you are you reading it if it goes to multiple sentences and then the
|
||
|
|
unless is at the end you're assuming that it's doing all these things with the natural fact oh yeah
|
||
|
|
sorry about that this is a negative statement oh yeah it's true it's true yes that the traps of
|
||
|
|
booleans and the the knots and the the ants and the organs of it are still there regardless
|
||
|
|
yeah but they're up you have to trade carefully yeah you you will have a bracket of code and it's only
|
||
|
|
at the end you find out that it's it's it's not an if statement it's a known unless statement
|
||
|
|
hmm because on the other side okay fine we've sent enough time now the next caution and this is
|
||
|
|
the next line and correct me if I'm wrong here you're doing you're saying feels equals
|
||
|
|
a whole ghost stuff which we'll talk about later than feels the whole stuff stuff which we'll
|
||
|
|
talk about later is the map command which is between curly braces and then in between that we have
|
||
|
|
two other brackets which has got the words defined open brackets dollar underscore
|
||
|
|
close brackets question mark dollar underscore colon zero close brackets yeah so the whole thing is
|
||
|
|
the array feels is equal to the mapping of of feels where in you're doing and check on each
|
||
|
|
of the fields to see if they're defined and if not you put them to zero correct yes it's it's forcing
|
||
|
|
everything that's undefined to be zero so that you don't because because you actually want to
|
||
|
|
display these things and displaying undefined values causes problems yeah and it's very
|
||
|
|
it's a lot clearer yes but why not use just use the unless or not the unless but you know
|
||
|
|
the implicit statement you have it if like we had before if sign feels if just the variable name
|
||
|
|
then that just checks to see if the variable exists or not why use the point
|
||
|
|
um just a convention really just just in the deprecase just did you know that
|
||
|
|
I don't know whether it's because I will then read oh by the way the dollar underscore is this
|
||
|
|
magic I think notepad that people are allowed to use the kind of chocolate temporary chocolate
|
||
|
|
where you can throw a variable without and of course this is all in context as well now what does
|
||
|
|
the map command do pray the map command applies an expression to each element of an array so it's
|
||
|
|
a type of loop you could write this for this same thing by simply going through a loop which loop
|
||
|
|
through every element of the the array said is is it is it defined yes then okay you carry on
|
||
|
|
otherwise replace it to zero you try it in that way oh that's just a more a quicker and more
|
||
|
|
convenient way of doing it yeah which is why I've avoided it like the plague as well it's nice
|
||
|
|
one to get in handy as maybe and you're also using this other short form of if then statement
|
||
|
|
yes something that's that's a conditional expression I think it's defined as okay thanks
|
||
|
|
conditional expression so you've got this before the question mark which is the if it's if
|
||
|
|
time is equal to I guess that's the test there's a test yes followed by a question mark and then
|
||
|
|
you got the true branch and then the false branch so to separate by colon the test you is if it's
|
||
|
|
defined yes and then the question mark is if it's true then it's dollar underscore so it is what
|
||
|
|
it is if it is return yes yeah otherwise is put to zero the result of each iteration through the
|
||
|
|
the map loop implicit loop returns values which are then strung together and put into the fields
|
||
|
|
on the left hand side of the side which is what I want to talk to you about in a minute but this
|
||
|
|
again the whole purpose of the whole purpose of this show is kind of to explain to me why
|
||
|
|
you're picking this form of of code in this point because I always try to avoid this one
|
||
|
|
I prefer to go if then this then equals yes and I tend to do that on the on a if it's a short one
|
||
|
|
I'll put it in on a single line what it makes for ugly code it's um it's a matter of taste really
|
||
|
|
I think uh I've come my the first language I learned was alcohol 60 to give you some idea how old I
|
||
|
|
am probably um which you couldn't do anything other than build a loop with with with a with a test
|
||
|
|
inside it uh and um so you know that that was that was a fine convention um you could easily bring
|
||
|
|
that sort of logic to pull and do it in a case like this uh one of the nice features and shortcuts
|
||
|
|
of a poll is is this things like map and so forth and so I've just developed the the personal
|
||
|
|
convention of using them wherever it's appropriate now that I feel confident enough to do so um I
|
||
|
|
would use them I think I wouldn't advise anybody to force themselves to do this if they didn't feel
|
||
|
|
happy with it with it no no I understand but um again what I was trying to do in my own silly way
|
||
|
|
was uh be consistent in in the code and not mix it not mix a lot but that results in very very
|
||
|
|
ugly code whereas you've got very pretty looking code but you're mixing these conventions doing exactly
|
||
|
|
what I was trying to avoid doing but when you do it it makes for tidy code it's also using the
|
||
|
|
power of per I mean so long as you as long as you understand what what it's doing then
|
||
|
|
know that this this is one of the powerful features there will be some of the things in other
|
||
|
|
languages too and oh python's quite good at doing this type of thing where you take a list and
|
||
|
|
zap through it doing particular thing to stuff um within it so you know it's um it's it's it's
|
||
|
|
the feature and it's a okay desirable feature now this is uh I want to go back to to a question
|
||
|
|
as a phenomenal mind for some time if you go back to our chest of drawers so you have one drawer
|
||
|
|
stacked on top together what essentially you're doing here is you're going to take the value of the
|
||
|
|
years that I give in the argument so one more more more more and you're going to put the first one
|
||
|
|
in the top drawer which will be called zero yes the second one into the second drawer which
|
||
|
|
we call one and the whole right down and that array is called fields so we on the chest of drawers
|
||
|
|
we're writing in chalk fields at sign fields and then shelf zero shelf one shelf two three go away
|
||
|
|
now you're rolling in another chest of drawers and you're calling that fields uh no it's it's the same
|
||
|
|
one sorry it's the same one it's it's it's doing a juggling juggling act here it's it's grabbing
|
||
|
|
things out of the array and doing stuff to them and poking them back into the same array um
|
||
|
|
um how do you all labels you have labels sorry labels labels labels over the top so you have an array
|
||
|
|
so the fields one sorry is i'm not talking about the fields one i'm talking what you were doing
|
||
|
|
there to use our drawer analogy was you were using the map command to go through each of the drawers
|
||
|
|
check to see if there's something in there if there isn't put in a zero and if there is just go to
|
||
|
|
the next one so essentially that's what you're doing all right but now with this labels thing that
|
||
|
|
you're doing you're pulling in another chest of drawers and you're calling that at sign labels
|
||
|
|
and then in the first drawer which is drawer zero you're putting in the word years the second
|
||
|
|
drawer which is drawer one you're putting in months and so forth days hours minutes seconds
|
||
|
|
and then you're using by virtue of the fact that you know you're going to display these print them
|
||
|
|
off in a loop and you're going to go from zero to the end of the drawers so you're going to open the
|
||
|
|
pull out both drawers at the same time you're going to look in the first one and go one pull out the
|
||
|
|
second drawer and go year you're going to close those two go to the next two pull them out one
|
||
|
|
okay so that's kind of what's happening yeah you're talking here about the using two arrays and
|
||
|
|
using using the fact that they're the same size yes and yes then doing a loop from one to the
|
||
|
|
maximum and then using the second array as labels okay why not use that well you're talking about
|
||
|
|
this thing called ISO duration don't think you said that did you well no I'm more referring to
|
||
|
|
the other program the previous version of this which didn't have the the ISO duration thing oh okay
|
||
|
|
okay sorry you're talking about the the for loop yeah the for and we're going down to the for loop
|
||
|
|
at line 97 number skipping okay conveniently over the hash way of doing this which
|
||
|
|
to be honest I'm more comfortable with because the idea of maintaining two arrays in order because if
|
||
|
|
if you're doing some sorting on the first one your second one goes completely here by or so now
|
||
|
|
if you had four years before it's now three months it's four months you know what I mean yes well
|
||
|
|
let me you're trying let me try and pull pull it all together in the in the the latest version of
|
||
|
|
the script I'm using a hash merely as an example of how you could use a hash as a way of storing
|
||
|
|
together both the labels things like years month etc along with the value so you end up with a
|
||
|
|
with a structure that contains labels and values so that's quite a convenient way of doing things
|
||
|
|
however the problem the problem with using that is that when you come to display its contents
|
||
|
|
you probably want to display them in the order of years month days hours minute seconds
|
||
|
|
the way that hashes are put together the labels are not in any any particular order they're
|
||
|
|
in completely arbitrary order so if you if you simply cycle through the labels you have no
|
||
|
|
easy way of ensuring they'll come out in the right order absolutely so it could come out as
|
||
|
|
in our case one week one year one month whatever but that doesn't change the structure of the
|
||
|
|
does it mind you no the relationship before the label to the value sorry going yeah don't before
|
||
|
|
we go any further using our analogy on what a hash is before if we take a chest of drawers and we
|
||
|
|
put some and we get a marker and we write another chest of drawers not the other two that we have
|
||
|
|
on one side of the corner the other side of the room we've got a chest of drawers that's got all
|
||
|
|
the shelves of it and we call that what are we calling this ISO duration yeah and then on the
|
||
|
|
front of each of the drawers we're writing on the first one we're writing years and then the second
|
||
|
|
one we're writing the word month days hours minutes yeah so using that you can go to that hash
|
||
|
|
which is what's what's called in C called a structured something another kind of
|
||
|
|
so you can refer to specifically that hash and that's named value so you can go that hash hours
|
||
|
|
and go directly to that drawer without having to cycle through to that hash it's zero yes it's
|
||
|
|
called an associative array yeah yeah yeah that's right that's right okay so I use these all the
|
||
|
|
time because I find them far cleaner to work with then whereas the thing you were saying before
|
||
|
|
is I've got two arrays one of which have got labels and the other one's got fields and the only
|
||
|
|
reason that they're useful is that they're both positioned the same it's like having two
|
||
|
|
chests of drawers side by side with with the labels in one and the values in the other but it's
|
||
|
|
only the fact that they correspond one with the other that you've got you've got useful
|
||
|
|
useful knowledge there yes yeah but if you want to do yeah okay boss the phone you think about
|
||
|
|
hashes is they're multi-dimensional you can make the manuscript you're like so you can have
|
||
|
|
a drawer that has got that's called zero one two three more down along and then inside of that
|
||
|
|
you can have a shoe box that's got a word years on it and then say that's the value and then
|
||
|
|
you can sort through you can sort through zero to the end ignoring what the labels are and then
|
||
|
|
get the values if you want it yep but I don't do that either because usually I know exactly what
|
||
|
|
field I want to refer to yes you can make some pretty hairy structures within within poe but yes
|
||
|
|
that's for another day I'm sure yeah but it won't be for me all right because trying to get them
|
||
|
|
working with multi-dimensional arrays and references and references to hashes hashes is it gets
|
||
|
|
very hairy very quickly but I one thing I didn't know that you could do was just dump an array
|
||
|
|
of hashes like that using this one liner well that's why I did it it's it's it's referred to
|
||
|
|
as zipping together to to arrays into it into a hash I think I put right there just as a demo really
|
||
|
|
is you know what's what would be really useful if you would mind and you've done it for some
|
||
|
|
movie code is that you're using the plural terminology for for what you're doing because I find
|
||
|
|
it very difficult in plural to figure out what it is the terminology is that they're using to
|
||
|
|
describe the thing and by the time I figured out what the plural terminology is the thing that I'm
|
||
|
|
trying to find out I figured it out anyway because I've read so much stuff then I figure out what
|
||
|
|
the terminology is yes yes yes well we probably need a need need some show notes
|
||
|
|
to along with this that maybe explain some of this stuff I guess cool um yeah okay okay so now
|
||
|
|
you've assigned the sign to a single one and you've assigned ISO duration to labels and fields
|
||
|
|
oh that's really cool is there any particular reason do you think why hashes are just in
|
||
|
|
some random order and not um not ordered I think it was more memory efficient or something
|
||
|
|
from what I could read yes I think so um because it is a it is a hash table which is the result of
|
||
|
|
doing some rather rather funky mathematical analyses on the on the contents the whole concept
|
||
|
|
of associative arrays work that way you can't you can't be sure of the way in which things come
|
||
|
|
out and I think pearls gone even further with this and has ensured that the the order of
|
||
|
|
labels within a hash are always randomized between invocations of a script because it makes
|
||
|
|
the script less predictable if you're trying to hack it from outside yeah so yes okay so moving
|
||
|
|
yeah go on then we have the print if command as a whole print f command which is
|
||
|
|
all on directly from c if I'm not mistaken indeed yes yes and then we would have percent dash
|
||
|
|
seven s call on space percent six s new line called comma double call sorry double
|
||
|
|
call comma double call sign double call comma dollar sign quotation tell me what that is
|
||
|
|
that prints out in a formatted way in seven columns the string sign the minus seven means left
|
||
|
|
justify it always it would it would be right justified within the seven columns um and then
|
||
|
|
that's followed by a six um column uh string which is um right justified uh so you would see
|
||
|
|
see see so momentum yeah come on ahead right we're good we're good so you you will the reason for
|
||
|
|
that is because there's a further print f a bit later on that that uses the same format um for
|
||
|
|
printing out the values that we've fished out the the duration and um it just wants to do the line
|
||
|
|
up so the loop is simply stepping through the two arrays together using indexes into them because
|
||
|
|
their index from zero through to the the maximum number of elements within them and for each one
|
||
|
|
it prints out the contents of the label and the field so you see year number month and a number and
|
||
|
|
so forth now the first time you ever looked at something like this it it does look a bit like
|
||
|
|
rocket science so the but it's a standard c for loop actually it is it is yes the the only
|
||
|
|
difference is that dollar hash fields business which is pearls way of uh specifying the length of
|
||
|
|
the thing but that's not even pearls originally that's what bash does as well for for representing
|
||
|
|
length of things i think that that comes from from way back when and the dollar i plus plus
|
||
|
|
thing is is a c thing and it's it's all through unix okay so they inside the four of those uh
|
||
|
|
two parentheses and closing three portions of the loop you have the start they um they check to see
|
||
|
|
if it's the end and the uh implementation the first part is my dollar i equals zero and i took
|
||
|
|
me ages to figure out why people picked dollar i yeah well it's just again that's convention
|
||
|
|
it's i and j and k tended to be the variables just i think but because you see it in mathematical
|
||
|
|
expressions right okay so i equals zero double semicolon means traditionally in pearl a new line
|
||
|
|
is that actually a new line semicolon is merely a statement end you can put multiple statements on
|
||
|
|
a line in this structure of the for loop then it simply separates out the uh the components of
|
||
|
|
the the iterator of the for loop now that's a c convention i think okay and then the next thing is
|
||
|
|
for i which starts a zero is less than or equal to the number of fields uh and then the first one
|
||
|
|
it's not and the second one of not and then it goes up until six whether it's six or eight and then
|
||
|
|
it would be and then if semicolon and then i plus plus mean each time you do this thing increment i
|
||
|
|
because if you don't i will always remain at zero and you'll be in this loop forever indeed
|
||
|
|
how would you've done it more than one occasion so i wonder what when you press control c don't they
|
||
|
|
loop field pain anyways and then we have the print up statement using your percent dollar minus
|
||
|
|
seven s trick and then labels square bracket dollar i square bracket comma fields square bracket dollar
|
||
|
|
i square bracket and that so each time your labels and your field chester drawer on one hand you're
|
||
|
|
taking out zero from the label store and the other hand you're taking out fields from the
|
||
|
|
fields chester doors perfect and then we close the for loop we close the if statement and the if
|
||
|
|
statement we're still with us from three hours ago when you were driving through Cincinnati
|
||
|
|
the uh that is the regular expression itself so if the regular expression succeeded uh we've
|
||
|
|
done our thing which is we've stored them and we printed them all out and the else would
|
||
|
|
it be just simply print in validation and the final line almost final line is print a new line
|
||
|
|
and then we have the end of the while loop right otherwise if you had put in one duration space
|
||
|
|
another duration space another duration it will continue happily on and in the end we have
|
||
|
|
exit and at the very end you have one of your vim marker things it's a mode line it's it's an
|
||
|
|
instruction to the editor to say by the way this is a pull file etc etc tab stops shift with
|
||
|
|
expand tabs etc etc i won't go into that maybe that's for another podcast why have you picked
|
||
|
|
tabbing of four instead of two like that's the shift with that that's what you get when you
|
||
|
|
yeah that's what you get when you press the tab key that's not the size of a tab is merely the
|
||
|
|
what vim does to move you the number columns it uses to to to uh skip when you press the tab key
|
||
|
|
to line things up and i've used that because you know it used to be the convention that you
|
||
|
|
indented by three but three to be a nasty number is four yeah well it's it's uh i don't taste really
|
||
|
|
okay and one other question is why do you put your stark bracket uh curly bracket
|
||
|
|
on the same line as the statement and not directly underneath it again it's it's convention it's
|
||
|
|
it's style it is the recommended style uh if you go to the style gurus of pull um and if you feed
|
||
|
|
this particular script through old tidy it will it will enforce that particular rule um but again
|
||
|
|
you know it's it's convention you can break away from it if you want to the the real rule is
|
||
|
|
if you're writing scripts and you're in a team all writing scripts then use the same rules
|
||
|
|
throughout otherwise you get into those sorts of difficulties perfect div that has explained
|
||
|
|
to me that have been bugging me for a while and stuff you can't actually handle it because uh
|
||
|
|
it's it's convention system okay with that i think we should uh we're 40 minutes over so we're
|
||
|
|
now around the hour and 40 minutes oh my god uh we better stop now for the community news as ever
|
||
|
|
so anything else you want to add no that's me all right thank you very much and remember folks
|
||
|
|
we're running fairly low on the shows so if you want you can record something uh i think
|
||
|
|
yeah upload it uh if you look on for the information send an email to admin at hackupublicradio.org
|
||
|
|
or go on to augcast.com on irc.frino.net and ask anyone there and they will give you the
|
||
|
|
password for the ftb server uh with that tuning thank you very much again Dave uh thanks very
|
||
|
|
much for doing this and thanks very much for the um the instructions on how to be a better program
|
||
|
|
oh tune in tomorrow for another exciting episode all
|
||
|
|
you have been listening to hackupublicradio or techupublicradio.org we are a community podcast
|
||
|
|
network that releases shows every weekday on day through friday. Today's show like all our shows
|
||
|
|
was contributed by an hbr listener like yourself if you ever consider recording a podcast
|
||
|
|
then visit our website to find out how easy it really is hackupublicradio was founded by the
|
||
|
|
digital dark pound and the economical and computer cloud hbr is funded by the binary revolution
|
||
|
|
at binref.com all binref projects are crowd-responsive by linear pages from shared hosting to
|
||
|
|
custom private clouds go to lunar pages.com for all your hosting needs unless otherwise stasis
|
||
|
|
today's show is released on the creative commons attribution share a line
|