- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
375 lines
49 KiB
Plaintext
375 lines
49 KiB
Plaintext
Episode: 1997
|
|
Title: HPR1997: Introduction to sed - part 3
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1997/hpr1997.mp3
|
|
Transcribed: 2025-10-18 13:04:26
|
|
|
|
---
|
|
|
|
This episode of HPR is brought to you by Ananasthost.com.
|
|
Get 15% discount on all shared hosting with the offer code HPR15 that's HPR15.
|
|
Better web hosting that's honest and fair at Ananasthost.com.
|
|
This is Hacker Public Radio. My name is Dave Morris and I'm talking this time about
|
|
SAID and this is part three of the series I've called Introduction to SAID.
|
|
And the last episode we looked at SAID at a more advanced level, perhaps most people have delved into.
|
|
Certainly there was a point where I hadn't gone much further than that.
|
|
We looked at all of the command line options that we're going to cover.
|
|
There were a few more that you can look up in the manual if you want to.
|
|
We continued concentrating on the S command for substitution.
|
|
We did this in a bit more detail and we dug very deeply into regular expressions.
|
|
So it's time now to move on to more SAID commands and give you some examples on how to use them.
|
|
So as I said we've concentrated on the S command because really that's the command that most people want to use on the command line.
|
|
But there are a lot more in SAID. Remember SAID's an editor. It's not a programming language as such.
|
|
There are some things in SAID I have to admit which is very difficult to do.
|
|
And there's a point at which you wonder whether SAID is the right tool.
|
|
I don't think we'll reach that point today in this episode but in the next episode and beyond that will be a question.
|
|
Let's look at some of the more commands within SAID and the first point about them is that they operate on individual lines or ranges of lines within the file that you're processing.
|
|
And so we need first of all to talk about line addressing or the subject of selecting lines.
|
|
Now put together a table of the different ways in which you can address a line.
|
|
It is referred to as line addressing in the documentation.
|
|
It's just a summary for reference but what I'll do is I'll dive into the more detailed explanation in this talk.
|
|
There's full show notes with this. There's pretty much always is with what I do these days.
|
|
So you can follow along with them and hopefully use them as a reference point later on if you want to get deeper into SAID.
|
|
We want to use it more often let's say.
|
|
So one of the paradoxes with doing stuff like this is that when you want to demonstrate stuff sometimes you need to come up with workable examples to demonstrate it with.
|
|
And I've reached the point where I need to use a command I haven't told you about yet in order to really demonstrate stuff.
|
|
So there's a command P command. These are all single letter commands in SAID as you've gathered.
|
|
Then what it does it prints the line or lines that we've addressed.
|
|
It's similar to the flag that we saw on the S command in the last episode.
|
|
But its function is the same but it's actually a standalone command as well.
|
|
It makes sense to use the P command if you're using the minus N option to the command itself to the SAID command.
|
|
And you remember that SAID by default prints everything that it processes which is referred to as auto printing.
|
|
Often we want to switch that off as we've seen already.
|
|
So first type of addressing then. Selecting a line by number.
|
|
So the address itself simply consists of a number and matches the line with that number in the input stream.
|
|
So my example demonstration of how you would do that is you would enter the command on the command line SAID space minus NE space.
|
|
Quote single quote one P close quote space SAID underscore demo two dot TXT.
|
|
So what that says is on line one print that line remember I just said the P was for printing the line.
|
|
So the line we've addressed is one.
|
|
So it will just print the first line of that file because we had the minus N flag it won't auto print the rest.
|
|
Now remember you can give said multiple input files and it normally treats them as one continuous stream.
|
|
So in that mode the address the numeric address will match that line number once.
|
|
So however many files you give it there will just be one line one in that stream.
|
|
But we saw last time there's a minus I option and a minus S option.
|
|
And when you use either of those said treats multiple input files separately.
|
|
So I've given an example where I've used the minus S option.
|
|
And I've asked for line five to be printed. I won't read this one out because it's fairly obvious.
|
|
And I've given it two files said demo one said demo two and what you get is two lines printed out each of which is line five on these files.
|
|
So the same line because these two files are are actually the same bit of text but one's longer than the other.
|
|
So that's specific line addressing line number.
|
|
There's also a way in which you can select every end line starting at a number.
|
|
You never need to use this myself but I can imagine it's probably just my lack of imagination but anyway.
|
|
This is a GNU extension so it's not in standard said.
|
|
And what you do is you specify a starting line number and then the size of the step to the next line number.
|
|
So the lines are selected through the input stream by adding the starting point to end times the size of the step.
|
|
So you'll actually enter this one by using the sequence starting number and then a tilde sign and then the step size.
|
|
So one tilde two means start at line one then step two to get line three then another two to get line five and so on.
|
|
So in other words every odd numbered line.
|
|
So if you want to think of it in a arithmetic way it's line one plus one times two.
|
|
Remember two is the step and that's where n is one and the answer to that is three.
|
|
And then the next calculation is one starting line plus two times two which is four plus one which is five.
|
|
So that's how it gets two line three line five then it would be seven and so on.
|
|
I don't know if it helps to think about that way some people think better that way than I'm not one but I thought I'd mention it anyway.
|
|
I don't carry on with that because it's quite hard to visualize in speech but it's written out that way in the notes.
|
|
So if you used two tilde three as your address expression it means start at line two and then step three lines to get to the next one which will be five and then three more lines to get to eight and so forth throughout the file.
|
|
Every third line throughout the input stream I did use there is an example of doing this but it's later on in these notes example two.
|
|
So I put a reference into that.
|
|
The next type of dressing is selecting the last line of the file and in this case you use the dollar symbol as an address.
|
|
The dollar symbol matches the last line of a file or more accurately as I've said here the last line of the stream of data.
|
|
If you're giving it multiple files then it means the last line of the last file but if you use either minus i or minus s then the files are treated separately and every file will have a last line.
|
|
So example three later on in the notes shows how this would work.
|
|
Next we are selecting by regular expression so we have an address which consists of a regular expression so you would type in slash a regular expression close slash as your address.
|
|
There's an example here which I put in which is said space minus any space open quote slash hpr in capitals slash p space said demo one dot txt.
|
|
What that's saying is print out every line which contains hpr and you'll see in my example that there are four lines which contain the string hpr in capitals.
|
|
Now I could have introduced the subject of delimiters when we were talking about the s command but I didn't want to complicate matters.
|
|
But we're getting into complications now so I'm going to have to talk about alternative delimiters. Normally we've seen regular expressions enclosed in slash characters.
|
|
We've done that all the way up until now but this can get difficult. Say you want to use a file path containing slashes as your regular expression then every slash you type in the regular expression needs to be proceeded by a backslash so it doesn't say it doesn't think it's the closing delimiters.
|
|
So it's useful under those seconds and use alternative delimiters. So when you do this you have to proceed the opening delimiter with a backslash.
|
|
So I've shown an example in the notes here. If you wanted to use something like ETC slash password as your regular expression you would type slash ETC backslash slash password P.A.W.S.W.D slash.
|
|
So they've escaped the slash that needs to exist in the middle or you could do backslash and then you're opening delimiter which in this example is a hash mark ETC slash P.A.W.S.W.D closing delimiter which is a hash mark again.
|
|
So they're both the same length so there's no particular advantages in using one over the other but you can see it's useful it will be useful to have these alternatives.
|
|
Now this the need to put a backslash when you change the delimiter is only the case instead when you are using regular expressions as addresses.
|
|
If you use the S command and there's another command we look at later there are other commands like this I think it's just one more forgotten that takes regular expressions or expressions with delimiters.
|
|
Anyway if you use the S command you don't have to put a backslash between your in front of your alternative that's really because said can't easily it's not really designed to be able to easily spot the fact that using an alternative to limiter when you just a straight regular expression.
|
|
The other command systems other languages can do this but said I guess due to its vintage doesn't work that way.
|
|
So I've shown an example where said space minus NE space open quote S and then a vertical bar is the delimiter HPR vertical bar banana vertical bar P close quote space said underscore demo 1.txt so what that's doing is it's saying replace all this as of HPR with banana will be.
|
|
So that works that's fine if you in my second example use a backslash in front of that vertical bar you get back an error from said says unterminated S command and when I tried this I wondered what's it doing I wasn't quite clear about this until I experimented with it.
|
|
So what it's doing is it's assuming that that backslash is the opening delimiter so I included in here an example where the backslash is actually used as the delimiter and it was included in an example where the letter P is the used as the delimiter and both of those work fine for the backslash vertical bar one does not.
|
|
I'm also going to include here the fact that an empty regular expression has a special meaning in said again we could have looked that looked at that in the context of the S command it's the same there but I'm leaving it till just now so that hopefully it's not too much of a shock.
|
|
So what this means is it will this is said's way at least of representing the last regular expression that matched so I've got an example that uses three S commands the first changes the space on each the first space on each line to an asterisk.
|
|
Second changes the second space to an underscore and the third changes the third space to a plus the second and third S commands have empty regular expression so they use the previous matching one so I'll try and read this out but you might be better just just checking it out in the notes.
|
|
So I've got said space minus e space open quote S slash space slash asterisk slash semicolon so that's replacing the first instance of a space by an asterisk then another S which is S slash slash so it's an empty regular expression underscore slash semicolon S slash slash plus slash.
|
|
And so that does the the three replacements for three spaces with the different characters space said underscore demo one dot txt and you'll see in the line that that follows it that the three spaces have been changed in this way.
|
|
There is a potential problem when you use empty regular expressions I hadn't appreciated this until I read the documentation a bit more detail on thought about it.
|
|
What I've done is I've included the recommendation or the comment from the GNU said manual and it says that note that modifiers to regular expressions are evaluated
|
|
when the regular expression is compiled thus it's invalid to specify them together with an empty regular expression.
|
|
That makes sense because the regular expressions got to be stored somewhere so if you use one and then you keep referring back to it it's got to be held somewhere.
|
|
And the regular expressions is sort of a language so it is compiled in the sense of a programming language so that's something just to be aware of.
|
|
I'll try and refer to this later in another another show just to see if I can demonstrate it a little bit more but it seems a little bit too advanced for what we're doing just at the moment.
|
|
So let's talk about modifiers in the context of regular expressions.
|
|
There are two that we'll look at in the in this context and we've already seen in the S command you can use a capital I or lowercase I flag to mean that the regular expression part of the S command is not case sensitive.
|
|
Well there's also a capital I modifier no lowercase I modifier for address regular expressions and this modifier has the same effect.
|
|
So I've shown a simple example which goes like this said space minus NE space open quote slash lowercase HPR slash capital IP closed quote space
|
|
space said underscore demo one dot txt so what that's doing is selecting all of the lines that contain the string HPR but it's using the don't care about case modifier and so it's returning all of the ones which got up a case
|
|
of a case HPR there are no lowercase ones so it's a bit of a daft demonstration I suppose but it makes the point anyway.
|
|
Now there's another modifier to these regular expressions which is capital M. Now this is relevant to stuff that we're not going to be doing today.
|
|
I thought I'd put it in because I wanted to keep all this regular expression thing together in the same place so it's relevant to text in the pattern space that contains multiple new lines.
|
|
Well we've not seen how you you construct such a circumstance yet so we'll look at this in the next episode.
|
|
And the next thing about addressing is that you can also specify a range address range matches the lines in the input data from a starting position up to and including that's an important point an ending position.
|
|
The range is written as two addresses of any of the types you've seen separated by a comma so my simple example is said space minus n e space open quote one comma three p close quote blah blah blah.
|
|
What it means is lines one two three inclusive print and I actually included the first three lines of the file to show you probably a bit of overkill there really but never mind.
|
|
And I had a minus n option there to prevent automatic printing otherwise you'd have just seen those lines repeated twice but there are other forms.
|
|
So my next example is said space minus n e space quote slash up arrow capital W e we slash comma dollar p close quote so the first there are two expressions there one is the regular expression for the word we with the capital W to occur beginning of the line.
|
|
And the second in the range is dollar which is the end of the file so it prints from a line beginning with the word we up into the end of the file which I carefully chose so it was just the next line so you only get two lines in the example the next one is said space minus n e space quote slash up arrow or so complex capital W h a t slash comma.
|
|
Slash circumflex or up arrow produced slash p so this is a range where you're starting with the line that begins with the word what with the capital W in the first first position and ending with the line that begins produced again in the first column.
|
|
And again that's these are two consecutive lines just again to prove the point so we've seen numeric ranges we've seen regular expression and end of file range we've seen two regular expression ranges.
|
|
You can there are some extras that Gnu said provides in terms of ranges I'm not going to look at them in this series I think I might change my mind on this actually because it depends whether I stumble over an example that would be quite good to use them but for the moment I'm not going to mention them this episode is going to be pretty huge anyway if I need to I'll refer to them later on in the series.
|
|
So the next topic under the addressing heading is the process of what's referred to as negating an address match so all of the address types that we've seen so far can be quotes negated.
|
|
I'm not sure that's a best word but it is what's used in the Gnu manual so I'm sticking to it.
|
|
So for example using a line number so you're looking for example for a given line you want to print it if you negated that means to match all the lines other than that one all lines but the selected line and you perform this negation by adding an exclamation character after the address.
|
|
In the long notes there's the first example example one contains a number negation example and in this particular bit of the notes I've got a demonstration where it's we're using the addressing form where we match every end line starting at a specific line which is the number tilde number form.
|
|
So I've got an example here where there is we're using said space minus NE space open quote 2 tilde 2 exclamation mark P we're doing that again said underscore demo 1 dot txt.
|
|
So what that will do is it will print all the odd number lines in this file which has got 13 lines I think you've seen this file often enough so I've not actually included it in the in the notes.
|
|
If you don't negate then you get all the even numbered lines so you're getting the logical opposite of it I guess is one way putting it.
|
|
If you negate the last line of the file using the dollar then it means all lines except the last line negating a regular expression on the other hand means all the lines that don't match it so there's an example here said space minus NE space open quote slash square bracket open square bracket capital A hyphen capital Z slash exclamation mark P.
|
|
So what that will do is it will print out all of the lines of the file which is the good old said demo 1 again it will print out all the lines that don't contain a capital letter.
|
|
So I did another example here using the same idea just to really try and add some more examples of things doing stuff.
|
|
The said things doing stuff to file and this one let me read this out again said space minus NE space open quote slash open square bracket capital A hyphen capital Z closed square bracket slash exclamation mark.
|
|
So that's the that's the addressing thing that you want to do and that's then followed we haven't seen this before but it's logical by an S command.
|
|
So we've got S slash back slash B remember that's the word boundary slash W that's a word character slash that's the end of the regular expression.
|
|
Then as the replacement part we've got back slash U which means do an uppercase to the thing that follows uppercase the first letter of it.
|
|
And we follow that with an ampersand which means everything we match in the regular expression you want that in the replacement.
|
|
So the effect of that is to find all word boundaries in the line so the starts of all words and then replace the thing that comes back just be that one letter of each word with its uppercase equivalent.
|
|
And then after the closing slash of the replacement we've got a lowercase G and lowercase P we're doing this on said demo 1 again.
|
|
So what we see then is it finds all the lines that don't contain capitals then it goes through with the S command replacing all words with an uppercase version capitalized version of the same word.
|
|
So you get back two lines and it's it's got every word's got a capital first letter.
|
|
I don't know quite why you'd want to do that but you never know. I just wanted to emphasize to how you can associate addresses with all of the the various commands pretty much or maybe one or two that don't take addresses on front of them.
|
|
But the general rule is that commands can be proceeded by addresses.
|
|
All the times we're using the S command in episode 1 and 2 we didn't know about addresses yet. I decided not to introduce addresses until this episode because it seemed you know if you didn't want to go beyond episode 3 then you'd still have enough said to survive with.
|
|
So if negation is used with an address range then it applies to the range. It's not possible to negate the individual addresses in the range.
|
|
Not quite sure what that would mean because yeah you can only negate the whole thing.
|
|
So what that means is match all the lines outside the range.
|
|
So I've got an example of how you what sort of such a command would look like and it's reiteration of the one which starts with the word what and ends with the word produced.
|
|
I'll accept a pop turn negation in there after the second regular expression. So it will match the rest of the file except for the two lines that we saw in the last instance of this example.
|
|
So I'm hoping that you'll if you're not clear about any of these you'll be able to just go and type them in yourself and see what happens.
|
|
The said demo one is obviously available for download. I think we saw that in episode 1.
|
|
So that's addressing under the heading of other commands. So I needed to do tell you about that before I started on the various other commands that we're going to look at today.
|
|
So we're now going back to commands and the like and the topic now is comments in scripts.
|
|
It's possible to add comments to a said script. It doesn't really make a lot of sense when you're building a script on the command line and it's more appropriate when you're putting stuff in a file.
|
|
The hash mark is used for this. So the hash character begins a comment and then the comment continues to the end of the line, the new line in other words.
|
|
Now I thought what I do would be to add in this point a reference to the fact that you can build said files.
|
|
We've seen how you can put commands in a file but I've got an example here of how you can create a file which becomes executable.
|
|
You remember hopefully you're aware that if you put the so called crunch bang which is a hash mark and an exclamation mark on the first line of such a file and follow it by the path to the actual command.
|
|
Then the and you make it executable the unit system or the Linux system will if you call it as if it's a program will invoke that particular command interpreter or whatever it is to process it.
|
|
So the same applies to said so the first I've got an example here where I'm using cat a cat command and it's redirecting standard in into a file I'm calling demo dot said.
|
|
Now the first line this is this is me typing in these lines is what I'm trying to represent here is hash exclamation mark slash bin slash said as the path to my said it might be different with you but that's the sort of generic place said lives follow that with a space and a minus f you need to have the minus f because otherwise said is not invoked in the mode to read a file.
|
|
It's going to read the very file that this hash bang line has just been inserted.
|
|
So my next line in this file is a comment hash space B and then one three three seven which is lead speed for lead so be lead is what I typed here.
|
|
Then there's just one expression one command in this file and it is s slash hacker with a capital H slash and then the the lead speed version of hacker which is h4 x 0 are hacks or I suppose you say it.
|
|
I don't know close slash g so that's going to find all instances of hacker and place it with replace it with the lead version.
|
|
So I represent the fact that I would have stopped feeding stuff to this file through cat by putting in a control D so you'd actually press control D as the end of file signal to cat.
|
|
This is a thing where you can change the permissions on a file and I'm using the simpler form u plus x that is I want to give execute access to to me or whoever's doing it.
|
|
Follow that with a space and then the name of the file demo dot said in the next line is me invoking the command that I've created on the command file dot slash.
|
|
It's in the current directory so dot slash means it's here dot slash is the current directory demo dot said space said underscore demo one dot txt.
|
|
So it's running this which is just a simple thing but you know what I'm just making the point and it runs and it finds hacker on some somewhere and find it lots of times.
|
|
So I think I put all of the instances in I don't know why I didn't add them all anyway you can see for yourself in the notes what's happened it's changed hacker into its lead form.
|
|
So that's how you add comments now let's look at the quit command this consists of the single lowercase q and that causes said to exit.
|
|
You might say well said exits anyway so yeah but this a lot this cause it to exit early it can take an address but only a single address meaning exit when this line is reached.
|
|
It doesn't really make sense to make sense to make it to range what does it quit on every line in this range no but I'm assuming I've actually tested this the single address means anything like a number a dollar or.
|
|
A regular expression and exit yeah I'm pretty certain it does like I said didn't actually test this the current pattern space is printed unless you've got the minus n option which stops it's a it's a form of water printing so it's being prevented if you have that option.
|
|
So I've got an example here which is doing another one of these lead speak thing is and what this one does it it replaces radio with what I assume is the lead form r4d10 and it does it well let me read the thing out because I don't want to summarize it too much.
|
|
It's said space minus n e space open quote s slash radio with the capital R slash capital R4d10 slash gp so that's going to do all instances on all lines and it's going to print it after it's done.
|
|
Then the next command is separated by semicolon all in the in the same quotes which is three q close quote and we're using said demo one dot txt so it will find like a public radio and change it there's two instances of radio on that line and we'll do radio freak America and fix that binary revolution radio.
|
|
On the next line and then it will reach line three and we've been told don't print it because we've got minus n and just quit so it doesn't go any further there any other instances of radio it will ignore them because you it's aborted the said session I should have said that the q command in GNU said has been enhanced to allow an exit code on the end.
|
|
So if you were writing something in a bash script for example you could capture the determination code when said quit it and you could take some action on the basis of which which quit was was initiated or something like that never done it myself but you could do if you wanted to.
|
|
So example of exactly the same thing without using q which where the expression I won't read the whole line is open quote one comma three S slash radio by R4 D10 slash GP close quote so that says just do the replacements online one through three why would you use the q command what why would you bother well it came up a while ago in the context of looking for stuff in a box.
|
|
Big file if you have a gigantic log file for example and you want to pick out a given line and do something with it or give a big range of lines etc and that range of lines or the given line matching line occurs fairly early you still got several million more lines to go through you don't really want to leave said to just walk through the rest of the file and not find it if you know there's only one instance of the one range of this particular thing in the file.
|
|
It makes a lot more sense to say when you've done that just quit so that's where where it can be useful said you come on prompt will come back much quicker as a consequence of doing that there will be examples later I'm hoping I haven't written them yet that use in maybe episode four or even five that use these types of techniques and demonstrate.
|
|
Such things so the next command we have in this list of of fairly commonly used said commands is one that deletes the pattern space it's the lower case D at least the pattern space and causes said to start the next cycle by reading the next line if there is one that is so this command can be proceeded by any of the addressing types that we've seen earlier on and the effect is to.
|
|
Amit the lines in question from the output stream effectively deleting into you moving from one file to another effectively so my little example here is one which will emit stroke delete all lines beginning with capital H so I've got said space minus E is no end here space open quote slash.
|
|
So complex capital H slash D close quote space said underscore demo dot txt so all lines begin with H will not be not be shown now why is there no minus N you might ask well what we're doing here is we're just letting said walk through the file and print automatically everything that it sees except when.
|
|
The script sees a line beginning with H it deletes it so it deletes it from the pattern space and auto printing is the process that happens at the end of a cycle when there's no more commands to apply to a line it will print the pattern space well it won't be a pattern space when there's a line starting with an H I'm saying this because I had to think twice about it myself so.
|
|
I put another example which uses negation with an exclamation mark so taking exactly the same command putting up an exclamation mark before the D that will delete all lines that don't begin with an H so you end up with all the all of the lines that match that next one is the P command we've already seen which is to print the pattern space it's equivalent to the P flag that we.
|
|
We saw with the S command but it's a command all by itself lowercase P and we've seen several times now and it can be preceded by any of the addresses that we've seen it's only relevant when using the minus N option to switch off auto printing if you leave that on and you just see the same lines twice because they get auto printed as well as being printed by this.
|
|
So for example to print lines one to five of a file you would do said space minus N E space quote one comma five P quote in the name of the file and that's equivalent to the head command you head minus five will show you the first five lines of a file so said can do the same thing a bit more typing.
|
|
Now it's the last command I want to talk about today and it's a command that's a little bit odd it's classified as a commonly used frequently used command I have to admit that I have not used this in my previous usage of set of use it now since I've learned about it but I couldn't immediately see the use for it but I've
|
|
come up with an example a bit later on it's the end command it's a single N and it can be it can be preceded by addresses if auto printing is enabled that is there's no minus N then it prints the pattern space so it doesn't override
|
|
the other minus N it prints it and then goes and gets the next line from the input stream you say well why what was yeah well that was my thought as well I copied in the
|
|
description from the GNU manual in the hopes of this might make more sense and perhaps I'm making I'm going to read it to you if auto print is not disabled print the pattern space then regardless
|
|
replace the pattern space with the next line of input if there's no more input then said exits without processing any more commands so not completely sure why you'd use that but I got an example coming up in a moment so which I think is valid we're probably going to look at this in more detail in later
|
|
later episodes I should say actually what I'm planning to do is to look through some of the the more hairy examples of said which are available in the GNU
|
|
documentation and try and explain them I will work out how they work first but that's my plan is to explain them to you and demonstrate the use for these commands that way
|
|
okay final point then in this episode before I go through the examples and that is the subject of grouping commands you might want to perform several said commands on a given
|
|
input line or set of lines so you need a means of grouping them well it's done by using open brace or open curly bracket and close curly bracket around the set of commands that you want to group
|
|
together so I've got a little example which does this just just to make the point really and it's a command line example which uses a range of
|
|
addresses the same range we saw before the line that begins with the word what the capital W and up to the line that begins with
|
|
produced on the column one so the thing that follows that range is an open brace open curly bracket followed by an
|
|
S command so the S command is S slash circumflex slash greater than sign back slash T close slash what that means is replace the start of the line
|
|
there's no actual physical characters there remember the circumflex means the start of the line it's the anchor for that so the
|
|
effect is to put stuff on the front of the line and the stuff that I'm putting on the front of the line is the greater than sign followed by
|
|
back slash T which is a tab character so this S command is then followed by semicolon P and then close curly bracket close brace so for each
|
|
line in this range I want to put a greater than sign and a tab and print it and it's just since we've got a minus N flag minus
|
|
N option we are not seeing any other lines and you'll see that what happens is you get the two lines that we saw
|
|
before in the other variants of this example with the stuff propended to it so my final demonstration of grouping
|
|
is as I put in the notes fairly useless is just a demonstration of what can be done it shows the contents of a command file
|
|
and I've included this file in the on the HPR site it's called demo 2 dot said and what it's doing is it contains two groups
|
|
it's quite complex actually to read out it's got two groups and various addressing to a two regular expression addresses
|
|
the purpose of it the first regular expression matches a line that contains the letter A followed by B within five characters and then C within five characters
|
|
and the second regular expression is similar but matches only A and B with up to five characters in between them
|
|
the first group uses four S commands to mark the line with G1 so it was processed by group one and then it uses other S commands to highlight all of the characters A, B and C
|
|
by putting square brackets around them to make them stand out then it's going to print the result
|
|
the second group does the same it uses G2 to signal what's what's there and just highlights A and B
|
|
so rather than just go through this line by line I thought if I try and explain the components in a general way
|
|
let's look at the first group the regular expression is lowercase A dot back slash curly open curly bracket 1,5 back slash close curly bracket
|
|
that's the first bit of it so what that means is a letter A followed by any old arbitrary set of letters
|
|
from 1 to 5, 1 to 5 arbitrary letters then after that comes a B dot back slash open brace 1,5 back slash close brace then a C
|
|
so saying A followed by any arbitrary characters from 1 to 5, a B followed by any arbitrary characters from 1 to 5 followed by a C
|
|
so looking for the sequence A, B and C in a line with a most five characters between them
|
|
so after the regular expression is an open brace open curly bracket which starts the group and the group simply consists of a bunch of S commands
|
|
so the first one is S slash circumflex slash G1 colon space close slash
|
|
so it's saying put G1 on the start of the line then we've got S slash A, lowercase A slash open square brackets A, close square bracket A slash G
|
|
so all instances of A replace by an A in square brackets then the same for B and C, three different S commands
|
|
follow that with P this is being done in the assumption that your running said with a minus N option
|
|
that only these lines are being printed and I then followed that with an N command
|
|
the N command is there to ensure that a given line if it matches a regular expression is not then processed by the other regular expression
|
|
since there are two and they sort of overlap one another so the second ring in the expression and group in this file
|
|
is almost the same except that it's using an A followed by an arbitrary number of characters followed by a B
|
|
and if you can hear the cat in the background she's always about, she's always intruding
|
|
yeah you I'm talking about so anyway the contents of this group is slightly different in that it propends G2 on the front of the line
|
|
and it only deals with A's and B's it's also got a P and an N in it
|
|
so I've got an example of how one would run this and you do it by calling the command said space minus N F
|
|
and then space demo 2 dot said space said underscore demo dot TXT
|
|
so it just produces two lines the first one begins with the G2 so it matched the second group
|
|
and it's got A's and B's bracketed in square brackets the second line begins with G1
|
|
and it's got A's B's and C's bracketed so it's complex but you're fairly useless
|
|
but you know this is all about learning to do bizarre things in order to learn I hope anyway
|
|
so as I say the demo file is available if you want to mess around with it and there are links to it in the notes
|
|
okay let's have a look at the examples I'm gonna zip through some of these fairly quickly because they're fairly simple
|
|
for example one it's a couple of scripts which print all but line one of a file
|
|
the file being the famous said underscore demo one dot TXT and one way to do it is to put in the expression
|
|
one exclamation mark P meaning print everything but line one
|
|
the other way of doing it is to use two comma dollar P which means print from line 2 to the end of the file
|
|
so that's just to really make the point that the two things are complementary
|
|
sort of logical complement is that the right term?
|
|
example 2 is the demonstration of the first tilde step form of addressing the thing we start with a number, a line number
|
|
and then go steps which the second number represents
|
|
what I've done is I have set up a little example to look at line one started line one and then go in steps of five
|
|
and I thought it might be useful if you could actually see the line numbers that it generates
|
|
so what I did was I created a script or an example a pipeline really which uses the NL command NL is for numbering lines
|
|
so I used NL space minus w3 that means I want each number to be three characters wide
|
|
minus ba which is I want you to number all lines including blank lines
|
|
normally it doesn't do this it it it skips blank lines with numbering
|
|
never quite understood that but anyway then the file we're working on is said underscore demo one dot TXT
|
|
five that into said space minus NE space quote one tilde five p close quote
|
|
so what you see is lines one six and eleven of the file and I made the note that if we use one tilde five exclamation mark p
|
|
that gives you the complement of that with the reverse of it or the ligation of it we'd see all lines except one six and eleven
|
|
that helps I certainly would have appreciated that in an example myself so I hope you found useful
|
|
example three is the demonstration of the dollar address last line of input thing and I think I've already demonstrated this elsewhere tonight
|
|
what I've done is to examine two files said demo one and said demo two which is really the same file
|
|
one's just got more texting than the other so what I've asked for is dollar p which is print me the last line
|
|
so running said space minus NE space quote dollar p quote then the names of the two files
|
|
that way they all appears one stream and I see the last line of the last file contribute one show a year
|
|
is the last line if on the other hand I use an S minus S in that list of options so it's minus S NE
|
|
I usually concatenate the safe type in all the hyphen and spaces and then I've got quote dollar p quote the two file names
|
|
again then I see two lines one is detail on a topic and the next one is contribute one show a year
|
|
so the first line is the last line of the first file second line to last line the second file
|
|
that's of course we used the minus S option which sees the files as separate you remember that I'm sure
|
|
example four demonstrates the regular expression form of addressing just uses a simple regular expression
|
|
which is it's looking for the the string long lng so you've got said space minus NE space
|
|
quote slash lng slash p quote and we're using said demo one and we see three lines coming back
|
|
which have the word long in them so it's a bit like using grep on the file a little bit more typing though
|
|
now that example uses the forward slash characters as the most examples we've seen
|
|
remember I mentioned that you can change those to something else if you want to so the second example in this second instance in this example
|
|
so for example I don't know what you call it is said space NE minus NE
|
|
sorry space quote back slash hash lng hash so the hash is the alternative
|
|
delimiter but I had to introduce it by a backslash so that said knew what I was doing so that's pretty simple
|
|
example five shows an address range and I'm using a few other bits and pieces here I'm using the range one comma slash
|
|
hpr I've got lowercase hpr followed by after the closing slash a capital i so remember that means in this regular expression
|
|
don't bother about the case and then I've got a piece so the whole line is said space minus NE space
|
|
quote one comma slash hpr in lowercase slash capital i lowercase p quote and then the file is said demo one
|
|
and I get back two lines hacker public radio is the one has got hpr in it that's the first line of the file I think
|
|
and then there's another instance of hpr that comes back and they're both in capitals we saw that before pretty much
|
|
so the second part of example five uses a range or we're using two regular expressions so what I've got I'll just read it out maybe
|
|
and explain it after said space minus NE space quote slash circumflex what w-h-a-t in lowercase slash
|
|
capital i comma slash circumflex produced all in uppercase slash capital i p
|
|
close quote and it said demo one what I get back is five lines which are the lines between the line beginning with what
|
|
and ending with produced but I've I've told said that I don't care what the case of the two actually is
|
|
now what's actually happened here and it's really hard to see as I realize when I did this is that
|
|
it's found two instances because it doesn't care about case said it's found two instances of a range beginning with what
|
|
and ending with produced or at least beginning with what so I thought well be nice if there was a way of putting
|
|
a line number on the end of everything on the end obviously because you need to you need to do that if you do it before
|
|
you give it to said otherwise it will mess up the regular expression things so I digressed and used a
|
|
bit of another facility another command called ork a wk which has the capability of processing text and
|
|
reformating and do various things I won't explain what it's doing it might be good idea if I think about doing
|
|
an ork series at some stage or somebody else should do one perhaps anyway the ork
|
|
core takes every line of the file and puts a line number on it in brackets and it puts it
|
|
right at the far end after column 75 and then I feed that and this is said demo one as usual I feed
|
|
that to said using exactly the same expression as I did in the last time but now you can see that
|
|
it's returned lines 7 and 8 which are the ones that begin with what and end with produced and then it finds another
|
|
what on line 11 and it carries on through to 13 it never finds produced because there isn't another one
|
|
but it hits the end of the file at that point is 13 lines long if you recall back to episode 1
|
|
so that's why it's doing this I thought it might be useful just for you to be able to visualize
|
|
what is up to example 6 is me having a go at Ken's text in the about page which is in said demo 2.txt
|
|
the grammatical mistakes he gave me a hard time in the last community news about doing this to his text
|
|
but it's it's done with with with the best possible taste of course so I've got a file of commands called example
|
|
underscore 6 dot said SED and it's included with this episode and this is an example of a group
|
|
and the group in question is controlled by a single regular expression it's it's fairly silly it's just there
|
|
demonstration purposes really the group consists of slash circumflex space asterisk dollar slash
|
|
and it's followed by an exclamation mark meaning not that so what I'm looking for is any line which contains zero or more spaces
|
|
and that's all show it's got a start of line in an end of line marker so I'm up I'm happy to cope with any lines
|
|
which contain spaces as well sometimes that happens and you can't see the spaces necessarily
|
|
this you look very closely with an editor or something and I've said do this for all lines which are not blank
|
|
it's a sort of make work type of exercise really because there's no really need for it but it demonstrates it anyway
|
|
and in it I've got a bunch of S commands which are changing the various grammar things that I hammered on about in episode one and so on
|
|
you can do this yourself if you want to mess around with them and do other things with them
|
|
so that you would invoke this with the command said space minus f space example underscore six dot said space
|
|
said underscore demo two dot txt so this should process the entire file there's there's some
|
|
some quotes corrections in there that are not really correct it's just me messing around with the with the text
|
|
to make it suit my taste I suppose example seven is a case of an executable said script
|
|
and we looked at this idea before and what I've done here is to show you the contents of this thing
|
|
it's another file that's that's being included in this episode so the demonstration what it contains
|
|
shows me typing the text in cat greater than example underscore seven dot said which is the name of the file
|
|
so I would then type in a hash an exclamation mark slash bin slash said space minus n f we never saw minus n f last time
|
|
did we I think we just did a minus f for this one applies the do not auto print flag as well
|
|
so when you run this it will never print anything unless the the said commands themselves cause it to print
|
|
think I did that I don't think I did that before we do that so what I've got here is a single regular expression
|
|
address and it consists of circumflax dot back slash open brace open curly bracket 75 comma 80 slash
|
|
close curly bracket dollar slash so interpreting that what it means is any line which has got
|
|
between 75 and 80 characters on it between the start of line and end of line the those two anchors
|
|
so I'm interested in any of those lines then inside the group that this controls
|
|
I have an S command which says S slash dollar slash then five spaces slash what this is saying is
|
|
when you find such a line add five spaces to the end of it then the next S command is S slash circumflax
|
|
back slash open parenthesis dot slash open curly bracket 80 back slash let's say slash back slash
|
|
close curly bracket back slash close parenthesis dot asterisk slash so that is an S expression
|
|
an S command which with a rather nasty regular expression inside it the heart of this regular
|
|
expression is dot a dot followed by one of those curly bracketed numbers so it's saying any
|
|
character any 80 characters and the 80 characters are to be counted from the start of the line which is why
|
|
there's a circumflex the 80 characters are enclosed in these parenthesis which caused it to be a
|
|
regular expression group and the rest of the line I don't care about though it's actually referred
|
|
to with the dot asterisk at the end so the replacement part of this S expression is vertical bar
|
|
back slash one vertical bar slash so the whole expression is saying the whole command is saying
|
|
find me a line well we already found the line and we on this line take the first 80 characters starting
|
|
in the beginning of the line and put vertical bars around them I know there's going to be 80
|
|
characters there because I've deliberately padded each line with five characters so it'll either be 80
|
|
characters long because I selected at the 75 to 80 range it'll either be 80 characters long
|
|
because it started at 75 or it'll be somewhere between 80 and 85 characters long so I can definitely
|
|
chop off 80 characters off the front of it and put vertical bars around it.
|
|
The last command in this group is P so we're printing the result.
|
|
Okay so this is an executable file and I've shown a bit later on having explained what it contains
|
|
CH mod U plus X we saw that before make it executable and it's called example underscore seven dot said
|
|
as I say in the next line is dot slash example seven dot said and we're running it against said
|
|
underscore demo one dot TXT so you get back four lines which have vertical bars at the beginning
|
|
and the end and they're all in a nice neat line vertical line okay so that script is available
|
|
for you to play around with if you wish. It's another one of these sort of useless fairly useless things
|
|
but it's a it's a fun demonstration of what can be done.
|
|
All right with that that's the end of this episode I hope you found it useful.
|
|
Okay bye.
|
|
You've been listening to hecka public radio at hecka public radio dot org.
|
|
We are a community podcast network that releases shows every weekday Monday through Friday.
|
|
Today's show like all our shows was contributed by an HBR listener like yourself.
|
|
If you ever thought of recording a podcast then click on our contributing to find out how easy it really is.
|
|
Hecka public radio was founded by the digital dog pound and the infonomicon computer club
|
|
and it's part of the binary revolution at BingRef.com.
|
|
If you have comments on today's show please email the host directly, leave a comment on the website
|
|
or record a follow-up episode yourself unless otherwise status.
|
|
Today's show is released on the creative comments, attribution, share a light 3.0 license.
|