- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
182 lines
14 KiB
Plaintext
182 lines
14 KiB
Plaintext
Episode: 944
|
|
Title: HPR0944: LITS 002: tr
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0944/hpr0944.mp3
|
|
Transcribed: 2025-10-08 05:19:17
|
|
|
|
---
|
|
|
|
Welcome to Linux in the Shell Episode 002, my name is Dan Washco and I will be your
|
|
host.
|
|
We are going to talk about the Transliterate or Translate command, better known as TR.
|
|
If you have not read the entry on the website, I strongly encourage you to do so either
|
|
before listening to this or afterwards to solidify the command in your mind.
|
|
The TR command, Transliterate or Translate command, is a quick and dirty search and replace
|
|
type command.
|
|
What it does is it takes standard in and replaces or deletes characters in standard
|
|
in and writes it to standard out.
|
|
For instance, you can use the echo command and echo the word apple and we can pipe that
|
|
with a pipe TR.
|
|
So we can say echo apple TR.
|
|
Now TR by itself, just using TR command and sets will take whatever you put characters
|
|
and set one and replace them with characters in set two.
|
|
So TR is going to have set one and set two.
|
|
So for instance, if we wanted to change all the P's to ones in the word apple, we would
|
|
say echo apple pipe TR, set one would be P and set two would be one.
|
|
Now when echo, when apple gets echoed to TR, it's going to replace all the instances
|
|
of P's with the character one.
|
|
So what the output of this is going to be is going to be A11LE.
|
|
Simple enough, TR, two sets.
|
|
Set one is what you want to have replaced and set two is what you want to replace it
|
|
with respectively.
|
|
So if you wanted to replace all the numbers, one, two, three, four, five with the letters
|
|
A, B, C, D, you would be TR, space, set one would be one, two, three, four, five and
|
|
set two would be A, B, C, D, E.
|
|
So then any one would be replaced by A, two would be replaced by B, three would be replaced
|
|
by C, four would be replaced by D, five would be replaced by E.
|
|
You might be saying yourself, what if I had the set of one, two, three, four, five, but
|
|
I didn't specify A, B, C, D, E and set two instead I specified a smaller set like A,
|
|
B, C. Well, that depends on the version of TR that you're using.
|
|
By version I mean, are you using the B, S, D, E or using system five?
|
|
The canoe version of TR operates like the B, S, D version.
|
|
And what it does is it will pan set two, the last character and set two out to the length
|
|
of the characters and set one.
|
|
So in the case where set one is one, two, three, four, five and set two is A, B, C, what
|
|
it will do is it will repeat the last character to match each extended character and set
|
|
one.
|
|
So one would be A, two would be B, three would be C, four would be C, five would be C.
|
|
So that's what it does.
|
|
It repeats that last character.
|
|
The system five version instead, what that does is it will truncate set one to match
|
|
the length of set two.
|
|
So one would be A, two would be B, three would be C, four, five and six would be ignored.
|
|
So be aware of that.
|
|
The canoe version of TR and also the B, S, D version of TR has a dash T option or flag,
|
|
which is truncate set one.
|
|
That would then emulate the way the system five TR handles it and that it would truncate
|
|
set one to the length of set two.
|
|
So in this case, if you use a dash T option set one would match one to A, two to B, three
|
|
to C, four, five and six would be ignored or truncated out of there.
|
|
So be aware of that.
|
|
If you're not using BSD or Linux or one of their derivatives and you're using a TR command,
|
|
your results might not be what you expected.
|
|
Keep that in the back of your mind, but chances are you'll never run into it.
|
|
It length does matter.
|
|
And one thing that you can do, if you're not sure the length of set one or if you just
|
|
wanted to be sure that you're going to repeat characters, you can use a, the asterisk by
|
|
putting whatever character you want repeated in between open bracket, that character asterisk,
|
|
close bracket, that will repeat that character to pan out the total number of characters
|
|
in equal and set one.
|
|
Now if you had like the numbers one, two, three, four, five, and again, we're using the
|
|
letters A, B and C, what it operates like standard is if you were to take A, B, open bracket,
|
|
C, asterisk, close bracket, that is like a symbolic representation of how it expands
|
|
out the last character to equal set one.
|
|
The way that could be done is let's say you wanted to have five and six B, B and C
|
|
and every other character in just B, A. So you would do open bracket, A, asterisk, close
|
|
bracket, B, C. So then one would be A, two would be B, I'm sorry, one would be A, two
|
|
would be A, three would be A, four would be A, five would be B, six would be C. So it's
|
|
smart enough to figure that out by using those, those wild cards.
|
|
Now another thing that you can do is specify ranges.
|
|
So instead of having to do set one to be one, two, three, four, five, six, and set two
|
|
B, A, B, C, D, E, you could specify one dash six, which would specify the range one, two,
|
|
three, four, five, six, and then the range A to F. And that would specify A, B, C, D,
|
|
E, F. So then there would be a one-to-one match in there.
|
|
So you don't have to specify individual characters you can specify by ranges, which brings
|
|
up the question, what if you wanted to substitute the dash itself?
|
|
Well, if you specify the dash first, it treats character set as like a flag and will probably
|
|
throw an error. If you specify the dash inside of your character list, it's going to treat
|
|
it as a range. Therefore, if you do want to replace the dash, you need to put it at the
|
|
end of the character set, put it at the, that's the last character in the character set.
|
|
That way it won't be treated as an option and it won't be treated as a range. Put it
|
|
at the end there. As opposed to specifying individual characters or a range of characters
|
|
in either set, you can specify character classes. And those define a specific group of characters
|
|
like all numbers, all letters, all numbers of letters, punctuation, printer, control,
|
|
characters, a different range of things. To get the full list, I encourage you to seek
|
|
out the man page or the info page lists all those in there. But the way that the character
|
|
list syntax is defined is character set is defined by open bracket colon, the name of
|
|
the set colon closed bracket. So the name of the set is between a bracket and a colon.
|
|
So for instance, all alphabet characters, it's open bracket colon, alpha colon, closed
|
|
bracket. They're handy ways to define a set of characters instead of having to specify
|
|
a range or all the characters in the set. So for instance, the alpha or all alphabet
|
|
characters is pretty much the same as saying the range of a to z in the English language.
|
|
Digits, the digit class character class is the same as saying zero to nine range. Lower
|
|
case letters in the English language would be the same as a to z in just lower case. Upper
|
|
would be the same as a uppercase a dash uppercase z would define all uppercase letters.
|
|
So in a lot of times, it might be just a lot easier to use the character classes and
|
|
more efficient to use those character classes. So as an example, is if I had a file and
|
|
I wanted to convert all the uppercase letters to lower case letters, I would specify
|
|
TR set one would be the upper class and set to would be the lower class and redirect
|
|
standard in as that file and the output would be converting all those uppercase letters
|
|
that it finds in there to lower case equivalents. TR command takes a couple of different flags
|
|
which alter its behavior. One of those flags is the dash d or delete option. So instead
|
|
of replacing one set of characters with another, it will take set one and delete all those
|
|
characters that occur in set one. The delete option does not take a second set of characters.
|
|
If you try and put a second set of characters in there, it will throw errors at you. So
|
|
for instance, let's say we had a file and we wanted to get rid of all the numbers in
|
|
the file, we could do TR dash d and the character class digit and redirect that file into
|
|
TR and the output would be that text in that file, but with all the numbers removed. So
|
|
that's the d command. It's going to instead of replace is going to remove characters.
|
|
There's a dash s command which is for squeeze or squeeze repeats which it replaces multiple
|
|
instances of that character in the set with the single currents of that character. If
|
|
we had the word apple, APPLE, and we piped it to TR and did the dash s and in character set
|
|
one, we said P, it would strip out double, you know, any repeated occurrences of the letter
|
|
P with a single currents of that letter. So apple APPLE would then become APLE. So that's
|
|
a way to get rid of repeated characters in a file. The last flag that I want to talk about
|
|
is the dash c or the complement. Now, if you remember a mathematics growing up, the complement
|
|
set is the set that contains characters that aren't in the original set. So for instance,
|
|
the set of all letters, one complement set to the set of all letters would be digits.
|
|
So the way TR makes use of the complement set is that it will replace or translate all the
|
|
characters that are not in that current set. So if we were to pipe a file through to the TR
|
|
command and do dash c, digit as set one and then as set two, the letter f. So we have TR dash c,
|
|
digit class f. What that's going to do is going to replace every non-digit character in that file
|
|
pushed to TR because we're doing the complement, every non-digit character and replace it with the
|
|
letter f. Example, again, we'll talk in, let's say using Apple, we're passing Apple to it,
|
|
and we were to use that command to say Apple 1, 2, 3, 4, 5 and use TR dash c, digit class f.
|
|
What you would have from Apple 1, 2, 3, 4, 5 is then five f's, f, f, f, f, f, f, 1, 2, 3, 4, 5,
|
|
because it's replaced every instance of every instance of the complement of the digit class.
|
|
That means every non-digit, 0 through 9 number in that stream with the letter f.
|
|
Now you can combine some of these switches together. One example that I've seen
|
|
is combining the dash c and s or the complement and squeeze to get a list of
|
|
words in a file. Well, it's actually going to give you a list of words and letters in a file.
|
|
So that would be TR dash c, s. Now when squeeze is used in conjunction with another character,
|
|
and squeeze only usually takes one set, but when squeeze is used with dash c and you have two sets,
|
|
it works on the second set. So the complement is going to be performed first and squeeze is going
|
|
to be performed second. For instance, to get the total number of words or letters in a file,
|
|
the first set would be alpha class, and the second set would be the new line character.
|
|
And the way that you specify it is you'd have to make sure that you, because set 1 is the list
|
|
of all alpha-battle characters, set 2 because it's just one character has to pan out to be
|
|
repeated self to be the size of set 1. So you would have to use the asterisk in this case. So
|
|
specify the new line character would be open bracket, new line character would be backslash,
|
|
and asterisk closed bracket. What that would do then with the dash c, s option here is it would take
|
|
the complement of the alpha class, which would be anything that's not a letter a through z in the
|
|
English language, replace it with a new line character, and then the squeeze command is used to squeeze
|
|
out any duplicate new line characters. Now, why would you need to do this? Well, let's say for
|
|
instance there was a sentence in there and say, I am 13 period. How old are you? Well, between,
|
|
well each one of those spaces between the words would be converted to a new line character.
|
|
But when you got to the number 13 period, it would convert the one, the three, and the periods
|
|
in new line character two. So in a sentence, I am 13. How old are you? After I am
|
|
am the letter am, you would have one, two, three, four, five new line characters. Now if you use
|
|
two spaces after a period, it would be six new line characters. So with the squeeze command
|
|
would do instead of having a bunch of new lines after those words, it would squeeze out any
|
|
repeats of the new line character. So there would only be one. And then you would have your list
|
|
of words in that file. Pretty handy. That is the TR command in a nutshell, quick and dirty way
|
|
to replace characters, delete characters, get rid of repeat of characters in a stream. You can use
|
|
the TR command with redirection, either pipe the output of a command, two TR, or use the standard
|
|
redirection of standard in to be a file using the less than command. That's the one where it's
|
|
pointing to the left, opening to the right. So that would be less than command to redirect
|
|
a file into TR. And remember, also about redirecting output. Because if you're using the TR
|
|
command on a file and you want to save the output, you would have to redirect it using the greater
|
|
than option afterwards, to redirect output of TR to a new file so that you can save it.
|
|
Thank you very much for listening. I thank Hacker Public Radio for supporting this show.
|
|
I encourage you to go over to the website if you haven't done so,
|
|
to read the full write up on the TR command and also watch the video of some of the examples
|
|
that I talk about in action. We'll see you in two weeks. Have a great one.
|
|
You have been listening to Hacker Public Radio at Hacker Public Radio. We are a community podcast
|
|
network that releases shows every weekday on their free Friday. Today's show, like all our shows,
|
|
was contributed by an HPR listener like yourself. If you ever consider recording a podcast,
|
|
then visit our website to find out how easy it really is. Hacker Public Radio was founded by the
|
|
Digital Dark Pound and the International Computer Club. HPR is funded by the binary revolution
|
|
at binref.com. All binref projects are crowd-responsive by linear pages.
|
|
From shared hosting to custom private clouds, go to lunarpages.com for all your hosting needs.
|
|
Unless otherwise stasis, today's show is released under a creative commons, attribution,
|
|
share a like, read those own license.
|