Files
hpr-knowledge-base/hpr_transcripts/hpr0944.txt
Lee Hanken 7c8efd2228 Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use
- Search episodes, transcripts, hosts, and series
- 4,511 episodes with metadata and transcripts
- Data loader with in-memory JSON storage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00

182 lines
14 KiB
Plaintext

Episode: 944
Title: HPR0944: LITS 002: tr
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0944/hpr0944.mp3
Transcribed: 2025-10-08 05:19:17
---
Welcome to Linux in the Shell Episode 002, my name is Dan Washco and I will be your
host.
We are going to talk about the Transliterate or Translate command, better known as TR.
If you have not read the entry on the website, I strongly encourage you to do so either
before listening to this or afterwards to solidify the command in your mind.
The TR command, Transliterate or Translate command, is a quick and dirty search and replace
type command.
What it does is it takes standard in and replaces or deletes characters in standard
in and writes it to standard out.
For instance, you can use the echo command and echo the word apple and we can pipe that
with a pipe TR.
So we can say echo apple TR.
Now TR by itself, just using TR command and sets will take whatever you put characters
and set one and replace them with characters in set two.
So TR is going to have set one and set two.
So for instance, if we wanted to change all the P's to ones in the word apple, we would
say echo apple pipe TR, set one would be P and set two would be one.
Now when echo, when apple gets echoed to TR, it's going to replace all the instances
of P's with the character one.
So what the output of this is going to be is going to be A11LE.
Simple enough, TR, two sets.
Set one is what you want to have replaced and set two is what you want to replace it
with respectively.
So if you wanted to replace all the numbers, one, two, three, four, five with the letters
A, B, C, D, you would be TR, space, set one would be one, two, three, four, five and
set two would be A, B, C, D, E.
So then any one would be replaced by A, two would be replaced by B, three would be replaced
by C, four would be replaced by D, five would be replaced by E.
You might be saying yourself, what if I had the set of one, two, three, four, five, but
I didn't specify A, B, C, D, E and set two instead I specified a smaller set like A,
B, C. Well, that depends on the version of TR that you're using.
By version I mean, are you using the B, S, D, E or using system five?
The canoe version of TR operates like the B, S, D version.
And what it does is it will pan set two, the last character and set two out to the length
of the characters and set one.
So in the case where set one is one, two, three, four, five and set two is A, B, C, what
it will do is it will repeat the last character to match each extended character and set
one.
So one would be A, two would be B, three would be C, four would be C, five would be C.
So that's what it does.
It repeats that last character.
The system five version instead, what that does is it will truncate set one to match
the length of set two.
So one would be A, two would be B, three would be C, four, five and six would be ignored.
So be aware of that.
The canoe version of TR and also the B, S, D version of TR has a dash T option or flag,
which is truncate set one.
That would then emulate the way the system five TR handles it and that it would truncate
set one to the length of set two.
So in this case, if you use a dash T option set one would match one to A, two to B, three
to C, four, five and six would be ignored or truncated out of there.
So be aware of that.
If you're not using BSD or Linux or one of their derivatives and you're using a TR command,
your results might not be what you expected.
Keep that in the back of your mind, but chances are you'll never run into it.
It length does matter.
And one thing that you can do, if you're not sure the length of set one or if you just
wanted to be sure that you're going to repeat characters, you can use a, the asterisk by
putting whatever character you want repeated in between open bracket, that character asterisk,
close bracket, that will repeat that character to pan out the total number of characters
in equal and set one.
Now if you had like the numbers one, two, three, four, five, and again, we're using the
letters A, B and C, what it operates like standard is if you were to take A, B, open bracket,
C, asterisk, close bracket, that is like a symbolic representation of how it expands
out the last character to equal set one.
The way that could be done is let's say you wanted to have five and six B, B and C
and every other character in just B, A. So you would do open bracket, A, asterisk, close
bracket, B, C. So then one would be A, two would be B, I'm sorry, one would be A, two
would be A, three would be A, four would be A, five would be B, six would be C. So it's
smart enough to figure that out by using those, those wild cards.
Now another thing that you can do is specify ranges.
So instead of having to do set one to be one, two, three, four, five, six, and set two
B, A, B, C, D, E, you could specify one dash six, which would specify the range one, two,
three, four, five, six, and then the range A to F. And that would specify A, B, C, D,
E, F. So then there would be a one-to-one match in there.
So you don't have to specify individual characters you can specify by ranges, which brings
up the question, what if you wanted to substitute the dash itself?
Well, if you specify the dash first, it treats character set as like a flag and will probably
throw an error. If you specify the dash inside of your character list, it's going to treat
it as a range. Therefore, if you do want to replace the dash, you need to put it at the
end of the character set, put it at the, that's the last character in the character set.
That way it won't be treated as an option and it won't be treated as a range. Put it
at the end there. As opposed to specifying individual characters or a range of characters
in either set, you can specify character classes. And those define a specific group of characters
like all numbers, all letters, all numbers of letters, punctuation, printer, control,
characters, a different range of things. To get the full list, I encourage you to seek
out the man page or the info page lists all those in there. But the way that the character
list syntax is defined is character set is defined by open bracket colon, the name of
the set colon closed bracket. So the name of the set is between a bracket and a colon.
So for instance, all alphabet characters, it's open bracket colon, alpha colon, closed
bracket. They're handy ways to define a set of characters instead of having to specify
a range or all the characters in the set. So for instance, the alpha or all alphabet
characters is pretty much the same as saying the range of a to z in the English language.
Digits, the digit class character class is the same as saying zero to nine range. Lower
case letters in the English language would be the same as a to z in just lower case. Upper
would be the same as a uppercase a dash uppercase z would define all uppercase letters.
So in a lot of times, it might be just a lot easier to use the character classes and
more efficient to use those character classes. So as an example, is if I had a file and
I wanted to convert all the uppercase letters to lower case letters, I would specify
TR set one would be the upper class and set to would be the lower class and redirect
standard in as that file and the output would be converting all those uppercase letters
that it finds in there to lower case equivalents. TR command takes a couple of different flags
which alter its behavior. One of those flags is the dash d or delete option. So instead
of replacing one set of characters with another, it will take set one and delete all those
characters that occur in set one. The delete option does not take a second set of characters.
If you try and put a second set of characters in there, it will throw errors at you. So
for instance, let's say we had a file and we wanted to get rid of all the numbers in
the file, we could do TR dash d and the character class digit and redirect that file into
TR and the output would be that text in that file, but with all the numbers removed. So
that's the d command. It's going to instead of replace is going to remove characters.
There's a dash s command which is for squeeze or squeeze repeats which it replaces multiple
instances of that character in the set with the single currents of that character. If
we had the word apple, APPLE, and we piped it to TR and did the dash s and in character set
one, we said P, it would strip out double, you know, any repeated occurrences of the letter
P with a single currents of that letter. So apple APPLE would then become APLE. So that's
a way to get rid of repeated characters in a file. The last flag that I want to talk about
is the dash c or the complement. Now, if you remember a mathematics growing up, the complement
set is the set that contains characters that aren't in the original set. So for instance,
the set of all letters, one complement set to the set of all letters would be digits.
So the way TR makes use of the complement set is that it will replace or translate all the
characters that are not in that current set. So if we were to pipe a file through to the TR
command and do dash c, digit as set one and then as set two, the letter f. So we have TR dash c,
digit class f. What that's going to do is going to replace every non-digit character in that file
pushed to TR because we're doing the complement, every non-digit character and replace it with the
letter f. Example, again, we'll talk in, let's say using Apple, we're passing Apple to it,
and we were to use that command to say Apple 1, 2, 3, 4, 5 and use TR dash c, digit class f.
What you would have from Apple 1, 2, 3, 4, 5 is then five f's, f, f, f, f, f, f, 1, 2, 3, 4, 5,
because it's replaced every instance of every instance of the complement of the digit class.
That means every non-digit, 0 through 9 number in that stream with the letter f.
Now you can combine some of these switches together. One example that I've seen
is combining the dash c and s or the complement and squeeze to get a list of
words in a file. Well, it's actually going to give you a list of words and letters in a file.
So that would be TR dash c, s. Now when squeeze is used in conjunction with another character,
and squeeze only usually takes one set, but when squeeze is used with dash c and you have two sets,
it works on the second set. So the complement is going to be performed first and squeeze is going
to be performed second. For instance, to get the total number of words or letters in a file,
the first set would be alpha class, and the second set would be the new line character.
And the way that you specify it is you'd have to make sure that you, because set 1 is the list
of all alpha-battle characters, set 2 because it's just one character has to pan out to be
repeated self to be the size of set 1. So you would have to use the asterisk in this case. So
specify the new line character would be open bracket, new line character would be backslash,
and asterisk closed bracket. What that would do then with the dash c, s option here is it would take
the complement of the alpha class, which would be anything that's not a letter a through z in the
English language, replace it with a new line character, and then the squeeze command is used to squeeze
out any duplicate new line characters. Now, why would you need to do this? Well, let's say for
instance there was a sentence in there and say, I am 13 period. How old are you? Well, between,
well each one of those spaces between the words would be converted to a new line character.
But when you got to the number 13 period, it would convert the one, the three, and the periods
in new line character two. So in a sentence, I am 13. How old are you? After I am
am the letter am, you would have one, two, three, four, five new line characters. Now if you use
two spaces after a period, it would be six new line characters. So with the squeeze command
would do instead of having a bunch of new lines after those words, it would squeeze out any
repeats of the new line character. So there would only be one. And then you would have your list
of words in that file. Pretty handy. That is the TR command in a nutshell, quick and dirty way
to replace characters, delete characters, get rid of repeat of characters in a stream. You can use
the TR command with redirection, either pipe the output of a command, two TR, or use the standard
redirection of standard in to be a file using the less than command. That's the one where it's
pointing to the left, opening to the right. So that would be less than command to redirect
a file into TR. And remember, also about redirecting output. Because if you're using the TR
command on a file and you want to save the output, you would have to redirect it using the greater
than option afterwards, to redirect output of TR to a new file so that you can save it.
Thank you very much for listening. I thank Hacker Public Radio for supporting this show.
I encourage you to go over to the website if you haven't done so,
to read the full write up on the TR command and also watch the video of some of the examples
that I talk about in action. We'll see you in two weeks. Have a great one.
You have been listening to Hacker Public Radio at Hacker Public Radio. We are a community podcast
network that releases shows every weekday on their free Friday. Today's show, like all our shows,
was contributed by an HPR listener like yourself. If you ever consider recording a podcast,
then visit our website to find out how easy it really is. Hacker Public Radio was founded by the
Digital Dark Pound and the International Computer Club. HPR is funded by the binary revolution
at binref.com. All binref projects are crowd-responsive by linear pages.
From shared hosting to custom private clouds, go to lunarpages.com for all your hosting needs.
Unless otherwise stasis, today's show is released under a creative commons, attribution,
share a like, read those own license.