Episode: 944 Title: HPR0944: LITS 002: tr Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0944/hpr0944.mp3 Transcribed: 2025-10-08 05:19:17 --- Welcome to Linux in the Shell Episode 002, my name is Dan Washco and I will be your host. We are going to talk about the Transliterate or Translate command, better known as TR. If you have not read the entry on the website, I strongly encourage you to do so either before listening to this or afterwards to solidify the command in your mind. The TR command, Transliterate or Translate command, is a quick and dirty search and replace type command. What it does is it takes standard in and replaces or deletes characters in standard in and writes it to standard out. For instance, you can use the echo command and echo the word apple and we can pipe that with a pipe TR. So we can say echo apple TR. Now TR by itself, just using TR command and sets will take whatever you put characters and set one and replace them with characters in set two. So TR is going to have set one and set two. So for instance, if we wanted to change all the P's to ones in the word apple, we would say echo apple pipe TR, set one would be P and set two would be one. Now when echo, when apple gets echoed to TR, it's going to replace all the instances of P's with the character one. So what the output of this is going to be is going to be A11LE. Simple enough, TR, two sets. Set one is what you want to have replaced and set two is what you want to replace it with respectively. So if you wanted to replace all the numbers, one, two, three, four, five with the letters A, B, C, D, you would be TR, space, set one would be one, two, three, four, five and set two would be A, B, C, D, E. So then any one would be replaced by A, two would be replaced by B, three would be replaced by C, four would be replaced by D, five would be replaced by E. You might be saying yourself, what if I had the set of one, two, three, four, five, but I didn't specify A, B, C, D, E and set two instead I specified a smaller set like A, B, C. Well, that depends on the version of TR that you're using. By version I mean, are you using the B, S, D, E or using system five? The canoe version of TR operates like the B, S, D version. And what it does is it will pan set two, the last character and set two out to the length of the characters and set one. So in the case where set one is one, two, three, four, five and set two is A, B, C, what it will do is it will repeat the last character to match each extended character and set one. So one would be A, two would be B, three would be C, four would be C, five would be C. So that's what it does. It repeats that last character. The system five version instead, what that does is it will truncate set one to match the length of set two. So one would be A, two would be B, three would be C, four, five and six would be ignored. So be aware of that. The canoe version of TR and also the B, S, D version of TR has a dash T option or flag, which is truncate set one. That would then emulate the way the system five TR handles it and that it would truncate set one to the length of set two. So in this case, if you use a dash T option set one would match one to A, two to B, three to C, four, five and six would be ignored or truncated out of there. So be aware of that. If you're not using BSD or Linux or one of their derivatives and you're using a TR command, your results might not be what you expected. Keep that in the back of your mind, but chances are you'll never run into it. It length does matter. And one thing that you can do, if you're not sure the length of set one or if you just wanted to be sure that you're going to repeat characters, you can use a, the asterisk by putting whatever character you want repeated in between open bracket, that character asterisk, close bracket, that will repeat that character to pan out the total number of characters in equal and set one. Now if you had like the numbers one, two, three, four, five, and again, we're using the letters A, B and C, what it operates like standard is if you were to take A, B, open bracket, C, asterisk, close bracket, that is like a symbolic representation of how it expands out the last character to equal set one. The way that could be done is let's say you wanted to have five and six B, B and C and every other character in just B, A. So you would do open bracket, A, asterisk, close bracket, B, C. So then one would be A, two would be B, I'm sorry, one would be A, two would be A, three would be A, four would be A, five would be B, six would be C. So it's smart enough to figure that out by using those, those wild cards. Now another thing that you can do is specify ranges. So instead of having to do set one to be one, two, three, four, five, six, and set two B, A, B, C, D, E, you could specify one dash six, which would specify the range one, two, three, four, five, six, and then the range A to F. And that would specify A, B, C, D, E, F. So then there would be a one-to-one match in there. So you don't have to specify individual characters you can specify by ranges, which brings up the question, what if you wanted to substitute the dash itself? Well, if you specify the dash first, it treats character set as like a flag and will probably throw an error. If you specify the dash inside of your character list, it's going to treat it as a range. Therefore, if you do want to replace the dash, you need to put it at the end of the character set, put it at the, that's the last character in the character set. That way it won't be treated as an option and it won't be treated as a range. Put it at the end there. As opposed to specifying individual characters or a range of characters in either set, you can specify character classes. And those define a specific group of characters like all numbers, all letters, all numbers of letters, punctuation, printer, control, characters, a different range of things. To get the full list, I encourage you to seek out the man page or the info page lists all those in there. But the way that the character list syntax is defined is character set is defined by open bracket colon, the name of the set colon closed bracket. So the name of the set is between a bracket and a colon. So for instance, all alphabet characters, it's open bracket colon, alpha colon, closed bracket. They're handy ways to define a set of characters instead of having to specify a range or all the characters in the set. So for instance, the alpha or all alphabet characters is pretty much the same as saying the range of a to z in the English language. Digits, the digit class character class is the same as saying zero to nine range. Lower case letters in the English language would be the same as a to z in just lower case. Upper would be the same as a uppercase a dash uppercase z would define all uppercase letters. So in a lot of times, it might be just a lot easier to use the character classes and more efficient to use those character classes. So as an example, is if I had a file and I wanted to convert all the uppercase letters to lower case letters, I would specify TR set one would be the upper class and set to would be the lower class and redirect standard in as that file and the output would be converting all those uppercase letters that it finds in there to lower case equivalents. TR command takes a couple of different flags which alter its behavior. One of those flags is the dash d or delete option. So instead of replacing one set of characters with another, it will take set one and delete all those characters that occur in set one. The delete option does not take a second set of characters. If you try and put a second set of characters in there, it will throw errors at you. So for instance, let's say we had a file and we wanted to get rid of all the numbers in the file, we could do TR dash d and the character class digit and redirect that file into TR and the output would be that text in that file, but with all the numbers removed. So that's the d command. It's going to instead of replace is going to remove characters. There's a dash s command which is for squeeze or squeeze repeats which it replaces multiple instances of that character in the set with the single currents of that character. If we had the word apple, APPLE, and we piped it to TR and did the dash s and in character set one, we said P, it would strip out double, you know, any repeated occurrences of the letter P with a single currents of that letter. So apple APPLE would then become APLE. So that's a way to get rid of repeated characters in a file. The last flag that I want to talk about is the dash c or the complement. Now, if you remember a mathematics growing up, the complement set is the set that contains characters that aren't in the original set. So for instance, the set of all letters, one complement set to the set of all letters would be digits. So the way TR makes use of the complement set is that it will replace or translate all the characters that are not in that current set. So if we were to pipe a file through to the TR command and do dash c, digit as set one and then as set two, the letter f. So we have TR dash c, digit class f. What that's going to do is going to replace every non-digit character in that file pushed to TR because we're doing the complement, every non-digit character and replace it with the letter f. Example, again, we'll talk in, let's say using Apple, we're passing Apple to it, and we were to use that command to say Apple 1, 2, 3, 4, 5 and use TR dash c, digit class f. What you would have from Apple 1, 2, 3, 4, 5 is then five f's, f, f, f, f, f, f, 1, 2, 3, 4, 5, because it's replaced every instance of every instance of the complement of the digit class. That means every non-digit, 0 through 9 number in that stream with the letter f. Now you can combine some of these switches together. One example that I've seen is combining the dash c and s or the complement and squeeze to get a list of words in a file. Well, it's actually going to give you a list of words and letters in a file. So that would be TR dash c, s. Now when squeeze is used in conjunction with another character, and squeeze only usually takes one set, but when squeeze is used with dash c and you have two sets, it works on the second set. So the complement is going to be performed first and squeeze is going to be performed second. For instance, to get the total number of words or letters in a file, the first set would be alpha class, and the second set would be the new line character. And the way that you specify it is you'd have to make sure that you, because set 1 is the list of all alpha-battle characters, set 2 because it's just one character has to pan out to be repeated self to be the size of set 1. So you would have to use the asterisk in this case. So specify the new line character would be open bracket, new line character would be backslash, and asterisk closed bracket. What that would do then with the dash c, s option here is it would take the complement of the alpha class, which would be anything that's not a letter a through z in the English language, replace it with a new line character, and then the squeeze command is used to squeeze out any duplicate new line characters. Now, why would you need to do this? Well, let's say for instance there was a sentence in there and say, I am 13 period. How old are you? Well, between, well each one of those spaces between the words would be converted to a new line character. But when you got to the number 13 period, it would convert the one, the three, and the periods in new line character two. So in a sentence, I am 13. How old are you? After I am am the letter am, you would have one, two, three, four, five new line characters. Now if you use two spaces after a period, it would be six new line characters. So with the squeeze command would do instead of having a bunch of new lines after those words, it would squeeze out any repeats of the new line character. So there would only be one. And then you would have your list of words in that file. Pretty handy. That is the TR command in a nutshell, quick and dirty way to replace characters, delete characters, get rid of repeat of characters in a stream. You can use the TR command with redirection, either pipe the output of a command, two TR, or use the standard redirection of standard in to be a file using the less than command. That's the one where it's pointing to the left, opening to the right. So that would be less than command to redirect a file into TR. And remember, also about redirecting output. Because if you're using the TR command on a file and you want to save the output, you would have to redirect it using the greater than option afterwards, to redirect output of TR to a new file so that you can save it. Thank you very much for listening. I thank Hacker Public Radio for supporting this show. I encourage you to go over to the website if you haven't done so, to read the full write up on the TR command and also watch the video of some of the examples that I talk about in action. We'll see you in two weeks. Have a great one. You have been listening to Hacker Public Radio at Hacker Public Radio. We are a community podcast network that releases shows every weekday on their free Friday. Today's show, like all our shows, was contributed by an HPR listener like yourself. If you ever consider recording a podcast, then visit our website to find out how easy it really is. Hacker Public Radio was founded by the Digital Dark Pound and the International Computer Club. HPR is funded by the binary revolution at binref.com. All binref projects are crowd-responsive by linear pages. From shared hosting to custom private clouds, go to lunarpages.com for all your hosting needs. Unless otherwise stasis, today's show is released under a creative commons, attribution, share a like, read those own license.