Files
hpr-knowledge-base/hpr_transcripts/hpr1172.txt

128 lines
11 KiB
Plaintext
Raw Normal View History

Episode: 1172
Title: HPR1172: LiTS 022: Sort
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1172/hpr1172.mp3
Transcribed: 2025-10-17 20:56:41
---
Welcome to Linux in the Shell episode 22. My name is Dan Moshko, I'll be your host
today and I would like to thank Hacker Public Radio for hosting the website and these
audio files. Please consider contributing the Hacker Public Radio by doing your own
episode or at the very least listening to the fantastic shows that are offered every
weekday. Episode 22 of Linux in the Shell is going to talk about the sort command. If you
have not gone over to the website, Linux in the Shell.org to read up on this entry, I suggest
you do so to get a full understanding of the sort command either before or soon after you
listen to the audio component. Sort command is an extremely handy utility and it does basically
what it says. It sorts, it sorts input from standard in, from a file that you provide and it
sorts it in many different ways and I have used sort a lot of times in the past in many different
projects that I've had to do. One that comes to mind is when I used to manage a lot of user
accounts or user names, I get lists of data that I have to put into a format that I
can easily put into a database or whatever and I get lists in spreadsheets and pulling this
information out and sorting it and using it in conjunction with like unique and cut. Really save
time on manipulating that data into a format that I needed. Sort by itself does just a standard
alpha numeric sorting of whatever data you throw at it line by line. So if you had a list of
like a shopping list for instance and you ran sort on that sort shopping list, what that would
do is essentially put it in an alpha numeric order and the way that it orders stuff by default
is that symbols have the highest priority in the hierarchy. So anything that starts with like a
symbol like a pound sign a plus minus any any kind of symbol takes precedence followed by
numbers and then letters with capital letters taking precedence that is uppercase letters
taking precedence over lower case letters. So if you had a list that was like a hash,
fish, two pounds, sinker, hooks and bobbers and you sorted that list the first thing would be the hash,
fish, two pounds, sinker then it would change the order of hook that would come after bobbur. So it
would be hash, fish, two pounds, sinker, bobbur, hook like that. So it was a simplest like that.
And based on that you can then apply different options to sort and oh just so you know a space
is considered the highest priority symbol. So if you had a space at the beginning of any of those
lines that would come before any of the symbols. So space then symbols, then numbers,
then letters starting with uppercase, then lower case. And it does a comparison logically
like it looks at the first character in that line and orders by that. And if there's two lines
that have the same character then of course it proceeds to the second character and so on. So that's
that's how sorting works. Now you can ignore leading blank lines blank spaces with the dash B
or dash dash ignore dash leading dash blanks. And so that that treats any line that starts with a
blank or a space. It ignores those blanks and goes to the first non space character like that.
And you can also ignore case by the dash F or dash dash ignore dash case option. And then
it just does an alphanumeric sort as you would expect. But it doesn't look at upper or lower case.
It doesn't differentiate. It treats everything essentially as if it were lower case and and sorts
on that option. Now sort has a couple other functions to it that not only look at the characters
and ordering them in alphanumeric format, but it also has stuff like you could do if you had a
list of dates you can do a month sort or which is a dash capital M or dash dash month dash sort.
And it looks at a list of months whether they're full names like April, December, October or
they're the abbreviated name of the month like JUN for June or MAR for March or AUG for August.
And it would sort on those put them in the proper order and it does that ignoring case.
And it also ignores whether it's abbreviated or not. So if you had a mixture of
full name dates and abbreviations, it would sort them as you would expect in the proper alphabet
or proper order for the date. Proper monthly order where January would be first followed by
February, March, April and so on through December. So that's pretty handy. Now there's a
couple of other stuff more specific to like numeric values which there's a general numeric sort,
dash G, which is dash dash general, dash numeric, dash sort. And that is what you would expect that
it sorts on a standard numeric sort. And what takes precedence in those are any non numeric
character is considered the same and treated as a regular sort. So if you had a list of numbers
in their integers, it would sort them as you would expect from 0 to or negative numbers to
up through positive numbers. So if you had like negative, if you had like 0 5.88 plus 12 negative
32 15, it would sort those as you would expect to be negative 32 0 5.88 plus 12 15. So it looks at
that numbers in their symbols. But if there were any letters in there like you had fish and corn,
it would sort those first before the numeric list alphabetically as you would expect and then
it would do the numeric sort. Now that's a general numeric sort whereas a numeric sort, a dash N or
dash dash numeric dash sort, that produces a list by a little different rules that it looks at
symbols first and treats all non number characters as the same like it does in a regular numeric sort.
But it gives preference to like alpha numeric values. So
what it does is if there's a symbol character in there, it kind of
puts those symbols characters first. So any previous example that I used like dash where it was
the negative 32 and the plus 12, those would come first. So it treats the symbols first, negative
32 plus 12, then it gets 0 and then any alpha numeric characters would be all treated like 0.
So you'd have any alpha characters in there and then numeric characters followed logically
going from 0 on up through the numbers. So a dash and a numeric sort may not give you the
output that you expected to. So just test that first. So the way that it behaves can be a little
jarring. So a regular numeric sort which dash N has different rules than general numeric sort.
So be aware of that. Chances are if you're really looking to do something more in a numerical
basis, you want to do a general numeric sort as opposed to just a that's with the dash g as a
number sort as just a regular numerical sort. There's a dash h which is the human numeric sort
or dash dash human dash numeric dash sort. And what that does is it first determines whether
there's a number signs positive or negative 0. And then it looks at whether there's a suffix.
And the suffix can be any one of the following could be a k for like kilobytes or capital k.
And then the other options, MGTPEZY, which we're all familiar with those, megabyte, gigabyte,
terabyte, petabyte, exabyte, zedabyte, yadabyte, those all have to be capital letters. MGTPEZY,
all have to be capitalized. In this case, incident, you won't get a proper sort if you don't capitalize
those if that's what you're looking for with a human human generic sort. So it looks at the prefix
first, the number, and the suffix. And it orders it on both of those. So for example,
if you had one M and one one capital M and one capital G, that would be the order if you had
but if you did like 1042, 1042 capital M and one G, it would not put one G first and then 1042
capital M second, even though 1042 M is greater than one G, one giga, because 1024 mega is equal to
one gigabyte. It's not smart enough to do that, just be aware of that. There's some limitations there.
So it's primarily looking at the numbers and the suffixes and ordering them on there, not necessarily
the value of the number or in conjunction with the suffix. Just be aware of that right there.
Sort has an option to randomize the sort value. If you need to really take a list and randomize it,
you can use the dash capital R or dash dash random dash sort. And that'll do a hashed random value
of that list. So if you if you ran it three or four times, you get different values each time
that you run it. So it does a pretty good job of randomizing it. And you can you can randomize that
based on a file, use a dash dash random dash source equals some file and you'll get a random
sort based on that the value of that file. And it should be fairly consistent if you're doing a
random sort of the same list over and over again. The last sorting option I want to talk about
is versioning sort, which is a smarter option of looking at prefixes and suffixes for version
files like source code files or something like that. That the way it operates is it it looks at
tries to break it into a prefix and suffix logically. And the suffix being the version number,
so to speak. And it looks at that and orders it logically through a standard regular expression
that's outlined in the info file. But what it does is is if encounters leading zeros and the
version numbers ignores those. So it does a good job of being able to sort out like zero one two zero
one two B zero one three zero zero one three B and sort those in proper order where necessary.
A normal sort would put the zero zero one three B first when you probably want that to be last.
And so it would order it based upon the non zero suffix value and a standard numeric order on
those to give you a proper versioning list. And that can be handy if you're if you're sorting through
a list of versioned software. Now finally all those options that I've talked about when you're
passing a sort you can do a dash dash sort equals word where word would be one of the values that
I talked about before which would be general dash numeric human dash numeric month numeric random
version instead of specifying like the other options you can do dash dash sort equals numeric spell
it out. And it'll do us numeric sort on the list. That is the basics of sort in a nutshell. There
are other options which I may cover in a future show. They're pre uh unique options that
90% of the cases people use sort four will probably never use those but that is an option.
Head over to the website for to full write up and to watch the video of using the sort command.
Again I want to thank hacker public radio for hosting the files and you for listening have a great day.
You have been listening to Hacker Public Radio or Hacker Public Radio does our
We are a community podcast network that releases shows every weekday Monday through Friday.
Today's show like all our shows was contributed by a HPR listener like yourself.
If you ever consider recording a podcast then visit our website to find out how easy it really is.
Hacker Public Radio was founded by the digital dog pound and the infonomicum computer club.
HPR is funded by the binary revolution at binref.com. All binref projects are crowd-sponsored
by luna pages. From shared hosting to custom private clouds go to luna pages.com for all your
hosting needs. Unless otherwise stasis today's show is released under a creative commons,
attribution, share a line, free dose of license.