- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
161 lines
9.1 KiB
Plaintext
161 lines
9.1 KiB
Plaintext
Episode: 1635
|
|
Title: HPR1635: 41 - LibreOffice Calc - Data Manipulation 1: Sorting and AutoFilter
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1635/hpr1635.mp3
|
|
Transcribed: 2025-10-18 06:08:48
|
|
|
|
---
|
|
|
|
This episode of HBR is brought to you by AnanasThost.com.
|
|
Get 15% discount on all shared hosting with the offer code HBR15.
|
|
That's HBR15.
|
|
Better web hosting that's honest and fair at AnanasThost.com.
|
|
Hello, this is Ahuka, welcoming you to Hacker Public Radio.
|
|
Another exciting episode in our ongoing series on Libra Office Calc.
|
|
What I want to turn to now is the idea of how to do data manipulation in Calc.
|
|
I'm going to start with some of the simplest things that you can do sorting and auto filter.
|
|
Now Calc is not a database.
|
|
It can be used for some data analysis and manipulation.
|
|
When I worked for the finance department of a hospital, it was very common for the financial
|
|
analysts to get a data dump from a centralized system as a CSV file loaded up in a spreadsheet
|
|
and then slice and dice the data to get the answers they wanted.
|
|
Now, it is not anywhere near what you can do with a good relational database and a structured
|
|
query, but you can do some quick and dirty analysis here.
|
|
To illustrate this, I went looking for a data set and I found one from Vanderbilt University
|
|
Department of Biostatistics.
|
|
Now, in case you were not aware, there's a lot of data sets you can download on the internet,
|
|
including in many cases, government data, and it's all freely available and you are
|
|
welcome to download it and use it as always, you know, do check to see if there's any particular
|
|
license involved.
|
|
But in this case, I thought it was fun to get some biostatistics, so I grabbed a meningitis
|
|
data set, partly just to get something in the medical area into our discussion.
|
|
Now as it happens, it was larger than I needed, so I discarded a lot of the data to get
|
|
something a little more manageable.
|
|
I hid most of the variables and kept these.
|
|
Case number.
|
|
It's just an index.
|
|
It starts at one and it goes up.
|
|
Year.
|
|
Looks like a two-digit year in which the patient was observed and some of them are blank.
|
|
Month.
|
|
Looks like the month within the year when the patient was observed and some of them are
|
|
blank.
|
|
Age.
|
|
The age of the patient when they were observed and some are blank.
|
|
Case of patient.
|
|
Black versus white.
|
|
Some blank.
|
|
Sex of patient.
|
|
Male versus female.
|
|
Some are blank.
|
|
There were 581 records in this data set, so I kept those and just those six variables.
|
|
So what are the things that I want to do with this little bit of a data set?
|
|
Well, sorting is one of them.
|
|
You can put them in some type of order.
|
|
That's the simplest thing you can do with any data set.
|
|
You can sort the data and put it in order by a column.
|
|
So let's say you wanted to sort by the year when the patient was observed.
|
|
So you click on the B column, which has that variable, and go to the data menu, sort,
|
|
and you get a window that opens up.
|
|
And this is interesting because it gives you a couple of different options.
|
|
So the window that comes up says sort range.
|
|
The cells next to the current selection also contain data.
|
|
Do you want to extend the sort range?
|
|
And it says to range of cell A1 to F582, in other words, the entire database.
|
|
Or sort the currently selected range, B1 to B, and then it goes to 1,048,000, so basically
|
|
I've selected the entire column.
|
|
Then there's three buttons.
|
|
Extend selection is the first button, and that's selected by default.
|
|
Extend selection and cancel.
|
|
So what is this all about?
|
|
If you had clicked current selection, column B would be sorted all by itself, but the other
|
|
columns would not be sorted.
|
|
This used to be the default behavior of spreadsheets.
|
|
But we've learned that people do attempt these kinds of manipulations on data sets.
|
|
So now it is smart enough to check.
|
|
If you select extend selection, the first button, the one that's already highlighted by default,
|
|
it will sort based on what is in column B, but will sort the data set from cell A1 to
|
|
cell F582.
|
|
Now, in this case, that's what we want because each row is an entire record here.
|
|
So we want to keep each record intact.
|
|
So extend selection is what we want.
|
|
That brings up another window with sort criteria, and you can have several sort keys.
|
|
So I just had year as being sort key one, and I've got a little radio button there that
|
|
says I've selected ascending, so it'll start with the earliest year and go up.
|
|
So, if I had chosen month for the second sort key, the data would first be sorted by year,
|
|
then for all the data points with the same year, they would be sorted by month.
|
|
That's not bad.
|
|
It's useful to know that you can do this multiple sort kind of thing, and it handles it in a sensible
|
|
way.
|
|
Next thing I want to talk about here is something called auto-filter.
|
|
Now filters are an important tool for working with large data sets.
|
|
While they don't replace a relational database with structured queries, they do give you
|
|
some useful features.
|
|
There are three kinds of filters, all accessed through the data menu in Calc.
|
|
We will discuss auto-filter in this tutorial, then move on to the others in subsequent tutorials.
|
|
Go to the data menu, to filters, and then to auto-filter, and click to turn on.
|
|
You will see a drop-down appear on every column in the open sheet.
|
|
These drop-downs will give you an entry for each unique value in the column.
|
|
For a column like case number, this feature is useless since each row has a unique case
|
|
number.
|
|
But if we wanted to only look at cases from 1978, we could go to the year column, click
|
|
the drop-down, remove the check mark from all in the bottom of the drop-down, which deselects
|
|
everything, then put a check mark in 78 to select only that, and the result is to display
|
|
only those rows which have 78 in the year column.
|
|
If you look at the row numbers on the left, every time one or more rows was skipped, there
|
|
is a heavier line separating the row numbers.
|
|
Also for each column that has a filter applied, there will be a small black square in the lower
|
|
right corner of the drop-down arrow box.
|
|
This is helpful if you need to quickly see where the filters are applied, such as when
|
|
you need to remove one.
|
|
Now note, you do need to be careful in working with a filtered dataset.
|
|
If you forget the filtering you have applied, you may get a wrong idea about your data.
|
|
Be sure to always check your filters first when working with a dataset to make sure you
|
|
have not filtered out some of the cases you want to see.
|
|
Now, you can combine filters as well.
|
|
Leaving the filter on the year column, I can next go to the sex column and select female
|
|
in a similar manner.
|
|
This reduces the dataset even further to only those cases which occurred in 1978 and involved
|
|
a female.
|
|
If I wanted to go even further, I could add a race qualification and filter for black.
|
|
This would give me all cases in 1978 which involved a black female.
|
|
Now, you can remove an auto filter easily, just click the drop-down and put a checkmark
|
|
in all at the bottom that will remove that particular filter.
|
|
Other filters you might want to use are shown at the top of the drop-down window.
|
|
The first one is top 10.
|
|
This returns the rows with the 10 highest values in the dataset.
|
|
If there are no other filters and you have at least 10 rows, you should get 10 rows based
|
|
on this column.
|
|
But if you have other filters, you may get less.
|
|
For example, when I first filtered on race to select white and then selected top 10 for
|
|
age, I got back six records.
|
|
That this means that of the 10 highest ages in my dataset, only six were of white patients.
|
|
Second option you might see there is empty.
|
|
This selects only those rows that are empty for the column you are filtering and then
|
|
not empty as you might guess.
|
|
Selects only those rows that are not empty for the column you are filtering.
|
|
Now suppose you wanted to select cases in 1978 involving females who are over the age
|
|
of 40.
|
|
You could hit the drop-down for the age column, remove the checkmark in all, then put checkmarks
|
|
in for every value greater than 40.
|
|
This would work.
|
|
Then the result is six cases.
|
|
But there is a better way and that involves using what are called standard filters, which
|
|
is what we are going to get to next.
|
|
Now the sample spreadsheet that I created with the data and the simple filtering things
|
|
I have done is available.
|
|
Link is in the show notes and you can get it from my website.
|
|
So please feel free to download and check that out.
|
|
And that means that this is Ahuka signing off for Hacker Public Radio and reminding everyone
|
|
to support FreeSoftware.
|
|
Bye-bye.
|
|
You've been listening to Hacker Public Radio at Hacker Public Radio.
|
|
We are a community podcast network that releases shows every weekday, Monday through Friday.
|
|
Today's show, like all our shows, was contributed by an HPR listener like yourself.
|
|
If you ever thought of recording a podcast and click on our contributing to find out
|
|
how easy it really is.
|
|
Hacker Public Radio was founded by the digital dog pound and the Infonomicon Computer Club
|
|
and is part of the binary revolution at binwreff.com.
|
|
If you have comments on today's show, please email the host
|
|
directly, leave a comment on the website or record a follow-up episode yourself.
|
|
Unless otherwise stated, today's show is released on the Creative Commons'
|