Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
160
hpr_transcripts/hpr1635.txt
Normal file
160
hpr_transcripts/hpr1635.txt
Normal file
@@ -0,0 +1,160 @@
|
||||
Episode: 1635
|
||||
Title: HPR1635: 41 - LibreOffice Calc - Data Manipulation 1: Sorting and AutoFilter
|
||||
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1635/hpr1635.mp3
|
||||
Transcribed: 2025-10-18 06:08:48
|
||||
|
||||
---
|
||||
|
||||
This episode of HBR is brought to you by AnanasThost.com.
|
||||
Get 15% discount on all shared hosting with the offer code HBR15.
|
||||
That's HBR15.
|
||||
Better web hosting that's honest and fair at AnanasThost.com.
|
||||
Hello, this is Ahuka, welcoming you to Hacker Public Radio.
|
||||
Another exciting episode in our ongoing series on Libra Office Calc.
|
||||
What I want to turn to now is the idea of how to do data manipulation in Calc.
|
||||
I'm going to start with some of the simplest things that you can do sorting and auto filter.
|
||||
Now Calc is not a database.
|
||||
It can be used for some data analysis and manipulation.
|
||||
When I worked for the finance department of a hospital, it was very common for the financial
|
||||
analysts to get a data dump from a centralized system as a CSV file loaded up in a spreadsheet
|
||||
and then slice and dice the data to get the answers they wanted.
|
||||
Now, it is not anywhere near what you can do with a good relational database and a structured
|
||||
query, but you can do some quick and dirty analysis here.
|
||||
To illustrate this, I went looking for a data set and I found one from Vanderbilt University
|
||||
Department of Biostatistics.
|
||||
Now, in case you were not aware, there's a lot of data sets you can download on the internet,
|
||||
including in many cases, government data, and it's all freely available and you are
|
||||
welcome to download it and use it as always, you know, do check to see if there's any particular
|
||||
license involved.
|
||||
But in this case, I thought it was fun to get some biostatistics, so I grabbed a meningitis
|
||||
data set, partly just to get something in the medical area into our discussion.
|
||||
Now as it happens, it was larger than I needed, so I discarded a lot of the data to get
|
||||
something a little more manageable.
|
||||
I hid most of the variables and kept these.
|
||||
Case number.
|
||||
It's just an index.
|
||||
It starts at one and it goes up.
|
||||
Year.
|
||||
Looks like a two-digit year in which the patient was observed and some of them are blank.
|
||||
Month.
|
||||
Looks like the month within the year when the patient was observed and some of them are
|
||||
blank.
|
||||
Age.
|
||||
The age of the patient when they were observed and some are blank.
|
||||
Case of patient.
|
||||
Black versus white.
|
||||
Some blank.
|
||||
Sex of patient.
|
||||
Male versus female.
|
||||
Some are blank.
|
||||
There were 581 records in this data set, so I kept those and just those six variables.
|
||||
So what are the things that I want to do with this little bit of a data set?
|
||||
Well, sorting is one of them.
|
||||
You can put them in some type of order.
|
||||
That's the simplest thing you can do with any data set.
|
||||
You can sort the data and put it in order by a column.
|
||||
So let's say you wanted to sort by the year when the patient was observed.
|
||||
So you click on the B column, which has that variable, and go to the data menu, sort,
|
||||
and you get a window that opens up.
|
||||
And this is interesting because it gives you a couple of different options.
|
||||
So the window that comes up says sort range.
|
||||
The cells next to the current selection also contain data.
|
||||
Do you want to extend the sort range?
|
||||
And it says to range of cell A1 to F582, in other words, the entire database.
|
||||
Or sort the currently selected range, B1 to B, and then it goes to 1,048,000, so basically
|
||||
I've selected the entire column.
|
||||
Then there's three buttons.
|
||||
Extend selection is the first button, and that's selected by default.
|
||||
Extend selection and cancel.
|
||||
So what is this all about?
|
||||
If you had clicked current selection, column B would be sorted all by itself, but the other
|
||||
columns would not be sorted.
|
||||
This used to be the default behavior of spreadsheets.
|
||||
But we've learned that people do attempt these kinds of manipulations on data sets.
|
||||
So now it is smart enough to check.
|
||||
If you select extend selection, the first button, the one that's already highlighted by default,
|
||||
it will sort based on what is in column B, but will sort the data set from cell A1 to
|
||||
cell F582.
|
||||
Now, in this case, that's what we want because each row is an entire record here.
|
||||
So we want to keep each record intact.
|
||||
So extend selection is what we want.
|
||||
That brings up another window with sort criteria, and you can have several sort keys.
|
||||
So I just had year as being sort key one, and I've got a little radio button there that
|
||||
says I've selected ascending, so it'll start with the earliest year and go up.
|
||||
So, if I had chosen month for the second sort key, the data would first be sorted by year,
|
||||
then for all the data points with the same year, they would be sorted by month.
|
||||
That's not bad.
|
||||
It's useful to know that you can do this multiple sort kind of thing, and it handles it in a sensible
|
||||
way.
|
||||
Next thing I want to talk about here is something called auto-filter.
|
||||
Now filters are an important tool for working with large data sets.
|
||||
While they don't replace a relational database with structured queries, they do give you
|
||||
some useful features.
|
||||
There are three kinds of filters, all accessed through the data menu in Calc.
|
||||
We will discuss auto-filter in this tutorial, then move on to the others in subsequent tutorials.
|
||||
Go to the data menu, to filters, and then to auto-filter, and click to turn on.
|
||||
You will see a drop-down appear on every column in the open sheet.
|
||||
These drop-downs will give you an entry for each unique value in the column.
|
||||
For a column like case number, this feature is useless since each row has a unique case
|
||||
number.
|
||||
But if we wanted to only look at cases from 1978, we could go to the year column, click
|
||||
the drop-down, remove the check mark from all in the bottom of the drop-down, which deselects
|
||||
everything, then put a check mark in 78 to select only that, and the result is to display
|
||||
only those rows which have 78 in the year column.
|
||||
If you look at the row numbers on the left, every time one or more rows was skipped, there
|
||||
is a heavier line separating the row numbers.
|
||||
Also for each column that has a filter applied, there will be a small black square in the lower
|
||||
right corner of the drop-down arrow box.
|
||||
This is helpful if you need to quickly see where the filters are applied, such as when
|
||||
you need to remove one.
|
||||
Now note, you do need to be careful in working with a filtered dataset.
|
||||
If you forget the filtering you have applied, you may get a wrong idea about your data.
|
||||
Be sure to always check your filters first when working with a dataset to make sure you
|
||||
have not filtered out some of the cases you want to see.
|
||||
Now, you can combine filters as well.
|
||||
Leaving the filter on the year column, I can next go to the sex column and select female
|
||||
in a similar manner.
|
||||
This reduces the dataset even further to only those cases which occurred in 1978 and involved
|
||||
a female.
|
||||
If I wanted to go even further, I could add a race qualification and filter for black.
|
||||
This would give me all cases in 1978 which involved a black female.
|
||||
Now, you can remove an auto filter easily, just click the drop-down and put a checkmark
|
||||
in all at the bottom that will remove that particular filter.
|
||||
Other filters you might want to use are shown at the top of the drop-down window.
|
||||
The first one is top 10.
|
||||
This returns the rows with the 10 highest values in the dataset.
|
||||
If there are no other filters and you have at least 10 rows, you should get 10 rows based
|
||||
on this column.
|
||||
But if you have other filters, you may get less.
|
||||
For example, when I first filtered on race to select white and then selected top 10 for
|
||||
age, I got back six records.
|
||||
That this means that of the 10 highest ages in my dataset, only six were of white patients.
|
||||
Second option you might see there is empty.
|
||||
This selects only those rows that are empty for the column you are filtering and then
|
||||
not empty as you might guess.
|
||||
Selects only those rows that are not empty for the column you are filtering.
|
||||
Now suppose you wanted to select cases in 1978 involving females who are over the age
|
||||
of 40.
|
||||
You could hit the drop-down for the age column, remove the checkmark in all, then put checkmarks
|
||||
in for every value greater than 40.
|
||||
This would work.
|
||||
Then the result is six cases.
|
||||
But there is a better way and that involves using what are called standard filters, which
|
||||
is what we are going to get to next.
|
||||
Now the sample spreadsheet that I created with the data and the simple filtering things
|
||||
I have done is available.
|
||||
Link is in the show notes and you can get it from my website.
|
||||
So please feel free to download and check that out.
|
||||
And that means that this is Ahuka signing off for Hacker Public Radio and reminding everyone
|
||||
to support FreeSoftware.
|
||||
Bye-bye.
|
||||
You've been listening to Hacker Public Radio at Hacker Public Radio.
|
||||
We are a community podcast network that releases shows every weekday, Monday through Friday.
|
||||
Today's show, like all our shows, was contributed by an HPR listener like yourself.
|
||||
If you ever thought of recording a podcast and click on our contributing to find out
|
||||
how easy it really is.
|
||||
Hacker Public Radio was founded by the digital dog pound and the Infonomicon Computer Club
|
||||
and is part of the binary revolution at binwreff.com.
|
||||
If you have comments on today's show, please email the host
|
||||
directly, leave a comment on the website or record a follow-up episode yourself.
|
||||
Unless otherwise stated, today's show is released on the Creative Commons'
|
||||
Reference in New Issue
Block a user