Initial commit: HPR Knowledge Base MCP Server

- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 10:54:13 +00:00
commit 7c8efd2228
4494 changed files with 1705541 additions and 0 deletions
--- a/hpr_transcripts/hpr1635.txt
+++ b/hpr_transcripts/hpr1635.txt
@@ -0,0 +1,160 @@
+Episode: 1635
+Title: HPR1635: 41 - LibreOffice Calc - Data Manipulation 1: Sorting and AutoFilter
+Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1635/hpr1635.mp3
+Transcribed: 2025-10-18 06:08:48
+
+---
+
+This episode of HBR is brought to you by AnanasThost.com.
+Get 15% discount on all shared hosting with the offer code HBR15.
+That's HBR15.
+Better web hosting that's honest and fair at AnanasThost.com.
+Hello, this is Ahuka, welcoming you to Hacker Public Radio.
+Another exciting episode in our ongoing series on Libra Office Calc.
+What I want to turn to now is the idea of how to do data manipulation in Calc.
+I'm going to start with some of the simplest things that you can do sorting and auto filter.
+Now Calc is not a database.
+It can be used for some data analysis and manipulation.
+When I worked for the finance department of a hospital, it was very common for the financial
+analysts to get a data dump from a centralized system as a CSV file loaded up in a spreadsheet
+and then slice and dice the data to get the answers they wanted.
+Now, it is not anywhere near what you can do with a good relational database and a structured
+query, but you can do some quick and dirty analysis here.
+To illustrate this, I went looking for a data set and I found one from Vanderbilt University
+Department of Biostatistics.
+Now, in case you were not aware, there's a lot of data sets you can download on the internet,
+including in many cases, government data, and it's all freely available and you are
+welcome to download it and use it as always, you know, do check to see if there's any particular
+license involved.
+But in this case, I thought it was fun to get some biostatistics, so I grabbed a meningitis
+data set, partly just to get something in the medical area into our discussion.
+Now as it happens, it was larger than I needed, so I discarded a lot of the data to get
+something a little more manageable.
+I hid most of the variables and kept these.
+Case number.
+It's just an index.
+It starts at one and it goes up.
+Year.
+Looks like a two-digit year in which the patient was observed and some of them are blank.
+Month.
+Looks like the month within the year when the patient was observed and some of them are
+blank.
+Age.
+The age of the patient when they were observed and some are blank.
+Case of patient.
+Black versus white.
+Some blank.
+Sex of patient.
+Male versus female.
+Some are blank.
+There were 581 records in this data set, so I kept those and just those six variables.
+So what are the things that I want to do with this little bit of a data set?
+Well, sorting is one of them.
+You can put them in some type of order.
+That's the simplest thing you can do with any data set.
+You can sort the data and put it in order by a column.
+So let's say you wanted to sort by the year when the patient was observed.
+So you click on the B column, which has that variable, and go to the data menu, sort,
+and you get a window that opens up.
+And this is interesting because it gives you a couple of different options.
+So the window that comes up says sort range.
+The cells next to the current selection also contain data.
+Do you want to extend the sort range?
+And it says to range of cell A1 to F582, in other words, the entire database.
+Or sort the currently selected range, B1 to B, and then it goes to 1,048,000, so basically
+I've selected the entire column.
+Then there's three buttons.
+Extend selection is the first button, and that's selected by default.
+Extend selection and cancel.
+So what is this all about?
+If you had clicked current selection, column B would be sorted all by itself, but the other
+columns would not be sorted.
+This used to be the default behavior of spreadsheets.
+But we've learned that people do attempt these kinds of manipulations on data sets.
+So now it is smart enough to check.
+If you select extend selection, the first button, the one that's already highlighted by default,
+it will sort based on what is in column B, but will sort the data set from cell A1 to
+cell F582.
+Now, in this case, that's what we want because each row is an entire record here.
+So we want to keep each record intact.
+So extend selection is what we want.
+That brings up another window with sort criteria, and you can have several sort keys.
+So I just had year as being sort key one, and I've got a little radio button there that
+says I've selected ascending, so it'll start with the earliest year and go up.
+So, if I had chosen month for the second sort key, the data would first be sorted by year,
+then for all the data points with the same year, they would be sorted by month.
+That's not bad.
+It's useful to know that you can do this multiple sort kind of thing, and it handles it in a sensible
+way.
+Next thing I want to talk about here is something called auto-filter.
+Now filters are an important tool for working with large data sets.
+While they don't replace a relational database with structured queries, they do give you
+some useful features.
+There are three kinds of filters, all accessed through the data menu in Calc.
+We will discuss auto-filter in this tutorial, then move on to the others in subsequent tutorials.
+Go to the data menu, to filters, and then to auto-filter, and click to turn on.
+You will see a drop-down appear on every column in the open sheet.
+These drop-downs will give you an entry for each unique value in the column.
+For a column like case number, this feature is useless since each row has a unique case
+number.
+But if we wanted to only look at cases from 1978, we could go to the year column, click
+the drop-down, remove the check mark from all in the bottom of the drop-down, which deselects
+everything, then put a check mark in 78 to select only that, and the result is to display
+only those rows which have 78 in the year column.
+If you look at the row numbers on the left, every time one or more rows was skipped, there
+is a heavier line separating the row numbers.
+Also for each column that has a filter applied, there will be a small black square in the lower
+right corner of the drop-down arrow box.
+This is helpful if you need to quickly see where the filters are applied, such as when
+you need to remove one.
+Now note, you do need to be careful in working with a filtered dataset.
+If you forget the filtering you have applied, you may get a wrong idea about your data.
+Be sure to always check your filters first when working with a dataset to make sure you
+have not filtered out some of the cases you want to see.
+Now, you can combine filters as well.
+Leaving the filter on the year column, I can next go to the sex column and select female
+in a similar manner.
+This reduces the dataset even further to only those cases which occurred in 1978 and involved
+a female.
+If I wanted to go even further, I could add a race qualification and filter for black.
+This would give me all cases in 1978 which involved a black female.
+Now, you can remove an auto filter easily, just click the drop-down and put a checkmark
+in all at the bottom that will remove that particular filter.
+Other filters you might want to use are shown at the top of the drop-down window.
+The first one is top 10.
+This returns the rows with the 10 highest values in the dataset.
+If there are no other filters and you have at least 10 rows, you should get 10 rows based
+on this column.
+But if you have other filters, you may get less.
+For example, when I first filtered on race to select white and then selected top 10 for
+age, I got back six records.
+That this means that of the 10 highest ages in my dataset, only six were of white patients.
+Second option you might see there is empty.
+This selects only those rows that are empty for the column you are filtering and then
+not empty as you might guess.
+Selects only those rows that are not empty for the column you are filtering.
+Now suppose you wanted to select cases in 1978 involving females who are over the age
+of 40.
+You could hit the drop-down for the age column, remove the checkmark in all, then put checkmarks
+in for every value greater than 40.
+This would work.
+Then the result is six cases.
+But there is a better way and that involves using what are called standard filters, which
+is what we are going to get to next.
+Now the sample spreadsheet that I created with the data and the simple filtering things
+I have done is available.
+Link is in the show notes and you can get it from my website.
+So please feel free to download and check that out.
+And that means that this is Ahuka signing off for Hacker Public Radio and reminding everyone
+to support FreeSoftware.
+Bye-bye.
+You've been listening to Hacker Public Radio at Hacker Public Radio.
+We are a community podcast network that releases shows every weekday, Monday through Friday.
+Today's show, like all our shows, was contributed by an HPR listener like yourself.
+If you ever thought of recording a podcast and click on our contributing to find out
+how easy it really is.
+Hacker Public Radio was founded by the digital dog pound and the Infonomicon Computer Club
+and is part of the binary revolution at binwreff.com.
+If you have comments on today's show, please email the host
+directly, leave a comment on the website or record a follow-up episode yourself.
+Unless otherwise stated, today's show is released on the Creative Commons'