Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
173
hpr_transcripts/hpr1605.txt
Normal file
173
hpr_transcripts/hpr1605.txt
Normal file
@@ -0,0 +1,173 @@
|
||||
Episode: 1605
|
||||
Title: HPR1605: 38 - LibreOffice Calc - simple Descriptive Statistics
|
||||
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1605/hpr1605.mp3
|
||||
Transcribed: 2025-10-18 05:43:56
|
||||
|
||||
---
|
||||
|
||||
This episode of HBR is brought to you by AnanasThost.com.
|
||||
Get 15% discount on all shared hosting with the offer code HBR15.
|
||||
That's HBR15.
|
||||
Better web hosting that's Aniston Fair at AnanasThost.com.
|
||||
Hello, this is Ahuka, welcoming you to Hacker Public Radio.
|
||||
And another in our series of tutorials, I guess you would call it, on Libra Office Calc.
|
||||
And what I want to talk about today is the functions that are available to deal with simple descriptive statistics.
|
||||
We started our look at functions by looking at all of the financial functions, very important area,
|
||||
but I think there's a lot of good stuff here as well.
|
||||
So, what are we talking about?
|
||||
In statistics, there are generally speaking two types of analysis,
|
||||
broken down between descriptive and inferential statistics.
|
||||
Difference has to do with what claims you are making about the data.
|
||||
If you are simply stating something about the data, for instance,
|
||||
there were more men than women in the sample, that is descriptive.
|
||||
If you are taking a look at an entire population and measuring things like the standard deviation,
|
||||
that would be descriptive.
|
||||
But if you're making a claim that something is not likely to occur by chance, for instance,
|
||||
or that something is statistically significant, both of those statements are essentially the same thing.
|
||||
Then you are in the realm of inferential statistics.
|
||||
If you take a sample and do a measurement in the sample and then say, based on that,
|
||||
this is what I think the population is like that is inferential.
|
||||
So, Calc has functions to do both kinds of analysis, and this tutorial is going to focus on some
|
||||
of the common descriptive statistics in Calc and how they are used.
|
||||
Calc offers many statistical functions, of course, that you might want to make use of.
|
||||
These let you get some analytics on data that you have.
|
||||
But you need to have some data to start with.
|
||||
And to do that, I'm going to make use of the random function to make up some numbers.
|
||||
In cell A1 and B1, I set up a header by clicking merge and center,
|
||||
making the font aerial 12, bold, and giving a colored background through the formatting cells option.
|
||||
Then I select cells C1 and D1, merge and center to the same formatting.
|
||||
So, for column A and B, I give the column named data, and for the combined C1, D1, I enter statistics.
|
||||
In cell A2, I use the RAND function, found in the mathematics category, which gives me a random
|
||||
number between 0 and 1. I then multiply by 100 to get numbers a little bigger, and I click and drag
|
||||
through the column to get 30 numbers. When you start working, you will encounter an interesting
|
||||
problem, which is that the random numbers keep changing when you change other cells.
|
||||
What is happening here is that the formulas are recalculating every time the sheet recalculates.
|
||||
But we can use a trick to get around this. Highlight all of the numbers,
|
||||
all of the cells, and collect copy. Then with the same area highlighted,
|
||||
paste on to itself by right-clicking, and selecting paste only, and then selecting number.
|
||||
This takes the result of the function, and turns it into a RAND number, which replaces the
|
||||
function at each cell. Now you have your data locked in, and you're ready to do some statistics.
|
||||
First thing we'll look at, measures of central tendency.
|
||||
In statistics, we distinguish between several different measures of central tendency.
|
||||
Essentially, this is an attempt to answer the question, what does the most representative member
|
||||
of this group look like? There is more than one answer depending on the data.
|
||||
First is the question, does the data represent a qualitative or quantitative variable?
|
||||
Yes, that question does keep coming up.
|
||||
For quantitative data, there are several likely answers. One is the average,
|
||||
also referred to in statistics as the mean, and another is the median, which one do you use?
|
||||
Basically, it comes down to how symmetric the distribution is. When it's symmetric,
|
||||
the two measures will be very close. When it is skewed, they will diverge a lot.
|
||||
For instance, if you have a group of 10 people in a room and ask what the typical person's wealth is,
|
||||
well, if all 10 people are reasonably normal, average people, you could just use the average and
|
||||
get a good answer. But if one of those people does bill gates, you're going to get an extremely high
|
||||
and unrepresented number. In that case, you should use the median, which divides the sample into
|
||||
two groups and asks, where is the boundary between the top 50 percent and the bottom 50 percent?
|
||||
So for average, go to the function wizard, select statistical as your category,
|
||||
and average as the function. Click next. The window for putting in arguments opens with a space
|
||||
for each number. You could enter each number one at a time, one per field, but that is not optimal.
|
||||
Instead, click on the field for the first number, then click on cell A2. You will see that the field
|
||||
now has that cell address. But now, hold down the shift key and click on cell A31, which selects
|
||||
the whole column of numbers. Now the field will read equals average open parent A2 colon A31
|
||||
close parent. That gives you the whole range of numbers, and when you click OK, you will get the
|
||||
average of this group of numbers. In this case, we generated data using the Rand function,
|
||||
multiplied by 100, which should mean random number between 0 and 100. So you should not be
|
||||
surprised if your answer comes reasonably close to 50. Mine came out to 49, but your number could
|
||||
be slightly different. Now the average, also called the arithmetic mean, is calculated by adding
|
||||
up the measurements and dividing by the number of measurements. The geometric mean is calculated
|
||||
by multiplying the numbers together, and then taking the n-th root, where n is the number of
|
||||
measurements. The function is the Geo mean function, and you use it just like the average function.
|
||||
In other words, select the data, paste it in, yada yada. Harmonic mean, fairly complicated to
|
||||
describe, but it's used in scientific applications. The three types of mean arithmetic geometric and
|
||||
harmonic are called the Pythagorean means. The harmonic mean is used, for example, in evaluating
|
||||
computer algorithms. It is called the Har mean function, and again usage is exactly the same.
|
||||
The rule of thumb with this is that of the three measures, the arithmetic mean is always the
|
||||
largest, the geometric mean is in the middle, and the harmonic mean is the lowest.
|
||||
Now, median. As above, go to the statistical functions, but this time select median.
|
||||
As above, select the range for the first field. Now this may be a little farther from the middle,
|
||||
depending on your numbers. Mine was 42, again random numbers, but the median is what divides
|
||||
the group into two equals, you know, an upper and a lower 50%. Now, mode. This is what we get with
|
||||
qualitative data, and mode is the most common. All right, so for example, let's say you had a sample
|
||||
of people where you recorded their hair color as black, blonde, or red-headed, and wanted to know
|
||||
what was the most representative. Now, in a case like this, it makes no sense to use something
|
||||
like average. There's no way of doing that. Mode looks for the most common, so, you know,
|
||||
which of those groups had the most people. Now, what you need to watch out for here is that
|
||||
Calc works with numerical data, so what happens is you need to encode the data
|
||||
with something like a number one means black, a number two means blonde, and a number three means red-head.
|
||||
Do that, and you can put a range of data in the mode function and get your answer. Now,
|
||||
because you've coded it, you could take this and put it into an arithmetic mean function,
|
||||
and it would calculate something, but the answer would be totally meaningless. Do not do this.
|
||||
All right, so yes, you need to code the data with numbers, but understand and use the right one,
|
||||
and if you understand that all qualitative data needs something like mode, and you can't use
|
||||
mean, that'll keep you on the path of virtue and righteousness. Now, the other thing that we
|
||||
typically use in descriptive statistics are what we call measures of dispersion, and that tells you
|
||||
how much variation there is in a group of numbers. Two different groups of numbers could have a
|
||||
very similar average or median, and yet be very different when you look at the degree of variation.
|
||||
For example, the number four and the number of six, if you average it together, you get five.
|
||||
Well, you could also average the number zero and the number ten and get five,
|
||||
and it's very clear that those are two different groups of numbers. So to address this,
|
||||
we need to look at a few related measures of dispersion. But to use these functions, we need to first
|
||||
discuss the difference between a population and a sample. A population means you have the entire
|
||||
group measured. Well, a sample means you have some fraction of the group measured, and you want to
|
||||
use that to make a claim about the population. For instance, in my past life, I worked for a company
|
||||
that did political polling, and so I could say I expect, you know, 100 million voters in the United
|
||||
States to vote in a presidential election, but when I do a poll, I might talk to one or two thousand
|
||||
of them. All right, so the sample was one thousand, two thousand, something on that order.
|
||||
The population was the one hundred million. Now this matters. I'm not going to go into it in
|
||||
huge detail. There are reasons why these statistical measures are slightly different.
|
||||
So just know which one you're dealing with, okay, so that you pick the right thing.
|
||||
So, variance. If you want a short description of variance, it is the average of the squared
|
||||
deviations from the mean. You have four possible functions here. Two of them are somewhat
|
||||
specialized. I'll skip over them for now. So the ones I will talk about are VAR and VARP.
|
||||
Now, VAR function. This assumes that you have a sample and will produce a slightly larger number
|
||||
on that account because of the assumed sampling error. Go to the function wizard to statistical,
|
||||
select the VAR function. Click next, then in the first of the number fields. Click the place
|
||||
you're insertion mark, then click on cell A2. Hold down the shift key and click on cell A31.
|
||||
This will put the range A2, colon A31 into your function. Click okay and get your number.
|
||||
Now, the other function, VARP and the P is for population. The procedure is absolutely identical.
|
||||
It'll just produce a slightly smaller number because with the population, obviously you don't
|
||||
have any kind of sampling error. Then standard deviation. This is strongly related to variance
|
||||
since it's the square root of variance. Again, you have four possible functions.
|
||||
I will recover the two common ones. There's one for sample, one for a population.
|
||||
So, the standard deviation can be considered in some sense a measurement of the average deviation
|
||||
of each measurement from the mean. So, STDEV is standard deviation and this is the function to use
|
||||
if you're measuring the standard deviation of a sample. Then there's STDEVP, which is the function
|
||||
to measure the standard deviation for a population. The procedure for all of these is absolutely
|
||||
identical. You go to the function wizard, you find the function, you click on the cells to
|
||||
select them and then that gets inserted into the function box and then you click okay and you get
|
||||
out again. Okay, other measures. A few other measures that you might want to look at for descriptive
|
||||
statistics. I just want to bring in a couple here. Min and max, minimum, maximum. So, exactly the same
|
||||
usage that we talked about, how to get them. Go to the function wizard, find the function and
|
||||
click on the appropriate cells and insert the cell information and click okay and you get your number.
|
||||
Now, what are the lessons learned from this? There's a few of them. First, a very useful trick is to
|
||||
use the paste-only number trick to convert the contents of a cell or range of cells into the
|
||||
resulting numbers. Now, recall that earlier in our series, we emphasize the difference between
|
||||
the contents of a cell, frequently a formula or cell address if you are a skilled builder of
|
||||
spreadsheets and the visible results, which are generally a number. If you ever want to get
|
||||
just the number and lose the underlying formulas, this is how it's done.
|
||||
Second, lesson learned. Use some of these functions. It helps to have a little background in the
|
||||
theory, such as why samples and populations are treated differently or why there are three
|
||||
different ways of measuring the mean. Now, going any further here is beyond the scope of these
|
||||
tutorials. It is left as an exercise for you, the listener. But, you know, if you really want
|
||||
no more about this, take a course in statistics. I'm sure there are lots of colleges around that
|
||||
would be happy to give you one. In fact, actually, these days with these, multiple massively online
|
||||
open courses, you can probably get one for free on the web. Descriptive statistics simply describe
|
||||
what we see in a group of numbers. There is a branch of statistics that goes further and it is
|
||||
called inferential statistics. In the next tutorial, we'll look at a few of the more common
|
||||
inferential statistics functions. Now, one big takeaway from this lesson is that all of these
|
||||
functions are used in similar ways. The procedures for using a function are very standardized,
|
||||
so the key is not figuring out the mechanics. It is understanding which function to use and why
|
||||
that is the correct function. The single most common error I see is people using the wrong function
|
||||
because they don't understand why they should be using one function rather than another.
|
||||
And in closing, let me just say that I have built a spreadsheet that has
|
||||
examples of the things that we've been talking about. You are welcome to download it and take a
|
||||
look at it. You can take a look at the functions and see exactly how they were entered and what
|
||||
they look like. All of that is in the show notes. This is Ahuka, as always, signing off by reminding
|
||||
you to support free software. Bye-bye.
|
||||
You've been listening to Heccupublic Radio at HeccupublicRadio.org. We are a community podcast
|
||||
network that releases shows every weekday Monday through Friday. Today's show, like all our shows,
|
||||
was contributed by an HBR listener like yourself. If you ever thought of recording a podcast
|
||||
and click on our contributing to find out how easy it really is. Heccupublic Radio was found
|
||||
by the digital dog pound and the infonomican computer club and is part of the binary revolution
|
||||
at binwreff.com. If you have comments on today's show, please email the host directly, leave a comment
|
||||
on the website or record a follow-up episode yourself. Unless otherwise status, today's show is
|
||||
released on the creative comments, attribution, share a light, 3.0 license.
|
||||
Reference in New Issue
Block a user