Files

246 lines
16 KiB
Plaintext
Raw Permalink Normal View History

Episode: 1545
Title: HPR1545: 32 - LibreOffice Calc - Introduction to Charts and Graphs
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1545/hpr1545.mp3
Transcribed: 2025-10-18 04:51:17
---
Music
Hello, this is Ahuka, welcoming you to Hacker Public Radio in another exciting episode
in our ongoing series of Libra Office Calc.
What I want to do today is I want to start introducing the whole idea of charts and graphs.
One of the nice features of spreadsheets is that they come with a pretty decent capability
for presenting your data in a good way with charts and graphs.
It is said a picture is worth a thousand words, and a well-done graph can communicate
a lot of information in a very concise form.
I really appreciate good graphing, and I abhor misleading graphs which are very easy
to create.
For anyone who wants to become an expert in this area, you cannot do better than to study
the works of Edward Tufty, particularly his book, The Visual Display of Quantitative Information,
and links for all of the things that I mention are going to be in the show notes.
For pure pleasure, if you have a few minutes, there's a video on YouTube from a channel
called NumberFile that discusses what they call perhaps the greatest infographic ever-created.
You probably would enjoy taking a look at that, and there is a TED Talk, the best stats
you've ever seen, which is simply mind-blowing if you have any feel at all for visually
displaying information.
And finally, I want to mention the David McCandless TED Talk, The Beauty of Data Visualization.
So if you take a little time to take a look at this, you'll start to see what good presentation
is on this stuff.
Now returning to the Tufty book for a moment, notice the key word in the title, Quantitative.
Now we discuss the distinction between quantitative and qualitative data in an earlier lesson,
but it never hurts to be clear about this, because the choices you make about the right
chart or graph depend crucially on knowing what is appropriate for the data you have.
Quantitative data is data that is measured in terms of numbers, and the measurements make
sense of numbers.
If I ask you how many apples you have, and you say I have three apples, I could record
that data, and it would be quantitative.
But if I asked you what apartment number you live in, and you said three, I could record
that data, but it is in no sense quantitative.
Three is just a label in this instance, and the data is actually qualitative, which means
it can be used to distinguish your apartment from the one next door that is apartment 2,
but you would never claim that apartment 3 is 50% more apartment-y than apartment 2.
The number in this case has no meaning as a number, and the software won't prevent
you from making a mistake here.
You can create a graph using this data, but it may be completely useless if you make the
wrong choice.
So what is qualitative data, and how do you make charts out of this?
Qualitative data measures a particular quality of each object, hence the name.
And these are very common in social science research.
People can be divided by sex, male versus female, by religion, by race, by nationality,
by province, etc.
And I like to think of these qualitative variables as being buckets into which the data
is sorted.
If I am sorting by sex, I have buckets for male and female, and each person gets placed
in the appropriate bucket.
And when I have finished sorting them into buckets, I can do one meaningful mathematical
measurement, and that is to count the number in each bucket.
But these counts, I do have numbers that can be placed into a chart.
So I have several options here.
First is a column.
A column chart lets you display a column whose height is proportional to the number in each
bucket.
If your data had 20 men and 10 women, the column for men should be twice as high as the column
for women.
If this is not the case, you may very well be lying with data, whether deliberately
or inadvertently.
This came up recently when a major television network in the United States had to apologize
and correct a column graph that made the difference between 6 million and 7 million
look like the difference between 1 and 7.
Not good.
Bar, a bar graph, what's that, or a bar chart?
This is just a column chart turned on its side.
Instead of columns going up, you have bars going from left to right.
There really is no other difference.
But there are reasons why you choose one or over the other.
If your chart has both positive and negative numbers, that will be much clearer using columns.
And if you have a lot of bars and they have long names, a horizontal bar graph is probably
clearer.
And none of these involve any actual change in what they display just in making things
easier to read, but that is a good consideration.
And there is the pie chart.
This is the chart to use when you want to discuss relative percentages within each bucket.
The entire pie is 100% and each bucket gets a slice of the pie proportional to its percentage
within the total.
The use case here is for qualitative data where the number of categories is fairly small.
I find if there are more than about a half a dozen slices of the pie, it becomes progressively
harder to read and understand the chart.
And that last point raises a general point about all qualitative charts.
You really don't want a huge number of buckets or categories in these charts.
Even with a bar graph that can, in theory, accommodate more categories, you can have a chart
that is hard to make sense of.
A good way to resolve this is to broaden your categories.
For example, suppose you are doing an analysis of the proportion of evangelical Protestants
in different parts of the United States.
You might start with a data breaking this down by each state, but there are 50 states in
the United States so you would have 50 buckets.
None of these charts would work well, in this case.
But if you group the states into regions such as East Coast, Midwest, South, West Coast,
something like that, you can get it down to a manageable number and produce a chart that
makes sense and is easy to understand.
Now onto the quantitative analysis.
This is where we get to more complicated mathematical analysis and the nature of the charts and
graphs available changes as a result.
The most interesting cases are ones where you have several quantitative variables interacting.
For example, a typical economics question might be how the unemployment rate has varied
over time.
The unemployment rate is one quantitative variable and time itself is another.
But you can come up with examples in many other fields as well.
In chemistry, you might measure the rate of reaction as the concentration varies, for
instance.
In these types of analysis, each variable needs to be graphed on an axis that has numbers
arranged in order.
Given the limitations of the human brain, that generally means no more than three axes
if you're trying to do a graph, although there have been some clever ways to get around
this limitation, and the resources I have given at the beginning of this tutorial will give
you some wonderful examples of multiple variable graphing.
This question you need to consider is what kind of relationship do you think exists in
this data?
In scientific analysis, there are in general two kinds of variables in any analysis.
They can be called independent versus dependent.
And sometimes the independent variable is instead referred to as the explanatory variable.
However you call them, the basic idea is that one variable is explaining the other.
To take an example from medicine, you might want to examine the idea that there is a relationship
between age of death and body weight, and collect data from a group of individuals to examine
this idea and see what relationship exists.
I hope you will agree that it makes no sense to think of body weight, that it makes sense,
that it makes sense to think of body weight as being something that helps to determine
the age of death, but it makes no sense to think of the age of death determining body weight.
So if you're graphing this particular data set, you would always put body weight on the
horizontal axis and age of death on the vertical axis.
This is a convention, it's not a scientific necessity, perhaps, but conventions are important
since that governs how people will read the graph.
You should never violate conventions without a very compelling reason, since in most cases
it will cause people to misinterpret your graph.
Okay, so what kind of options do we have?
The line graph, this is the most basic type of quantitative chart.
It places one variable on the horizontal axis, conventionally called the x-axis, and the
other on the vertical axis, usually called the y-axis.
Each point of the graph represents a particular data point, or observation as scientists refer
to it, which is entered by selecting the correct values for each axis.
The last step is to draw a line that connects each of these data points.
Line graphs carry certain implications, though.
First of all, for each value on the x-axis, there should be one and only one value on the
y-axis.
A variable that changes over time is an excellent case in point.
If you have a graph of gross domestic product, also known as GDP, for each year, a line graph
would be perfectly appropriate since you can have only one value of GDP for any given year.
In this type of analysis called time series and statistics, the convention is to always
place the time variable on the x-axis and the corresponding measurement on the y-axis.
This type of graph generally presumes an orderly progression, therefore, across the x-axis.
Now, sometimes we don't have those presumptions, and that's where something like the x-y or
scatter diagram becomes useful.
This is the preferred graph to use when there can be more than one value on the y-axis,
for any given x-axis value, or where there is not yet a presumption of orderly progress
on the x-axis.
As an example, consider a graph that relates body weight to height for a group of individuals.
Even if you presume some general relationship here, it is clear that for any given height,
you could have multiple weights, and for any given weight, there can be multiple heights.
So the scatter graph is often used to see whether a relationship might exist between these
variables.
Area graph.
This is a way to combine multiple series of data that could just as well be displayed in
a line graph.
The idea here is to fill in the area under the graph.
Bubble.
This chart is designed to show the relationship between three variables.
One is on the horizontal axis, another on the vertical axis, and the third is shown by
the size of the bubble that's drawn.
This could be drawn for three quantitative variables, or it could be a hybrid graph where
one variable is qualitative.
This is a clever alternative to drawing a perspective rendering of a 3D graph.
The limitation is you cannot display zero or negative numbers in the bubble.
The next one is called in LibraOffice a net graph, or a net diagram.
It's a very odd choice.
I'm not quite sure because as far as I can tell, no one other than LibraOffice uses this
terminology.
Now, if you go looking for this on the web, you will find it referred to most commonly as
a radar chart or a spider chart.
For radar chart and bubble chart, by the way, there's links in the show notes to Wikipedia
so you can read up more about this.
The radar chart has spokes that represent different variables which radiate out from a common
center.
Along each spoke, the distance from the center represents a measurement.
Then you connect the dots going around the spokes to form a very irregular shape.
By repeating this process for a number of selected objects, you can do a comparison.
So, if you go to the Wikipedia article that I linked, they give example of several different
cars, and for each you do a measurement of variables such as price, mileage, headroom, etc.
And by comparing the shapes you get for each automobile, you can kind of do a quick comparison
among them.
Then there's the hybrid charts and graphs.
Sometimes you need to combine both kinds of variables in a single analysis, and for that
it helps to have a hybrid graph that combines both types of data in a good way.
Now, technically you can view bar charts, column charts, and pi charts as hybrids in that
the count of members in each bucket is actually a quantitative measure, but that is not how
most people think of it.
A stock chart, this is a specialized type of column graph that essentially combines
three different numerical measures on one diagram.
The height of each column generally represents the closing price, but it adds both the high
and the low for the day.
Now, this can be used for more than stock prices.
You could use it to display the average, minimum, and maximum for a group of measurements,
or in statistics, maybe the mean of the measurements combined with the standard deviation or estimated
error, and so on.
There's a chart called column and line, which combines two types of data in a single chart
presumably because they were related.
For example, you have data on how many cars were sold in a dealership broken down by model.
You might display this as a column or a bar chart if you were just displaying this by
itself, but suppose you wanted to add in a related quantitative variable such as the
amount of display space each model was given or the price of each model.
You could do this by putting the cars sold into a column graph and then adding a line graph
on top of it to show the amount of display space each model received.
So the point of this analysis is to understand that choosing a graph should not be random.
You should have a reason for your choice, and the graph you choose should be a good fit
for the point you are trying to communicate.
I regularly see examples in the media of graphs that are not appropriate, that are done incorrectly,
or that violate the conventions.
At best these are just stupid mistakes, but at worst they are examples of what to me
is just plain lying.
As an example, just in the last few days I saw an example of a graph where the vertical
axis was reversed, so that the lower numbers were on top and the numbers increased as you
went down.
This is of course the exact opposite of what convention tells all of us to expect, and
I believe it was done deliberately to mislead people.
When I was not totally surprised, this happened to be a contentious political issue.
To hear defenders say, well if you are too stupid to read a graph, but I am firmly in the
school that says clear and honest communication is the point in using graphs properly as essential
to that communication.
To see some examples of creative ways of lying with graphs, and therefore things you should
avoid, there is an article from simply speaking.
It goes through some of these, and that is also in the show notes.
So wrapping up now, this is Huka, signing off, and reminding you as always, don't forget
to support FreeSoftware.
Bye-bye.
You have been listening to Hacker Public Radio, or TechUpublicRadio.
We are a community podcast network that releases shows every weekday Monday through Friday.
Today's show, like all our shows, was contributed by a HBR listener like yourself.
If you ever consider recording a podcast, then visit our website to find out how easy
it really is.
Hacker Public Radio was founded by the Digital Dark Pound and the Infonomicom Computer
Club.
HBR is funded by the binary revolution at binref.com, all binref projects are proudly sponsored
by Luna Pages.
For shared hosting to custom private clouds, go to LunaPages.com for all your hosting
needs.
Unless otherwise stasis, today's show is released under a creative comments, attribution, share
a like, free dose of license.