hpr-knowledge-base/hpr_transcripts/hpr2955.txt

Episode: 2955
Title: HPR2955: Machine Learning / Data Analysis Basics
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2955/hpr2955.mp3
Transcribed: 2025-10-24 13:49:38

---

This is HPR episode 2955 for Friday the 29th of November 2019.
Today's show is entitled, Machine Learning, Data Analysis Basics,
and it's the first show by our new host, Daniel Piersen's.
It's about 22 minutes long and carries a clean flag.
The summary is, we talk about different machine learning techniques.
This episode of HPR is brought to you by Ananasthost.com.
Get 15% discount on all shared hosting with the offer code HPR-15.
That's HPR-15.
Better web hosting that's honest and fair at Ananasthost.com.
This is HPR-15.com.
This is HPR-15.com.
This is HPR-15.com.
This is HPR-15.
This is HPR-15.
This is HPR-15.
Music
Hello hackers and welcome to another podcast.
Today I'm going to talk about machine learning basics,
and the specific topic I'm going to cover today is some classification basics.
So there are many types of machine learning,
but one of the problems you can solve with machine learning is classification.
And classification is where you take some input features
and determine a specific classification on the other end.
So for instance you can have a lot of weather data
and on the other end you can determine if it's going to be sunny or not.
And when it comes to classification and the absolute basics,
we can see that we have a lot of different features
and we want some specific outcome.
So what is the most simple thing or the most basic thing that we can do
in order to figure out a specific target?
And this might seem a little bit strange,
but the simplest you can say is that you just pick one.
So for instance, if I have a lot of features, weather data,
and I'm going to see if it's sunny or if it's cloudy,
I can just say it's always sunny.
And this part here we are trying to cover is actually some kind of baseline.
And if I say that for this specific machine learning technique called 0R
is that you have zero inputs and you still come with one output.
In this case, you can say that if a 0R will give you, for instance,
it's always sunny.
And that's true, let's say 26% of the time.
If you have any other machine learning technique
that is worse than 26%, then you are worse than the baseline
and you probably need to try to configure that thing even better
because you are actually worse than just picking one at random.
The next step up, if you are going to be a little bit more advanced,
is that you have some rule that applies to only one of the features.
So this is called 1R.
And for instance, if we say that if it's humidity above 80%,
then it's probably not sunny.
And if it's below 80%, then it's sunny.
You can just say that that is our simple rule.
We only look at one of the features and we try to determine if it's sunny or not.
So that's 1R.
And that should give you a better result than 0R,
but there are much more significant techniques that can give you even better result.
The next thing that we can do after that is called naive base.
And that is also a quite naive solution,
but it looks for correlations between all the features to take all your features.
And then you say that the humidity data,
it's probably 60% interesting for if it's sunny or not.
And let's say the temperature is interesting to 20%,
and then you can say that the amount of light is interesting to 20%.
It's a very weird example, but in this case,
you give the different features, different probabilities,
and you calculate depending on the features,
what the probability of the different outcomes are.
It looks at many different features and uses very simple math to actually figure out
which result should be evaluated to.
And this is quite often a pretty good algorithm.
It can actually give results up to 80% correct,
which is pretty good for a machine learning technique.
The next one I want to talk about,
and now we are getting into some very complicated ones
is the nearest neighbor algorithm.
So in this case, we have a bunch of different data,
a lot of different features.
We have a lot of examples where features
has been determined to specific outcomes.
So for instance, we can have a bunch of data that points to sunny.
We can have a bunch of data that points to cloudy
and a bunch of data pointing to rainy.
And the nearest neighbor will look for each feature
which of the results are closest to the feature
that we are looking at at the moment.
By looking for the distance between the feature result,
this feature and the other features,
we can find which outcomes should be the probable best one.
And this could actually give you a really good result,
an 85, 86 or even higher.
So it's a little bit more complicated.
The math behind this is also a bit strange,
but it's very good for figuring out what the features
will actually give in result.
Next up, I want to talk about decision trees.
And decision trees are, as they sound,
something that goes from a root and then down to leaf notes,
taking specific decisions on the way.
A simple thing can be that we just say that the feature,
for instance, rain amount is very interesting
if you want to see if it's raining or sunny.
So the humidity, for instance, we can look at that
as the first feature.
And we can just say if it's above a specific range,
we go one path down the tree.
And if it's below that rain, that specific value,
we go down the other path of the tree.
And then we look at the different feature.
For the next leaf, we could look at temperature, for instance.
And let's say that it's above a specific degree
and then below a specific degree.
And that will all bring branch out to the next result.
And these decision trees are usually called C4.5
or there's a Java version for it called J48.
And these are quite good, actually, fast.
They take a while to actually train to figure out
which are the best features to look at.
And then how should you arrange your tree?
But when you have done that, they are quite efficient
to actually predict the results.
If you want to be even more advanced, you can use a random forest.
And this is a larger training set
where you train a lot of different trees.
It can be hundreds of trees or thousands of trees
that you train.
And you give them a little bit different structure.
So for instance, in one case, you can say
the best should be to look at the humidity,
but you can have another that starts off by looking at the temperature
and they can have different values all down the leaf notes.
And when you have created your random forest
for all your features, you can ask this forest
as a complete entity.
What should the result be?
And then the different trees will figure out different predictions
and then they will vote.
So each tree will have one vote for a specific predicted outcome.
And then you take the most votes
and give that as your predicted outcome.
So it's actually a kind of process of spreading out the knowledge
to a larger set of feature detections.
The next thing I will talk about is support vector machines.
And these can be one-dimensional, two-dimensional, three-dimensional,
can be how many dimensions as you want.
And dimensions are dependent on how many features you have.
And then you try to, for each feature,
split the set of examples into different spaces.
So for instance, you can say that I want to split
this specific, these examples that are in the volume
or the, I think it's easier to talk in the area.
So let's say that we have some examples in one point of the area
and then you have some examples on another place in the area
and you draw a line between them
where the distance between the line and the different resulting examples
are as large as possible.
And that's the way for you to actually separate the different predictions.
And these lines can also be curves.
So it's more of mathematical structures or mathematical expressions
that will separate your predicted outcomes in different areas.
Next up, we come to something that I thought was really interesting.
It's the multi-layer perceptron.
I actually thought it was so interesting that I went out and implemented one myself
just to try to figure out how it actually worked.
And a multiple layers perceptron
is something that you have an input layer will all your features
and these features will in some way activate a second layer
depending on what the feature says.
So for instance, if we have a feature coming in that says that the humidity is 68%.
That will actually activate some of the nodes in the hidden layer,
if we call it the hidden layer one.
So we will get some activations in that hidden layer.
And then those activations in the hidden layer one for all those nodes
will activate some layers in or some nodes in the hidden layer two.
And that in turn will activate some in the hidden layer three
and so on all the way to your output layer.
So depending on your inputs, you will have some activations
which is a value between 0 and 1 and it's a floating value.
So the activation number can be a very distinct value.
And depending on how it actually activates through these hidden layers,
it will activate the output layer with a specific prediction.
So if you put in a specific amount of features in one end,
you will get out predicted ranges on the other end
and all of those will sum up to one.
So for instance, if we put in some humidity, some temperature and so on
and then we will get out on the other end,
it will be 68% sunny, 32% cloudy or something.
So you get a prediction which is spread out of over all of your possible output predictions.
And this multi layer perceptron is something that you need to train.
So you will set this up and then you will back train it.
So if you actually go through and you get the specific output,
you will change that output and then you will change all the nodes.
So they will trigger differently depending on the actual outputs
that you want it to create from the input that you put.
So this is something that you will run multiple times in order or multiple epochs.
So what you call it when you send all your examples through.
So you will run it through multiple epochs and you will create hidden layers
that are able to get a specific prediction depending on specific inputs.
The last thing I want to talk about is convolutional neural networks.
And these are some of the things that we are using today for classification.
And these are built into networks called so as pie torch or into tensor flow and so on.
So these are very sophisticated networks that can predict all the way from specific input data.
We have talked about now, temperature and so on to actually images or sounds and so on.
And they are also reversible so you can actually create data from from them as well
and create new sounds or new music or new images.
But I want to talk about the classification part now.
And there a convolutional network is actually something where you go through a specific amount
of layers that you have set up.
And first thing you do is take your inputs, put them through convolution,
which will create multiple feature maps, which is pretty much similar to a multi-layer perceptron.
But you get a lot of output layers with results in those.
Then what you do off that is a sub sampling where you actually shrink the result down to a smaller layer.
So you get hits in your feature map on specific points and then you shrink it down.
Let's say you shrink it down two by two.
That means that for each four in a square, you will just get one result out.
So you shrink it down to a smaller feature map after that you run another convolution
and get a lot more feature maps, but they are smaller.
And then you sub-sampler sample again.
So you get a smaller feature set, but a lot more feature maps.
And then you can do convolution again, sub-sampling again.
And depending on how many of these iterations you do or layers, as you call them,
you have a deeper network.
And in the end, after you have done all these convolution sub-sampling and so on,
you can have a fully connected layer.
And that's very similar to the multi-layer perceptron,
because that is where you get all the inputs in and get all outputs out.
And that will actually give you the specific prediction.
And I'm just going to look at some statistical data just to you see how much better
this result has been during the years.
And this data is a little bit old, so it's from 2010 to 2015,
but 2010 on image net classification.
So this is actually a set of images that you needed to predict what thing was in the image.
It could be a bicycle or a car or something like that.
And you needed to classify this image.
What can you see in this image?
And for the first year, 2010, they had a failure rate of 28.
So that's a lot.
So every fourth image was wrong.
And so, and these were quite shallow, these network.
And 2011 was not much better.
They were at 25% also a shallow.
And then you actually started to use convolutional neural networks that was a little bit deeper.
So they went up to eight layers and came down to 16%.
So pretty much eight of every sample was wrong, but that's still much better.
And then 2013, they still had eight layers, but they went down to 11% failure rate.
So every 10th of the samples was wrong.
2014, they had 19 layers and was down to 7.3 in their failure rate.
2014, they had 22 layers and they had 7.6 in their failure rate.
And the end of this diagram here, you have 2015, the rest net had 152 layers.
And it had a failure rate of 3.57.
So, and they have become much more sophisticated after that.
And the failure rate is very much improved.
And the layers or the amount of layers has increased exponentially.
I think Google has set where they actually have a structure where you do multiple convolutions
and subsampling in one structured entity.
And then they had put those into layers, which means that their networks are really deep
and can do a lot of computations and get a very good predictions out of them.
So this is a growing field and there's a lot of work going into improving these kind of tooling
in order to get good predictions of classification of different data.
So this was what I wanted to cover today.
I hope that you found this interesting.
I hope that you learned something.
I usually do YouTube videos so you can find me there if you search for my name.
So until next time, have a good one.
You've been listening to Hecker Public Radio at HeckerPublicRadio.org.
We are a community podcast network that releases shows every weekday, Monday through Friday.
Today's show, like all our shows, was contributed by an HBR listener like yourself.
If you ever thought of recording a podcast, then click on our contribute link
to find out how easy it really is.
Hecker Public Radio was founded by the digital dog pound and the Infonomicon Computer Club
and is part of the binary revolution at binrev.com.
If you have comments on today's show,
please email the host directly.
Leave a comment on the website or record a follow-up episode yourself.
Unless otherwise stated, today's show is released under creative comments,
attribution, share-like, 3.0 license.