- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
283 lines
15 KiB
Plaintext
283 lines
15 KiB
Plaintext
Episode: 2955
|
|
Title: HPR2955: Machine Learning / Data Analysis Basics
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2955/hpr2955.mp3
|
|
Transcribed: 2025-10-24 13:49:38
|
|
|
|
---
|
|
|
|
This is HPR episode 2955 for Friday the 29th of November 2019.
|
|
Today's show is entitled, Machine Learning, Data Analysis Basics,
|
|
and it's the first show by our new host, Daniel Piersen's.
|
|
It's about 22 minutes long and carries a clean flag.
|
|
The summary is, we talk about different machine learning techniques.
|
|
This episode of HPR is brought to you by Ananasthost.com.
|
|
Get 15% discount on all shared hosting with the offer code HPR-15.
|
|
That's HPR-15.
|
|
Better web hosting that's honest and fair at Ananasthost.com.
|
|
This is HPR-15.com.
|
|
This is HPR-15.com.
|
|
This is HPR-15.com.
|
|
This is HPR-15.
|
|
This is HPR-15.
|
|
This is HPR-15.
|
|
Music
|
|
Hello hackers and welcome to another podcast.
|
|
Today I'm going to talk about machine learning basics,
|
|
and the specific topic I'm going to cover today is some classification basics.
|
|
So there are many types of machine learning,
|
|
but one of the problems you can solve with machine learning is classification.
|
|
And classification is where you take some input features
|
|
and determine a specific classification on the other end.
|
|
So for instance you can have a lot of weather data
|
|
and on the other end you can determine if it's going to be sunny or not.
|
|
And when it comes to classification and the absolute basics,
|
|
we can see that we have a lot of different features
|
|
and we want some specific outcome.
|
|
So what is the most simple thing or the most basic thing that we can do
|
|
in order to figure out a specific target?
|
|
And this might seem a little bit strange,
|
|
but the simplest you can say is that you just pick one.
|
|
So for instance, if I have a lot of features, weather data,
|
|
and I'm going to see if it's sunny or if it's cloudy,
|
|
I can just say it's always sunny.
|
|
And this part here we are trying to cover is actually some kind of baseline.
|
|
And if I say that for this specific machine learning technique called 0R
|
|
is that you have zero inputs and you still come with one output.
|
|
In this case, you can say that if a 0R will give you, for instance,
|
|
it's always sunny.
|
|
And that's true, let's say 26% of the time.
|
|
If you have any other machine learning technique
|
|
that is worse than 26%, then you are worse than the baseline
|
|
and you probably need to try to configure that thing even better
|
|
because you are actually worse than just picking one at random.
|
|
The next step up, if you are going to be a little bit more advanced,
|
|
is that you have some rule that applies to only one of the features.
|
|
So this is called 1R.
|
|
And for instance, if we say that if it's humidity above 80%,
|
|
then it's probably not sunny.
|
|
And if it's below 80%, then it's sunny.
|
|
You can just say that that is our simple rule.
|
|
We only look at one of the features and we try to determine if it's sunny or not.
|
|
So that's 1R.
|
|
And that should give you a better result than 0R,
|
|
but there are much more significant techniques that can give you even better result.
|
|
The next thing that we can do after that is called naive base.
|
|
And that is also a quite naive solution,
|
|
but it looks for correlations between all the features to take all your features.
|
|
And then you say that the humidity data,
|
|
it's probably 60% interesting for if it's sunny or not.
|
|
And let's say the temperature is interesting to 20%,
|
|
and then you can say that the amount of light is interesting to 20%.
|
|
It's a very weird example, but in this case,
|
|
you give the different features, different probabilities,
|
|
and you calculate depending on the features,
|
|
what the probability of the different outcomes are.
|
|
It looks at many different features and uses very simple math to actually figure out
|
|
which result should be evaluated to.
|
|
And this is quite often a pretty good algorithm.
|
|
It can actually give results up to 80% correct,
|
|
which is pretty good for a machine learning technique.
|
|
The next one I want to talk about,
|
|
and now we are getting into some very complicated ones
|
|
is the nearest neighbor algorithm.
|
|
So in this case, we have a bunch of different data,
|
|
a lot of different features.
|
|
We have a lot of examples where features
|
|
has been determined to specific outcomes.
|
|
So for instance, we can have a bunch of data that points to sunny.
|
|
We can have a bunch of data that points to cloudy
|
|
and a bunch of data pointing to rainy.
|
|
And the nearest neighbor will look for each feature
|
|
which of the results are closest to the feature
|
|
that we are looking at at the moment.
|
|
By looking for the distance between the feature result,
|
|
this feature and the other features,
|
|
we can find which outcomes should be the probable best one.
|
|
And this could actually give you a really good result,
|
|
an 85, 86 or even higher.
|
|
So it's a little bit more complicated.
|
|
The math behind this is also a bit strange,
|
|
but it's very good for figuring out what the features
|
|
will actually give in result.
|
|
Next up, I want to talk about decision trees.
|
|
And decision trees are, as they sound,
|
|
something that goes from a root and then down to leaf notes,
|
|
taking specific decisions on the way.
|
|
A simple thing can be that we just say that the feature,
|
|
for instance, rain amount is very interesting
|
|
if you want to see if it's raining or sunny.
|
|
So the humidity, for instance, we can look at that
|
|
as the first feature.
|
|
And we can just say if it's above a specific range,
|
|
we go one path down the tree.
|
|
And if it's below that rain, that specific value,
|
|
we go down the other path of the tree.
|
|
And then we look at the different feature.
|
|
For the next leaf, we could look at temperature, for instance.
|
|
And let's say that it's above a specific degree
|
|
and then below a specific degree.
|
|
And that will all bring branch out to the next result.
|
|
And these decision trees are usually called C4.5
|
|
or there's a Java version for it called J48.
|
|
And these are quite good, actually, fast.
|
|
They take a while to actually train to figure out
|
|
which are the best features to look at.
|
|
And then how should you arrange your tree?
|
|
But when you have done that, they are quite efficient
|
|
to actually predict the results.
|
|
If you want to be even more advanced, you can use a random forest.
|
|
And this is a larger training set
|
|
where you train a lot of different trees.
|
|
It can be hundreds of trees or thousands of trees
|
|
that you train.
|
|
And you give them a little bit different structure.
|
|
So for instance, in one case, you can say
|
|
the best should be to look at the humidity,
|
|
but you can have another that starts off by looking at the temperature
|
|
and they can have different values all down the leaf notes.
|
|
And when you have created your random forest
|
|
for all your features, you can ask this forest
|
|
as a complete entity.
|
|
What should the result be?
|
|
And then the different trees will figure out different predictions
|
|
and then they will vote.
|
|
So each tree will have one vote for a specific predicted outcome.
|
|
And then you take the most votes
|
|
and give that as your predicted outcome.
|
|
So it's actually a kind of process of spreading out the knowledge
|
|
to a larger set of feature detections.
|
|
The next thing I will talk about is support vector machines.
|
|
And these can be one-dimensional, two-dimensional, three-dimensional,
|
|
can be how many dimensions as you want.
|
|
And dimensions are dependent on how many features you have.
|
|
And then you try to, for each feature,
|
|
split the set of examples into different spaces.
|
|
So for instance, you can say that I want to split
|
|
this specific, these examples that are in the volume
|
|
or the, I think it's easier to talk in the area.
|
|
So let's say that we have some examples in one point of the area
|
|
and then you have some examples on another place in the area
|
|
and you draw a line between them
|
|
where the distance between the line and the different resulting examples
|
|
are as large as possible.
|
|
And that's the way for you to actually separate the different predictions.
|
|
And these lines can also be curves.
|
|
So it's more of mathematical structures or mathematical expressions
|
|
that will separate your predicted outcomes in different areas.
|
|
Next up, we come to something that I thought was really interesting.
|
|
It's the multi-layer perceptron.
|
|
I actually thought it was so interesting that I went out and implemented one myself
|
|
just to try to figure out how it actually worked.
|
|
And a multiple layers perceptron
|
|
is something that you have an input layer will all your features
|
|
and these features will in some way activate a second layer
|
|
depending on what the feature says.
|
|
So for instance, if we have a feature coming in that says that the humidity is 68%.
|
|
That will actually activate some of the nodes in the hidden layer,
|
|
if we call it the hidden layer one.
|
|
So we will get some activations in that hidden layer.
|
|
And then those activations in the hidden layer one for all those nodes
|
|
will activate some layers in or some nodes in the hidden layer two.
|
|
And that in turn will activate some in the hidden layer three
|
|
and so on all the way to your output layer.
|
|
So depending on your inputs, you will have some activations
|
|
which is a value between 0 and 1 and it's a floating value.
|
|
So the activation number can be a very distinct value.
|
|
And depending on how it actually activates through these hidden layers,
|
|
it will activate the output layer with a specific prediction.
|
|
So if you put in a specific amount of features in one end,
|
|
you will get out predicted ranges on the other end
|
|
and all of those will sum up to one.
|
|
So for instance, if we put in some humidity, some temperature and so on
|
|
and then we will get out on the other end,
|
|
it will be 68% sunny, 32% cloudy or something.
|
|
So you get a prediction which is spread out of over all of your possible output predictions.
|
|
And this multi layer perceptron is something that you need to train.
|
|
So you will set this up and then you will back train it.
|
|
So if you actually go through and you get the specific output,
|
|
you will change that output and then you will change all the nodes.
|
|
So they will trigger differently depending on the actual outputs
|
|
that you want it to create from the input that you put.
|
|
So this is something that you will run multiple times in order or multiple epochs.
|
|
So what you call it when you send all your examples through.
|
|
So you will run it through multiple epochs and you will create hidden layers
|
|
that are able to get a specific prediction depending on specific inputs.
|
|
The last thing I want to talk about is convolutional neural networks.
|
|
And these are some of the things that we are using today for classification.
|
|
And these are built into networks called so as pie torch or into tensor flow and so on.
|
|
So these are very sophisticated networks that can predict all the way from specific input data.
|
|
We have talked about now, temperature and so on to actually images or sounds and so on.
|
|
And they are also reversible so you can actually create data from from them as well
|
|
and create new sounds or new music or new images.
|
|
But I want to talk about the classification part now.
|
|
And there a convolutional network is actually something where you go through a specific amount
|
|
of layers that you have set up.
|
|
And first thing you do is take your inputs, put them through convolution,
|
|
which will create multiple feature maps, which is pretty much similar to a multi-layer perceptron.
|
|
But you get a lot of output layers with results in those.
|
|
Then what you do off that is a sub sampling where you actually shrink the result down to a smaller layer.
|
|
So you get hits in your feature map on specific points and then you shrink it down.
|
|
Let's say you shrink it down two by two.
|
|
That means that for each four in a square, you will just get one result out.
|
|
So you shrink it down to a smaller feature map after that you run another convolution
|
|
and get a lot more feature maps, but they are smaller.
|
|
And then you sub-sampler sample again.
|
|
So you get a smaller feature set, but a lot more feature maps.
|
|
And then you can do convolution again, sub-sampling again.
|
|
And depending on how many of these iterations you do or layers, as you call them,
|
|
you have a deeper network.
|
|
And in the end, after you have done all these convolution sub-sampling and so on,
|
|
you can have a fully connected layer.
|
|
And that's very similar to the multi-layer perceptron,
|
|
because that is where you get all the inputs in and get all outputs out.
|
|
And that will actually give you the specific prediction.
|
|
And I'm just going to look at some statistical data just to you see how much better
|
|
this result has been during the years.
|
|
And this data is a little bit old, so it's from 2010 to 2015,
|
|
but 2010 on image net classification.
|
|
So this is actually a set of images that you needed to predict what thing was in the image.
|
|
It could be a bicycle or a car or something like that.
|
|
And you needed to classify this image.
|
|
What can you see in this image?
|
|
And for the first year, 2010, they had a failure rate of 28.
|
|
So that's a lot.
|
|
So every fourth image was wrong.
|
|
And so, and these were quite shallow, these network.
|
|
And 2011 was not much better.
|
|
They were at 25% also a shallow.
|
|
And then you actually started to use convolutional neural networks that was a little bit deeper.
|
|
So they went up to eight layers and came down to 16%.
|
|
So pretty much eight of every sample was wrong, but that's still much better.
|
|
And then 2013, they still had eight layers, but they went down to 11% failure rate.
|
|
So every 10th of the samples was wrong.
|
|
2014, they had 19 layers and was down to 7.3 in their failure rate.
|
|
2014, they had 22 layers and they had 7.6 in their failure rate.
|
|
And the end of this diagram here, you have 2015, the rest net had 152 layers.
|
|
And it had a failure rate of 3.57.
|
|
So, and they have become much more sophisticated after that.
|
|
And the failure rate is very much improved.
|
|
And the layers or the amount of layers has increased exponentially.
|
|
I think Google has set where they actually have a structure where you do multiple convolutions
|
|
and subsampling in one structured entity.
|
|
And then they had put those into layers, which means that their networks are really deep
|
|
and can do a lot of computations and get a very good predictions out of them.
|
|
So this is a growing field and there's a lot of work going into improving these kind of tooling
|
|
in order to get good predictions of classification of different data.
|
|
So this was what I wanted to cover today.
|
|
I hope that you found this interesting.
|
|
I hope that you learned something.
|
|
I usually do YouTube videos so you can find me there if you search for my name.
|
|
So until next time, have a good one.
|
|
You've been listening to Hecker Public Radio at HeckerPublicRadio.org.
|
|
We are a community podcast network that releases shows every weekday, Monday through Friday.
|
|
Today's show, like all our shows, was contributed by an HBR listener like yourself.
|
|
If you ever thought of recording a podcast, then click on our contribute link
|
|
to find out how easy it really is.
|
|
Hecker Public Radio was founded by the digital dog pound and the Infonomicon Computer Club
|
|
and is part of the binary revolution at binrev.com.
|
|
If you have comments on today's show,
|
|
please email the host directly.
|
|
Leave a comment on the website or record a follow-up episode yourself.
|
|
Unless otherwise stated, today's show is released under creative comments,
|
|
attribution, share-like, 3.0 license.
|