Files
hpr-knowledge-base/hpr_transcripts/hpr0029.txt

255 lines
16 KiB
Plaintext
Raw Normal View History

Episode: 29
Title: HPR0029: Codecs Part 2
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0029/hpr0029.mp3
Transcribed: 2025-10-07 10:28:11
---
music
This is Hacker Public Radio, my name is Clat 2.
This is the second episode in an in-depth series on video codecs.
In the last episode we spoke about why codecs exist, which is essentially to translate
analog video or rather analog visuals into a digital stream of information.
We also spoke about why compression exists, which is so that our computers can handle
all of that digital information.
And we also spoke about compression for delivery of video, meaning sending our video that
we've taken and edited out to different devices like an Okea in 800 or a DVD or posting
it online for download.
And the reason that compression exists for that is so that people can actually get the
video on a reasonable download time, or so we could fit it onto a physical DVD disc,
or so we could fit it onto our Nokia in 800 or whatever.
Now we're going to talk about the technique of compression.
As I said in the last episode, in Linux, the way I compress is simply in the command line,
but there are probably GUI tools to do the compressing.
For instance, when you're using Thogin DVD Ripper or DVD RIP or any of those kinds of programs,
you are compressing, well you're transcoding, so you're taking a video source that has
actually already been compressed into, for instance, InPEG2 and you are translating
it into another compression format with another codec.
So if you've done that, you've already, you're doing what we're talking about right now.
What happens during compression?
Well, there's something called spatial compression and temporal compression.
Spatial compression basically takes the frame that you see in front of you, for instance,
if you're watching a video and you have it on pause, it would take that frame, the current
frame and compress it, just as if though it was compressing it into a JPEG or a PNG,
so it really is compressing it like a still frame.
I know I said last time that you shouldn't look at video as a series of still images,
but as a collection of pixels, that's still true, but it's still, it's sometimes useful
to think of it as, in this case, when it's compressing that current frame, it takes
that whole set of pixels and commits it into an image with some kind, with some amount
of compression.
So obviously the less compression, the more chroma and luma levels that it will see
and it will recognize.
The more compression you put onto it, the pixels are still the same, but it's kind of
looking at kind of a group of pixels, uncompressed, it would see every pixel for pixel.
If you compress it, it's kind of looking at big groups of pixels and kind of making a
judgment, just sort of like, well, that group of eight pixels is predominantly this shade
of blue, so I'm going to cut out all that variation and just make it this shade.
And you do that across the entire frame and you get a pretty good representation of what
you started out with, but obviously you're losing a lot of shades of color and sharpness
and things like that.
That's spatial compression.
There's also temporal compression, which is a really, really fancy way of compressing
video and it's responsible for really cutting down on file sizes.
The way it does it is it looks really at the pixels and it sees all those pixels and
then it looks at the next frame and it sees those pixels and it compares the two.
And it says, well, this pixel here was blue on the current frame and it's still blue
on the next frame.
So I'm just going to encode that once and it looks at the next pixel.
Well, this one really got a lot darker on the next frame.
So I'm going to have to encode that twice and it goes through every pixel and figures
out what it needs to do.
The higher compressed, the lower the less amount of compression you apply, the more precise,
the more strict it will be.
So if there's, if it went to blue to blue, if it changed shade, it's still going to encode
it twice.
If you compress it more, it's going to look at blue to blue and if it's just approximately
the same shade, more or less, it'll just encode it once.
That's what's going on during compression in terms of the actual image that you're seeing,
like the pixel for pixel encoding.
There are different kinds of compression called iframes and there are also p-frames and there
are b-frames.
So the iframe would be what we were talking about in the spatial compression, where it
looks at the current frame and it compresses it more or less as a JPEG or a PNG.
And that's just one set of pixels and that's how it is.
So it's kind of a, it's a freeze, you know, it's that image, that's the iframe, that's
the intra frame.
They're spatially encoded, it's the highest quality image that is going to appear in your
compressed video.
Now, if it's the highest quality, obviously it's also going to contain the most bits.
If we look at it on a frame by frame basis, it would have the most bits out of any of
the other frames.
Now there isn't just one iframe in your compressed video, there's going to be a certain number
of iframes every second.
And so each of those iframes are going to be really, really high quality pictures and
images that will kind of stimulate our eye.
And if we get enough of those iframes, our eyes will probably fool us into thinking that
it's a pretty sharp looking video.
Can't have too many of these because if you have so many iframes, you're not really
compressing it that much.
So deciding on how many iframes you want per second is important.
It's also something that we'll get into in a little while.
First, there's the iframes.
So these are called predictive frames.
And these are not spatially compressed.
They're not iframes.
A p frame is basically a mathematical calculation of what that frame is going to look like using
the current frame plus the frame immediately preceding it.
So it's kind of like p frame equals iframe minus p frame, you know, because it's going
to take the information from the current frame, it's going to compare it to the frame
before it.
And it's going to calculate what needs to happen.
And that's where it gets involved with the temporal compression.
So that if a if a pixel from the the preceding frame looked really blue and it's still pretty
much blue now in the current frame, it's not going to re encode that pixel.
It's just going to borrow that pixel from the previous frame and use it again cuts down
on your bit rate or how many, you know, how much information you've got in your in your
end result.
If you've got, for instance, a black frame right now and the frame before it was black,
you can see how really, even though I've just talked about two frames, we're really only
talking about one because the p frame is just going to multiple, it's just going to use
the information from the previous frame.
So, okay, so it can cut down a lot on your, your end, your end file size.
Okay, so the other, the last way to encode is a b frame.
B frames are yet another mathematical calculation, but this time they use the current frame, the
previous frame and the next frame.
So it's looking both ways.
B frame, I believe, stands for like bi-directional or something like that.
It's looking both ways to calculate what the current frame is going to look like.
B frames are really the worst quality.
They take the least amount of bits in your overall scheme of things, but they tend to
be used fairly often because there's just so much movement going on and the human eye
is fairly forgiving about that, that putting in a lot of b frames, a lot of times cuts down
so heavily on the file size that people do like to use it.
You will see these structures mapped out when you're compressing, if you're like people
who do this all the time, will actually map out the structure of the i and the p and
the b frames.
So you might have something where you want to use an i frame and then a p frame and then
two b frames and then another p frame and then another i frame, and that'll just kind
of loop over and over and over again and that would be your structure for the, for the
your compression.
All this has to do with basically key frames.
If you hear a compressor talking about key frames or a compression program asking about
key frames, it's not really anything more than just how many i frames do you want per
second.
So the lower your key frame number would mean the higher number of key frames per second
because if you have, for instance, a key frame every 15 frames, then you're basically
going to have an i frame every 15 frames and that will be an okay looking picture probably
like I say it always depends because it really just depends on what you are encoding.
If you bump that number up a key frame every 60 frames, it's going to look a lot different
because you're not going to have any really good looking image until 60 frames, you know,
every 60 frames which is like two seconds really.
So that's i, p and b frames, concept of key frames, spatial compression, those are the
good ones, temporal compression, that's the mathematical stuff.
Another variable here is frame rate.
This is not the frame rate at which the video is captured.
Most cameras, consumer cameras, capture at a frame rate of 29.97 frames per second.
Again, the weirdest thing about video is that every time anyone's talking about frames,
there's no such thing anyway.
It's all esoteric, it's the computer can assign any number of frames to video that it
pleases.
The only reason we talk about frames in video is because we were all, we all cut our teeth
on film and so you think of it in frames per second.
But in video, there's no, there isn't ever really a still image.
It's bits that some kind of program is decoding so that we can see it as quote a frame.
A frame rate in the case of compression has to do with really our perception of frames
per second.
And so a higher frame rate would be smoother motion to our eye, whereas cutting down the
frame rate is going to look not quite as smooth.
So if you see video online and it just doesn't, it looks like online video, right?
You see it and it just looks like video that you find online.
Hard to express what that is, but I think you probably know what I'm talking about.
It's just kind of, it's not as smooth and nice looking as something we would see, you
know, on if we rented a DVD or something.
Frame rates for compression is perceptual only, a higher number is going to be smoother
and obviously anytime it's smoother and nicer, it's going to be a bigger file size.
Lower number, a little bit chopier, will require less, less of a bit rate in the end.
It's going to be a smaller file size.
It just depends on the video, of course, as usual, as to what you're going to have to
do for your frame rate.
Usually you can get away with 15 frames per second and be pretty happy with it.
But if it's one of those situations where you're not that worried about your file size,
go with the native frame rate, you know, if you captured it 2997, go with 2997, capture
it 24, go with 24, whatever.
Not really a need to ever go higher than what the native frame rate is.
The last variable, really, of any importance is bit rate.
So bit rate is a direct result of the file size.
So if you've got a file size, I don't know, 50 megabytes and you're going to send it from
a computer to another computer, from the internet to your computer, there's just, it's
a simple calculation, you know, I mean, if the file is 50 megabytes and you've got a
delivery, a bandwidth of, you know, one megabyte per second, you're not going to have
a hard time getting all that information to the end user pretty quickly.
And if you've got a much larger file size and you've got a smaller bandwidth size, then
suddenly it's going to be a problem and there's going to, people are going to have to be
waiting around for it to, to download and kind of cash into their computer.
So bit rate is basically the, the amount of time it takes the video information to get
from whatever source it's coming from onto whatever delivery it is going to.
This applies not just to internet.
You kind of think of it as an internet issue because you go to archive.org or something
and you watch a video or YouTube, you watch a video, the bit rate determines how quickly
it gets from the server to your computer and whether you have to sit down and wait for
it to load first or whether you just turn press play and bang, it starts playing smoothly
without any kind of interruption.
This also applies from like a DVD that you pop into your DVD player and watch on your
TV.
The information has to get off of that disc and out to your TV.
It has to be decoded so that it can be seen as, you know, representation of something
visual and that takes time and the DVD player and your TV can only handle so much information
at one time, which is why that it's actually literally possible to give something too much
of a bit rate when you're compressing it.
It's quite common people who don't maybe know what they're doing when they're encoding
their video to DVD.
They think, well, the higher the bit rate, the better, right, the better the quality.
Put it into the DVD player and it won't even play because it's just 25 megabits per second
when typical consumer DVD player, standard definition DVD players can handle maximum 8 megabits
per second.
Bit rate is important.
You have to know your video and you have to know what you can squeeze out of your delivery
method.
So as long as you've been reasonable with your amount of key frames and you think you're
going to have a nice clear image and you've been reasonable with your frame rate and you
think you're going to have a nice smooth image, your bit rate can theoretically be really
anything.
You have to kind of know what it is that you're, you know, what file size are you shooting
for, what, what delivery method are you going out to and you must also remember that audio
and video are both being figured in here so you're going to have to, that they both contribute
to your bit rate.
Obviously frame rate and key frames does nothing to do with your audio.
Your audio signals apart from that, but you're, you're in result of bit rate.
If you've got really, really great sound, you're stealing bit rate from your video.
If you've got really, really great video and ignoring your audio, you're stealing bit
rate from your audio.
Something's going to have to get compressed.
How long is the video?
If it's an hour going to DVD, you can really, you can give it a lot of bit rate, six, seven,
seven point two megabits per second, three hours of home video that you're going to just
try to get onto a DVD, you're going to be cutting it way down like to two megabits
per second just so it fits on the physical disc.
Knowing what you know, if you're going out to the internet, think about what you're
banned with is what your download speeds typically are and shoot for something that you think
is reasonable.
There's not really a recipe or a cheat sheet, you just have to look at the video and kind
of get a feel for what that video is going to require.
The key to good compression is practice.
If you look at the video and then you go and encode it and then you look at the end result
and you've taken careful notes, you're going to see what happens with certain, under certain
conditions to certain video.
If your computer isn't doing anything at night while you sleep, there's really not,
if you're trying to get into video compression, there's no reason that you shouldn't have
that computer compressing video so that you can wake up and see what you've done.
Anything else is essentially in traditional terms, it would be wasted CPU cycles, right?
The computer's there, you might as well use it.
So practicing and kind of comparing your source versus your end result, looking at the
file size, seeing what variables did what to your video, that's the key to good compression.
Now you know what the variables are, good luck.
It's not as hard as you think.
The frame rate, like I say, perceptual frame rate, what you, the smoothness or not smooth,
key frames, clarity of image, bit rate, that's the kicker, that's the one that's tough.
The I and the B and the P frames, I wouldn't worry so much about that.
A lot of the codecs that you're going to be using kind of has that all preset for you anyway.
And if nothing else, maybe they'll ask you about the key frame rate so you know right
there that that's talking about how many I frames, how clear it's going to be.
There you go.
Thanks for listening.
One more episode probably, maybe two, will delve into the specific codecs that you and
I encounter on an everyday basis.