Episode: 29 Title: HPR0029: Codecs Part 2 Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0029/hpr0029.mp3 Transcribed: 2025-10-07 10:28:11 --- music This is Hacker Public Radio, my name is Clat 2. This is the second episode in an in-depth series on video codecs. In the last episode we spoke about why codecs exist, which is essentially to translate analog video or rather analog visuals into a digital stream of information. We also spoke about why compression exists, which is so that our computers can handle all of that digital information. And we also spoke about compression for delivery of video, meaning sending our video that we've taken and edited out to different devices like an Okea in 800 or a DVD or posting it online for download. And the reason that compression exists for that is so that people can actually get the video on a reasonable download time, or so we could fit it onto a physical DVD disc, or so we could fit it onto our Nokia in 800 or whatever. Now we're going to talk about the technique of compression. As I said in the last episode, in Linux, the way I compress is simply in the command line, but there are probably GUI tools to do the compressing. For instance, when you're using Thogin DVD Ripper or DVD RIP or any of those kinds of programs, you are compressing, well you're transcoding, so you're taking a video source that has actually already been compressed into, for instance, InPEG2 and you are translating it into another compression format with another codec. So if you've done that, you've already, you're doing what we're talking about right now. What happens during compression? Well, there's something called spatial compression and temporal compression. Spatial compression basically takes the frame that you see in front of you, for instance, if you're watching a video and you have it on pause, it would take that frame, the current frame and compress it, just as if though it was compressing it into a JPEG or a PNG, so it really is compressing it like a still frame. I know I said last time that you shouldn't look at video as a series of still images, but as a collection of pixels, that's still true, but it's still, it's sometimes useful to think of it as, in this case, when it's compressing that current frame, it takes that whole set of pixels and commits it into an image with some kind, with some amount of compression. So obviously the less compression, the more chroma and luma levels that it will see and it will recognize. The more compression you put onto it, the pixels are still the same, but it's kind of looking at kind of a group of pixels, uncompressed, it would see every pixel for pixel. If you compress it, it's kind of looking at big groups of pixels and kind of making a judgment, just sort of like, well, that group of eight pixels is predominantly this shade of blue, so I'm going to cut out all that variation and just make it this shade. And you do that across the entire frame and you get a pretty good representation of what you started out with, but obviously you're losing a lot of shades of color and sharpness and things like that. That's spatial compression. There's also temporal compression, which is a really, really fancy way of compressing video and it's responsible for really cutting down on file sizes. The way it does it is it looks really at the pixels and it sees all those pixels and then it looks at the next frame and it sees those pixels and it compares the two. And it says, well, this pixel here was blue on the current frame and it's still blue on the next frame. So I'm just going to encode that once and it looks at the next pixel. Well, this one really got a lot darker on the next frame. So I'm going to have to encode that twice and it goes through every pixel and figures out what it needs to do. The higher compressed, the lower the less amount of compression you apply, the more precise, the more strict it will be. So if there's, if it went to blue to blue, if it changed shade, it's still going to encode it twice. If you compress it more, it's going to look at blue to blue and if it's just approximately the same shade, more or less, it'll just encode it once. That's what's going on during compression in terms of the actual image that you're seeing, like the pixel for pixel encoding. There are different kinds of compression called iframes and there are also p-frames and there are b-frames. So the iframe would be what we were talking about in the spatial compression, where it looks at the current frame and it compresses it more or less as a JPEG or a PNG. And that's just one set of pixels and that's how it is. So it's kind of a, it's a freeze, you know, it's that image, that's the iframe, that's the intra frame. They're spatially encoded, it's the highest quality image that is going to appear in your compressed video. Now, if it's the highest quality, obviously it's also going to contain the most bits. If we look at it on a frame by frame basis, it would have the most bits out of any of the other frames. Now there isn't just one iframe in your compressed video, there's going to be a certain number of iframes every second. And so each of those iframes are going to be really, really high quality pictures and images that will kind of stimulate our eye. And if we get enough of those iframes, our eyes will probably fool us into thinking that it's a pretty sharp looking video. Can't have too many of these because if you have so many iframes, you're not really compressing it that much. So deciding on how many iframes you want per second is important. It's also something that we'll get into in a little while. First, there's the iframes. So these are called predictive frames. And these are not spatially compressed. They're not iframes. A p frame is basically a mathematical calculation of what that frame is going to look like using the current frame plus the frame immediately preceding it. So it's kind of like p frame equals iframe minus p frame, you know, because it's going to take the information from the current frame, it's going to compare it to the frame before it. And it's going to calculate what needs to happen. And that's where it gets involved with the temporal compression. So that if a if a pixel from the the preceding frame looked really blue and it's still pretty much blue now in the current frame, it's not going to re encode that pixel. It's just going to borrow that pixel from the previous frame and use it again cuts down on your bit rate or how many, you know, how much information you've got in your in your end result. If you've got, for instance, a black frame right now and the frame before it was black, you can see how really, even though I've just talked about two frames, we're really only talking about one because the p frame is just going to multiple, it's just going to use the information from the previous frame. So, okay, so it can cut down a lot on your, your end, your end file size. Okay, so the other, the last way to encode is a b frame. B frames are yet another mathematical calculation, but this time they use the current frame, the previous frame and the next frame. So it's looking both ways. B frame, I believe, stands for like bi-directional or something like that. It's looking both ways to calculate what the current frame is going to look like. B frames are really the worst quality. They take the least amount of bits in your overall scheme of things, but they tend to be used fairly often because there's just so much movement going on and the human eye is fairly forgiving about that, that putting in a lot of b frames, a lot of times cuts down so heavily on the file size that people do like to use it. You will see these structures mapped out when you're compressing, if you're like people who do this all the time, will actually map out the structure of the i and the p and the b frames. So you might have something where you want to use an i frame and then a p frame and then two b frames and then another p frame and then another i frame, and that'll just kind of loop over and over and over again and that would be your structure for the, for the your compression. All this has to do with basically key frames. If you hear a compressor talking about key frames or a compression program asking about key frames, it's not really anything more than just how many i frames do you want per second. So the lower your key frame number would mean the higher number of key frames per second because if you have, for instance, a key frame every 15 frames, then you're basically going to have an i frame every 15 frames and that will be an okay looking picture probably like I say it always depends because it really just depends on what you are encoding. If you bump that number up a key frame every 60 frames, it's going to look a lot different because you're not going to have any really good looking image until 60 frames, you know, every 60 frames which is like two seconds really. So that's i, p and b frames, concept of key frames, spatial compression, those are the good ones, temporal compression, that's the mathematical stuff. Another variable here is frame rate. This is not the frame rate at which the video is captured. Most cameras, consumer cameras, capture at a frame rate of 29.97 frames per second. Again, the weirdest thing about video is that every time anyone's talking about frames, there's no such thing anyway. It's all esoteric, it's the computer can assign any number of frames to video that it pleases. The only reason we talk about frames in video is because we were all, we all cut our teeth on film and so you think of it in frames per second. But in video, there's no, there isn't ever really a still image. It's bits that some kind of program is decoding so that we can see it as quote a frame. A frame rate in the case of compression has to do with really our perception of frames per second. And so a higher frame rate would be smoother motion to our eye, whereas cutting down the frame rate is going to look not quite as smooth. So if you see video online and it just doesn't, it looks like online video, right? You see it and it just looks like video that you find online. Hard to express what that is, but I think you probably know what I'm talking about. It's just kind of, it's not as smooth and nice looking as something we would see, you know, on if we rented a DVD or something. Frame rates for compression is perceptual only, a higher number is going to be smoother and obviously anytime it's smoother and nicer, it's going to be a bigger file size. Lower number, a little bit chopier, will require less, less of a bit rate in the end. It's going to be a smaller file size. It just depends on the video, of course, as usual, as to what you're going to have to do for your frame rate. Usually you can get away with 15 frames per second and be pretty happy with it. But if it's one of those situations where you're not that worried about your file size, go with the native frame rate, you know, if you captured it 2997, go with 2997, capture it 24, go with 24, whatever. Not really a need to ever go higher than what the native frame rate is. The last variable, really, of any importance is bit rate. So bit rate is a direct result of the file size. So if you've got a file size, I don't know, 50 megabytes and you're going to send it from a computer to another computer, from the internet to your computer, there's just, it's a simple calculation, you know, I mean, if the file is 50 megabytes and you've got a delivery, a bandwidth of, you know, one megabyte per second, you're not going to have a hard time getting all that information to the end user pretty quickly. And if you've got a much larger file size and you've got a smaller bandwidth size, then suddenly it's going to be a problem and there's going to, people are going to have to be waiting around for it to, to download and kind of cash into their computer. So bit rate is basically the, the amount of time it takes the video information to get from whatever source it's coming from onto whatever delivery it is going to. This applies not just to internet. You kind of think of it as an internet issue because you go to archive.org or something and you watch a video or YouTube, you watch a video, the bit rate determines how quickly it gets from the server to your computer and whether you have to sit down and wait for it to load first or whether you just turn press play and bang, it starts playing smoothly without any kind of interruption. This also applies from like a DVD that you pop into your DVD player and watch on your TV. The information has to get off of that disc and out to your TV. It has to be decoded so that it can be seen as, you know, representation of something visual and that takes time and the DVD player and your TV can only handle so much information at one time, which is why that it's actually literally possible to give something too much of a bit rate when you're compressing it. It's quite common people who don't maybe know what they're doing when they're encoding their video to DVD. They think, well, the higher the bit rate, the better, right, the better the quality. Put it into the DVD player and it won't even play because it's just 25 megabits per second when typical consumer DVD player, standard definition DVD players can handle maximum 8 megabits per second. Bit rate is important. You have to know your video and you have to know what you can squeeze out of your delivery method. So as long as you've been reasonable with your amount of key frames and you think you're going to have a nice clear image and you've been reasonable with your frame rate and you think you're going to have a nice smooth image, your bit rate can theoretically be really anything. You have to kind of know what it is that you're, you know, what file size are you shooting for, what, what delivery method are you going out to and you must also remember that audio and video are both being figured in here so you're going to have to, that they both contribute to your bit rate. Obviously frame rate and key frames does nothing to do with your audio. Your audio signals apart from that, but you're, you're in result of bit rate. If you've got really, really great sound, you're stealing bit rate from your video. If you've got really, really great video and ignoring your audio, you're stealing bit rate from your audio. Something's going to have to get compressed. How long is the video? If it's an hour going to DVD, you can really, you can give it a lot of bit rate, six, seven, seven point two megabits per second, three hours of home video that you're going to just try to get onto a DVD, you're going to be cutting it way down like to two megabits per second just so it fits on the physical disc. Knowing what you know, if you're going out to the internet, think about what you're banned with is what your download speeds typically are and shoot for something that you think is reasonable. There's not really a recipe or a cheat sheet, you just have to look at the video and kind of get a feel for what that video is going to require. The key to good compression is practice. If you look at the video and then you go and encode it and then you look at the end result and you've taken careful notes, you're going to see what happens with certain, under certain conditions to certain video. If your computer isn't doing anything at night while you sleep, there's really not, if you're trying to get into video compression, there's no reason that you shouldn't have that computer compressing video so that you can wake up and see what you've done. Anything else is essentially in traditional terms, it would be wasted CPU cycles, right? The computer's there, you might as well use it. So practicing and kind of comparing your source versus your end result, looking at the file size, seeing what variables did what to your video, that's the key to good compression. Now you know what the variables are, good luck. It's not as hard as you think. The frame rate, like I say, perceptual frame rate, what you, the smoothness or not smooth, key frames, clarity of image, bit rate, that's the kicker, that's the one that's tough. The I and the B and the P frames, I wouldn't worry so much about that. A lot of the codecs that you're going to be using kind of has that all preset for you anyway. And if nothing else, maybe they'll ask you about the key frame rate so you know right there that that's talking about how many I frames, how clear it's going to be. There you go. Thanks for listening. One more episode probably, maybe two, will delve into the specific codecs that you and I encounter on an everyday basis.