hpr-knowledge-base/hpr_transcripts/hpr0614.txt

Episode: 614
Title: HPR0614: Intro To Audio and Pod/Oggcasting
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr0614/hpr0614.mp3
Transcribed: 2025-10-07 23:53:50

---

music
music
music
music
Hello and welcome to the first installment of my hacker public radio series on an introduction
to audio and podcast and podcasting.
This first episode we're going to be talking about what audio is and some of the things
that we can do with it.
So first of all we should define what audio is.
What audio is are mechanical waves that are sent through the air in the form of vibration.
But typically in audio, especially when you're dealing with electronics, we refer to audio
in a sine wave.
And if you don't remember from high school science, what a sine wave is, you can think
of it kind of like a piece of spaghetti.
If you stretched it out, cooked spaghetti, of course, stretched out in front of you.
And you bent it into an S shape making continuous S shapes.
And that would be the sine wave, typically representing audio.
Now there's two different components to audio, the first is the frequency.
And the frequency is the distance between each of those S's horizontally.
So if you took it measured from each peak or valley to the next one, that would indicate
the frequency of the wave.
And then the height vertically, from the highest point to the lowest point, would represent
the amplitude.
So the closer together the frequency is, the closer together the humps are, the higher
the frequency, and the further apart the lower the frequency.
And then the taller, the louder it is, an amplitude, and the shorter, the quieter.
Typically in audio we refer to gain as an increase in amplitude.
And we do that through an amplifier.
But an amplifier can also what's called attenuate, which is the opposite of gain, and that's
make things quieter.
So typically when we're dealing with audio, we're using an amplifier to either increase
the audio level or decrease it.
And that's most of what we do, except for filtering, which is equalization.
And we'll talk more about that in a later series.
So with analog audio, we represent it in a sine wave.
But we can't represent digitally a sine wave perfectly, because we're not dealing with
all these different values on each point of the sine wave.
It's continuous, right?
So what we have to do is break it up for digital.
And if you can imagine how we do that is we take slices of the sine wave.
So if you took a knife and held it vertically, along with spaghetti noodle we're talking
about, and we started chopping it up, how many segments we get indicates the bit rate.
And the higher the bit rate that we're using, the more accurately we represent the sine wave
in the analog realm.
And we're able to reproduce that analog later when we do the conversion.
Oh, I should say that the conversion from analog to digital is called analog to digital
conversion or ADA.
And then the converse would be digital analog, which is a DA, digital to analog conversion.
So sometimes you hear people refer to a DAC, which is a digital digital to analog converter
or an AC, or ADC, sometimes you see CADC or DCCA or AC, you'll see that kind of terminology.
And they're just referring to the conversion between analog and digital and digital analog.
So the bit rate that we get is typically what you hear of in hertz.
So you'll hear like 44.1 kilohertz, you'll hear 48 kilohertz, 96 kilohertz, 192 kilohertz.
And that's how many slices we've taken of that audio.
Now, the human range of hearing for frequency is between about 20 hertz and 20,000 hertz.
And there's something called the Nyquist theorem that is a mathematical formula that says to overcome
the limitations of conversion to digital that you need to sample at twice the rate of the frequency range.
And so typically that would be 48 kilohertz.
And that's what you see in pro audio.
However, when they were first trying to convert audio to digital, it was expensive.
It takes a lot of disk space, it takes a lot of room.
And there wasn't a lot, disk space was extremely expensive.
So what they decided to do was to start storing the stuff on magnetic tape.
And the magnetic tapes of the time typically could hold a maximum and sample rate of 44.1 is what fit nicely on there.
And that's what is sometimes referred to as a dat tape or digital audio tape.
And it was developed as a way of storing digital audio on an inexpensive media.
And that's how we ended up with 44.1.
Now, a lot of consumer type stuff uses 44.1 because when the CD was introduced, they chose the dat format in 16 bit audio at 44.1 kilohertz.
And that's what you get off a CD.
But professionals, because it makes the math easier to do different kinds of things with your samples, stuck with the 48 kilohertz.
And that's why we have typically sarron cards today support by their 44.1 or 48 kilohertz.
So that represents how much we've chopped up each piece of the audio.
Now, how much room we give each slice is what we call bit depth.
That's how many bits each slice is allowed to take up.
Now, we talked just a second ago that CD audio is 16 bit.
But typically, most good equipment today will use 24 bit, which is really the maximum you're going to need for analog audio.
We don't really need much above 24 bits.
However, certain processing benefits from having a higher bit rate in the processing range.
So the program we're going to be talking about editing in later.
Arter uses an internal bit depth of 32 floating point bits as opposed to 24 bit.
But even on the professional end of audio, typically they stick with 24 bits.
Some of the lower end cards only support, especially USB cards, only support a 16 bit audio depth, which for the purposes we're talking about here for podcasting is going to be plenty adequate.
We don't want to use anything below that.
Typically, say, for a cell phone or the standard for, I should say, telephony audio is eight bits.
So we don't really want that low end of audio.
We want something that sounds at least reasonably clear.
That does make it a little more difficult to host callings or interviews.
But we'll talk about some solutions to that later.
So now we've sliced it up.
We have a bit rate and a bit depth and we have all this information, right?
And if we were to directly encode that onto the disk, we'd get what's called the wave container.
And that's just the raw samples taken from the conversion, whatever we're converting with.
And certain, certain converters support different bit rates, different bit depths.
But say, for example, my card supports 32 kilohertz, 44, one kilohertz, I'm sorry, 28 kilohertz, 32 kilohertz, 41.1 kilohertz, 48 kilohertz, and 96 kilohertz.
96 is overkill.
Some professional audio guys like using it, it takes up a lot of space and there isn't a ton of advantage in using it.
Unless you have really good analog to digital converters.
But what we get written to desk is the raw sample in a wave container.
And that wave container is wrapping up whatever we've recorded.
It doesn't do anything to the audio, it's just the raw samples.
We also have something called a broadcast wave, which can be handy when you're dealing with audio files,
because a broadcast wave not only has the raw samples, but also the time data that it was recorded at.
So typically, if you're recording in one audio editor that supports certain file formats and you're converting,
you want to use it in a different audio editor, and you need to export that, but keep things lined up in time.
That's why we use the broadcast wave because it makes sure everything's a light time wise.
For example, if you have, say, a segment of audio that you have in the middle, and you've done a live show,
and you've inserted this stuff in, and you want it all to line up correctly again,
you would want to use something like broadcast wave, otherwise, when you import, it won't line your audio samples up.
This is really more important for a professional broadcast or people doing professional audio recording or voiceover type of work.
Not super pertinent to dealing with podcasts and on-cast audio.
However, if you're doing double-ender, it can be handy to export in broadcast wave so that you can ensure that,
even if you didn't hit record at the same time, they're going to be roughly lined up and you can do the minor tweaking without messing with trying to figure out where everything goes anyways.
Okay, now we've got the in-a-wave format, but because of the amount of disk space that raw audio at 24-bit depth rates and 48 or 44.1 killers deal with,
they take up too much room to have people download in a podcast format.
So what we use is Codex, so I wanted to touch a little bit on what a Codex is.
A Codex is an algorithm that mathematically reduces the amount of space on disk that an audio takes up.
And depending on how aggressive that Codex could be, can depend on how small you can get it down, but you always get a degradation in quality with a Codex.
You're always not dealing with the raw audio itself, there's always a degradation with a Codex.
We have two different kinds of Codex.
The first is what we call a lossless Codex.
And what lossless Codex is, they don't do anything that affects necessarily the overall sound.
They take out certain zeros and take out the cruff, say, of silence, things below a certain threshold, rather, aren't important to the audio overall.
They're just kind of wasted space and a lossless container, say, like a flack audio, just removes the cruff.
But it doesn't actually do anything to the audio itself to make it more on disk.
So these are still pretty big files, and nobody releases a podcast or not a cast I don't have in the flack format.
Because it's even still, we'd be looking, say, an hour long show, we'd be looking at on disk possibly somewhere around, probably close to about 800 megabytes for an hour long show in stereo.
If we use the flack container, we can get that down to about 200 megabytes, but you can see that's still way too much for the audio.
So then we get into a lossy type of Codex.
And what a lossy Codex does is it uses an algorithm that actually starts chopping and compressing and doing things with each sample to make it fit into a smaller space on disk.
Most of these Codex support either a fixed bitrate or a variable bitrate.
And we'll talk a little bit about the differences between the two.
The bitrates themselves aren't the same as the bitrate we used for taking samples.
This is the bitrate of the amount of space it takes up per segment of time of audio.
And you can guarantee a certain size for your audio per minute.
But the more and more you compress it, the more and more you lose of that audio.
So we have basically three different kinds of Codex in the lossy format that we deal with in podcasting and on casting.
We deal with the MP3 file format, the Og file format, and the Og file format with a speaks codec.
So Og can actually be a container around a lot of different things.
We usually use the Og Bourbus format for audio or Og with a speaks codec.
And then MP3, which everyone is pretty much familiar with.
There's advantages and disadvantages to going with either.
Currently, MP3 encoding is considered patented.
Those patents is really only kick in if you're generating more than $100,000 per year in income.
And it's directly related to that format.
So most podcasters aren't really affected by the MP3 patents.
And the patents will be up depending on how you look at the patents that apply to it anywhere between 2012 and 2016.
The Og Codec is purposefully developed to be patent unencumbered with an open patent.
There's some dispute over whether or not that's actually true, whether some people would actually have patents that apply to Og.
However, it's generally agreed upon that you're OK using Og, the Bourbus format.
Speaks is also developed specifically for spoken word audio and has a smaller disk space imprint.
The disadvantage, say, of one over the other would be MP3 allows you to get into the iTunes store.
Now, I know we're talking on hacker public radio about what we would be considering open audio or free culture type of audio.
And some people don't like iTunes because of the history of DRM and it being Apple and Apple how it ties down its hardware.
However, a lot of shows can actually benefit for being on iTunes.
And it's a good way of getting content in front of people that aren't familiar with free as in freedom and allowing them to access your content.
So the only formats that iTunes supports would be the AAC audio, which we didn't talk about and I don't really want to get into, you shouldn't be encoding an AAC or MP3.
So sticking with an MP3 stream has the advantage of its availability to iTunes.
My personal podcast, about 15 to 30% of my listeners to each episode come through iTunes.
So it's not something you necessarily should rule out unless you're really sticking to supporting the Og format, which is fine.
So with the Og format, we typically the algorithms between MP3 and Og.
The Og is a superior codec meaning that you can get audio in about the same space as an MP3 with a little more advanced algorithm that has less degradation to the audio.
It's not enough to wear in a podcast.
I would really make the argument on quality versus one versus the other.
The Speaks codec in an Og container, however, really is a smaller footprint but has less quality of audio.
And you'll notice that a little bit on spoken word, but a lot if you have any music in your podcast.
Depending on what you want to do, some people offer an MP3 and an Og feed.
Some people are in P3 only, some people are Og only.
The really the choices up to you.
We'll talk about the hosting solutions and the complications involved in all of those later when we deal with some of the other issues.
For right now, for audio, it doesn't matter.
For a podcast, when we're talking specifically on sound quality and audio, pick one, pick them all.
Does it really matter?
So I think that pretty much covers audio on a desk.
We dealt with some of the terms you're going to be hearing with audio, what some of that means and how we encode.
And generally, one last point I want to make, you don't want to directly record an MP3 because if you do then want to reencode it as Og or as Speaks,
it's a further reduction because you've applied double algorithms to that audio.
So typically, we want to stick with waves and record and wave and then do it later.
If you have limitations in hardware, I understand that none of this would be a hard and fast rule for anybody.
If you absolutely can't don't have enough room on disk to record, what you can end up doing is certain software is allow you to then encode to disk in an MP3 container or an odd container.
And then you're only dealing, you're encoding on the fly basically.
And then from there on disk, you're dealing with a smaller file format.
For the most part, storage is relatively cheap and you won't need to do this, but your mileage may vary.
So I think that concludes this episode on our intro to audio and podcasting.
If you have any questions, feel free to contact me at dworth at open source musician dot com.
You can also visit the webpage open source musician dot com and check out more information there, including the show notes for the show are up there.
And you can help me edit the future episodes and get show notes developed for stuff you you like to know more about.
And I would love to hear feedback and I think that wraps it up. So until next time, podcast out!
.