Files
hpr-knowledge-base/hpr_transcripts/hpr1393.txt

695 lines
47 KiB
Plaintext
Raw Normal View History

Episode: 1393
Title: HPR1393: Audio Metadata in Ogg, MP3, and others
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1393/hpr1393.mp3
Transcribed: 2025-10-18 00:46:10
---
In today's episode of Hacker Public Radio, the pseudonymous Epicanus spends a few minutes
talking about audio metadata while trying to get some household chores done.
It's the autumn of 2013, and, in accordance with the prophecy, Hacker Public Radio is running
low on shows.
At the same time, we've been in the busy season here at the asylum for the sufficiently
nerdy, so I haven't had time to properly sit down and finish assembling and recording
a real full-length episode like the one on the Opus Audio Codec that I've been trying
to get done.
I don't want to leave HDR hanging in the meantime, though, so I thought maybe I could
try recording a few small, somewhat sloppy episodes on short subjects, while I'm dealing
with running up and down the stairs dealing with household chores, doing laundry, scrubbing
toilets, exercising demons, hauling out garbage, and so on.
I've wasted some mental time trying to figure out what to call episodes like this, maybe
laundry lectures, or toilet-scribber tutorials, or chore chat, or something.
Anyway, even without a cool name, hopefully people will be able to get some entertainment
and useful information out of them, despite less than ideal sound recording conditions,
and probably not so great organization of the topics.
While I'm cleaning this room today, I'm going to talk a bit about metadata in audio files.
I heard that somebody out there just asked, what's metadata?
The correct answer to that question appears to be, why it's data about data, said with
an irritatingly smug facial expression as though you've just created a piece of profound
wisdom while also properly answering the question, even though you've done neither.
In audio files, nearly all of the data is actual encoded sound, but there's a small bit
of extra data, most of which is optional, that is used to tell users of the file about
the audio or the file itself.
The mandatory parts of metadata are automatically handled, so you don't have to worry about
them much, like the sample size of the file, the audio codec being used, and so on.
That's mostly just useful for the playback software, which needs it so that it can figure
out how to play it back.
Audio files virtually always have room for additional bonus information about the file,
though.
That's where you find the title, the piece of the audio, the name of the performer, the
name of the album, or collection the audio comes from, the little picture of the front
cover of the record album, the geolocation information, and so on.
That's the metadata that I'm talking about today.
There seem to be lots of different ways that various people encode metadata for sound
files, but really, there are only about two that most people need, and only three or
four others that you might encounter once in a while.
The two important ones are ID3 and Vorbus comments.
ID3 is the one you're most likely to have heard about already.
That's what they use for MP3 files, specifically what you may be familiar with is probably
ID3 version 2.3, which seems to be what nearly everyone uses these days.
You might also be acquainted with that abomination that is the original ID3 version 1, which is
actually an unrelated older format with some serious limitations.
In ID3 v1, the metadata was all crammed into a single 128-bytes special data structure
at the end of an MP3 file, with hang on, I've got the list here.
It had room for 30 bytes each of title, artist, album, and comment, four bytes to type in
a year, and one single byte for a number representing genre.
The idea of sticking the ID3 v1 at the end of the file was that if your crappy player
software didn't know what it was, it would probably just try to interpret the last 128
bytes as more sound and play a tiny blip of noise at the end of the file, or at worst,
if it choked and died, at least it would do so after you got to listen to the file.
ID3 v2 is a completely different kind of thing.
Instead of one tiny data structure at the end of an MP3 file, it's a whole bunch of
different special data structures that go at the beginning of the file.
Hang on, I have to open up the lid here and scrub the toilet.
Excuse the sound quality.
Anyway, anyway, I looked up the ID3 v2.3 specification, and, ugh, gross, nasty.
Somebody must have been really sick to make a mess like this.
Well, at least this toilet's pretty clean, so this won't take too long.
But anyway, ID3 v2.3 looks like a complicated mess to me.
The specification lists about 75 different special little fields, each with their own
special little data structure, and a special four character code to identify them like T-C-O-N
for genre, and T-I-T-2 for title.
Actually, I'm exaggerating a little, though 75 fields only cover about 5 or 6 different
special little data structures.
All of the text field types are the same structure, for example.
Well, except for comments, which is its own separate field.
Oh, and the Involved Persons list, which is a catch-all text field for cramming into
a single messy metadata entry.
Everyone's name and role for everyone else's role wasn't defined in one of the other
special little fields.
Ugh, see what I mean?
Most of these fields you can usually ignore unless you really need them, though.
Most of what I usually see people use are a few of the text type fields that cover artist,
album name, track number, and content type, which is more colloquially known as genre.
That hideous field is now text instead of a number like an ID3B1, but the specification
still suggests continuing to put a number in there, taken from the oddly specific ID3B1
list of 141 or so special genre names, none of which, incidentally, are podcast, which
is what hacker, public, radio, and various other shows seem to use.
This doesn't actually break the specification, fortunately, it just goes against the recommendation.
As a few of the 39 specific text type ID3 frames, the only other ID3B2.3 frame I've ever
personally seen anyone use is the attached picture frame, which is a so-called coverard.
One thing about this that most people don't realize is that you can have more than one
attached picture frame in an ID3B2 header.
The data structure isn't just a copy of the JPEG file or whatever, but actually specifies
the mind type of the picture data, a freeform text description of the picture data, and a
number that indicates specifically what the picture is supposed to be, like the front
cover of the album, the back cover, a picture of the CD that the MP3 was ripped from, a picture
of the band, the logo of the recording studio, a brightly colored fish.
No, seriously, I'm not joking, that's in the specification, its picture type number
17.
Except for the two file icon attached picture types, the specification explicitly permits
as many of each kind of attached picture as you want to embed.
An MP3 file with six different front cover pictures embedded in it is perfectly valid.
There are lots of different audio file formats, but only MP3 uses ID3.
Except, MP4, if I'm not mistaken, is an object-oriented sort of file format, kind of like a special
version of QuickTime, in the same way that WebM is a special version of Matroska.
Yeah, I know, somewhere out there is a course of shocked people spitting out their lattes
on their MacBook Pros and complaining, QuickTime isn't a file type, it's a framework, doesn't
matter, it's just an analogy.
Anyway, the MP4 specifications actually do include a special ID3 data object that you can
cram a whole ID3 header into, so you might run into .m4a files with them.
I'm not sure how common that is though, since as far as I know, most people getting .m4a
files are getting them from iTunes, and from what I've read, iTunes uses its own special
undocumented format for metadata instead.
That special undocumented format is one of the, quote, three or four others you might encounter
once in a while, unquote.
One last point worth mentioning, I've been talking about ID3 version 2.3 all this time,
yet there is a version 2.4.
Thing is, there seems to have been very little interest in this revision, which looks like
it was mostly a few incompatible renaming of a few tags, and a few relatively obscure
new tags.
Oh, and when you cram multiple entries into a text field, you separate them with nulls
in 2.4 where you use forward slashes in 2.3.
I wouldn't bother with 2.4 personally, but if you find yourself trying to get windows
to read your MP3 files metadata and it won't do it, maybe someone stuck 2.4 tags in it
instead of 2.3.
There, that covers ID3, the special format used by one or maybe two out of all the kinds
of audio files you might run into on the internet.
That's enough of that mess.
Okay, at this point, it's actually taken me in long enough now to get this done that
the busy season is over, so from here I can just make the rest of the episode like a
more typical one that I've been doing.
Well, typical for me anyway.
Incidentally, hacker public radio could really still use some more shows.
Please record something.
While you're doing that though, let me get back to this.
Now then, what about everybody else besides MP3?
It seems to be pretty common to assume that ID3 is the metadata format for all audio
everywhere, so don't feel bad if you were under that impression.
You wouldn't be the first person to try to cram an ID3 frame into an Ogg file.
Heck, I did that myself once or twice before I knew better.
In reality, besides MP3 and maybe some MP4 audio files, everybody else uses Bourbus comments.
Okay, not literally everybody, but pretty much any other kind of digital audio file that
you're likely to actually run into often online, including Opus, Flack, Ogg, Bourbus,
and Speaks.
Unlike ID3, with only one specific exception that I know of, Bourbus comments are simple,
consistent, flexible, and even human readable.
According to the specifications, all Bourbus comments are made of printable text characters,
so no strange binary codes to deal with.
Heck, you can use GREP to find files with Bourbus comment metadata, how cool is that?
The tag names are case-and-sensitive, too, so you don't have to worry about that either.
Of course, there's a couple of issues with this arrangement.
For one thing, the vast flexibility means that you can name bits of metadata whatever
the heck you want.
You can imagine the mass of one site published their audio with a title contained in a field
called Name, and another in a field called Title, and another in a field called Song,
and so on.
Oh, here's a brief pointless digression, speaking of Song.
Am I the only one who gets irrationally annoyed when applications insist on referring
to all audio files as Songs?
You're listening to an audio file right now.
Does it sound like I'm singing?
Do you want me to sing?
Okay then.
Stop that, programmers.
Sorry, where was I?
Oh yeah, picking in some inconsistent names for metadata tags.
You got the freedom to do this with Bourbus comments, of course, but there actually is
a published standard with officially recommended names for the most useful metadata, which
you should probably stick to for those fields so that software can more easily use it.
Hang on, I have a list here.
The published official Bourbus comment recommendations list includes.
Title for the name of the track, same as Title for ID3.
Version for when you have more than one track with the same title, like you might have two
different versions of Schubert's Ave Maria, and they both have the title, Ave Maria, but
maybe one is also tagged with a version of Metalcore Remix.
Album is for the name of the collection that the track came from, just like with ID3.
Track number, all one word, is for the track number on the album, or the episode in a podcast
series, or whatever, artist, again, just like ID3.
This is usually the name of the musician or band for music, though for classical music,
it should probably be the name of the composer, or for an audio book, it would be the author
of the book.
Performer is the field you use when the artist isn't necessarily who is speaking or
singing or whatever in the recording.
So the artist might be Franz Schubert, but the performer is, say, Justin Bieber.
Or you might have an audio book with artist as Stephanie Meyer, title as Twilight, and performer
as Gilbert Gottfried.
Copyright is for the typical copyright notice, like copyright 2013 Richard Solomon, or
something similar.
License is where you might put a link to a Creative Commons license that you're using,
or a phrase like all rights reserved if you're a fascist freedom hater.
Organization is where you put the record label, or perhaps Liebervox for an audio book,
or indeed, hacker public radio for what you're listening to right now.
Genre, like the field in ID3, except it's supposed to be an actual short human readable text
to description of whatever genre the audio is supposed to fit into, rather than some
relatively meaningless genre number.
For the recording date, for the audio track in a nice, rational, standard ISO 8601 year-month
dash day format, location for where the track was recorded, like the name of the recording
studio, or OgCamp 27, or my mom's basement, contact for a URL email address, or whatever
for contact information for the audio distributor.
In the case of the file you're listening to right now, it should probably be HTTP colon
slash slash hacker public radio.org, for example, description, which, like it says, is a place
for a description of the audio, among other things, I think this is the appropriate place
to put text copies of show notes for podcasts.
Note that comment and comments are not in the recommended field name list.
So I think usually you'll want to put your comments in the description field, or maybe
not.
There's nothing wrong or invalid about using a field named comment or comments, and in
fact a lot of people seem to use comment.
It's just that any playback software that is strict about sticking to the official recommendations
list will probably ignore them and may not display them.
Or use both.
It's not like a few extra bytes of text is going to kill your download.
And finally, there's even an ISRC tag for an international standard recording code number,
which appears to be a special tracking number that can be issued for a fee naturally,
from a central authority which seems to work for audio tracks kind of like an ISBN does
for books.
I've never seen this when used anywhere, but it's in the official verbose comment documentation,
and I suppose it might be used by old school proprietary pay to listen sort of businesses.
Also documented are a few additional useful fields.
There's an easy specification for chapter marks that supports up to 1,000 chapters per
file with tags named chapter 3 digit number and chapter number number number name.
So for example, the beginning of the file might be the start of the first chapter, so
you might have chapter 0, 0, 0 equals 0, 0, colon 0, 0.0 and chapter 0, 0, 0, name equals
introduction.
There's also a chapter number number number URL tag for links to chapter information
stored online.
This set of tags seems to be virtually identical to the human readable text that you feed
to Matroska tools to cram the special binary Matroska chapter metadata structures into
Matroska files, which I'll talk just a little bit about at the end here.
On this specific subject, forgive me for harshing the verbose comments mellow by retrogracing
back to ID3 for a moment, but there actually is apparently a quote addendum unquote for
chapter support in ID3 V2.3 and 2.4.
The specification seems to involve smushing a set of nested table of comments and chapter
structures into the ID3 header, each containing their own set of embedded ID3 tags.
Trying to read the documentation for this and determine why they did it that way may
give you the mental equivalent of irritable bowel syndrome.
The good news is that I have yet to find any tag editors that support this monstrosity.
Well, except for a special ID3 V2 chapter tool written in Java, quote maintained unquote
by the BBC, and not updated since 2006.
As far as I can tell, very little if any playback software supports using it anyway, so you
shouldn't have to worry about it.
For reference, this specification was published in 2005, half a decade after ID3 V2.4, and
most player software still doesn't even support ID3 V2.4 yet, or probably ever, I suspect.
My opinion is to stick with Bourbus comment using formats for support of chapter features,
or WebM if you must, or maybe MP4 using magic iTunes tags if Tim Cook is looking over
your shoulder or paying you.
Back to the happy land of Bourbus comments, there's a specification for replay gain for adjusting
track volumes using fields named replay gain underscore track underscore gain, replay gain
underscore track underscore peak, replay gain underscore album underscore gain and replay
gain underscore album underscore peak in a machine parsable format that playback software
can use if it wants to.
And while I've still not yet gotten around to doing the geotagging episode, I'll tease
it here a bit, because Bourbus comments seems to have the only documented standard for
geotagging of media files besides JPEG and TIFF images.
The field is called geo underscore location, and the contents take the form decimal attitude,
semicolon, decimal longitude, and optionally another semicolon and elevation in meters.
This format has the benefit of being both easily machine parsable and human readable.
One other nice feature of the Bourbus comments specification that you should know about.
You can and should use each field name as many times as is appropriate for each file.
For example, each recording artist in an audio track should have their own artist tag
in the file.
If you have a recording of a collaboration between Slim Whitman, Celine Dion, Mel Tourmée,
and Brian Johnson on a hip-hop album, you don't cram a single messy artist equals Slim
Whitman and Celine Dion and Mel Tourmée and Brian Johnson field in there.
You put in four separate artist entries, each one with one of those names.
That way, if you freaking love Mel Tourmée, you can easily find all of your recordings with
Mel Tourmée in them just by looking for artist equals Mel Tourmée and or performer equals
Mel Tourmée without having to look for the name buried among a bunch of other names in
a single field.
ID3, on the other hand, mandates that only one of each kind of text field can exist.
And if you have multiple artists, you cram them all into the same text string, separated
by forward slashes or nulls if you're using V2.4.
Similarly, if your file is, say, an audio tour guide recording or a travel log, you should
put multiple Geo underscore location tags in the metadata, one for each location mentioned
in the audio.
Then, if you wanted to automate a search through your audio files, you could find anything
that refers to nowhere Oklahoma, for example, just by looking for Geo underscore location
tags near latitude 35.1592 and longitude minus 98.4422.
There, see?
Morbis comments, all simple, all human readable, all pretty intuitive.
Well, like I warned you, except for one thing, attached pictures, more commonly called
album art, are actually kind of a pain.
It's not actually any worse than ID3, but compared to the simplicity of the rest of
Morbis comments, it's a bit of a nuisance.
There are two reasonable excuses for this.
One is just that since a digital picture is obviously not text, unless maybe you convert
it to ASCII art first, there just plain is no way to store it as a piece of simple human
readable metadata.
The second reason is that if you were doing things as properly as possible, it really
shouldn't be in the metadata anyway.
See, if you think about it, a picture of an album cover or any other attached picture
isn't really mere metadata any more than an audio track is mere metadata for a movie's
video track.
Attached still images are really their own independent pieces of data that just happened
to be associated with the audio track.
The most properly correct way to implement this would seem to be a separate stream in
the file with the attached pictures multiplexed in with the audio, just as the audio and subtitle
text should be their own separate streams multiplexed with a video stream.
The problem is, there is no specification that I can find for streams of, quote, series
of independent still jpeg and png images, unquote, in org files, or MP3 for that matter.
In any case, MP3 has been doing attached pictures as metadata for so long that it's kind
of stuck as the way it's done.
So the specification for attaching pictures to org Bourbus speaks and opus files involves
encoding the binary image data to printable text characters so that it can be included
in Bourbus comments, just like email programs have to do with email attachments.
Something like five or ten years ago, a few people were doing this with an obsolete field
called coverart, with the contents of the field just being the contents of a base 64
encoded jpeg or png file.
Don't do this, at least if you expect people to ever see the coverart.
From what I can tell, pretty much nobody ever implemented using that field, and it's
been long since replaced by an officially documented somewhat more informative structure.
Here's where it gets a little obnoxious.
The field name for the attached pictures actually has the unintuitive name, metadata
underscore block underscore picture.
And the contents of those fields are actually a complete base 64 encoded data structure
that includes image within height, mime type, and optional description of the image,
the same picture type designations that ID3's attached picture frames use, along with the
actual image data.
You can either thank or blame Flack for this one, depending on how you like Flack.
I mentioned that Flack uses Warbus comments for its metadata.
For all of the audio metadata I've talked about up to this point, that's true, but
not attached pictures.
Unlike Hog Warbus, Speaks, and Opus, Flack files aren't actually in AUG containers,
but are their own special file format.
That format actually includes a specific metadata block, structured to be very similar to the
attached picture frames in MP3 files, and it just happens to be called metadata underscore
block underscore pictures.
For Opus, AUG Warbus, and Speaks, which don't have a special metadata block just for
attached pictures, what happens is they build this same Flack data structure, then base
64 encoded to turn it into text that can be shoved in as a valid Warbus comment.
The data structure involved is pretty well documented in the Flack documentation, and
these days most people don't need to worry about it unless their encoder doesn't have
a built-in option to generate it, or they're adding it to the metadata by hand, or from
a simple command line script.
I actually wrote an implementation of this in PHP of all things, which I can share with
anyone who wants it.
I've also seen an implementation done in Pearl.
Anyway, this gives the Warbus comment field for attached pictures a funny name, and is
in kind of a hard to mess with format for people doing it by hand.
The good news is that if someone writes a media player, that understands album art in
Flack files, adding support for album art in Opus, Og Warbus Speaks, or even Og Theora
video files for that matter, should hypothetically be pretty simple since other than having to
pass the data through base 64 decoding to turn it back into a binary structure, you then
pass that directly to the already existing Flack album art code to get the pictures out.
More good news for those of us switching to the superior new Opus format?
The command line Opus encoder now has a dash dash picture option that works virtually
identically to the one in the Flack encoder, with the same argument structure, which at
least makes it pretty easy to attach pictures to Opus files at encoding time.
Og Warbus users still need to deal with this by either pre-generating a metadata underscore
block underscore picture of Warbus comment text to include as a command line option for
awgank, or to attach the pictures after the fact using a GUI tag editor or a script based
on something like taglib or mutagen.
To wrap up, there are two more audio file formats you might run into somewhat regularly
online that you might want metadata for.
Wave files are still more or less the lowest common denominator for audio files, usually
being lossless PCM audio, and being widely supported, and I guess pretty simple in structure.
There are actually standards for metadata in Wave files, but I haven't managed to dig
up any clear documentation for this yet.
I know it's out there somewhere, I just haven't got it myself.
Apparently, Audacity actually embeds the limited set of metadata that it supports,
as both a standard info chunk, whatever that is, documented for Wave files, and, as an
ID3 tag in some way when it saves Wave files, the other format you might some day run into
for audio files is WebM.
WebM is a specific implementation of the Matroska file format.
To me, Matroska metadata looks even worse than ID3.
Like ID3, it seems to be made up of about 100 rigidly defined tag names, of which WebM
looks to support about 70.
The metadata is heavily video-centric, and seems to assume that Matroska files will
contain movies.
Among the metadata tags for WebM and Matroska are things like special little fields designated
for choreographer, costume designer, director of photography, screenplay writer, assistant
director, and so on.
There's even a character tag that isn't actually for the file as a whole, but is supposed
to be buried inside an actor tag, which I guess makes the character tag a sort of meta-medida.
I imagine Peter Sellers movies in WebM form must have some pretty messy metadata.
The whole thing seems to be object-oriented, so there are several other cases where tags
are supposed to be buried inside other tags data structures as well.
Anyone who isn't in one of the special collection of video production roles that the Matroska
standard decided to include has to settle for getting crammed into a generic, thanks to
tag, kind of like ID3, and it's involved person structure.
Zooming the dolly grip, clapper loader, best boy, and gaffers to be second-class citizens
from Matroska files, the standards also say all this stuff should be tacked onto the
end of the file like ID3 v1.
Apparently the idea is that you can then rewrite the metadata without having to rewrite
the whole file.
On the other hand, that makes it not so great for streaming media.
Since the player won't get the title, album, artist, executor producer, genre, and so
on, until after the stream is finished and it's too late to display that information anyway,
unless it's buffering the whole file before it starts playing.
Lastly, as far as I can tell, WebM doesn't actually support attached pictures at all, though
the broader Matroska standard does in a limited way.
The standard has room for large and small versions of a sort of banner graphic and large
and small versions of a more typical album art graphic for a total of four images.
For audio, you probably won't have to deal with this really.
The only place I've ever seen WebM audio files aside from ones I've made myself for
testing is in GNU Media Goblin, which as far as I can tell only uses that format, because
they originally implemented audio only as a kind of afterthought to video, so their audio
for the project is just video file without video.
I assume that once they've implemented multi-format support, the default for audio will end
up being Opus or OgVorpus, and then nobody will really be using WebM for anything but
Internet TV.
I'm kind of waiting for Opus output from Media Goblin before I start trying to use it
seriously, at which point it will probably deserve its own HPR episode.
To finish off this part, should I mention special Microsoft Windows Media?
Hmm, no, nobody should mention Windows Media.
Oh, alright, just quickly.
If you're unlucky, you might run into .asf or .wma audio files.
The situation with ASF and WMA and WMV is kind of like the situation with MP4 and M4A
and M4V files.
Several of these Windows Media files are really just ASF format.
The metadata for these seems pretty limited.
There are five different metadata, quote, objects, unquote, which can contain different
kinds of metadata.
The so-called content description object is for the very small set of predefined metadata
fields that the ASF format defines.
These are title, author, copyright, description, and rating, with up to 64 kilobytes of text
for each field.
The album art and URLs for copyright warning stored online goes in the so-called content
branding object, which seems to be limited to a single banner image, if I'm interpreting
the specification correctly.
The other three objects are extended content description object, which seems to be where
you put any random other metadata that you want that isn't in the approved metadata
field list for the content description object, and a metadata object, which seems to be
just an extended content metadata object that can refer to a specific stream in an ASS
file and not just the whole file, and finally, a metadata library object, whose description
makes my head hurt, but as far as I can tell is for cramming anything else that doesn't
belong in any of the other objects somehow.
I get the impression that all of these end up looking like Windows registry entries
in the end.
The good news is that in my experience, the only people who make much use of .wma files
are a few proprietary music, quote, selling, unquote, businesses, who offered as one
option along with MP3 and other formats, or people who seem to have apparently gotten
a seemingly sweet deal for Microsoft back in the early to mid-2000s to use Windows
media systems for streaming audio and who haven't been wanting to spend any money upgrading
to something modern instead for nearly a decade, and if they offer anything else, there's
a fair chance it's real player files.
Remember real player?
You do?
Oh, I'm sorry.
Dang, you're old.
Before, you probably won't see too much of this online either and won't need to deal
with it often, at least not for audio, and even when you do, you probably won't actually
have much cause to mess with the metadata.
And if you do, it's probably because you're a bad person and this is your punishment.
Repent sinner?
Yeah, okay, that's probably enough of an introduction to the subject.
How about I ran off this episode with some suggestions and wrap it up with some tips
on using an editing metadata?
My first and probably most important suggestion would be to actually use the freaking metadata.
Yeah, I'm looking at you, Linux voice podcast and the opus feed, among others.
When I am elected supreme emperor of internet audio, it will be mandatory to use at the very
least the basic fields that most audio players will display, like the title, artist and
quote album, unquote.
I suggest to you that putting audio online with no metadata is basically a form of trolling.
It's like when someone posts a really awesome picture somewhere online saying, wow, check
out this awesome place.
But then all the metadata has been stripped out by the dorks at the image hosting service
so you can't even tell when the picture was taken, let alone where this awesome place
actually is.
And you're basically being asked to beg the poster to actually tell you where the place
is.
It's like people that go on some social media network and post something vague like, wow,
that was amazing.
My life has now changed forever.
And then you have to digitally prostrate yourself before them and beg them to tell you what
it was that was actually amazing.
And then after some irritating coiness, you find out they were just raving about the new
brand of instant ramen they just ate for lunch.
And you have to spend all day hunting them down so you can beat them repeatedly with
a sweaty gym sock stuffed with used cat litter for wasting your time.
Well, come on, I know I'm not the only one who fantasizes about that now and then.
Anyway, ideally you should include as much relevant metadata as possible.
That includes, I beg of you, any relevant geolocation data.
Where exactly was Og Camp 13?
If the interviews had geo underscore location tags, I could look it up on open street map.
Same goes for discussions of hacker spaces, particularly good stores or restaurants you
might mention, the locations of dead drops or geocaches and so forth.
One could even, for example, have a promo for Linux Fest like, say, Northeast Linux
Fest 2014 added to one's podcast and then include a geo underscore location tag with the
location of the venue for that.
Once we find out what that venue will be, hint hint.
As far as cover art goes, my thinking on this has completely changed over the last couple
of years.
Since the mid-1990s, when it started showing up in MP3, I always thought album art was
silly, frivolous, space-wasting fluff.
I mean, think about it, do you insist on staring at the CD case while you're listening
to a CD?
For those of you young people who may be confused, CDs were a DVD-like physical medium that we
old people used to use to extract data to make MP3s from instead of just downloading
them.
Anyway, I never really saw the point of it, but in the last couple of years I've found
I actually do prefer to have it.
Even in its ordinary, expected use of actually having a picture of the physical mediums
packaging, it's kind of nice as a quick visual reminder of which collection the audio I'm
listening to came from.
Of course, even more interesting might be the extraordinary, unexpected uses.
If you're recording a podcast describing how to make something, some bonus illustrations
of the process included as attached pictures would be a nice bonus for listeners interested
enough to look for them.
If you have audio from a specific location, or about a specific location, you could benefit
everyone by including an image of a map as an attached picture, or a geotagged picture
of the location.
If you're doing a podcast for aquarium owners, you might even have a legitimate cause
to use that bright colored fish attached picture type.
If you want to mess with the NSA, you could even record a brief audio message, then encode
that as a low bit rate opus or codec2 file, then steginographically embed that file into
an image and include that image as an attached picture.
So in short, the feature is too much fun to ignore, and the more people use it, the more
playback and tagging software will start supporting it correctly.
Except for attached pictures, the amount of data and an additional tag of metadata adds
to the file is negligible, and worrying about wasting space with most metadata is like
worrying about wasting film when using a digital camera.
A well-designed set of attached pictures won't bolt the file too much either if you're
reasonably careful, and should definitely be included wherever they may add some usefulness
to the file.
Anything you think someone might be interested in knowing about the recording later, please
include it.
I know at least one person who will happily examine audio metadata for interesting information
that the audio player doesn't necessarily shove in my face, and I imagine I can't be
the only one.
Also, if you can reasonably identify parts of your audio that would make good obvious
times on the subject changes or something important happens, consider including some
chapter markings as a reward for player software that uses them and to encourage the ones
that don't to start.
Without the attached picture itself, if you care what Apple thinks, iTunes apparently
uses 600x600 as the standard-sized recover art images, though from what I've read it sounds
like you can use other sizes as well.
Personally, unless you have a good reason, I'd recommend sticking to around that size
or smaller just so you can tell what the images might look like on screens with lower resolution,
but I wouldn't worry too much about keeping them square.
Use them as JPEG or PNG and they'll fit into ID3 or Vorvus comment album art just fine.
One warning, so far many tag editors I run into that support cover art at all only support
a single cover art image, which is usually set by default to picture type 3, that is,
front cover.
If you want to include multiple images, you might find it easier to do it at encoding
time.
The command line encoders for FLAQ and OPUS allow you to include as many attached pictures
as you want as switches.
The AUG Vorvus encoder doesn't, but like the OPUS encoder, the current AUG Vorvus encoder
accepts FLAQ files directly for input, and it will transfer the FLAQ metadata over
to the AUG Vorvus file it creates, including the attached pictures from what I can tell.
Therefore, if you either get or make FLAQ files to work from as your originals and put
all of the metadata in there, you can use those FLAQ files as input to generate OPUS and
AUG Vorvus files without worrying about the metadata any further.
For MP3, the only encoder I am familiar with at all is the LAME encoder, which seems
to produce pretty good quality sound by MP3 standards, but appears to be limited to a single
attached picture on the command line, speaking of MP3 limitations.
Most of the common information that people put in Vorvus comments should have a reasonably
obvious equivalent for MP3, so you shouldn't have any trouble figuring out which special
little ID3 field to put the title and artist an album and so on in, if you have to deal
with MP3 files.
A table of mappings between ID3 and Vorvus comments would probably be really handy, but even
if I had such a thing ready, this episode would get really, really tedious, like even more
than it already is, if I tried to read it out to you.
So for now, just look it up online if you need to, and I'll try to put up a post at dogphilosophy.net
with a table later.
Not only take care of the most common tags, though, so what about other potentially useful
metadata for MP3-like geolocation?
It turns out I was slightly lying when I said that text fields in ID3 were limited to
one each.
There's actually a special user defined a text field in ID3 designated TXXX.
No, that's not where the audio codec-themed erotic fanfiction goes, but wait, come to think
of it, if you had such a thing and you wanted to embed it in MP3, that actually is where
it would go.
What I mean is, that's not why the XXX is in there.
Anyway, the data structure for the TXX field has two parts, a string for the name or description
of the text that you're putting in it, and the text string itself.
The specification does not allow multiple TXX tags with the same description, but you
can include as many separate TXXX tags with different descriptions as you want.
This makes it an obvious place to include Bourbus comments that can't readily be pigeonholed
into the pre-existing ID3 fields.
I propose that for this purpose, the description part should be used for a Bourbus comment tag
name, while the text part should include every relevant Bourbus comment with that tag name.
For a useful example, ID3 doesn't support geotagging, so instead, put in a TXXX frame with
the description, GEO underscore location, and the text contents of the tag would be GEO
underscore location equals 424347571, semicolon minus 83.9849477, semicolon 270, or whatever
cord and it's irrelevant.
If there is more than one, just stick a carriage return between them so that each geo-location
equals whatever entry has its own line in the same text string, at least that's how
I'd do it.
For editing metadata, say that three times fast, after the encoding is done, I usually
use KID3, which as of the current 3.0 version supports Opus, as well as Aug Bourbus,
Flack, MP3, and several other formats, in addition to now including a command line version
that could be used from scripts.
I don't use Windows or Mac systems, but KID3 is available for them as well, so I'd recommend
giving it a try.
So far, it seems to support pretty much every feature of ID3B2.3 and Bourbus comments
that you might want, with the sole exception of multiple attached pictures.
If you're on Linux, it'll almost certainly be in your distribution's repository.
If not, you can get it from kid3.sourceforge.net.
On Linux officially and apparently unofficially on at least Mac OS and possibly Windows, I
can also recommend Puddle Tag, which does appear to fully and properly support multiple
attached pictures, and also has up to date file format support.
Puddle Tag is a little more awkward to use for individual files, but it has a nice interface
for editing whole directories of files at a time.
Genome users on Linux may be familiar with a program called Easy Tag, but at least as
of late 2013, I can't really recommend it unless you don't edit anything but MP3.
When I looked, it seemed like their Aug support hadn't been updated in a decade.
It's still trying to use a non-standard set of cover art tags for attached pictures.
They still don't support Opus, and glancing at the source code and a quick test made
it look like they might only support a small specific set of basic Bourbus comment tags.
It also appears to no longer be cross-platform, though there was apparently a Windows version
many years ago.
Try Puddle Tag, it looks like it has a similar interface to what Easy Tag seems to do.
To finish up, here's a collection of command line tools I've found that may be of use to
you when dealing with audio metadata.
I already mentioned the existence of kid3-cli.
For MP3 files, I'll mention MPG123-id3-dump, which comes with the MPG123 command line audio
player, and like the name implies, it's used to extract id3 metadata, including attached
pictures.
Also potentially handy is id3-t-e-d, which seems to be able to extract virtually any
id3 tag, and can add or edit most of the useful ones, including adding attached pictures,
though it's hard-coded to tag them all as front cover.
The Vorbus Comment utility from the Vorbus Tools package can be used to add or edit tags
in Og Vorbus files, though you'll have to generate the metadata underscore block
underscore picture tag text yourself since it doesn't handle them.
The same package includes the Og Info utility, which displays Og Vorbus metadata.
The Opus Tools package includes an encoder and decoder, as well as the Opus Info utility,
which, like the Og Info utility, displays audio metadata for Opus files.
This one will verify attached pictures, but doesn't currently dump them.
Honorable mention goes to the XIFTUAL utility, which is mostly used for digital photograph
metadata, but is also able to display metadata from pretty much every audio format I've mentioned
except for Opus.
Okay, one last thing.
If you'll forgive me jumping mental tracks one last time, as far as I can tell, none of
the web browsers have any provision for handling or displaying audio metadata.
No matter how well they support the HTML5 audio tag otherwise, no, not even Mozilla Firefile
Box.
That means that for playing within a web browser, if you want to have the audio metadata
shown, you have to insert a copy of the decoded metadata somewhere else in the web page,
which kind of defeats the purpose of having the metadata attached to the audio file the
way it's supposed to be in the first place.
The same goes for video, incidentally, but whatever we're talking about audio today.
If anybody out there has any contacts at Mozilla, is there any chance you could get this going?
I specify Mozilla because they're probably the only organization that cares enough to
bother.
Google can't even get Opus support live by default after a year and a half, and probably
wouldn't bother with this unless they could somehow make you go through Google Plus to
get to it.
Microsoft seems like it can't innovate at all without having a battle to the death between
at least two departments, and then their legal department determining that the alleged
innovation by the survivors would be useful for suing people.
An apple firmly denies the existence of the world beyond iTunes, and if iTunes doesn't
display it then you don't need to know it, so sit down and shut up and look at the
pretty colors.
Help me Mozilla Firefox, you're my only hope.
Okay, for those of you just tuning in, I've just been talking for whatever, about metadata
for audio files you're likely to find on the internet, and you just missed it.
So here it is again.
MP3 files usually use a metadata format called ID3 version 2.3, which is an awful, fussy
micromanaged sort of format, but very common so you'll probably run into it a lot.
Flack, Opus, OgVorbus, and Speaks all use Vorbus comments, which are simple and awesome
and all the cool people use it, and you should too, unless you want to be uncool, and probably
even then.
You should tag all of your audio files with as much relevant metadata as you can for
the betterment of all humanity, or at least the good parts of humanity, including attached
pictures and geolocation data, and you should either put all the metadata in at encoding
time, or you can use KID3, Puddle Tag, or various other tag editors to add or change tags
later, and there are some handy command line utilities out there for reading and updating
various forms of audio metadata as well.
Also, why the foop can't we see the metadata in the audio that web browsers play?
Thanks for listening.
We hope this edition of Hacker Public Radio has provided both entertainment and education
in exchange for your valuable listening time, but that's not all.
After all of this information, you'd probably like some examples, right?
Well has Hacker Public Radio got a deal for you?
That's a rhetorical question.
Yes, Hacker Public Radio has a deal for you.
This very file that you're listening to right now has been stuffed full of top quality,
all natural, organically grown, artisanal metadata, handpicked by Hacker Public Radio specialists.
You can use the tools mentioned in this episode, or any other decent metadata handling program,
to examine this file for ideas on how you might use or abuse the technology for your own
amusement.
If you're interested in still more stuff in later episodes, I've actually started keeping
a running list of random, potentially upcoming topics I'm thinking of doing future episodes
on, plus a few that I'm already working on, at hpr.dogphilosophy.net, so you're welcome
to stop by and comment on topics that might interest you.
The End.
You have been listening to Hacker Public Radio, or Hacker Public Radio does our, we are a
community podcast network that releases shows every weekday Monday through Friday.
Today's show, like all our shows, was contributed by an HPR listener like yourself.
If you ever consider recording a podcast, then visit our website to find out how easy
it really is.
Hacker Public Radio was founded by the digital dog pound and new Phenomenal and Computer
Club.
HPR is funded by the binary revolution at binref.com, all binref projects are crowd-responsive
by linear pages.
For shared hosting to custom private clouds, go to lunarpages.com for all your hosting
needs.
Unless otherwise stasis, today's show is released under a creative commons, attribution, share
a like, free those own license.
Look it's not immoral, both MP3 and Ogborbus are about 20 years old, easily past the age
of consent.
Stop looking at me like that.