- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
695 lines
47 KiB
Plaintext
695 lines
47 KiB
Plaintext
Episode: 1393
|
|
Title: HPR1393: Audio Metadata in Ogg, MP3, and others
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1393/hpr1393.mp3
|
|
Transcribed: 2025-10-18 00:46:10
|
|
|
|
---
|
|
|
|
In today's episode of Hacker Public Radio, the pseudonymous Epicanus spends a few minutes
|
|
talking about audio metadata while trying to get some household chores done.
|
|
It's the autumn of 2013, and, in accordance with the prophecy, Hacker Public Radio is running
|
|
low on shows.
|
|
At the same time, we've been in the busy season here at the asylum for the sufficiently
|
|
nerdy, so I haven't had time to properly sit down and finish assembling and recording
|
|
a real full-length episode like the one on the Opus Audio Codec that I've been trying
|
|
to get done.
|
|
I don't want to leave HDR hanging in the meantime, though, so I thought maybe I could
|
|
try recording a few small, somewhat sloppy episodes on short subjects, while I'm dealing
|
|
with running up and down the stairs dealing with household chores, doing laundry, scrubbing
|
|
toilets, exercising demons, hauling out garbage, and so on.
|
|
I've wasted some mental time trying to figure out what to call episodes like this, maybe
|
|
laundry lectures, or toilet-scribber tutorials, or chore chat, or something.
|
|
Anyway, even without a cool name, hopefully people will be able to get some entertainment
|
|
and useful information out of them, despite less than ideal sound recording conditions,
|
|
and probably not so great organization of the topics.
|
|
While I'm cleaning this room today, I'm going to talk a bit about metadata in audio files.
|
|
I heard that somebody out there just asked, what's metadata?
|
|
The correct answer to that question appears to be, why it's data about data, said with
|
|
an irritatingly smug facial expression as though you've just created a piece of profound
|
|
wisdom while also properly answering the question, even though you've done neither.
|
|
In audio files, nearly all of the data is actual encoded sound, but there's a small bit
|
|
of extra data, most of which is optional, that is used to tell users of the file about
|
|
the audio or the file itself.
|
|
The mandatory parts of metadata are automatically handled, so you don't have to worry about
|
|
them much, like the sample size of the file, the audio codec being used, and so on.
|
|
That's mostly just useful for the playback software, which needs it so that it can figure
|
|
out how to play it back.
|
|
Audio files virtually always have room for additional bonus information about the file,
|
|
though.
|
|
That's where you find the title, the piece of the audio, the name of the performer, the
|
|
name of the album, or collection the audio comes from, the little picture of the front
|
|
cover of the record album, the geolocation information, and so on.
|
|
That's the metadata that I'm talking about today.
|
|
There seem to be lots of different ways that various people encode metadata for sound
|
|
files, but really, there are only about two that most people need, and only three or
|
|
four others that you might encounter once in a while.
|
|
The two important ones are ID3 and Vorbus comments.
|
|
ID3 is the one you're most likely to have heard about already.
|
|
That's what they use for MP3 files, specifically what you may be familiar with is probably
|
|
ID3 version 2.3, which seems to be what nearly everyone uses these days.
|
|
You might also be acquainted with that abomination that is the original ID3 version 1, which is
|
|
actually an unrelated older format with some serious limitations.
|
|
In ID3 v1, the metadata was all crammed into a single 128-bytes special data structure
|
|
at the end of an MP3 file, with hang on, I've got the list here.
|
|
It had room for 30 bytes each of title, artist, album, and comment, four bytes to type in
|
|
a year, and one single byte for a number representing genre.
|
|
The idea of sticking the ID3 v1 at the end of the file was that if your crappy player
|
|
software didn't know what it was, it would probably just try to interpret the last 128
|
|
bytes as more sound and play a tiny blip of noise at the end of the file, or at worst,
|
|
if it choked and died, at least it would do so after you got to listen to the file.
|
|
ID3 v2 is a completely different kind of thing.
|
|
Instead of one tiny data structure at the end of an MP3 file, it's a whole bunch of
|
|
different special data structures that go at the beginning of the file.
|
|
Hang on, I have to open up the lid here and scrub the toilet.
|
|
Excuse the sound quality.
|
|
Anyway, anyway, I looked up the ID3 v2.3 specification, and, ugh, gross, nasty.
|
|
Somebody must have been really sick to make a mess like this.
|
|
Well, at least this toilet's pretty clean, so this won't take too long.
|
|
But anyway, ID3 v2.3 looks like a complicated mess to me.
|
|
The specification lists about 75 different special little fields, each with their own
|
|
special little data structure, and a special four character code to identify them like T-C-O-N
|
|
for genre, and T-I-T-2 for title.
|
|
Actually, I'm exaggerating a little, though 75 fields only cover about 5 or 6 different
|
|
special little data structures.
|
|
All of the text field types are the same structure, for example.
|
|
Well, except for comments, which is its own separate field.
|
|
Oh, and the Involved Persons list, which is a catch-all text field for cramming into
|
|
a single messy metadata entry.
|
|
Everyone's name and role for everyone else's role wasn't defined in one of the other
|
|
special little fields.
|
|
Ugh, see what I mean?
|
|
Most of these fields you can usually ignore unless you really need them, though.
|
|
Most of what I usually see people use are a few of the text type fields that cover artist,
|
|
album name, track number, and content type, which is more colloquially known as genre.
|
|
That hideous field is now text instead of a number like an ID3B1, but the specification
|
|
still suggests continuing to put a number in there, taken from the oddly specific ID3B1
|
|
list of 141 or so special genre names, none of which, incidentally, are podcast, which
|
|
is what hacker, public, radio, and various other shows seem to use.
|
|
This doesn't actually break the specification, fortunately, it just goes against the recommendation.
|
|
As a few of the 39 specific text type ID3 frames, the only other ID3B2.3 frame I've ever
|
|
personally seen anyone use is the attached picture frame, which is a so-called coverard.
|
|
One thing about this that most people don't realize is that you can have more than one
|
|
attached picture frame in an ID3B2 header.
|
|
The data structure isn't just a copy of the JPEG file or whatever, but actually specifies
|
|
the mind type of the picture data, a freeform text description of the picture data, and a
|
|
number that indicates specifically what the picture is supposed to be, like the front
|
|
cover of the album, the back cover, a picture of the CD that the MP3 was ripped from, a picture
|
|
of the band, the logo of the recording studio, a brightly colored fish.
|
|
No, seriously, I'm not joking, that's in the specification, its picture type number
|
|
17.
|
|
Except for the two file icon attached picture types, the specification explicitly permits
|
|
as many of each kind of attached picture as you want to embed.
|
|
An MP3 file with six different front cover pictures embedded in it is perfectly valid.
|
|
There are lots of different audio file formats, but only MP3 uses ID3.
|
|
Except, MP4, if I'm not mistaken, is an object-oriented sort of file format, kind of like a special
|
|
version of QuickTime, in the same way that WebM is a special version of Matroska.
|
|
Yeah, I know, somewhere out there is a course of shocked people spitting out their lattes
|
|
on their MacBook Pros and complaining, QuickTime isn't a file type, it's a framework, doesn't
|
|
matter, it's just an analogy.
|
|
Anyway, the MP4 specifications actually do include a special ID3 data object that you can
|
|
cram a whole ID3 header into, so you might run into .m4a files with them.
|
|
I'm not sure how common that is though, since as far as I know, most people getting .m4a
|
|
files are getting them from iTunes, and from what I've read, iTunes uses its own special
|
|
undocumented format for metadata instead.
|
|
That special undocumented format is one of the, quote, three or four others you might encounter
|
|
once in a while, unquote.
|
|
One last point worth mentioning, I've been talking about ID3 version 2.3 all this time,
|
|
yet there is a version 2.4.
|
|
Thing is, there seems to have been very little interest in this revision, which looks like
|
|
it was mostly a few incompatible renaming of a few tags, and a few relatively obscure
|
|
new tags.
|
|
Oh, and when you cram multiple entries into a text field, you separate them with nulls
|
|
in 2.4 where you use forward slashes in 2.3.
|
|
I wouldn't bother with 2.4 personally, but if you find yourself trying to get windows
|
|
to read your MP3 files metadata and it won't do it, maybe someone stuck 2.4 tags in it
|
|
instead of 2.3.
|
|
There, that covers ID3, the special format used by one or maybe two out of all the kinds
|
|
of audio files you might run into on the internet.
|
|
That's enough of that mess.
|
|
Okay, at this point, it's actually taken me in long enough now to get this done that
|
|
the busy season is over, so from here I can just make the rest of the episode like a
|
|
more typical one that I've been doing.
|
|
Well, typical for me anyway.
|
|
Incidentally, hacker public radio could really still use some more shows.
|
|
Please record something.
|
|
While you're doing that though, let me get back to this.
|
|
Now then, what about everybody else besides MP3?
|
|
It seems to be pretty common to assume that ID3 is the metadata format for all audio
|
|
everywhere, so don't feel bad if you were under that impression.
|
|
You wouldn't be the first person to try to cram an ID3 frame into an Ogg file.
|
|
Heck, I did that myself once or twice before I knew better.
|
|
In reality, besides MP3 and maybe some MP4 audio files, everybody else uses Bourbus comments.
|
|
Okay, not literally everybody, but pretty much any other kind of digital audio file that
|
|
you're likely to actually run into often online, including Opus, Flack, Ogg, Bourbus,
|
|
and Speaks.
|
|
Unlike ID3, with only one specific exception that I know of, Bourbus comments are simple,
|
|
consistent, flexible, and even human readable.
|
|
According to the specifications, all Bourbus comments are made of printable text characters,
|
|
so no strange binary codes to deal with.
|
|
Heck, you can use GREP to find files with Bourbus comment metadata, how cool is that?
|
|
The tag names are case-and-sensitive, too, so you don't have to worry about that either.
|
|
Of course, there's a couple of issues with this arrangement.
|
|
For one thing, the vast flexibility means that you can name bits of metadata whatever
|
|
the heck you want.
|
|
You can imagine the mass of one site published their audio with a title contained in a field
|
|
called Name, and another in a field called Title, and another in a field called Song,
|
|
and so on.
|
|
Oh, here's a brief pointless digression, speaking of Song.
|
|
Am I the only one who gets irrationally annoyed when applications insist on referring
|
|
to all audio files as Songs?
|
|
You're listening to an audio file right now.
|
|
Does it sound like I'm singing?
|
|
Do you want me to sing?
|
|
Okay then.
|
|
Stop that, programmers.
|
|
Sorry, where was I?
|
|
Oh yeah, picking in some inconsistent names for metadata tags.
|
|
You got the freedom to do this with Bourbus comments, of course, but there actually is
|
|
a published standard with officially recommended names for the most useful metadata, which
|
|
you should probably stick to for those fields so that software can more easily use it.
|
|
Hang on, I have a list here.
|
|
The published official Bourbus comment recommendations list includes.
|
|
Title for the name of the track, same as Title for ID3.
|
|
Version for when you have more than one track with the same title, like you might have two
|
|
different versions of Schubert's Ave Maria, and they both have the title, Ave Maria, but
|
|
maybe one is also tagged with a version of Metalcore Remix.
|
|
Album is for the name of the collection that the track came from, just like with ID3.
|
|
Track number, all one word, is for the track number on the album, or the episode in a podcast
|
|
series, or whatever, artist, again, just like ID3.
|
|
This is usually the name of the musician or band for music, though for classical music,
|
|
it should probably be the name of the composer, or for an audio book, it would be the author
|
|
of the book.
|
|
Performer is the field you use when the artist isn't necessarily who is speaking or
|
|
singing or whatever in the recording.
|
|
So the artist might be Franz Schubert, but the performer is, say, Justin Bieber.
|
|
Or you might have an audio book with artist as Stephanie Meyer, title as Twilight, and performer
|
|
as Gilbert Gottfried.
|
|
Copyright is for the typical copyright notice, like copyright 2013 Richard Solomon, or
|
|
something similar.
|
|
License is where you might put a link to a Creative Commons license that you're using,
|
|
or a phrase like all rights reserved if you're a fascist freedom hater.
|
|
Organization is where you put the record label, or perhaps Liebervox for an audio book,
|
|
or indeed, hacker public radio for what you're listening to right now.
|
|
Genre, like the field in ID3, except it's supposed to be an actual short human readable text
|
|
to description of whatever genre the audio is supposed to fit into, rather than some
|
|
relatively meaningless genre number.
|
|
For the recording date, for the audio track in a nice, rational, standard ISO 8601 year-month
|
|
dash day format, location for where the track was recorded, like the name of the recording
|
|
studio, or OgCamp 27, or my mom's basement, contact for a URL email address, or whatever
|
|
for contact information for the audio distributor.
|
|
In the case of the file you're listening to right now, it should probably be HTTP colon
|
|
slash slash hacker public radio.org, for example, description, which, like it says, is a place
|
|
for a description of the audio, among other things, I think this is the appropriate place
|
|
to put text copies of show notes for podcasts.
|
|
Note that comment and comments are not in the recommended field name list.
|
|
So I think usually you'll want to put your comments in the description field, or maybe
|
|
not.
|
|
There's nothing wrong or invalid about using a field named comment or comments, and in
|
|
fact a lot of people seem to use comment.
|
|
It's just that any playback software that is strict about sticking to the official recommendations
|
|
list will probably ignore them and may not display them.
|
|
Or use both.
|
|
It's not like a few extra bytes of text is going to kill your download.
|
|
And finally, there's even an ISRC tag for an international standard recording code number,
|
|
which appears to be a special tracking number that can be issued for a fee naturally,
|
|
from a central authority which seems to work for audio tracks kind of like an ISBN does
|
|
for books.
|
|
I've never seen this when used anywhere, but it's in the official verbose comment documentation,
|
|
and I suppose it might be used by old school proprietary pay to listen sort of businesses.
|
|
Also documented are a few additional useful fields.
|
|
There's an easy specification for chapter marks that supports up to 1,000 chapters per
|
|
file with tags named chapter 3 digit number and chapter number number number name.
|
|
So for example, the beginning of the file might be the start of the first chapter, so
|
|
you might have chapter 0, 0, 0 equals 0, 0, colon 0, 0.0 and chapter 0, 0, 0, name equals
|
|
introduction.
|
|
There's also a chapter number number number URL tag for links to chapter information
|
|
stored online.
|
|
This set of tags seems to be virtually identical to the human readable text that you feed
|
|
to Matroska tools to cram the special binary Matroska chapter metadata structures into
|
|
Matroska files, which I'll talk just a little bit about at the end here.
|
|
On this specific subject, forgive me for harshing the verbose comments mellow by retrogracing
|
|
back to ID3 for a moment, but there actually is apparently a quote addendum unquote for
|
|
chapter support in ID3 V2.3 and 2.4.
|
|
The specification seems to involve smushing a set of nested table of comments and chapter
|
|
structures into the ID3 header, each containing their own set of embedded ID3 tags.
|
|
Trying to read the documentation for this and determine why they did it that way may
|
|
give you the mental equivalent of irritable bowel syndrome.
|
|
The good news is that I have yet to find any tag editors that support this monstrosity.
|
|
Well, except for a special ID3 V2 chapter tool written in Java, quote maintained unquote
|
|
by the BBC, and not updated since 2006.
|
|
As far as I can tell, very little if any playback software supports using it anyway, so you
|
|
shouldn't have to worry about it.
|
|
For reference, this specification was published in 2005, half a decade after ID3 V2.4, and
|
|
most player software still doesn't even support ID3 V2.4 yet, or probably ever, I suspect.
|
|
My opinion is to stick with Bourbus comment using formats for support of chapter features,
|
|
or WebM if you must, or maybe MP4 using magic iTunes tags if Tim Cook is looking over
|
|
your shoulder or paying you.
|
|
Back to the happy land of Bourbus comments, there's a specification for replay gain for adjusting
|
|
track volumes using fields named replay gain underscore track underscore gain, replay gain
|
|
underscore track underscore peak, replay gain underscore album underscore gain and replay
|
|
gain underscore album underscore peak in a machine parsable format that playback software
|
|
can use if it wants to.
|
|
And while I've still not yet gotten around to doing the geotagging episode, I'll tease
|
|
it here a bit, because Bourbus comments seems to have the only documented standard for
|
|
geotagging of media files besides JPEG and TIFF images.
|
|
The field is called geo underscore location, and the contents take the form decimal attitude,
|
|
semicolon, decimal longitude, and optionally another semicolon and elevation in meters.
|
|
This format has the benefit of being both easily machine parsable and human readable.
|
|
One other nice feature of the Bourbus comments specification that you should know about.
|
|
You can and should use each field name as many times as is appropriate for each file.
|
|
For example, each recording artist in an audio track should have their own artist tag
|
|
in the file.
|
|
If you have a recording of a collaboration between Slim Whitman, Celine Dion, Mel Tourmée,
|
|
and Brian Johnson on a hip-hop album, you don't cram a single messy artist equals Slim
|
|
Whitman and Celine Dion and Mel Tourmée and Brian Johnson field in there.
|
|
You put in four separate artist entries, each one with one of those names.
|
|
That way, if you freaking love Mel Tourmée, you can easily find all of your recordings with
|
|
Mel Tourmée in them just by looking for artist equals Mel Tourmée and or performer equals
|
|
Mel Tourmée without having to look for the name buried among a bunch of other names in
|
|
a single field.
|
|
ID3, on the other hand, mandates that only one of each kind of text field can exist.
|
|
And if you have multiple artists, you cram them all into the same text string, separated
|
|
by forward slashes or nulls if you're using V2.4.
|
|
Similarly, if your file is, say, an audio tour guide recording or a travel log, you should
|
|
put multiple Geo underscore location tags in the metadata, one for each location mentioned
|
|
in the audio.
|
|
Then, if you wanted to automate a search through your audio files, you could find anything
|
|
that refers to nowhere Oklahoma, for example, just by looking for Geo underscore location
|
|
tags near latitude 35.1592 and longitude minus 98.4422.
|
|
There, see?
|
|
Morbis comments, all simple, all human readable, all pretty intuitive.
|
|
Well, like I warned you, except for one thing, attached pictures, more commonly called
|
|
album art, are actually kind of a pain.
|
|
It's not actually any worse than ID3, but compared to the simplicity of the rest of
|
|
Morbis comments, it's a bit of a nuisance.
|
|
There are two reasonable excuses for this.
|
|
One is just that since a digital picture is obviously not text, unless maybe you convert
|
|
it to ASCII art first, there just plain is no way to store it as a piece of simple human
|
|
readable metadata.
|
|
The second reason is that if you were doing things as properly as possible, it really
|
|
shouldn't be in the metadata anyway.
|
|
See, if you think about it, a picture of an album cover or any other attached picture
|
|
isn't really mere metadata any more than an audio track is mere metadata for a movie's
|
|
video track.
|
|
Attached still images are really their own independent pieces of data that just happened
|
|
to be associated with the audio track.
|
|
The most properly correct way to implement this would seem to be a separate stream in
|
|
the file with the attached pictures multiplexed in with the audio, just as the audio and subtitle
|
|
text should be their own separate streams multiplexed with a video stream.
|
|
The problem is, there is no specification that I can find for streams of, quote, series
|
|
of independent still jpeg and png images, unquote, in org files, or MP3 for that matter.
|
|
In any case, MP3 has been doing attached pictures as metadata for so long that it's kind
|
|
of stuck as the way it's done.
|
|
So the specification for attaching pictures to org Bourbus speaks and opus files involves
|
|
encoding the binary image data to printable text characters so that it can be included
|
|
in Bourbus comments, just like email programs have to do with email attachments.
|
|
Something like five or ten years ago, a few people were doing this with an obsolete field
|
|
called coverart, with the contents of the field just being the contents of a base 64
|
|
encoded jpeg or png file.
|
|
Don't do this, at least if you expect people to ever see the coverart.
|
|
From what I can tell, pretty much nobody ever implemented using that field, and it's
|
|
been long since replaced by an officially documented somewhat more informative structure.
|
|
Here's where it gets a little obnoxious.
|
|
The field name for the attached pictures actually has the unintuitive name, metadata
|
|
underscore block underscore picture.
|
|
And the contents of those fields are actually a complete base 64 encoded data structure
|
|
that includes image within height, mime type, and optional description of the image,
|
|
the same picture type designations that ID3's attached picture frames use, along with the
|
|
actual image data.
|
|
You can either thank or blame Flack for this one, depending on how you like Flack.
|
|
I mentioned that Flack uses Warbus comments for its metadata.
|
|
For all of the audio metadata I've talked about up to this point, that's true, but
|
|
not attached pictures.
|
|
Unlike Hog Warbus, Speaks, and Opus, Flack files aren't actually in AUG containers,
|
|
but are their own special file format.
|
|
That format actually includes a specific metadata block, structured to be very similar to the
|
|
attached picture frames in MP3 files, and it just happens to be called metadata underscore
|
|
block underscore pictures.
|
|
For Opus, AUG Warbus, and Speaks, which don't have a special metadata block just for
|
|
attached pictures, what happens is they build this same Flack data structure, then base
|
|
64 encoded to turn it into text that can be shoved in as a valid Warbus comment.
|
|
The data structure involved is pretty well documented in the Flack documentation, and
|
|
these days most people don't need to worry about it unless their encoder doesn't have
|
|
a built-in option to generate it, or they're adding it to the metadata by hand, or from
|
|
a simple command line script.
|
|
I actually wrote an implementation of this in PHP of all things, which I can share with
|
|
anyone who wants it.
|
|
I've also seen an implementation done in Pearl.
|
|
Anyway, this gives the Warbus comment field for attached pictures a funny name, and is
|
|
in kind of a hard to mess with format for people doing it by hand.
|
|
The good news is that if someone writes a media player, that understands album art in
|
|
Flack files, adding support for album art in Opus, Og Warbus Speaks, or even Og Theora
|
|
video files for that matter, should hypothetically be pretty simple since other than having to
|
|
pass the data through base 64 decoding to turn it back into a binary structure, you then
|
|
pass that directly to the already existing Flack album art code to get the pictures out.
|
|
More good news for those of us switching to the superior new Opus format?
|
|
The command line Opus encoder now has a dash dash picture option that works virtually
|
|
identically to the one in the Flack encoder, with the same argument structure, which at
|
|
least makes it pretty easy to attach pictures to Opus files at encoding time.
|
|
Og Warbus users still need to deal with this by either pre-generating a metadata underscore
|
|
block underscore picture of Warbus comment text to include as a command line option for
|
|
awgank, or to attach the pictures after the fact using a GUI tag editor or a script based
|
|
on something like taglib or mutagen.
|
|
To wrap up, there are two more audio file formats you might run into somewhat regularly
|
|
online that you might want metadata for.
|
|
Wave files are still more or less the lowest common denominator for audio files, usually
|
|
being lossless PCM audio, and being widely supported, and I guess pretty simple in structure.
|
|
There are actually standards for metadata in Wave files, but I haven't managed to dig
|
|
up any clear documentation for this yet.
|
|
I know it's out there somewhere, I just haven't got it myself.
|
|
Apparently, Audacity actually embeds the limited set of metadata that it supports,
|
|
as both a standard info chunk, whatever that is, documented for Wave files, and, as an
|
|
ID3 tag in some way when it saves Wave files, the other format you might some day run into
|
|
for audio files is WebM.
|
|
WebM is a specific implementation of the Matroska file format.
|
|
To me, Matroska metadata looks even worse than ID3.
|
|
Like ID3, it seems to be made up of about 100 rigidly defined tag names, of which WebM
|
|
looks to support about 70.
|
|
The metadata is heavily video-centric, and seems to assume that Matroska files will
|
|
contain movies.
|
|
Among the metadata tags for WebM and Matroska are things like special little fields designated
|
|
for choreographer, costume designer, director of photography, screenplay writer, assistant
|
|
director, and so on.
|
|
There's even a character tag that isn't actually for the file as a whole, but is supposed
|
|
to be buried inside an actor tag, which I guess makes the character tag a sort of meta-medida.
|
|
I imagine Peter Sellers movies in WebM form must have some pretty messy metadata.
|
|
The whole thing seems to be object-oriented, so there are several other cases where tags
|
|
are supposed to be buried inside other tags data structures as well.
|
|
Anyone who isn't in one of the special collection of video production roles that the Matroska
|
|
standard decided to include has to settle for getting crammed into a generic, thanks to
|
|
tag, kind of like ID3, and it's involved person structure.
|
|
Zooming the dolly grip, clapper loader, best boy, and gaffers to be second-class citizens
|
|
from Matroska files, the standards also say all this stuff should be tacked onto the
|
|
end of the file like ID3 v1.
|
|
Apparently the idea is that you can then rewrite the metadata without having to rewrite
|
|
the whole file.
|
|
On the other hand, that makes it not so great for streaming media.
|
|
Since the player won't get the title, album, artist, executor producer, genre, and so
|
|
on, until after the stream is finished and it's too late to display that information anyway,
|
|
unless it's buffering the whole file before it starts playing.
|
|
Lastly, as far as I can tell, WebM doesn't actually support attached pictures at all, though
|
|
the broader Matroska standard does in a limited way.
|
|
The standard has room for large and small versions of a sort of banner graphic and large
|
|
and small versions of a more typical album art graphic for a total of four images.
|
|
For audio, you probably won't have to deal with this really.
|
|
The only place I've ever seen WebM audio files aside from ones I've made myself for
|
|
testing is in GNU Media Goblin, which as far as I can tell only uses that format, because
|
|
they originally implemented audio only as a kind of afterthought to video, so their audio
|
|
for the project is just video file without video.
|
|
I assume that once they've implemented multi-format support, the default for audio will end
|
|
up being Opus or OgVorpus, and then nobody will really be using WebM for anything but
|
|
Internet TV.
|
|
I'm kind of waiting for Opus output from Media Goblin before I start trying to use it
|
|
seriously, at which point it will probably deserve its own HPR episode.
|
|
To finish off this part, should I mention special Microsoft Windows Media?
|
|
Hmm, no, nobody should mention Windows Media.
|
|
Oh, alright, just quickly.
|
|
If you're unlucky, you might run into .asf or .wma audio files.
|
|
The situation with ASF and WMA and WMV is kind of like the situation with MP4 and M4A
|
|
and M4V files.
|
|
Several of these Windows Media files are really just ASF format.
|
|
The metadata for these seems pretty limited.
|
|
There are five different metadata, quote, objects, unquote, which can contain different
|
|
kinds of metadata.
|
|
The so-called content description object is for the very small set of predefined metadata
|
|
fields that the ASF format defines.
|
|
These are title, author, copyright, description, and rating, with up to 64 kilobytes of text
|
|
for each field.
|
|
The album art and URLs for copyright warning stored online goes in the so-called content
|
|
branding object, which seems to be limited to a single banner image, if I'm interpreting
|
|
the specification correctly.
|
|
The other three objects are extended content description object, which seems to be where
|
|
you put any random other metadata that you want that isn't in the approved metadata
|
|
field list for the content description object, and a metadata object, which seems to be
|
|
just an extended content metadata object that can refer to a specific stream in an ASS
|
|
file and not just the whole file, and finally, a metadata library object, whose description
|
|
makes my head hurt, but as far as I can tell is for cramming anything else that doesn't
|
|
belong in any of the other objects somehow.
|
|
I get the impression that all of these end up looking like Windows registry entries
|
|
in the end.
|
|
The good news is that in my experience, the only people who make much use of .wma files
|
|
are a few proprietary music, quote, selling, unquote, businesses, who offered as one
|
|
option along with MP3 and other formats, or people who seem to have apparently gotten
|
|
a seemingly sweet deal for Microsoft back in the early to mid-2000s to use Windows
|
|
media systems for streaming audio and who haven't been wanting to spend any money upgrading
|
|
to something modern instead for nearly a decade, and if they offer anything else, there's
|
|
a fair chance it's real player files.
|
|
Remember real player?
|
|
You do?
|
|
Oh, I'm sorry.
|
|
Dang, you're old.
|
|
Before, you probably won't see too much of this online either and won't need to deal
|
|
with it often, at least not for audio, and even when you do, you probably won't actually
|
|
have much cause to mess with the metadata.
|
|
And if you do, it's probably because you're a bad person and this is your punishment.
|
|
Repent sinner?
|
|
Yeah, okay, that's probably enough of an introduction to the subject.
|
|
How about I ran off this episode with some suggestions and wrap it up with some tips
|
|
on using an editing metadata?
|
|
My first and probably most important suggestion would be to actually use the freaking metadata.
|
|
Yeah, I'm looking at you, Linux voice podcast and the opus feed, among others.
|
|
When I am elected supreme emperor of internet audio, it will be mandatory to use at the very
|
|
least the basic fields that most audio players will display, like the title, artist and
|
|
quote album, unquote.
|
|
I suggest to you that putting audio online with no metadata is basically a form of trolling.
|
|
It's like when someone posts a really awesome picture somewhere online saying, wow, check
|
|
out this awesome place.
|
|
But then all the metadata has been stripped out by the dorks at the image hosting service
|
|
so you can't even tell when the picture was taken, let alone where this awesome place
|
|
actually is.
|
|
And you're basically being asked to beg the poster to actually tell you where the place
|
|
is.
|
|
It's like people that go on some social media network and post something vague like, wow,
|
|
that was amazing.
|
|
My life has now changed forever.
|
|
And then you have to digitally prostrate yourself before them and beg them to tell you what
|
|
it was that was actually amazing.
|
|
And then after some irritating coiness, you find out they were just raving about the new
|
|
brand of instant ramen they just ate for lunch.
|
|
And you have to spend all day hunting them down so you can beat them repeatedly with
|
|
a sweaty gym sock stuffed with used cat litter for wasting your time.
|
|
Well, come on, I know I'm not the only one who fantasizes about that now and then.
|
|
Anyway, ideally you should include as much relevant metadata as possible.
|
|
That includes, I beg of you, any relevant geolocation data.
|
|
Where exactly was Og Camp 13?
|
|
If the interviews had geo underscore location tags, I could look it up on open street map.
|
|
Same goes for discussions of hacker spaces, particularly good stores or restaurants you
|
|
might mention, the locations of dead drops or geocaches and so forth.
|
|
One could even, for example, have a promo for Linux Fest like, say, Northeast Linux
|
|
Fest 2014 added to one's podcast and then include a geo underscore location tag with the
|
|
location of the venue for that.
|
|
Once we find out what that venue will be, hint hint.
|
|
As far as cover art goes, my thinking on this has completely changed over the last couple
|
|
of years.
|
|
Since the mid-1990s, when it started showing up in MP3, I always thought album art was
|
|
silly, frivolous, space-wasting fluff.
|
|
I mean, think about it, do you insist on staring at the CD case while you're listening
|
|
to a CD?
|
|
For those of you young people who may be confused, CDs were a DVD-like physical medium that we
|
|
old people used to use to extract data to make MP3s from instead of just downloading
|
|
them.
|
|
Anyway, I never really saw the point of it, but in the last couple of years I've found
|
|
I actually do prefer to have it.
|
|
Even in its ordinary, expected use of actually having a picture of the physical mediums
|
|
packaging, it's kind of nice as a quick visual reminder of which collection the audio I'm
|
|
listening to came from.
|
|
Of course, even more interesting might be the extraordinary, unexpected uses.
|
|
If you're recording a podcast describing how to make something, some bonus illustrations
|
|
of the process included as attached pictures would be a nice bonus for listeners interested
|
|
enough to look for them.
|
|
If you have audio from a specific location, or about a specific location, you could benefit
|
|
everyone by including an image of a map as an attached picture, or a geotagged picture
|
|
of the location.
|
|
If you're doing a podcast for aquarium owners, you might even have a legitimate cause
|
|
to use that bright colored fish attached picture type.
|
|
If you want to mess with the NSA, you could even record a brief audio message, then encode
|
|
that as a low bit rate opus or codec2 file, then steginographically embed that file into
|
|
an image and include that image as an attached picture.
|
|
So in short, the feature is too much fun to ignore, and the more people use it, the more
|
|
playback and tagging software will start supporting it correctly.
|
|
Except for attached pictures, the amount of data and an additional tag of metadata adds
|
|
to the file is negligible, and worrying about wasting space with most metadata is like
|
|
worrying about wasting film when using a digital camera.
|
|
A well-designed set of attached pictures won't bolt the file too much either if you're
|
|
reasonably careful, and should definitely be included wherever they may add some usefulness
|
|
to the file.
|
|
Anything you think someone might be interested in knowing about the recording later, please
|
|
include it.
|
|
I know at least one person who will happily examine audio metadata for interesting information
|
|
that the audio player doesn't necessarily shove in my face, and I imagine I can't be
|
|
the only one.
|
|
Also, if you can reasonably identify parts of your audio that would make good obvious
|
|
times on the subject changes or something important happens, consider including some
|
|
chapter markings as a reward for player software that uses them and to encourage the ones
|
|
that don't to start.
|
|
Without the attached picture itself, if you care what Apple thinks, iTunes apparently
|
|
uses 600x600 as the standard-sized recover art images, though from what I've read it sounds
|
|
like you can use other sizes as well.
|
|
Personally, unless you have a good reason, I'd recommend sticking to around that size
|
|
or smaller just so you can tell what the images might look like on screens with lower resolution,
|
|
but I wouldn't worry too much about keeping them square.
|
|
Use them as JPEG or PNG and they'll fit into ID3 or Vorvus comment album art just fine.
|
|
One warning, so far many tag editors I run into that support cover art at all only support
|
|
a single cover art image, which is usually set by default to picture type 3, that is,
|
|
front cover.
|
|
If you want to include multiple images, you might find it easier to do it at encoding
|
|
time.
|
|
The command line encoders for FLAQ and OPUS allow you to include as many attached pictures
|
|
as you want as switches.
|
|
The AUG Vorvus encoder doesn't, but like the OPUS encoder, the current AUG Vorvus encoder
|
|
accepts FLAQ files directly for input, and it will transfer the FLAQ metadata over
|
|
to the AUG Vorvus file it creates, including the attached pictures from what I can tell.
|
|
Therefore, if you either get or make FLAQ files to work from as your originals and put
|
|
all of the metadata in there, you can use those FLAQ files as input to generate OPUS and
|
|
AUG Vorvus files without worrying about the metadata any further.
|
|
For MP3, the only encoder I am familiar with at all is the LAME encoder, which seems
|
|
to produce pretty good quality sound by MP3 standards, but appears to be limited to a single
|
|
attached picture on the command line, speaking of MP3 limitations.
|
|
Most of the common information that people put in Vorvus comments should have a reasonably
|
|
obvious equivalent for MP3, so you shouldn't have any trouble figuring out which special
|
|
little ID3 field to put the title and artist an album and so on in, if you have to deal
|
|
with MP3 files.
|
|
A table of mappings between ID3 and Vorvus comments would probably be really handy, but even
|
|
if I had such a thing ready, this episode would get really, really tedious, like even more
|
|
than it already is, if I tried to read it out to you.
|
|
So for now, just look it up online if you need to, and I'll try to put up a post at dogphilosophy.net
|
|
with a table later.
|
|
Not only take care of the most common tags, though, so what about other potentially useful
|
|
metadata for MP3-like geolocation?
|
|
It turns out I was slightly lying when I said that text fields in ID3 were limited to
|
|
one each.
|
|
There's actually a special user defined a text field in ID3 designated TXXX.
|
|
No, that's not where the audio codec-themed erotic fanfiction goes, but wait, come to think
|
|
of it, if you had such a thing and you wanted to embed it in MP3, that actually is where
|
|
it would go.
|
|
What I mean is, that's not why the XXX is in there.
|
|
Anyway, the data structure for the TXX field has two parts, a string for the name or description
|
|
of the text that you're putting in it, and the text string itself.
|
|
The specification does not allow multiple TXX tags with the same description, but you
|
|
can include as many separate TXXX tags with different descriptions as you want.
|
|
This makes it an obvious place to include Bourbus comments that can't readily be pigeonholed
|
|
into the pre-existing ID3 fields.
|
|
I propose that for this purpose, the description part should be used for a Bourbus comment tag
|
|
name, while the text part should include every relevant Bourbus comment with that tag name.
|
|
For a useful example, ID3 doesn't support geotagging, so instead, put in a TXXX frame with
|
|
the description, GEO underscore location, and the text contents of the tag would be GEO
|
|
underscore location equals 424347571, semicolon minus 83.9849477, semicolon 270, or whatever
|
|
cord and it's irrelevant.
|
|
If there is more than one, just stick a carriage return between them so that each geo-location
|
|
equals whatever entry has its own line in the same text string, at least that's how
|
|
I'd do it.
|
|
For editing metadata, say that three times fast, after the encoding is done, I usually
|
|
use KID3, which as of the current 3.0 version supports Opus, as well as Aug Bourbus,
|
|
Flack, MP3, and several other formats, in addition to now including a command line version
|
|
that could be used from scripts.
|
|
I don't use Windows or Mac systems, but KID3 is available for them as well, so I'd recommend
|
|
giving it a try.
|
|
So far, it seems to support pretty much every feature of ID3B2.3 and Bourbus comments
|
|
that you might want, with the sole exception of multiple attached pictures.
|
|
If you're on Linux, it'll almost certainly be in your distribution's repository.
|
|
If not, you can get it from kid3.sourceforge.net.
|
|
On Linux officially and apparently unofficially on at least Mac OS and possibly Windows, I
|
|
can also recommend Puddle Tag, which does appear to fully and properly support multiple
|
|
attached pictures, and also has up to date file format support.
|
|
Puddle Tag is a little more awkward to use for individual files, but it has a nice interface
|
|
for editing whole directories of files at a time.
|
|
Genome users on Linux may be familiar with a program called Easy Tag, but at least as
|
|
of late 2013, I can't really recommend it unless you don't edit anything but MP3.
|
|
When I looked, it seemed like their Aug support hadn't been updated in a decade.
|
|
It's still trying to use a non-standard set of cover art tags for attached pictures.
|
|
They still don't support Opus, and glancing at the source code and a quick test made
|
|
it look like they might only support a small specific set of basic Bourbus comment tags.
|
|
It also appears to no longer be cross-platform, though there was apparently a Windows version
|
|
many years ago.
|
|
Try Puddle Tag, it looks like it has a similar interface to what Easy Tag seems to do.
|
|
To finish up, here's a collection of command line tools I've found that may be of use to
|
|
you when dealing with audio metadata.
|
|
I already mentioned the existence of kid3-cli.
|
|
For MP3 files, I'll mention MPG123-id3-dump, which comes with the MPG123 command line audio
|
|
player, and like the name implies, it's used to extract id3 metadata, including attached
|
|
pictures.
|
|
Also potentially handy is id3-t-e-d, which seems to be able to extract virtually any
|
|
id3 tag, and can add or edit most of the useful ones, including adding attached pictures,
|
|
though it's hard-coded to tag them all as front cover.
|
|
The Vorbus Comment utility from the Vorbus Tools package can be used to add or edit tags
|
|
in Og Vorbus files, though you'll have to generate the metadata underscore block
|
|
underscore picture tag text yourself since it doesn't handle them.
|
|
The same package includes the Og Info utility, which displays Og Vorbus metadata.
|
|
The Opus Tools package includes an encoder and decoder, as well as the Opus Info utility,
|
|
which, like the Og Info utility, displays audio metadata for Opus files.
|
|
This one will verify attached pictures, but doesn't currently dump them.
|
|
Honorable mention goes to the XIFTUAL utility, which is mostly used for digital photograph
|
|
metadata, but is also able to display metadata from pretty much every audio format I've mentioned
|
|
except for Opus.
|
|
Okay, one last thing.
|
|
If you'll forgive me jumping mental tracks one last time, as far as I can tell, none of
|
|
the web browsers have any provision for handling or displaying audio metadata.
|
|
No matter how well they support the HTML5 audio tag otherwise, no, not even Mozilla Firefile
|
|
Box.
|
|
That means that for playing within a web browser, if you want to have the audio metadata
|
|
shown, you have to insert a copy of the decoded metadata somewhere else in the web page,
|
|
which kind of defeats the purpose of having the metadata attached to the audio file the
|
|
way it's supposed to be in the first place.
|
|
The same goes for video, incidentally, but whatever we're talking about audio today.
|
|
If anybody out there has any contacts at Mozilla, is there any chance you could get this going?
|
|
I specify Mozilla because they're probably the only organization that cares enough to
|
|
bother.
|
|
Google can't even get Opus support live by default after a year and a half, and probably
|
|
wouldn't bother with this unless they could somehow make you go through Google Plus to
|
|
get to it.
|
|
Microsoft seems like it can't innovate at all without having a battle to the death between
|
|
at least two departments, and then their legal department determining that the alleged
|
|
innovation by the survivors would be useful for suing people.
|
|
An apple firmly denies the existence of the world beyond iTunes, and if iTunes doesn't
|
|
display it then you don't need to know it, so sit down and shut up and look at the
|
|
pretty colors.
|
|
Help me Mozilla Firefox, you're my only hope.
|
|
Okay, for those of you just tuning in, I've just been talking for whatever, about metadata
|
|
for audio files you're likely to find on the internet, and you just missed it.
|
|
So here it is again.
|
|
MP3 files usually use a metadata format called ID3 version 2.3, which is an awful, fussy
|
|
micromanaged sort of format, but very common so you'll probably run into it a lot.
|
|
Flack, Opus, OgVorbus, and Speaks all use Vorbus comments, which are simple and awesome
|
|
and all the cool people use it, and you should too, unless you want to be uncool, and probably
|
|
even then.
|
|
You should tag all of your audio files with as much relevant metadata as you can for
|
|
the betterment of all humanity, or at least the good parts of humanity, including attached
|
|
pictures and geolocation data, and you should either put all the metadata in at encoding
|
|
time, or you can use KID3, Puddle Tag, or various other tag editors to add or change tags
|
|
later, and there are some handy command line utilities out there for reading and updating
|
|
various forms of audio metadata as well.
|
|
Also, why the foop can't we see the metadata in the audio that web browsers play?
|
|
Thanks for listening.
|
|
We hope this edition of Hacker Public Radio has provided both entertainment and education
|
|
in exchange for your valuable listening time, but that's not all.
|
|
After all of this information, you'd probably like some examples, right?
|
|
Well has Hacker Public Radio got a deal for you?
|
|
That's a rhetorical question.
|
|
Yes, Hacker Public Radio has a deal for you.
|
|
This very file that you're listening to right now has been stuffed full of top quality,
|
|
all natural, organically grown, artisanal metadata, handpicked by Hacker Public Radio specialists.
|
|
You can use the tools mentioned in this episode, or any other decent metadata handling program,
|
|
to examine this file for ideas on how you might use or abuse the technology for your own
|
|
amusement.
|
|
If you're interested in still more stuff in later episodes, I've actually started keeping
|
|
a running list of random, potentially upcoming topics I'm thinking of doing future episodes
|
|
on, plus a few that I'm already working on, at hpr.dogphilosophy.net, so you're welcome
|
|
to stop by and comment on topics that might interest you.
|
|
The End.
|
|
You have been listening to Hacker Public Radio, or Hacker Public Radio does our, we are a
|
|
community podcast network that releases shows every weekday Monday through Friday.
|
|
Today's show, like all our shows, was contributed by an HPR listener like yourself.
|
|
If you ever consider recording a podcast, then visit our website to find out how easy
|
|
it really is.
|
|
Hacker Public Radio was founded by the digital dog pound and new Phenomenal and Computer
|
|
Club.
|
|
HPR is funded by the binary revolution at binref.com, all binref projects are crowd-responsive
|
|
by linear pages.
|
|
For shared hosting to custom private clouds, go to lunarpages.com for all your hosting
|
|
needs.
|
|
Unless otherwise stasis, today's show is released under a creative commons, attribution, share
|
|
a like, free those own license.
|
|
Look it's not immoral, both MP3 and Ogborbus are about 20 years old, easily past the age
|
|
of consent.
|
|
Stop looking at me like that.
|