Initial commit: HPR Knowledge Base MCP Server
- MCP server with stdio transport for local use - Search episodes, transcripts, hosts, and series - 4,511 episodes with metadata and transcripts - Data loader with in-memory JSON storage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
694
hpr_transcripts/hpr1393.txt
Normal file
694
hpr_transcripts/hpr1393.txt
Normal file
@@ -0,0 +1,694 @@
|
||||
Episode: 1393
|
||||
Title: HPR1393: Audio Metadata in Ogg, MP3, and others
|
||||
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr1393/hpr1393.mp3
|
||||
Transcribed: 2025-10-18 00:46:10
|
||||
|
||||
---
|
||||
|
||||
In today's episode of Hacker Public Radio, the pseudonymous Epicanus spends a few minutes
|
||||
talking about audio metadata while trying to get some household chores done.
|
||||
It's the autumn of 2013, and, in accordance with the prophecy, Hacker Public Radio is running
|
||||
low on shows.
|
||||
At the same time, we've been in the busy season here at the asylum for the sufficiently
|
||||
nerdy, so I haven't had time to properly sit down and finish assembling and recording
|
||||
a real full-length episode like the one on the Opus Audio Codec that I've been trying
|
||||
to get done.
|
||||
I don't want to leave HDR hanging in the meantime, though, so I thought maybe I could
|
||||
try recording a few small, somewhat sloppy episodes on short subjects, while I'm dealing
|
||||
with running up and down the stairs dealing with household chores, doing laundry, scrubbing
|
||||
toilets, exercising demons, hauling out garbage, and so on.
|
||||
I've wasted some mental time trying to figure out what to call episodes like this, maybe
|
||||
laundry lectures, or toilet-scribber tutorials, or chore chat, or something.
|
||||
Anyway, even without a cool name, hopefully people will be able to get some entertainment
|
||||
and useful information out of them, despite less than ideal sound recording conditions,
|
||||
and probably not so great organization of the topics.
|
||||
While I'm cleaning this room today, I'm going to talk a bit about metadata in audio files.
|
||||
I heard that somebody out there just asked, what's metadata?
|
||||
The correct answer to that question appears to be, why it's data about data, said with
|
||||
an irritatingly smug facial expression as though you've just created a piece of profound
|
||||
wisdom while also properly answering the question, even though you've done neither.
|
||||
In audio files, nearly all of the data is actual encoded sound, but there's a small bit
|
||||
of extra data, most of which is optional, that is used to tell users of the file about
|
||||
the audio or the file itself.
|
||||
The mandatory parts of metadata are automatically handled, so you don't have to worry about
|
||||
them much, like the sample size of the file, the audio codec being used, and so on.
|
||||
That's mostly just useful for the playback software, which needs it so that it can figure
|
||||
out how to play it back.
|
||||
Audio files virtually always have room for additional bonus information about the file,
|
||||
though.
|
||||
That's where you find the title, the piece of the audio, the name of the performer, the
|
||||
name of the album, or collection the audio comes from, the little picture of the front
|
||||
cover of the record album, the geolocation information, and so on.
|
||||
That's the metadata that I'm talking about today.
|
||||
There seem to be lots of different ways that various people encode metadata for sound
|
||||
files, but really, there are only about two that most people need, and only three or
|
||||
four others that you might encounter once in a while.
|
||||
The two important ones are ID3 and Vorbus comments.
|
||||
ID3 is the one you're most likely to have heard about already.
|
||||
That's what they use for MP3 files, specifically what you may be familiar with is probably
|
||||
ID3 version 2.3, which seems to be what nearly everyone uses these days.
|
||||
You might also be acquainted with that abomination that is the original ID3 version 1, which is
|
||||
actually an unrelated older format with some serious limitations.
|
||||
In ID3 v1, the metadata was all crammed into a single 128-bytes special data structure
|
||||
at the end of an MP3 file, with hang on, I've got the list here.
|
||||
It had room for 30 bytes each of title, artist, album, and comment, four bytes to type in
|
||||
a year, and one single byte for a number representing genre.
|
||||
The idea of sticking the ID3 v1 at the end of the file was that if your crappy player
|
||||
software didn't know what it was, it would probably just try to interpret the last 128
|
||||
bytes as more sound and play a tiny blip of noise at the end of the file, or at worst,
|
||||
if it choked and died, at least it would do so after you got to listen to the file.
|
||||
ID3 v2 is a completely different kind of thing.
|
||||
Instead of one tiny data structure at the end of an MP3 file, it's a whole bunch of
|
||||
different special data structures that go at the beginning of the file.
|
||||
Hang on, I have to open up the lid here and scrub the toilet.
|
||||
Excuse the sound quality.
|
||||
Anyway, anyway, I looked up the ID3 v2.3 specification, and, ugh, gross, nasty.
|
||||
Somebody must have been really sick to make a mess like this.
|
||||
Well, at least this toilet's pretty clean, so this won't take too long.
|
||||
But anyway, ID3 v2.3 looks like a complicated mess to me.
|
||||
The specification lists about 75 different special little fields, each with their own
|
||||
special little data structure, and a special four character code to identify them like T-C-O-N
|
||||
for genre, and T-I-T-2 for title.
|
||||
Actually, I'm exaggerating a little, though 75 fields only cover about 5 or 6 different
|
||||
special little data structures.
|
||||
All of the text field types are the same structure, for example.
|
||||
Well, except for comments, which is its own separate field.
|
||||
Oh, and the Involved Persons list, which is a catch-all text field for cramming into
|
||||
a single messy metadata entry.
|
||||
Everyone's name and role for everyone else's role wasn't defined in one of the other
|
||||
special little fields.
|
||||
Ugh, see what I mean?
|
||||
Most of these fields you can usually ignore unless you really need them, though.
|
||||
Most of what I usually see people use are a few of the text type fields that cover artist,
|
||||
album name, track number, and content type, which is more colloquially known as genre.
|
||||
That hideous field is now text instead of a number like an ID3B1, but the specification
|
||||
still suggests continuing to put a number in there, taken from the oddly specific ID3B1
|
||||
list of 141 or so special genre names, none of which, incidentally, are podcast, which
|
||||
is what hacker, public, radio, and various other shows seem to use.
|
||||
This doesn't actually break the specification, fortunately, it just goes against the recommendation.
|
||||
As a few of the 39 specific text type ID3 frames, the only other ID3B2.3 frame I've ever
|
||||
personally seen anyone use is the attached picture frame, which is a so-called coverard.
|
||||
One thing about this that most people don't realize is that you can have more than one
|
||||
attached picture frame in an ID3B2 header.
|
||||
The data structure isn't just a copy of the JPEG file or whatever, but actually specifies
|
||||
the mind type of the picture data, a freeform text description of the picture data, and a
|
||||
number that indicates specifically what the picture is supposed to be, like the front
|
||||
cover of the album, the back cover, a picture of the CD that the MP3 was ripped from, a picture
|
||||
of the band, the logo of the recording studio, a brightly colored fish.
|
||||
No, seriously, I'm not joking, that's in the specification, its picture type number
|
||||
17.
|
||||
Except for the two file icon attached picture types, the specification explicitly permits
|
||||
as many of each kind of attached picture as you want to embed.
|
||||
An MP3 file with six different front cover pictures embedded in it is perfectly valid.
|
||||
There are lots of different audio file formats, but only MP3 uses ID3.
|
||||
Except, MP4, if I'm not mistaken, is an object-oriented sort of file format, kind of like a special
|
||||
version of QuickTime, in the same way that WebM is a special version of Matroska.
|
||||
Yeah, I know, somewhere out there is a course of shocked people spitting out their lattes
|
||||
on their MacBook Pros and complaining, QuickTime isn't a file type, it's a framework, doesn't
|
||||
matter, it's just an analogy.
|
||||
Anyway, the MP4 specifications actually do include a special ID3 data object that you can
|
||||
cram a whole ID3 header into, so you might run into .m4a files with them.
|
||||
I'm not sure how common that is though, since as far as I know, most people getting .m4a
|
||||
files are getting them from iTunes, and from what I've read, iTunes uses its own special
|
||||
undocumented format for metadata instead.
|
||||
That special undocumented format is one of the, quote, three or four others you might encounter
|
||||
once in a while, unquote.
|
||||
One last point worth mentioning, I've been talking about ID3 version 2.3 all this time,
|
||||
yet there is a version 2.4.
|
||||
Thing is, there seems to have been very little interest in this revision, which looks like
|
||||
it was mostly a few incompatible renaming of a few tags, and a few relatively obscure
|
||||
new tags.
|
||||
Oh, and when you cram multiple entries into a text field, you separate them with nulls
|
||||
in 2.4 where you use forward slashes in 2.3.
|
||||
I wouldn't bother with 2.4 personally, but if you find yourself trying to get windows
|
||||
to read your MP3 files metadata and it won't do it, maybe someone stuck 2.4 tags in it
|
||||
instead of 2.3.
|
||||
There, that covers ID3, the special format used by one or maybe two out of all the kinds
|
||||
of audio files you might run into on the internet.
|
||||
That's enough of that mess.
|
||||
Okay, at this point, it's actually taken me in long enough now to get this done that
|
||||
the busy season is over, so from here I can just make the rest of the episode like a
|
||||
more typical one that I've been doing.
|
||||
Well, typical for me anyway.
|
||||
Incidentally, hacker public radio could really still use some more shows.
|
||||
Please record something.
|
||||
While you're doing that though, let me get back to this.
|
||||
Now then, what about everybody else besides MP3?
|
||||
It seems to be pretty common to assume that ID3 is the metadata format for all audio
|
||||
everywhere, so don't feel bad if you were under that impression.
|
||||
You wouldn't be the first person to try to cram an ID3 frame into an Ogg file.
|
||||
Heck, I did that myself once or twice before I knew better.
|
||||
In reality, besides MP3 and maybe some MP4 audio files, everybody else uses Bourbus comments.
|
||||
Okay, not literally everybody, but pretty much any other kind of digital audio file that
|
||||
you're likely to actually run into often online, including Opus, Flack, Ogg, Bourbus,
|
||||
and Speaks.
|
||||
Unlike ID3, with only one specific exception that I know of, Bourbus comments are simple,
|
||||
consistent, flexible, and even human readable.
|
||||
According to the specifications, all Bourbus comments are made of printable text characters,
|
||||
so no strange binary codes to deal with.
|
||||
Heck, you can use GREP to find files with Bourbus comment metadata, how cool is that?
|
||||
The tag names are case-and-sensitive, too, so you don't have to worry about that either.
|
||||
Of course, there's a couple of issues with this arrangement.
|
||||
For one thing, the vast flexibility means that you can name bits of metadata whatever
|
||||
the heck you want.
|
||||
You can imagine the mass of one site published their audio with a title contained in a field
|
||||
called Name, and another in a field called Title, and another in a field called Song,
|
||||
and so on.
|
||||
Oh, here's a brief pointless digression, speaking of Song.
|
||||
Am I the only one who gets irrationally annoyed when applications insist on referring
|
||||
to all audio files as Songs?
|
||||
You're listening to an audio file right now.
|
||||
Does it sound like I'm singing?
|
||||
Do you want me to sing?
|
||||
Okay then.
|
||||
Stop that, programmers.
|
||||
Sorry, where was I?
|
||||
Oh yeah, picking in some inconsistent names for metadata tags.
|
||||
You got the freedom to do this with Bourbus comments, of course, but there actually is
|
||||
a published standard with officially recommended names for the most useful metadata, which
|
||||
you should probably stick to for those fields so that software can more easily use it.
|
||||
Hang on, I have a list here.
|
||||
The published official Bourbus comment recommendations list includes.
|
||||
Title for the name of the track, same as Title for ID3.
|
||||
Version for when you have more than one track with the same title, like you might have two
|
||||
different versions of Schubert's Ave Maria, and they both have the title, Ave Maria, but
|
||||
maybe one is also tagged with a version of Metalcore Remix.
|
||||
Album is for the name of the collection that the track came from, just like with ID3.
|
||||
Track number, all one word, is for the track number on the album, or the episode in a podcast
|
||||
series, or whatever, artist, again, just like ID3.
|
||||
This is usually the name of the musician or band for music, though for classical music,
|
||||
it should probably be the name of the composer, or for an audio book, it would be the author
|
||||
of the book.
|
||||
Performer is the field you use when the artist isn't necessarily who is speaking or
|
||||
singing or whatever in the recording.
|
||||
So the artist might be Franz Schubert, but the performer is, say, Justin Bieber.
|
||||
Or you might have an audio book with artist as Stephanie Meyer, title as Twilight, and performer
|
||||
as Gilbert Gottfried.
|
||||
Copyright is for the typical copyright notice, like copyright 2013 Richard Solomon, or
|
||||
something similar.
|
||||
License is where you might put a link to a Creative Commons license that you're using,
|
||||
or a phrase like all rights reserved if you're a fascist freedom hater.
|
||||
Organization is where you put the record label, or perhaps Liebervox for an audio book,
|
||||
or indeed, hacker public radio for what you're listening to right now.
|
||||
Genre, like the field in ID3, except it's supposed to be an actual short human readable text
|
||||
to description of whatever genre the audio is supposed to fit into, rather than some
|
||||
relatively meaningless genre number.
|
||||
For the recording date, for the audio track in a nice, rational, standard ISO 8601 year-month
|
||||
dash day format, location for where the track was recorded, like the name of the recording
|
||||
studio, or OgCamp 27, or my mom's basement, contact for a URL email address, or whatever
|
||||
for contact information for the audio distributor.
|
||||
In the case of the file you're listening to right now, it should probably be HTTP colon
|
||||
slash slash hacker public radio.org, for example, description, which, like it says, is a place
|
||||
for a description of the audio, among other things, I think this is the appropriate place
|
||||
to put text copies of show notes for podcasts.
|
||||
Note that comment and comments are not in the recommended field name list.
|
||||
So I think usually you'll want to put your comments in the description field, or maybe
|
||||
not.
|
||||
There's nothing wrong or invalid about using a field named comment or comments, and in
|
||||
fact a lot of people seem to use comment.
|
||||
It's just that any playback software that is strict about sticking to the official recommendations
|
||||
list will probably ignore them and may not display them.
|
||||
Or use both.
|
||||
It's not like a few extra bytes of text is going to kill your download.
|
||||
And finally, there's even an ISRC tag for an international standard recording code number,
|
||||
which appears to be a special tracking number that can be issued for a fee naturally,
|
||||
from a central authority which seems to work for audio tracks kind of like an ISBN does
|
||||
for books.
|
||||
I've never seen this when used anywhere, but it's in the official verbose comment documentation,
|
||||
and I suppose it might be used by old school proprietary pay to listen sort of businesses.
|
||||
Also documented are a few additional useful fields.
|
||||
There's an easy specification for chapter marks that supports up to 1,000 chapters per
|
||||
file with tags named chapter 3 digit number and chapter number number number name.
|
||||
So for example, the beginning of the file might be the start of the first chapter, so
|
||||
you might have chapter 0, 0, 0 equals 0, 0, colon 0, 0.0 and chapter 0, 0, 0, name equals
|
||||
introduction.
|
||||
There's also a chapter number number number URL tag for links to chapter information
|
||||
stored online.
|
||||
This set of tags seems to be virtually identical to the human readable text that you feed
|
||||
to Matroska tools to cram the special binary Matroska chapter metadata structures into
|
||||
Matroska files, which I'll talk just a little bit about at the end here.
|
||||
On this specific subject, forgive me for harshing the verbose comments mellow by retrogracing
|
||||
back to ID3 for a moment, but there actually is apparently a quote addendum unquote for
|
||||
chapter support in ID3 V2.3 and 2.4.
|
||||
The specification seems to involve smushing a set of nested table of comments and chapter
|
||||
structures into the ID3 header, each containing their own set of embedded ID3 tags.
|
||||
Trying to read the documentation for this and determine why they did it that way may
|
||||
give you the mental equivalent of irritable bowel syndrome.
|
||||
The good news is that I have yet to find any tag editors that support this monstrosity.
|
||||
Well, except for a special ID3 V2 chapter tool written in Java, quote maintained unquote
|
||||
by the BBC, and not updated since 2006.
|
||||
As far as I can tell, very little if any playback software supports using it anyway, so you
|
||||
shouldn't have to worry about it.
|
||||
For reference, this specification was published in 2005, half a decade after ID3 V2.4, and
|
||||
most player software still doesn't even support ID3 V2.4 yet, or probably ever, I suspect.
|
||||
My opinion is to stick with Bourbus comment using formats for support of chapter features,
|
||||
or WebM if you must, or maybe MP4 using magic iTunes tags if Tim Cook is looking over
|
||||
your shoulder or paying you.
|
||||
Back to the happy land of Bourbus comments, there's a specification for replay gain for adjusting
|
||||
track volumes using fields named replay gain underscore track underscore gain, replay gain
|
||||
underscore track underscore peak, replay gain underscore album underscore gain and replay
|
||||
gain underscore album underscore peak in a machine parsable format that playback software
|
||||
can use if it wants to.
|
||||
And while I've still not yet gotten around to doing the geotagging episode, I'll tease
|
||||
it here a bit, because Bourbus comments seems to have the only documented standard for
|
||||
geotagging of media files besides JPEG and TIFF images.
|
||||
The field is called geo underscore location, and the contents take the form decimal attitude,
|
||||
semicolon, decimal longitude, and optionally another semicolon and elevation in meters.
|
||||
This format has the benefit of being both easily machine parsable and human readable.
|
||||
One other nice feature of the Bourbus comments specification that you should know about.
|
||||
You can and should use each field name as many times as is appropriate for each file.
|
||||
For example, each recording artist in an audio track should have their own artist tag
|
||||
in the file.
|
||||
If you have a recording of a collaboration between Slim Whitman, Celine Dion, Mel Tourmée,
|
||||
and Brian Johnson on a hip-hop album, you don't cram a single messy artist equals Slim
|
||||
Whitman and Celine Dion and Mel Tourmée and Brian Johnson field in there.
|
||||
You put in four separate artist entries, each one with one of those names.
|
||||
That way, if you freaking love Mel Tourmée, you can easily find all of your recordings with
|
||||
Mel Tourmée in them just by looking for artist equals Mel Tourmée and or performer equals
|
||||
Mel Tourmée without having to look for the name buried among a bunch of other names in
|
||||
a single field.
|
||||
ID3, on the other hand, mandates that only one of each kind of text field can exist.
|
||||
And if you have multiple artists, you cram them all into the same text string, separated
|
||||
by forward slashes or nulls if you're using V2.4.
|
||||
Similarly, if your file is, say, an audio tour guide recording or a travel log, you should
|
||||
put multiple Geo underscore location tags in the metadata, one for each location mentioned
|
||||
in the audio.
|
||||
Then, if you wanted to automate a search through your audio files, you could find anything
|
||||
that refers to nowhere Oklahoma, for example, just by looking for Geo underscore location
|
||||
tags near latitude 35.1592 and longitude minus 98.4422.
|
||||
There, see?
|
||||
Morbis comments, all simple, all human readable, all pretty intuitive.
|
||||
Well, like I warned you, except for one thing, attached pictures, more commonly called
|
||||
album art, are actually kind of a pain.
|
||||
It's not actually any worse than ID3, but compared to the simplicity of the rest of
|
||||
Morbis comments, it's a bit of a nuisance.
|
||||
There are two reasonable excuses for this.
|
||||
One is just that since a digital picture is obviously not text, unless maybe you convert
|
||||
it to ASCII art first, there just plain is no way to store it as a piece of simple human
|
||||
readable metadata.
|
||||
The second reason is that if you were doing things as properly as possible, it really
|
||||
shouldn't be in the metadata anyway.
|
||||
See, if you think about it, a picture of an album cover or any other attached picture
|
||||
isn't really mere metadata any more than an audio track is mere metadata for a movie's
|
||||
video track.
|
||||
Attached still images are really their own independent pieces of data that just happened
|
||||
to be associated with the audio track.
|
||||
The most properly correct way to implement this would seem to be a separate stream in
|
||||
the file with the attached pictures multiplexed in with the audio, just as the audio and subtitle
|
||||
text should be their own separate streams multiplexed with a video stream.
|
||||
The problem is, there is no specification that I can find for streams of, quote, series
|
||||
of independent still jpeg and png images, unquote, in org files, or MP3 for that matter.
|
||||
In any case, MP3 has been doing attached pictures as metadata for so long that it's kind
|
||||
of stuck as the way it's done.
|
||||
So the specification for attaching pictures to org Bourbus speaks and opus files involves
|
||||
encoding the binary image data to printable text characters so that it can be included
|
||||
in Bourbus comments, just like email programs have to do with email attachments.
|
||||
Something like five or ten years ago, a few people were doing this with an obsolete field
|
||||
called coverart, with the contents of the field just being the contents of a base 64
|
||||
encoded jpeg or png file.
|
||||
Don't do this, at least if you expect people to ever see the coverart.
|
||||
From what I can tell, pretty much nobody ever implemented using that field, and it's
|
||||
been long since replaced by an officially documented somewhat more informative structure.
|
||||
Here's where it gets a little obnoxious.
|
||||
The field name for the attached pictures actually has the unintuitive name, metadata
|
||||
underscore block underscore picture.
|
||||
And the contents of those fields are actually a complete base 64 encoded data structure
|
||||
that includes image within height, mime type, and optional description of the image,
|
||||
the same picture type designations that ID3's attached picture frames use, along with the
|
||||
actual image data.
|
||||
You can either thank or blame Flack for this one, depending on how you like Flack.
|
||||
I mentioned that Flack uses Warbus comments for its metadata.
|
||||
For all of the audio metadata I've talked about up to this point, that's true, but
|
||||
not attached pictures.
|
||||
Unlike Hog Warbus, Speaks, and Opus, Flack files aren't actually in AUG containers,
|
||||
but are their own special file format.
|
||||
That format actually includes a specific metadata block, structured to be very similar to the
|
||||
attached picture frames in MP3 files, and it just happens to be called metadata underscore
|
||||
block underscore pictures.
|
||||
For Opus, AUG Warbus, and Speaks, which don't have a special metadata block just for
|
||||
attached pictures, what happens is they build this same Flack data structure, then base
|
||||
64 encoded to turn it into text that can be shoved in as a valid Warbus comment.
|
||||
The data structure involved is pretty well documented in the Flack documentation, and
|
||||
these days most people don't need to worry about it unless their encoder doesn't have
|
||||
a built-in option to generate it, or they're adding it to the metadata by hand, or from
|
||||
a simple command line script.
|
||||
I actually wrote an implementation of this in PHP of all things, which I can share with
|
||||
anyone who wants it.
|
||||
I've also seen an implementation done in Pearl.
|
||||
Anyway, this gives the Warbus comment field for attached pictures a funny name, and is
|
||||
in kind of a hard to mess with format for people doing it by hand.
|
||||
The good news is that if someone writes a media player, that understands album art in
|
||||
Flack files, adding support for album art in Opus, Og Warbus Speaks, or even Og Theora
|
||||
video files for that matter, should hypothetically be pretty simple since other than having to
|
||||
pass the data through base 64 decoding to turn it back into a binary structure, you then
|
||||
pass that directly to the already existing Flack album art code to get the pictures out.
|
||||
More good news for those of us switching to the superior new Opus format?
|
||||
The command line Opus encoder now has a dash dash picture option that works virtually
|
||||
identically to the one in the Flack encoder, with the same argument structure, which at
|
||||
least makes it pretty easy to attach pictures to Opus files at encoding time.
|
||||
Og Warbus users still need to deal with this by either pre-generating a metadata underscore
|
||||
block underscore picture of Warbus comment text to include as a command line option for
|
||||
awgank, or to attach the pictures after the fact using a GUI tag editor or a script based
|
||||
on something like taglib or mutagen.
|
||||
To wrap up, there are two more audio file formats you might run into somewhat regularly
|
||||
online that you might want metadata for.
|
||||
Wave files are still more or less the lowest common denominator for audio files, usually
|
||||
being lossless PCM audio, and being widely supported, and I guess pretty simple in structure.
|
||||
There are actually standards for metadata in Wave files, but I haven't managed to dig
|
||||
up any clear documentation for this yet.
|
||||
I know it's out there somewhere, I just haven't got it myself.
|
||||
Apparently, Audacity actually embeds the limited set of metadata that it supports,
|
||||
as both a standard info chunk, whatever that is, documented for Wave files, and, as an
|
||||
ID3 tag in some way when it saves Wave files, the other format you might some day run into
|
||||
for audio files is WebM.
|
||||
WebM is a specific implementation of the Matroska file format.
|
||||
To me, Matroska metadata looks even worse than ID3.
|
||||
Like ID3, it seems to be made up of about 100 rigidly defined tag names, of which WebM
|
||||
looks to support about 70.
|
||||
The metadata is heavily video-centric, and seems to assume that Matroska files will
|
||||
contain movies.
|
||||
Among the metadata tags for WebM and Matroska are things like special little fields designated
|
||||
for choreographer, costume designer, director of photography, screenplay writer, assistant
|
||||
director, and so on.
|
||||
There's even a character tag that isn't actually for the file as a whole, but is supposed
|
||||
to be buried inside an actor tag, which I guess makes the character tag a sort of meta-medida.
|
||||
I imagine Peter Sellers movies in WebM form must have some pretty messy metadata.
|
||||
The whole thing seems to be object-oriented, so there are several other cases where tags
|
||||
are supposed to be buried inside other tags data structures as well.
|
||||
Anyone who isn't in one of the special collection of video production roles that the Matroska
|
||||
standard decided to include has to settle for getting crammed into a generic, thanks to
|
||||
tag, kind of like ID3, and it's involved person structure.
|
||||
Zooming the dolly grip, clapper loader, best boy, and gaffers to be second-class citizens
|
||||
from Matroska files, the standards also say all this stuff should be tacked onto the
|
||||
end of the file like ID3 v1.
|
||||
Apparently the idea is that you can then rewrite the metadata without having to rewrite
|
||||
the whole file.
|
||||
On the other hand, that makes it not so great for streaming media.
|
||||
Since the player won't get the title, album, artist, executor producer, genre, and so
|
||||
on, until after the stream is finished and it's too late to display that information anyway,
|
||||
unless it's buffering the whole file before it starts playing.
|
||||
Lastly, as far as I can tell, WebM doesn't actually support attached pictures at all, though
|
||||
the broader Matroska standard does in a limited way.
|
||||
The standard has room for large and small versions of a sort of banner graphic and large
|
||||
and small versions of a more typical album art graphic for a total of four images.
|
||||
For audio, you probably won't have to deal with this really.
|
||||
The only place I've ever seen WebM audio files aside from ones I've made myself for
|
||||
testing is in GNU Media Goblin, which as far as I can tell only uses that format, because
|
||||
they originally implemented audio only as a kind of afterthought to video, so their audio
|
||||
for the project is just video file without video.
|
||||
I assume that once they've implemented multi-format support, the default for audio will end
|
||||
up being Opus or OgVorpus, and then nobody will really be using WebM for anything but
|
||||
Internet TV.
|
||||
I'm kind of waiting for Opus output from Media Goblin before I start trying to use it
|
||||
seriously, at which point it will probably deserve its own HPR episode.
|
||||
To finish off this part, should I mention special Microsoft Windows Media?
|
||||
Hmm, no, nobody should mention Windows Media.
|
||||
Oh, alright, just quickly.
|
||||
If you're unlucky, you might run into .asf or .wma audio files.
|
||||
The situation with ASF and WMA and WMV is kind of like the situation with MP4 and M4A
|
||||
and M4V files.
|
||||
Several of these Windows Media files are really just ASF format.
|
||||
The metadata for these seems pretty limited.
|
||||
There are five different metadata, quote, objects, unquote, which can contain different
|
||||
kinds of metadata.
|
||||
The so-called content description object is for the very small set of predefined metadata
|
||||
fields that the ASF format defines.
|
||||
These are title, author, copyright, description, and rating, with up to 64 kilobytes of text
|
||||
for each field.
|
||||
The album art and URLs for copyright warning stored online goes in the so-called content
|
||||
branding object, which seems to be limited to a single banner image, if I'm interpreting
|
||||
the specification correctly.
|
||||
The other three objects are extended content description object, which seems to be where
|
||||
you put any random other metadata that you want that isn't in the approved metadata
|
||||
field list for the content description object, and a metadata object, which seems to be
|
||||
just an extended content metadata object that can refer to a specific stream in an ASS
|
||||
file and not just the whole file, and finally, a metadata library object, whose description
|
||||
makes my head hurt, but as far as I can tell is for cramming anything else that doesn't
|
||||
belong in any of the other objects somehow.
|
||||
I get the impression that all of these end up looking like Windows registry entries
|
||||
in the end.
|
||||
The good news is that in my experience, the only people who make much use of .wma files
|
||||
are a few proprietary music, quote, selling, unquote, businesses, who offered as one
|
||||
option along with MP3 and other formats, or people who seem to have apparently gotten
|
||||
a seemingly sweet deal for Microsoft back in the early to mid-2000s to use Windows
|
||||
media systems for streaming audio and who haven't been wanting to spend any money upgrading
|
||||
to something modern instead for nearly a decade, and if they offer anything else, there's
|
||||
a fair chance it's real player files.
|
||||
Remember real player?
|
||||
You do?
|
||||
Oh, I'm sorry.
|
||||
Dang, you're old.
|
||||
Before, you probably won't see too much of this online either and won't need to deal
|
||||
with it often, at least not for audio, and even when you do, you probably won't actually
|
||||
have much cause to mess with the metadata.
|
||||
And if you do, it's probably because you're a bad person and this is your punishment.
|
||||
Repent sinner?
|
||||
Yeah, okay, that's probably enough of an introduction to the subject.
|
||||
How about I ran off this episode with some suggestions and wrap it up with some tips
|
||||
on using an editing metadata?
|
||||
My first and probably most important suggestion would be to actually use the freaking metadata.
|
||||
Yeah, I'm looking at you, Linux voice podcast and the opus feed, among others.
|
||||
When I am elected supreme emperor of internet audio, it will be mandatory to use at the very
|
||||
least the basic fields that most audio players will display, like the title, artist and
|
||||
quote album, unquote.
|
||||
I suggest to you that putting audio online with no metadata is basically a form of trolling.
|
||||
It's like when someone posts a really awesome picture somewhere online saying, wow, check
|
||||
out this awesome place.
|
||||
But then all the metadata has been stripped out by the dorks at the image hosting service
|
||||
so you can't even tell when the picture was taken, let alone where this awesome place
|
||||
actually is.
|
||||
And you're basically being asked to beg the poster to actually tell you where the place
|
||||
is.
|
||||
It's like people that go on some social media network and post something vague like, wow,
|
||||
that was amazing.
|
||||
My life has now changed forever.
|
||||
And then you have to digitally prostrate yourself before them and beg them to tell you what
|
||||
it was that was actually amazing.
|
||||
And then after some irritating coiness, you find out they were just raving about the new
|
||||
brand of instant ramen they just ate for lunch.
|
||||
And you have to spend all day hunting them down so you can beat them repeatedly with
|
||||
a sweaty gym sock stuffed with used cat litter for wasting your time.
|
||||
Well, come on, I know I'm not the only one who fantasizes about that now and then.
|
||||
Anyway, ideally you should include as much relevant metadata as possible.
|
||||
That includes, I beg of you, any relevant geolocation data.
|
||||
Where exactly was Og Camp 13?
|
||||
If the interviews had geo underscore location tags, I could look it up on open street map.
|
||||
Same goes for discussions of hacker spaces, particularly good stores or restaurants you
|
||||
might mention, the locations of dead drops or geocaches and so forth.
|
||||
One could even, for example, have a promo for Linux Fest like, say, Northeast Linux
|
||||
Fest 2014 added to one's podcast and then include a geo underscore location tag with the
|
||||
location of the venue for that.
|
||||
Once we find out what that venue will be, hint hint.
|
||||
As far as cover art goes, my thinking on this has completely changed over the last couple
|
||||
of years.
|
||||
Since the mid-1990s, when it started showing up in MP3, I always thought album art was
|
||||
silly, frivolous, space-wasting fluff.
|
||||
I mean, think about it, do you insist on staring at the CD case while you're listening
|
||||
to a CD?
|
||||
For those of you young people who may be confused, CDs were a DVD-like physical medium that we
|
||||
old people used to use to extract data to make MP3s from instead of just downloading
|
||||
them.
|
||||
Anyway, I never really saw the point of it, but in the last couple of years I've found
|
||||
I actually do prefer to have it.
|
||||
Even in its ordinary, expected use of actually having a picture of the physical mediums
|
||||
packaging, it's kind of nice as a quick visual reminder of which collection the audio I'm
|
||||
listening to came from.
|
||||
Of course, even more interesting might be the extraordinary, unexpected uses.
|
||||
If you're recording a podcast describing how to make something, some bonus illustrations
|
||||
of the process included as attached pictures would be a nice bonus for listeners interested
|
||||
enough to look for them.
|
||||
If you have audio from a specific location, or about a specific location, you could benefit
|
||||
everyone by including an image of a map as an attached picture, or a geotagged picture
|
||||
of the location.
|
||||
If you're doing a podcast for aquarium owners, you might even have a legitimate cause
|
||||
to use that bright colored fish attached picture type.
|
||||
If you want to mess with the NSA, you could even record a brief audio message, then encode
|
||||
that as a low bit rate opus or codec2 file, then steginographically embed that file into
|
||||
an image and include that image as an attached picture.
|
||||
So in short, the feature is too much fun to ignore, and the more people use it, the more
|
||||
playback and tagging software will start supporting it correctly.
|
||||
Except for attached pictures, the amount of data and an additional tag of metadata adds
|
||||
to the file is negligible, and worrying about wasting space with most metadata is like
|
||||
worrying about wasting film when using a digital camera.
|
||||
A well-designed set of attached pictures won't bolt the file too much either if you're
|
||||
reasonably careful, and should definitely be included wherever they may add some usefulness
|
||||
to the file.
|
||||
Anything you think someone might be interested in knowing about the recording later, please
|
||||
include it.
|
||||
I know at least one person who will happily examine audio metadata for interesting information
|
||||
that the audio player doesn't necessarily shove in my face, and I imagine I can't be
|
||||
the only one.
|
||||
Also, if you can reasonably identify parts of your audio that would make good obvious
|
||||
times on the subject changes or something important happens, consider including some
|
||||
chapter markings as a reward for player software that uses them and to encourage the ones
|
||||
that don't to start.
|
||||
Without the attached picture itself, if you care what Apple thinks, iTunes apparently
|
||||
uses 600x600 as the standard-sized recover art images, though from what I've read it sounds
|
||||
like you can use other sizes as well.
|
||||
Personally, unless you have a good reason, I'd recommend sticking to around that size
|
||||
or smaller just so you can tell what the images might look like on screens with lower resolution,
|
||||
but I wouldn't worry too much about keeping them square.
|
||||
Use them as JPEG or PNG and they'll fit into ID3 or Vorvus comment album art just fine.
|
||||
One warning, so far many tag editors I run into that support cover art at all only support
|
||||
a single cover art image, which is usually set by default to picture type 3, that is,
|
||||
front cover.
|
||||
If you want to include multiple images, you might find it easier to do it at encoding
|
||||
time.
|
||||
The command line encoders for FLAQ and OPUS allow you to include as many attached pictures
|
||||
as you want as switches.
|
||||
The AUG Vorvus encoder doesn't, but like the OPUS encoder, the current AUG Vorvus encoder
|
||||
accepts FLAQ files directly for input, and it will transfer the FLAQ metadata over
|
||||
to the AUG Vorvus file it creates, including the attached pictures from what I can tell.
|
||||
Therefore, if you either get or make FLAQ files to work from as your originals and put
|
||||
all of the metadata in there, you can use those FLAQ files as input to generate OPUS and
|
||||
AUG Vorvus files without worrying about the metadata any further.
|
||||
For MP3, the only encoder I am familiar with at all is the LAME encoder, which seems
|
||||
to produce pretty good quality sound by MP3 standards, but appears to be limited to a single
|
||||
attached picture on the command line, speaking of MP3 limitations.
|
||||
Most of the common information that people put in Vorvus comments should have a reasonably
|
||||
obvious equivalent for MP3, so you shouldn't have any trouble figuring out which special
|
||||
little ID3 field to put the title and artist an album and so on in, if you have to deal
|
||||
with MP3 files.
|
||||
A table of mappings between ID3 and Vorvus comments would probably be really handy, but even
|
||||
if I had such a thing ready, this episode would get really, really tedious, like even more
|
||||
than it already is, if I tried to read it out to you.
|
||||
So for now, just look it up online if you need to, and I'll try to put up a post at dogphilosophy.net
|
||||
with a table later.
|
||||
Not only take care of the most common tags, though, so what about other potentially useful
|
||||
metadata for MP3-like geolocation?
|
||||
It turns out I was slightly lying when I said that text fields in ID3 were limited to
|
||||
one each.
|
||||
There's actually a special user defined a text field in ID3 designated TXXX.
|
||||
No, that's not where the audio codec-themed erotic fanfiction goes, but wait, come to think
|
||||
of it, if you had such a thing and you wanted to embed it in MP3, that actually is where
|
||||
it would go.
|
||||
What I mean is, that's not why the XXX is in there.
|
||||
Anyway, the data structure for the TXX field has two parts, a string for the name or description
|
||||
of the text that you're putting in it, and the text string itself.
|
||||
The specification does not allow multiple TXX tags with the same description, but you
|
||||
can include as many separate TXXX tags with different descriptions as you want.
|
||||
This makes it an obvious place to include Bourbus comments that can't readily be pigeonholed
|
||||
into the pre-existing ID3 fields.
|
||||
I propose that for this purpose, the description part should be used for a Bourbus comment tag
|
||||
name, while the text part should include every relevant Bourbus comment with that tag name.
|
||||
For a useful example, ID3 doesn't support geotagging, so instead, put in a TXXX frame with
|
||||
the description, GEO underscore location, and the text contents of the tag would be GEO
|
||||
underscore location equals 424347571, semicolon minus 83.9849477, semicolon 270, or whatever
|
||||
cord and it's irrelevant.
|
||||
If there is more than one, just stick a carriage return between them so that each geo-location
|
||||
equals whatever entry has its own line in the same text string, at least that's how
|
||||
I'd do it.
|
||||
For editing metadata, say that three times fast, after the encoding is done, I usually
|
||||
use KID3, which as of the current 3.0 version supports Opus, as well as Aug Bourbus,
|
||||
Flack, MP3, and several other formats, in addition to now including a command line version
|
||||
that could be used from scripts.
|
||||
I don't use Windows or Mac systems, but KID3 is available for them as well, so I'd recommend
|
||||
giving it a try.
|
||||
So far, it seems to support pretty much every feature of ID3B2.3 and Bourbus comments
|
||||
that you might want, with the sole exception of multiple attached pictures.
|
||||
If you're on Linux, it'll almost certainly be in your distribution's repository.
|
||||
If not, you can get it from kid3.sourceforge.net.
|
||||
On Linux officially and apparently unofficially on at least Mac OS and possibly Windows, I
|
||||
can also recommend Puddle Tag, which does appear to fully and properly support multiple
|
||||
attached pictures, and also has up to date file format support.
|
||||
Puddle Tag is a little more awkward to use for individual files, but it has a nice interface
|
||||
for editing whole directories of files at a time.
|
||||
Genome users on Linux may be familiar with a program called Easy Tag, but at least as
|
||||
of late 2013, I can't really recommend it unless you don't edit anything but MP3.
|
||||
When I looked, it seemed like their Aug support hadn't been updated in a decade.
|
||||
It's still trying to use a non-standard set of cover art tags for attached pictures.
|
||||
They still don't support Opus, and glancing at the source code and a quick test made
|
||||
it look like they might only support a small specific set of basic Bourbus comment tags.
|
||||
It also appears to no longer be cross-platform, though there was apparently a Windows version
|
||||
many years ago.
|
||||
Try Puddle Tag, it looks like it has a similar interface to what Easy Tag seems to do.
|
||||
To finish up, here's a collection of command line tools I've found that may be of use to
|
||||
you when dealing with audio metadata.
|
||||
I already mentioned the existence of kid3-cli.
|
||||
For MP3 files, I'll mention MPG123-id3-dump, which comes with the MPG123 command line audio
|
||||
player, and like the name implies, it's used to extract id3 metadata, including attached
|
||||
pictures.
|
||||
Also potentially handy is id3-t-e-d, which seems to be able to extract virtually any
|
||||
id3 tag, and can add or edit most of the useful ones, including adding attached pictures,
|
||||
though it's hard-coded to tag them all as front cover.
|
||||
The Vorbus Comment utility from the Vorbus Tools package can be used to add or edit tags
|
||||
in Og Vorbus files, though you'll have to generate the metadata underscore block
|
||||
underscore picture tag text yourself since it doesn't handle them.
|
||||
The same package includes the Og Info utility, which displays Og Vorbus metadata.
|
||||
The Opus Tools package includes an encoder and decoder, as well as the Opus Info utility,
|
||||
which, like the Og Info utility, displays audio metadata for Opus files.
|
||||
This one will verify attached pictures, but doesn't currently dump them.
|
||||
Honorable mention goes to the XIFTUAL utility, which is mostly used for digital photograph
|
||||
metadata, but is also able to display metadata from pretty much every audio format I've mentioned
|
||||
except for Opus.
|
||||
Okay, one last thing.
|
||||
If you'll forgive me jumping mental tracks one last time, as far as I can tell, none of
|
||||
the web browsers have any provision for handling or displaying audio metadata.
|
||||
No matter how well they support the HTML5 audio tag otherwise, no, not even Mozilla Firefile
|
||||
Box.
|
||||
That means that for playing within a web browser, if you want to have the audio metadata
|
||||
shown, you have to insert a copy of the decoded metadata somewhere else in the web page,
|
||||
which kind of defeats the purpose of having the metadata attached to the audio file the
|
||||
way it's supposed to be in the first place.
|
||||
The same goes for video, incidentally, but whatever we're talking about audio today.
|
||||
If anybody out there has any contacts at Mozilla, is there any chance you could get this going?
|
||||
I specify Mozilla because they're probably the only organization that cares enough to
|
||||
bother.
|
||||
Google can't even get Opus support live by default after a year and a half, and probably
|
||||
wouldn't bother with this unless they could somehow make you go through Google Plus to
|
||||
get to it.
|
||||
Microsoft seems like it can't innovate at all without having a battle to the death between
|
||||
at least two departments, and then their legal department determining that the alleged
|
||||
innovation by the survivors would be useful for suing people.
|
||||
An apple firmly denies the existence of the world beyond iTunes, and if iTunes doesn't
|
||||
display it then you don't need to know it, so sit down and shut up and look at the
|
||||
pretty colors.
|
||||
Help me Mozilla Firefox, you're my only hope.
|
||||
Okay, for those of you just tuning in, I've just been talking for whatever, about metadata
|
||||
for audio files you're likely to find on the internet, and you just missed it.
|
||||
So here it is again.
|
||||
MP3 files usually use a metadata format called ID3 version 2.3, which is an awful, fussy
|
||||
micromanaged sort of format, but very common so you'll probably run into it a lot.
|
||||
Flack, Opus, OgVorbus, and Speaks all use Vorbus comments, which are simple and awesome
|
||||
and all the cool people use it, and you should too, unless you want to be uncool, and probably
|
||||
even then.
|
||||
You should tag all of your audio files with as much relevant metadata as you can for
|
||||
the betterment of all humanity, or at least the good parts of humanity, including attached
|
||||
pictures and geolocation data, and you should either put all the metadata in at encoding
|
||||
time, or you can use KID3, Puddle Tag, or various other tag editors to add or change tags
|
||||
later, and there are some handy command line utilities out there for reading and updating
|
||||
various forms of audio metadata as well.
|
||||
Also, why the foop can't we see the metadata in the audio that web browsers play?
|
||||
Thanks for listening.
|
||||
We hope this edition of Hacker Public Radio has provided both entertainment and education
|
||||
in exchange for your valuable listening time, but that's not all.
|
||||
After all of this information, you'd probably like some examples, right?
|
||||
Well has Hacker Public Radio got a deal for you?
|
||||
That's a rhetorical question.
|
||||
Yes, Hacker Public Radio has a deal for you.
|
||||
This very file that you're listening to right now has been stuffed full of top quality,
|
||||
all natural, organically grown, artisanal metadata, handpicked by Hacker Public Radio specialists.
|
||||
You can use the tools mentioned in this episode, or any other decent metadata handling program,
|
||||
to examine this file for ideas on how you might use or abuse the technology for your own
|
||||
amusement.
|
||||
If you're interested in still more stuff in later episodes, I've actually started keeping
|
||||
a running list of random, potentially upcoming topics I'm thinking of doing future episodes
|
||||
on, plus a few that I'm already working on, at hpr.dogphilosophy.net, so you're welcome
|
||||
to stop by and comment on topics that might interest you.
|
||||
The End.
|
||||
You have been listening to Hacker Public Radio, or Hacker Public Radio does our, we are a
|
||||
community podcast network that releases shows every weekday Monday through Friday.
|
||||
Today's show, like all our shows, was contributed by an HPR listener like yourself.
|
||||
If you ever consider recording a podcast, then visit our website to find out how easy
|
||||
it really is.
|
||||
Hacker Public Radio was founded by the digital dog pound and new Phenomenal and Computer
|
||||
Club.
|
||||
HPR is funded by the binary revolution at binref.com, all binref projects are crowd-responsive
|
||||
by linear pages.
|
||||
For shared hosting to custom private clouds, go to lunarpages.com for all your hosting
|
||||
needs.
|
||||
Unless otherwise stasis, today's show is released under a creative commons, attribution, share
|
||||
a like, free those own license.
|
||||
Look it's not immoral, both MP3 and Ogborbus are about 20 years old, easily past the age
|
||||
of consent.
|
||||
Stop looking at me like that.
|
||||
Reference in New Issue
Block a user