120 lines
9.1 KiB
Plaintext
120 lines
9.1 KiB
Plaintext
|
|
Episode: 2016
|
||
|
|
Title: HPR2016: Echoprint
|
||
|
|
Source: https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr2016/hpr2016.mp3
|
||
|
|
Transcribed: 2025-10-18 13:20:59
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
This is HPR episode 2016 entitled Echo Print.
|
||
|
|
It is hosted by India and is about 13 minutes long.
|
||
|
|
The summary is, I share what I learned about the Echo Print Music Identification System.
|
||
|
|
This episode of HPR is brought to you by an Honesthost.com.
|
||
|
|
Get 15% discount on all shared hosting with the offer code HPR15.
|
||
|
|
That's HPR15.
|
||
|
|
Better web hosting that's Honest and Fair at An Honesthost.com.
|
||
|
|
This is Leandere for Hacker Public Radio, recording Friday, April 25th, 2016.
|
||
|
|
I'm going to be talking today about the Echo Print Music Fingerprinting System.
|
||
|
|
Earlier this year, a message went out to the mailing list asking about possible ways of
|
||
|
|
identifying whether recordings uploaded to Hacker Public Radio had had the intro prepended
|
||
|
|
and outro appended or not, and if it was possible to automate this.
|
||
|
|
This got me thinking about how audio is identified and specifically music, and I'd
|
||
|
|
remembered reading some while back about the Echo Nest and Echo Print projects.
|
||
|
|
So I'd throw that out there as a suggestion of something to look into, and wouldn't you
|
||
|
|
know it?
|
||
|
|
You suggest doing something and pretty soon you end up doing it yourself.
|
||
|
|
Now I never was quite able to get good detection of the HPR intro and outro, but I did learn
|
||
|
|
a lot about the system and found it pretty interesting, so I thought I would record an
|
||
|
|
episode and share in case someone else is interested in it.
|
||
|
|
So first of all, the Echo Print project has a website, it's echoprint.me, and there's
|
||
|
|
a pretty high level overview there of how it works under this, how it works link, slash
|
||
|
|
how is the URL.
|
||
|
|
Now when I first started, I assumed this was going to be like most other fingerprinting
|
||
|
|
systems, just sort of a hash that I could just do direct comparisons to.
|
||
|
|
So I downloaded a few audio files, including the Hacker Public Radio, Intro and outro,
|
||
|
|
Black versions, then I re-encoded those to AUG with the idea that I could run the fingerprinting
|
||
|
|
over both and compare the two, and hopefully we'd get a match.
|
||
|
|
So I downloaded the code generating part of Echo Print, and they have the source code
|
||
|
|
for it on GitHub, it's at github.com slash econist slash echo print dash code gen.
|
||
|
|
The code is MIT licensed, and I was able to compile it with no problems on my Raspberry
|
||
|
|
Pi, and so I ran the fingerprinting on both audio, it uses FFMPEG to do the audio decoding,
|
||
|
|
so there were no problem handling both formats.
|
||
|
|
I learned pretty quickly that it's not just a matter of using the code generator to get
|
||
|
|
hashes and then comparing them.
|
||
|
|
First of all, the output of the code generator was JSON, and that included what looked like
|
||
|
|
a base64 chunk, which was the hash itself, and they didn't look the same at all, and
|
||
|
|
they weren't even the same length, so I needed to dig a little deeper.
|
||
|
|
So the first thing I did was try decoding the base64, which turned out it was actually
|
||
|
|
a URL safe version of base64, where the plus and slash are replaced by a hyphen and underscore,
|
||
|
|
and then I just got this big binary blob, and at that point I pretty much knew I was in over my
|
||
|
|
head, but I decided to keep going. So I did a little more searching around on their website,
|
||
|
|
found a mention of a white paper, and through a little googling was able to find that,
|
||
|
|
and that gave a little more information about how the algorithm worked, and
|
||
|
|
why I wasn't able to just direct compare these things. So I'll give a link to the white paper
|
||
|
|
in the show notes. It's at www.e.columbia.edu. Slash till the dpwe slash pubs slash lswp11-ecoprint.pdf.
|
||
|
|
And it's co-authored by Daniel PW Ellis, who is with the laboratory for recognition and
|
||
|
|
organization of speech and audio at Columbia University, Brian Whitman from Econest,
|
||
|
|
and Aleister Porter, who is from the Center for Interdisciplinary Research in Music Media
|
||
|
|
and Technology at McGill University. So I'll try to summarize for you what I learned about the
|
||
|
|
fingerprinting algorithm from the paper. You start with the audio, they do a little bit of
|
||
|
|
pre-processing to get everything in the same format, and then they chop the signal up into eight
|
||
|
|
frequency bands from zero hertz up to about five and a half kilohertz. I assume that something
|
||
|
|
just sort of like a Fourier transform just to get the signal into those eight buckets.
|
||
|
|
Next, they in each bucket go through and they detect what they call onset events.
|
||
|
|
From what I could figure out, that's basically just places where the volume increases,
|
||
|
|
trying to find the beginnings of notes or beats.
|
||
|
|
They take those onset events in groups of four, and then for each pair from that group,
|
||
|
|
they generate a hash. So all possible pairs for a group before,
|
||
|
|
keeping the pairs in order gives six pairs, so six hashes.
|
||
|
|
And then those hashes are stored along with the time they occur.
|
||
|
|
So at this point, I at least knew what the fingerprint represented, so now I just needed to deal
|
||
|
|
with the format. What was very helpful for that was actually looking at the server code for the
|
||
|
|
echo print project, and that's at github.com slash echo nest slash echo print dash server,
|
||
|
|
and that is licensed under the Apache 2 license.
|
||
|
|
So looking through that code, I found that the
|
||
|
|
JSON payload that gets sent to the server was base 64 encoded, and the binary blob that I was
|
||
|
|
getting after decoding that was actually gzip compressed, but with no gzip header.
|
||
|
|
So I took my samples that I generated, pre-pended a gzip header, and decompressed it.
|
||
|
|
And what I found was essentially just a long string of hex digits.
|
||
|
|
And the early ones, it was pretty obvious that values were getting repeated, and they were
|
||
|
|
five digit hexadecimal numbers. And each five digit number was repeated about six times
|
||
|
|
throughout the first part of the file. And then in the second half of the file, there
|
||
|
|
didn't seem to be any repetition at all. The numbers seemed much more random.
|
||
|
|
And the reason I found that out is, or the reason for that, is that all of the lengths
|
||
|
|
are listed first, and then all of the hashes are listed at the end of the file. And the advantage
|
||
|
|
that that gives is you get better compression, because each length is repeated six times,
|
||
|
|
once for each of the six pairs for that group. And since those are all together,
|
||
|
|
gzip is able to find those repetitions and compress them very well.
|
||
|
|
So I needed to take the file, split it into half, and split each half into a five digit length
|
||
|
|
for the first half, and a five digit hash for the second half, and then pair those up.
|
||
|
|
The next thing that the server does to compare two fingerprints is it compares all of the hashes
|
||
|
|
from sample A, and matches them up against hashes in sample B wherever they're equal.
|
||
|
|
Then it calculates the minimum time offset between each pair of matching hashes,
|
||
|
|
and does a histogram. So it counts how many times a distance between two hashes occurs.
|
||
|
|
And then it takes the most common times, and uses that as a score. The count of the
|
||
|
|
number of occurrences of that time distance. And that essentially allows finding two matching
|
||
|
|
pieces of music that are just shifted by a time offset. So rather than looking for occurrences
|
||
|
|
at exact times, they just check to make sure that the time difference between
|
||
|
|
two hashes is pretty constant throughout the sample. So I wrote some conversion to go from the JSON
|
||
|
|
through the base 64, pre-pen the gzip header, decompress, split the text up into pairs of five
|
||
|
|
digit hex numbers, and then wrote an ox script to try to score the difference between the two files,
|
||
|
|
essentially trying to duplicate what the server was doing. I never did quite get it worked out well
|
||
|
|
enough to be able to consistently identify the HPR intro and outro. And I think in the process,
|
||
|
|
I learned that a major reason for that is with this kind of fuzzy matching,
|
||
|
|
it works pretty well for audio identification. So let's say you have a large
|
||
|
|
database of pre-computed fingerprints. Like the Ekonass project does, they use I think it's the
|
||
|
|
million songs database. It's pretty easy to say which of these songs does this new sample match
|
||
|
|
most closely, which gives very good results, but it's harder to say given any two samples,
|
||
|
|
are they the same song? It's pretty easy to say no, false negatives are pretty rare,
|
||
|
|
but it's rather hard to say yes, you know, how close a match is close enough.
|
||
|
|
So I didn't really get what I was after, but did learn quite a bit, and hopefully it was in
|
||
|
|
interest to you as well. Tune in tomorrow for another exciting episode of Hacker Public Radio.
|
||
|
|
You've been listening to Hacker Public Radio at HackerPublicRadio.org. We are a community podcast
|
||
|
|
network that releases shows every weekday Monday through Friday. Today's show, like all our shows,
|
||
|
|
was contributed by an HPR listener like yourself. If you ever thought of recording a podcast,
|
||
|
|
then click on our contribute link to find out how easy it really is. Hacker Public Radio was
|
||
|
|
founded by the digital dog pound and the infonomican computer club, and it's part of the binary
|
||
|
|
revolution at binrev.com. If you have comments on today's show, please email the host directly,
|
||
|
|
leave a comment on the website or record a follow-up episode yourself. Unless otherwise status,
|
||
|
|
today's show is released on the creative comments, attribution, share a life, 3.0 license.
|
||
|
|
You've been listening to Hacker Public Radio at HackerPublic Radio at HackerPublicRadio.org.
|