Decoding MP3s, Volume 1: I Guess I'm Decoding MP3s Now

In which I find out that I don’t know how MP3 decoding works

Coming into the Recurse Center, I thought I’d found a solid project: move my music and videos into cloud storage and create a media player that can move stuff to and from local disk without the user knowing or caring. This is similar to Spotifys approach of letting users keep certain media on their device while streaming everything else. I felt this had the advantage of being a doable project while opening enough rabbit holes to keep me occupied for three months. Two notable areas of interest were:
- How does streaming work? What’s RTTP? What’s HLS?
- How much can I learn about Music Information Retrieval in three months?

Physically, I am near-sighted, but mentally… probably neither near- nor far-sighted. After 26 years, I still don’t recognize this in myself. Despite dreaming up these lofty goals for myself, I was still surprised when I started to write my program and followed roughly this sequence of events:
1. open a file
2. …?

I didn’t even know what happened after I called open()! After all this, I was foiled by needing to play MP3s. I’d used Python to analyze WAV files, but all I’ve got to listen to are MP3s! Googling “Python play MP3s” and “Python decode MP3s” and “why am I an idiot?” yielded a few useful results, but while looking at the various libraries the internet threw my way I ultimately I came across a common operation:

subprocess.Popen('ffmpeg -i your_mp3.mp3 your_new_wav.wav', ...)

For those unfamiliar with the above command, it’s Python’s way of running a new process from within a Python program. The first argument is essentially the same as you’d use on the command line and ffmpeg is a program which, among many many other things, converts audio files from one format to another. Those scoundrels were all just converting MP3s to WAVs!

And so my first real project is to decode MP3s. Volumes 2-? of this blog series are forthcoming, but in order to write them I need to know what I’m doing. I (thought I) knew the basics: frames, FFT, cutting out frequencies humans can’t hear (sorry dogs), Huffman encoding. What I soon realized was that I didn’t know how those things fit together, and also that those are by far not the only components of MP3 decoding.

I’ve started work on MP-3PO, an MP3 decoder in pure Python. Luckily, a fellow Recurser also happens to be the patron saint of all things audio/visual codecs, and they introduced me to tools that made life in the weeds of examining bit streams much, much more enjoyable. The latest revelation was that I could use MediaConch to get the parsed values of all the MP3 headers and side information from an MP3. I almost cried.

I now know a little more about how MP3s are made. Hopefully later this week I’ll have enjoyable sound coming out of my speakers.