During the course of developing one of the original DailySplice features, we spent a lot of time researching how to join MP3 files together so they would play back properly in any MP3 player. If an MP3 doesn’t have a consistent encoding scheme, some players will get confused and stop playing. Since we wanted to package a bunch of audio files into one, we had to reencode each file into a standard format. (A plethora of different encoding schemes were used among the source files. Different bitrates, sample rates, stereo/mono, etc. The goal was to find an encoding scheme that yielded good sound quality while minimizing bandwidth. It took a couple attempts to find something that worked.
We decided to use the LAME library as the API for reencoding the audio. That was a relatively easy decision; it’s licensed with LGPL, open source, and easy to use. LAME is also renowned for its sound quality. There are few (perhaps none) encoding tools that surpass LAME’s quality.
The difficult part was finding a combination of MP3 properties that yielded good sound quality and low bandwidth consumption. MP3 audio has a number of factors that affect sound quality:
- bitrate: The number of bits required to encode one second of audio. In general, the higher the bitrate, the better the sound quality. For music, the bitrate is usually about 192 kilobits per second (kbps) or 192,000 bits per second. For speech, it’s usually much lower. 64kbps or less.
- sample rate: The number of samples per second (in Hz) taken from the analog signal (microphone or whatever) used to construct the digital signal. Once again, a higher sample rate implies better sound quality.
- encoding mode: How to distribute the bits within the audio file. The most common encoding mode is Constant Bit Rate (CBR) which means that every second of audio takes exactly the same amount of bits. Another one is Variable Bit Rate (VBR) where some parts of a file can use more or less bits than others. For example, when people are talking, more data is needed to encode the conversation, but when it’s quiet, less data is needed.
- channels: Is it mono or stereo?
The first attempt used CBR to encode all the files into the exact same format: 64kbps as the bitrate, 32000hz as the samplerate, and since we were working with a lot of music, stereo.
After transcoding a set of audio files, they would be joined together. MP3 is easy to join together if all the pieces are in the same format; you can simply concatenate the files together and it should play just fine.
Unfortunately, the first encoding scheme seemed to cause a lot of players to hiccup. We never figured out precisely what was going wrong, but after fiddling with different bitrates and samplerates, we didn’t make much progress.
We changed the encoding mode to VBR and players seemed to play the audio reliably, although most players reported an incorrect duration. It could be upwards of 15 minutes off for a 1-hour-long MP3.