The wave file format is one of the oldest formats used to store uncompressed and plain sound samples. Most computer users might have seen wav-files lingering around their Windows-computer, ready to be used as system sounds, etc. I came across the idea to calculate how many of the possible values a recording in 16 bits offers per second are actually used -- basically assuming that we use only the 4 bits to express high- and low-peaks and that there's mostly nothing in between as today's music is compressed to death. Over the journey of writing my own parser, I came across the realization that there's no sound code snipplet out there, explaining how to parse a wave file. So here it is.

Introduction, or, why all the other explainations suck

It's one thing to write a parser which works its way through a wavefile and which will work 90% of the timee. But what if it doesn't? Then you're fucked and you have to get your ass up to understand .wav files anyway. I learned this the hard way, this is my 3rd attmempt to write a parser. Previous attempts failed as they worked well when all the assumptions were met, but as soon as that's not the case... you know.

Yet Another Container Format

So let's start drilling down on the wavefile format.
Unfortunately, RIFF/WAVE seems often perceived as just another binary format. But it's more than that. It's just another binary container format. That makes an important difference as it explains not only some redundancies and seemingly useless header parts, but also does it give you a clue on how to implement at parsing routine.

The basic format is RIFF. And that's a pretty easy format to begin with -- it's header starts with 4 bytes stating "RIFF", followed by a 32-bit-integer expressing the chunk size. Tough not usual for sound files, it should be possible to store more than one chunk in a file. Anyway, the chunk header also contains a 4-byte description of the data to follow, in case of a wave soundfile it's "WAVE".
Now that the chunk is established, we have subchunks which again identify themselves with a header containing an identifier and the size information. For parsing a wave file two subchunks are of primary interest: the "fmt " (note the excess space character, this is not a typo) and the "data" subchunk. There may be other subchunks, depending on who created that specific file and this is where other implementation approaches start fucking up badly: They assume a fixed sequence of chunk header -- subchunk "fmt " -- subchunk "data". Which is fucking stupid (but easy to run into) as the format explicitly does not require this to be the fact, nor is there an informal agreement.

Which leads to my recommendations when it comes to implementing a parser:

That is pretty straight forward, once figured. Might I add, I needed three failed attempts before I came to this point and wrote my final parser. Mainly as I encountered a file having the following subchunks:
[ad001@machie ~]$ ./wavanalyzer Mrs\ Marple\ Theme.wav
Processing file 1 of 1: Mrs Marple Theme.wav
Read RIFF header.
File lenght announced with 9474732
Read chunk header
        ChunkID:        fmt
        ChunkSize:      16
Format information:
        AudioFormat:           1
        NumChannels:           1
        SampleRate:            44100
        ByteRate:              88200
        BlockAlign:            2
        BitsPerSample:         16
Read chunk header
        ChunkID:        smpl
        ChunkSize:      36
Read chunk header
        ChunkID:        inst
        ChunkSize:      8
Read chunk header
        ChunkID:        PAD
        ChunkSize:      3984
Read chunk header
        ChunkID:        data
        ChunkSize:      9470644
Read data portion.

For Shits and Giggles

And of course, just as with PDF (pdftool/use_embedfile.html) I couldn't help myself but write a little routine to either embed and/or extract chunks to or from raw files.

Further Reading

I admit, this article started with quite a rant regarding the state of what the Internet provides to the individual looking (that is, googling) for information regarding parsing wave files. As said, the first results are some kind of disappointing. They do provide some kind of working solution but they are not the best of answers. Anyhow, I don't want to throw the whole Internet under the retirement home bus.
Once you manage to escape the spitball-range that is stackoverflow, leave the blogosphere behind and re-enter the realm of web sites which look as if their CSS hasn't been vacuumed in a while, things change. You find web sites that were built for content, not looks. Two of them stood out, specifically

Stichworte:


Impressum