I had a woulderful recording of a thunderstorm on a CF drive. Pronounciation on _had_. I accidently deleted it -- confusing it with an other file on the same CF drive containing nothing but the sound of rain.

Identifying the problem

What does rm do? It does not literally destroy the file, it just erases it from the file allocation table (FAT), marking the bytes formerly known to be 1001.wav as nothing, i.e. free space. Anyhow, the file itself still is there. All I need to do is find it and copy it.

Lucky assumptions

Since i recorded starting with a blank disk and without deleteing any recorded files, I do assume that the file will not be fragmented -- meaning that if the file is, say, 1000 bytes, they will be linearly placed from an start offset (let's call it o) to o+1000.
Otherwise this would turn somewhat ugly as we will see when looking at the structure of a wav-File.

Given knowledge

What's known about my deleted file? It contains wounderful noise made by a natural phenomenon. Ok, not that helpful. What else? It is a wave file in 16 Bit and 44.1 kHz, stereo. Start with the beginning: It's a wave file, so it will probably have a decent file header. Wikipedia brings confirmation: http://de.wikipedia.org/wiki/RIFF_WAVE, the German wikipedia was much more of a help than the English. It tells us that a) I should be looking for something such as

as a header with b) the dots being the four bytes of the total file length -- which is a great information to have for recovery.


Now, here comes the painful part. I connect the flash drive to my FreeBSD machine, it is /dev/da0, the FAT partition is /dev/da0s1. I then made a (shiver) hexdump of the device -- in order to find the beginning of the wave file. Correction -- i will find the beginning of each wave file, but still good.

sudo hexdump -C /dev/da0s1 | less
Using less, I can search for the file header by typing /RIFF.
00044a00  52 49 46 46 f8 2f ad 11  57 41 56 45 66 6d 74 20  |RIFF./..WAVEfmt |
00044a10  10 00 00 00 01 00 02 00  44 ac 00 00 10 b1 02 00  |........D.......|
00044a20  04 00 10 00 73 6d 70 6c  cc 01 00 00 00 00 00 00  |....smpl........|
00044a30  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
...there's a wave file. Looks good, altough we still don't know, what's in it. We need to test the contents. Best would be to listen to it. So we need to extract at least a part of that file (say, 10M) and we need a player who will not check for a complete file before replay. But first things first -- file extraction. What do we know? We want 10M starting at offset 0x44a00 put to a file. What a task for dd!
sudo cat /dev/da0s1 | dd skip=0x44a00 bs=1 count=10M > wavfile
Yeah, unneccessary use of cat, but dd obeyed my orders and got degraded to #2 in the chain of command. Anyhow, mplayer (I admit it, a not-so-standard-tool) played the file and I was able to identify files from listening to their first 10M. Once I found the file I searched (let's say it would be the first one to keep the numbers in this example accurate), I had all information necessary for full file recovery since by now, I know file offset and length! Remember the length to be the four bytes between RIFF and WAVE in the above hexdump. Exactly, the file length is that number - 1. But hey, close enough.
So, my final rescue was
sudo cat /dev/da0s1 | dd skip=0x44a00 bs=1 count=0x11ad2ff8 > wavfile


If the file would have been fragmented, it would have been a PITA to assemble it. Why? Because the data frames inside the wave file would have been hard to find. They start with a "data" keyword followed by a 4-byte-length and then there would have been only values, values and values. And it is quite hard to distinguish noise (e.g. random bytes) from sound just looking at it's values in PCM representation. Yes, there would be strategies to assemble the file. But they are not half the fun of this solution.

Use for other purposes

What's been shown here to work for wave audio will sureley also work for other files. All you need to know is the file header and how to guess the file's length. Bonus hint: It seems plausible, that many programs won't care if an input file has tailing nonsense bytes and will just stick with what they can understand. So -- give It a try, take an old empty CF card and look for pictures. First step: Get to know the file header...


Why I did it can be heard in the file down/thunder-20140613.wav down/thunder-20140613.wav.mp3.