Here are some helper programs to decode, display, and encode US style Caller ID signals. They don't include any code to get the audio data into the computer - that's your problem - but you shouldn't have any problems doing that or converting to other processors. If your processor can handle 8-bit sampling at 7200 or 9600 per second, and can hack twenty-one multiply and accumulates between each sample, no worries...
nbfsk decodes individual bytes from the Bell 202 coded audio,sampled at either 7200 or 9600 samples per second.
nbcid decodes the message itself and displays the results.
cidmaker creates audio files with your choice of data and optionally overlaid noise, for testing purposes.
AVR machine code fsk decoder In a fit of madness, I started a project to port the x86 C code above to an AVR Mega8 chip. Assembly code, naturally. In simulation, using the supplied AVR studio stimulation file as sampled data, it produces exactly the same data output as nbfsk above, but only as far as the output of the individual bits. I never had the time to finish debugging the UART - though I think the sensible approach would be to feed the bit pattern straight into the serial input of the AVR and let the hardware do the hard work.
It's not easy. The AVR is no DSP and while 8*8bit multiplies are straight forward and fast, 16*16 are a bit slower. And the DSP needs that precision. Also, I wanted to keep as much of the internal registers unused as possible, to avoid the need to swap them in and out during interrupt - slow... Finally, though the AVR is no slouch, at 16MHz it doesn't have many cycles to do the work. I've got the processing time down to about 50uS on each sample - they occur 106uS apart - which I think is reasonable. It wasn't easy...
I'm particularly proud of the optimisations to the FIR filter routine. I've never seen an FIR filter implemented this way (except on nbfsk :) and I guarantee it *will* make your head hurt. But on most hardware, the killer for time on an FIR is the multiply accumulate shuffle everything along one stage. I can't do much about the multiply-accumulate, but this method has *no* memory moves once the data is in.