Digital Audio Systems
We all have our CD players, MP3 players, and DVD audio is beginning to make an appearance. But, have you ever taken a thought to how all this works? Ever wondered what the “bit rate” and “sample rate“ had to do with the price of fish? Well, if so this post will hopefully answer a few questions. It’s a bit of a slog reading through this so I suggest breaking it down a bit to save your head from hurting. Oh, and if you do have any questions please post them up. Oh.. and excuse my MS Paint skills.. or lack of!
Digital Audio Basics
Right, firstly we have our source sound. This could be our voice, an instrument etc. This sound needs to be converted into a signal we can work with. To do this we need a transducer that will take real world sound and convert it into an electrical signal. In most cases this will be some form of microphone. The problem is, the signal that we will get from the mic will be an analogue signal. So… we need to convert it into a digital signal. To do this we use a little device called an Analogue to Digital Converter. The overall quality of this device is critical to the final resulting digital sound. Use a poor quality cheap converter and you’ll get a naff sound. This is why on digital mixers you rarely have more than a few analogue input channels and why a decent digital desk (such as Yamaha’s O2R96) cost so blummin’ much! Once the signal is in the digital domain we can then use our computer to record it, manipulate it and do all sorts of other weird and wonderful things!
Analogue to Digital Conversion
This is the science bit peeps. Lets imagine for a second we’re recording a simple sine wave. This sine wave has a fixed amplitude (how loud it is) and a fixed frequency (the musical “pitch”).
It might look a little like the following:
For us to convert it into the digital domain we firstly need to snip the wave up into regular segments horizontally and vertically on the X and Y axis. If we now take a “sample” recording at each of the points we get something that looks a bit like this:
This is a very basic approximation of our analogue source. How might we make this a bit better? Well, if you look at the pic you can see that the wave is staircase like. To get a better representation of the source wave we need to increase the number of steps (samples) on the staircase… To something like this perhaps:
The increased sample rate has left us with a representation that is far closer to the original. But, it’s far from perfect. The signal is still looking a little staircase like and we’re still having huge jumps on the Y axis. The Y axis showing the amplitude needs increasing as well. So, if we increase our bit depth and up the sample rate again, then we get the following:
This is looking much more like our original signal. The bit depth has given us a much better representation. This is very important to us, since the less like the original sound our digital representation is, the more hiss and noise we will experience when we listen to it. This noise is known as “Quantise Error”, and is unfortunately an integral part of any digital audio system. The best way of avoiding it is to ensure that the bit depth is as high as reasonably possible. To put it into perspective, if we have a depth of 8 bits, then we would have a possible 256 values or steps on the Y axis. If we boost that to 16 bits then we could have a possible 65,536 steps! This will obviously go a long way to avoiding any quantise error. Just to reiterate though, the quantise error noise will always be there. We can’t remove it, we can just minimise it.
So, for this section we have established the following. The higher the bit depth and sample rate is, the better the digital representation will be, therefore the better it will sound. If you’re head’s had enough for one day, stop here.. otherwise carry on…
A clever dude called Harry Nyquist
So our final sound can always be improved by upping the sample rate (x axis) or the bit rate (y axis) can’t it? Well, yes and no. Like most things in life we have a law of diminishing returns. You do get to a point where the signal is that close to the original source the human ear is unable to detect the difference. The other problem we have is the size of all this data. With recording studios, audio sample rates of 96khz aren’t uncommon… Let’s put this into perspective. With a typical piece of music being 4 minutes long, each mono recording we make (each channel) at 96khz will take up 46,080,000 bytes. That’s 46 meg of space per instrument, and you could have 16+ channels on the go at once. This is before we’ve even started to get into edits, second and third takes etc. Hell, our final recording where we master it down to 2 channels would end up being 92,160,000 bytes. 92 meg! Imagine if every tune you had was 92 meg or more. IBM, Western Digital et al would love it! This is overkill though, we don’t really need to use sample rates of this order unless we’re in a pro studio. Especially, as the human ear has limits…
The human ear is a fantastic piece of kit. It works over a massive range, of both frequency (pitch) and dynamics (volume). In terms of frequency, a good human ear can hear sounds ranging from 20hz to 20khz. From sounds that make the floor move, to high pitched sounds like nails down a blackboard, your ears do a fantastic job of picking it all up no matter virtually how loud or quiet it is. Hell, I could rave on about our ears for hours but I won’t. So what does this have to do with our digital audio? Nyquist’s Theorem.
Nyquist’s Theorem states that to reproduce an analogue signal properly in the digital domain, that the sample rate must be at least twice as high as the highest frequency in the analogue signal. So, if our piece of music has a highest frequency of 10khz, we must ensure the sample rate is at least 20khz. With the human ear being able to detect frequencies up to 20khz, it makes sense that we take samples no higher than ~40hkz, else we’re wasting space since the ear won’t be able to hear it.
It’s no coincidence that the sample rate of CD audio is 44.1khz / 16 bit. This sample rate offers good quality audio, covers the best of human ears, and is also tied into the fact that CD mastering used to use video equipment, but that’s something for another day.
Oversampling
So what about these stupidly high sample rates of 192khz and this 2x or 4x playback oversampling stuff I hear you say? Well, remember earlier I mentioned that all digital audio systems suffer from quantise error? Well, oversampling is yet another way of minimising this hiss and noise. The amount of quantise noise in a system is always fixed. It can’t be removed in any way, but it can be “hidden”. Oversampling is a clever way of using the ears limitations to hide the noise as best as possible in a way that we can’t actually hear it.
Imagine a rectangle. Lets say for example it’s 4 units wide and 2 units high. This gives us as area inside the rectangle of 8 units (4x2). Now, suppose this area inside the rectangle is quantise noise, and the quantise noise is fixed. That means our noise level is 8 units. Still with me?
Now imagine if we rearrange the rectangle so that instead of being 4x2, it was 2x4. The area inside the rectangle is the same isn’t it? Now what if we rearranged the rectangle again so it was 8x1. Still the same area…
Imagine that we put those shapes onto a graph. The X axis is frequency, the Y axis is amplitude. Which of those shapes has the lowest amplitude?
The 8x1…
Now lets suppose the upper threshold of hearing was 4 units on the X axis. Since the threshold is 4, anything above 4 wouldn’t be heard. We’ve just halved the amount of quantise noise that’s audible! This would be 2x oversampling. We could take it even higher – bump it up to 16x0.5 or 32x0.25 and reduce the audible level even further!
In case that's as clear as mud, here's another couple of pics illustrating it in a similar way:
Imagine the first piccy is our typical CD player. 16bit, 44.1khz sample rate. The blue area in the graph is the noise. Now we can't get rid of that - though we can "hide" it. If we oversample.. say twice the rate, then the noise level has to be reduced since the amount of noise in the system as a whole is constant - it cannot be altered.
The beauty of this is we can't hear above the red dotted line. So the noise level is effectively reduced, or hidden. The higher you oversample, the lower the quantise noise. Once the signal is oversampled, it has to be brought back down to 44.1khz again else it would play at a faster speed. A DAC does this, and in the process reduces our noise overhead.
And that is the end of the analogue to digital process. Once it’s converted you can store your audio assuming you have enough room for it all… if not, it’s time for compression… which I think we’ll leave for another day!
Hope you found that useful, and you could understand it. As I said before, if anything puzzles you chuck a question up and I’ll do my best to explain it a different way.