I began to get into bit depth and sample rate in my final mixing/mastering tut and although we are not necessarily digital audio engineers, some basic background information on what exactly bit depth and sample rate are is good information for anyone who is involved in digital music. It is something you are always working with, whether you know it or not, and is great background information to have whether it be to understand the basic building blocks of digital audio for personal gain, or just to be able to look smart should the conversation ever arise.
So the first thing to understand is that bit depth and sample rate only exist in digital audio. In digital audio, bit depth describes amplitude (vertical axis) and sample rate describes frequency (horizontal axis). So when increasing the number of bits we are using, we are increasing the amplitude resolution of our sound and in increasing the number of samples per second we are using, we are increasing the frequency resolution of our sound.
In an analogue system (and in nature) audio is continuous and smooth. In a digital system, the smooth analogue waveform is only approximated by samples and must be fixed to a limited number of amplitude values. When sampling a sound, the audio is split up into small slices (samples) and these samples are then fixed to one of the available amplitude levels. The process of fixing the signal to an amplitude level is called quantization and the process of creating the sample slices is, of course, called sampling.
In the below diagram, you can see a visualization of this where there is an organic sine wave playing for one second. It starts at 0 seconds and ends at 1 second. The blue bars represent the digital approximation of the sine wave where each bar is a sample and has been fixed to one of the available amplitude levels. (This diagram is of course far coarser than in real life.)
This one second of audio would have 44.1K, 48K, etc. samples going from left to right depending on the sample rate selected during recording and would cover -144 dB to 0 dB at 24 bit (or -96 dB to 0 dB at 16 bit). The resolution of the dynamic range (the number of possible amplitude levels for the sample to rest on) would be 65,536 at 16 bits and -get this- 16,777,216 if recorded at 24 bit.
So increasing the bit depth obviously greatly increases our amplitude resolution and dynamic range. What is not so obvious is where the increase in dynamic range occurs. The added dBs are added to the softer portion of the sound since the amplitude can never go above 0 dB. What this does is allow for more delicate sounds (e.g. a reverb tail trailing to -130 dB) to be heard which may have otherwise been cut short at a 16 bit and -96 dB sample.
Rounding and Truncation
In digital audio, each sample is analyzed, processed, converted back to audio and pushed through the speakers. When a sample is processed (gain change, distortion, etc.) in your DAW it is sent through a basic multiplication or division algorithm and the number representing the sample is changed in accordance and spat out. Simple if it weren't for the fact that we are not dealing with simple or round numbers (a gain boost of 1 dB requires multiplying by 1.122018454) so even an 8 or 4 bit sample can be easily extended well beyond our 24 bit sample space.
Since we have only 24 bits, these long numbers must be fit into that space. To do so, DSP's employ either a rounding or truncation of the least significant bit (LSB - the last bit in a digital word - e.g. the 16th numeral in a 16 bit sample). Rounding is fairly straight forward and acts as you may expect in basic arithmetic. Truncation simply drops the information after the LSB without any further analysis.
This is obviously problematic in that both processes introduce error into the equation and these errors multiply as process upon process is accumulated through the signal chain. The positive side to this is that the LSB in a digital word is the softest amplitude of that word, so in a 16 bit sample the error is at -96 dB and at -144 dB for a 24 bit sample. Even so, the difference between a DSP with a good architecture and one that sounds awful largely rests in how the DSP manages these long words and compounded processes.
So, we now know that DSP's are necessarily riddled with error; that even the gross approximations they make of a naturally occurring phenomena are themselves riddled with error. These errors not only make the audio sound less pristine as otherwise, but can introduce audible artifacts of there own.
To counteract these artifacts, a type of low amplitude, mathematically calculated noise (randomness) called dither is applied to the signal. This randomness breaks up any periodic errors in the signal which may create new frequencies or other artifacts. The dither noise is very low amplitude and although it is slightly audible at high levels it still creates a final product far better than without.
A waveform showing the effects of dither. Dither has been applied to the top waveform.
One thing to note about dither is that the noise is accumulative. In adding noise to the signal, you are essentially decreasing the signal to noise ratio (the ratio between usable signal and noise). If done repeatedly, this ratio continues to decrease while adding further randomization to a signal that no longer needs it. This is why dither is always applied as the last step of the mastering process and only applied once.
Dither has a relatively colorful history:
One of the earliest [applications] of dither came in World War II. Airplane bombers used mechanical computers to perform navigation and bomb trajectory calculations. Curiously, these computers (boxes filled with hundreds of gears and cogs) performed more accurately when flying on board the aircraft, and less well on ground. Engineers realized that the vibration from the aircraft reduced the error from sticky moving parts. Instead of moving in short jerks, they moved more continuously. Small vibrating motors were built into the computers, and their vibration was called dither from the Middle English verb "didderen," meaning "to tremble." ...modern dictionaries define dither as a highly nervous, confused, or agitated state. In minute quantities, dither successfully makes a digitization system a little more analog.
- Ken Pohlmann, Principles of Digital Audio
According to theory, 44.1K samples per second should be more than enough to cover every frequency within (and slightly outside of) the human range of hearing. You may have come across the Nyquist Theorem before which states that to avoid aliasing (a type of distortion) and to accurately recreate all frequencies during sampling, one must sample at least twice the rate of the highest frequency contained in a given signal (this theorem applies to media outside of audio, but we won't get into that here).
The human ear can supposedly hear up to 20K (most studies indicate it is more around 17K at best) cycles per second (Hz) so accordingly, a sample rate of 40K samples per second should be enough to hear every frequency possible. 44.1K is industry standard, was made that way for several reasons, and ultimately chosen by the oligarchy known as Sony.
To make a long story short(er), digital audio samples necessarily have to be above the Nyquist Frequency as in practice the samples must also be low pass filtered during A/D and D/A conversion to avoid aliasing at that step as well. The gentler the slope of the low pass filter, the easier (read cheaper) it is to make. Thus, an audio signal with a low pass filter having a gentle slope covering 2 kHz, for example, and starts at 20 kHz to let through the entire frequency spectrum, must be sampled at 44K samples per second (20K (highest frequency)+2K (slope of LPF) x 2 (Nyquist Theorem) =44K).
Ultimately, the 44.1K standard was chosen after a struggle between Sony and Philips (they both had similar final proposals) and was chosen based on the mathematics behind audio sample rate and video tape anatomy; so that audio and video could reside on the same video cassette in a good fidelity to price ratio. However, 48K is now currently the standard for video related audio. CD audio remains at 44.1K.
This image shows the sample level of an 'organic' kick drum recording in Logic. You can see how the sound has been sampled and quantized from the sharp rectangular approximations of the waveform. The original drum sound would have had no such distortion.
Can You Hear It?
Some people claim to be able to hear a distinct difference between a 44.1K sample rate and, for example, a 96K sample rate. Most people attribute this difference to the increased bandwidth being produced (96K would represent frequencies up to 48 kHz). Although I too have noticed subtle clarity changes when oversampling, it is incorrect to think these differences are present due to the higher frequencies present (or at least they aren't directly related).
It has been shown through various tests that it is in fact the low pass filtering that creates audible differences and at higher sample rates those LPF artifacts fall outside of the audible spectrum. In increasing the filter cutoff from 22 kHz to 48 kHz when sampling, we decrease the demand on the filter to act in the audible range, thereby making sure more if not all filter artifacts remain in the ultrasonic spectrum.
This clears up the audible spectrum and gives the illusion that a higher bandwidth/sample rate creates a more pristine audio. Although a more pristine audio is created, it is an effect of the sample rate being high enough to counteract the artifacts of a poorly designed (unfortunately a standard) low pass filter during A/D and D/A conversion.
So, that about covers it. I realize this may have been more of a lesson than a tutorial, but it is good information to have non-the-less. Knowing the tools you are working with is never a bad thing and this is about as detailed as you will ever have to know the subject for any practical purpose as a music producer. Mastering engineers and audiophiles may need to look elsewhere however ;)
Until next time.