Digital Audio Theory Article
So many people do not fully understand what a BIT is and even more don't now how it relates to audio... Please bare with me as I begin to explain how they relate to audio and go further into digital audio theory hopefully without boring or loosing people with the concepts which can be hard to grasp at first. If you want any further digital audio theory topics covered, just let me know and I'll look into revising this article.
A bit is a 0 or a 1 since a computer is pretty much just millions of switches a computer works with ON and OFF states, this is called BINARY since it is numbers in a Base 2 format.
In this table you see some 2 bit words and their equalant numbers in the BASE that we are used to seeing numbers in. They are two bit words since they have two ON or OFF states. I will stray from this for a little while to explain the audio side for a while....
In this picture we see I'm not a very good artist LOL.. besides this I have tried to show that BITS are the Y axis on a graph and that each BIT encodes 6 DB of DYNAMIC RANGE.... The X axis is of course TIME which is set by the sample rate.
Most people know how a television works. A dot is painted onto a screen, the dot moves very fast across the screen painting the picture. Remember how when a bright light is shone in to your eyes ?? After the light has been turned off you can still see a blurry dot, this is called persistence of vision. This allows the brain to see the fast moving dot on a TV screen as a full picture !! dogs don't have persistence of vision and cannot see pictures on a TV ! The refresh rate of a TV must be more than 30-40 hertz for a picture to be shown (that's why mains power is 50 Hz, so light bulbs don't flicker). In other words more than 30 - 40 pictures/snap shots must be painted for our eyes to see a constant picture... Getting back to audio now. Samples are like snapshots of sound and just like early Projection Movies in black and white if they are not played back at a fast enough speed your ears will hear the gap. Not sure what the speed of the ear is but if someone yells in to your ear you hear a ringing so you do have persistence of hearing. More on sample rates later on... Back to BITS....
A 8BIT word looks like this "01001000" .... Now as we add an extra BIT to a word length we DOUBLE the possible combinations, in terms of audio we double the quantisation values or the number of Y values we can round off the level of the audio to. In digital realms you cant have 1 and a HALF it either has to be 1 or a 2... Hence the term quantisation or rounding. The level of the audio has to be rounded to the nearest value allowed in a BIT.
As you can see by this table by going from 16 BIT audio to 24 BIT audio, we have gained 256 TIMES the accuracy of the word lengths (samples). That's why all professionals will record in 20 bits or more. On a side note, most professionals will record in the same sample rate that the final product will be mastered to eg (44.1) even when they have the capability of recording in 96k or even beyond that. There are many reasons why which I wont go into here. This is now changing as DVD allows for higher sampling rates and capturing the original sound in higher sampling rates leaves the engineer more flexibility later on.
8 BITS make up 1 Byte of storage.. 16 BITS take up two bytes of storage and 24 BITS take up 3 bytes of storage. This is generally true, but there are some exceptions. I wont make it more complicated by going into this in more depth.
dB is a deci Bel.... 10 deciBels make up one Bel... One decibel is approximately
equal to the smallest change in volume of sound that the normal ear can
detect. The scale of decibels is logarithmic, every increase of 10 dB
representing an increase of about 300% in sound. The deciBel is a LOGARITHMIC
scale and you cannot treat them like normal values when adding and subtracting
What is Dynamic range
Dynamic range represents the difference between the maximum signal that
can be recorded (0dB / DFS) and the noise floor of your system. The noise
floor is the noise present in your system without any signal present.
A system with a high dynamic range will be quieter than one with a
1 Bit can encode 6dB of Dynamic range. Therefore a 24-bit system theoretically
has a dynamic range of 144dB (24 * 6 = 144) and a 16-bit
Current analog-to-digital converters typically produce a full-scale input voltage with an input of +7dBu. If they were to have 144dB of dynamic range, they would have to be capable of resolving signals as small as 10 nano-volts. Thats 10 one-billionths of a volt! Transistors and resistors produce noise in this range just by having electrons moving around due to heat. Even if the converters could be perfectly designed to read these levels, the low noise requirements of the surrounding circuitry such as power supplies and amplifiers would be so stringent that they would either be impossible or too expensive to build.
An average RMS of 120dB dynamic range in 24bit converters is about as good as it gets to this date with mass produced converters.
To sample or graph a SINE wave you must have at least two points or co-ordinates in order to guess what the frequency is.. For example you need the ORIGIN (y=0 normally) and either a MAXIMUM or a MINIMUM value to guess the frequency. Because of this simple fact, to record a frequency you must have at least double that number as the sampling rate.. EG. To record a sine wave of 50 Hz you need a MINIMUM of 100 samples per second to record the sine wave. This basic fact which governs the minimum sampling rate is called the Nyquist theory. The Nyquist frequency is the highest frequency that you can record with a given sample rate. In the case of a recording with 44,100 samples per second (the sampling rate of CDs) the Nyquist frequency is 22050 Hz. I could go in to drawing pictures to prove this but I wont because you have seen the quality of my drawings above. LOL. The Nyquist theory is not something that you will hear much about but is good to know what it is and how it effects things in real life situations.
Guitarist use the beat frequency to tune guitars with harmonics. If a guitarist picks a harmonic on the guitar the string can only vibrate with waves corresponding to the length of the string, by placing a finger on the string at a particular fret you create what is called in physics as a Node and Anti-Nodes. When playing two harmonics that are very close together you hear a rise and a fall in the perceived level of the notes due to the BEAT frequency. I will now explain what the BEAT freq is.. The difference between any two frequencies will create a new frequency.. The formula is F1-F2=BEAT ..... Back to relating this to Dynamic range... The BEAT frequency in simple terms is for every Hz that you go over the Nyquist frequency you will get a artifact that equals the difference between the recorded frequency and the nyquist frequency.. WOW that is hard to explain in simple terms.. For example if u record a 22,051 Hz sine wave with a 44.1 Khz sample rate you will get a 1 Hz rumble in your audio due to going over the Nyquist frequency. If you record a 22,080 Hz you will get a 30 HZ rumble and so on. Once again I will spare you of the draws I could do to prove this with graphs.
When mastering tracks to go on to CD's the material is EQ' ed so that nothing over 18-19 KHz or very little reaches the DA converters. Early CD's and CD players where said to be HARSH to the ears since they used What's called a BRICK WALL filter to cut all frequencies off after a set point, sometimes as low as 15KHz. I'm sure you have heard people complain about Cd's and say they are un-warm and harsh, this is true of cheap cd players and older models.. This abrupt cut off wasn't natural and the ears picked it up even though very few adults can hear past 16 KHz... I should expand on this or newbies to digital audio will argue on this point, whilst a human cannot hear above say 16 K we can sense if frequencies are present or not in recordings. High end CD players actually recreate the harmonics in the DA conversion right up to 30 KHz... Pioneer call this "Legato Link" technology if you wish to look it up. A new born baby can hear to 20 Khz, as the baby gets older the ears slowly loses this range. The more loud rock concerts you attend the less you will be able to hear high frequencies, every time you hear your ears ringing and they seem quite then you have done damage to your ears.. If the sound is filtered too much and too steeply the sound is very harsh and if it is not filtered enough you will get rumbles in the recordings. Some AD and DA use 180KhZ brick wall filters to help block RFI and EMI interference, this one way how internal audio cards are much quieter than cheap sound cards.
I wont go into too much depth about Dithering since many sites explain it already. Here's a link.
When Dithering from 24bit to 16 bit the information stored in the last 8 bits is moved into the top 16 bits which are the ones which we want to keep. Truncating then throws away the last 8 bits. If you truncate before dithering you lose some of the audio information you have recorded. IE You throw away some of your quality ! If you dither before truncating, your adding small amounts of random noise to the audio to push the audio information up into the top 16bits... When the digital audio is truncated most of the noise is thrown away, although some of it will be kept. Dithering gives you much smoother and pleasant audio to listen to after you have reduced the word-length of the audio. Read Noise shaping to learn about advanced dithering techniques.
Noise shaping is dithering but taking in to account the Fletcher Munson graphs. These Fletcher Munson graphs show the areas where the human ear is most sensitive and where it is also least sensitive to certain audio frequencies... By only adding noise in the areas where our ears cannot hear as well, or our ears cannot hear at all, the noise that is added in dithering is pretty much completely inaudible. This is made even truer as dither is normally around 90db below the maximum level of material in 16 bit audio. There are many different noise shaping techniques in use and depending on the recorded material a different one may be better than another one. That's one reason why mastering should be left to professional studios who can dither properly and know which noise shaping technique will work best for your material.
Very small amounts are added.
When calculating signal levels and comparing to dB values you must use this formula because the decibel is a logarithmic scale.
N(dB) = 20(LOG A - LOG B)
This is as far as I am going to go in this tute for now..... Hope u learnt heaps and understood most if not all.. Any comments about this feel free to email me.