2.4. Quantization#

If you recall the introduction to this chapter, we saw that digitized audio has two properties: the sampling rate (already covered in this chapter), and the precision. This section concerns the precision of digital audio, but what exactly does this mean? To understand this, we’ll first need to take a detour to see how computers represent numerical data.

2.4.1. Background: digital computers and integers#

These days, most humans use the Hindu-Arabic (or decimal) numeral system to represent numbers. With decimal digits, we use the ten symbols \(0,1,2,\cdots,9\) to encode numbers as combinations of powers of ten (the base or radix of the system). For example, a number like \(132\) can be expanded out in terms of powers of 10:

\[ 132= \red{1} \cdot 10^2 + \red{3} \cdot 10^1 + \red{2} \cdot 10^0. \]

There’s nothing magical about the number 10 here: it was probably chosen to match up with the number of fingers (ahem, digits) most people possess. Any other base can work too.

Of course, computers don’t have fingers, so they might find decimal to be difficult. Computers do have logic gates though, which can represent true and false values, which we can interpret as \(1\) and \(0\) respectively. This leads us to binary numbers, which only use two symbols to encode numbers as combinations of powers of 2, rather than combinations of powers of 10.

In our example above, the number \(132\) could be represented as

\[\begin{align*} 132 &= \red{1} \cdot 128 + \red{0} \cdot 64 + \red{0} \cdot 32 + \red{0} \cdot 16 + \red{0} \cdot 8 + \red{1} \cdot 4 + \red{0} \cdot 2 + \red{0} \cdot 1\\ &= \red{1} \cdot 2^7 + \red{0} \cdot 2^6 + \red{0} \cdot 2^5 + \red{0} \cdot 2^4 + \red{0} \cdot 2^3 + \red{1} \cdot 2^2 + \red{0} \cdot 2^1 + \red{0} \cdot 2^0, \end{align*}\]

or, more compactly, as \(10000100_2\) (where the subscript lets us know we’re in binary). We refer to each position as a bit (short for binary digit).

For various technical reasons, computers don’t generally support arbitrarily large numbers. Instead, integers come in a few different “sizes” depending on how many bits we’ll need: usually, 8, 16, 32, or 64 bits. The example above is an 8-bit number, but it could just as easily have been written in 16-, 32- or 64-bit representation by using leading zeros: \(0000000010000100_2\) for 132 in 16-bit form.

2.4.1.1. Negative numbers#

An \(n\)-bit number can represent \(2^n\) distinct numbers, but which numbers? We can interpret the bit representation as the numbers \(0, 1, \cdots, 2^n-1\), but this doesn’t provide an obvious way to represent negative numbers.

There’s an elegant solution to this problem if we imagine arranging binary numbers around a circle, as illustrated below for 3-bit numbers. We can think of counter-clockwise movement as incrementing by one, and clockwise movement as decrementing by one. In this view, the numbers beginning with 1 can be seen as negative numbers: \(111 = -1, 110 = -2, \dots\), and the numbers beginning with 0 are the non-negative numbers as discussed above. It’s beyond our scope here, but this representation of integers, known as two’s complement, has many nice properties, and is implemented by almost every modern computer for doing integer arithmetic.

Visualization of two's complement for 3-bit integers

Fig. 2.6 Two’s complement representation of 3-bit integers (-4, -3, …, 3).#

To summarize, an \(n\)-bit two’s-complement integer can represent \(2^{n-1}\) distinct non-negative numbers (\(0, 1, \dots, 2^{n-1}-1\)), and \(2^{n-1}\) distinct negative numbers (\(-1, -2, \dots, -2^{n-1}\)). For example, an 8-bit number can take \(2^8 = 256\) distinct values: \(-128, -127, \dots, -2, -1, 0, 1, 2, \dots, 127\).

This is a relatively minor detail in the bigger picture of digital signals, but it can help to understand why quantized signals look the way they do (as illustrated below).

2.4.2. Defining precision and quantization#

Precision, also known as bit depth, refers to how many bits are used to represent each sample in a digital signal. While we typically think of signals as taking on continuous real values, computers quantize these values to be drawn from a fixed, finite set of numbers.

High precision means that we have more distinct values, and can therefore faithfully represent smaller differences and get a more accurate representation of the underlying continuous signal. However, doing so comes at a cost: higher precision means we’re storing and transmitting more data. There’s a trade-off to be made between storage cost and the perceptual fidelity of the quantized signal. Investigating this thoroughly is beyond the scope of this text, but interested readers are encouraged to look into perceptual coding and lossy audio compression to learn more.

2.4.3. The effects of quantization#

Fig. 2.7 illustrates the effects of varying levels of quantization on samples from a continuous waveform. High precision values (16-bit) provide a good approximation to the original wave, but this approximation deteriorates as we reduce the precision. In the extreme case (1-bit quantization), each sample can take only one of two values (-1 or 0), which results in a highly distorted signal.

Fig. 2.7 A continuous signal (solid curve) is sampled (dots) and then quantized to several different bit depths. At low depths, the sample values are noticeably different from the original signal.#

2.4.4. Dynamic range#

Fundamentally, quantization reduces the number of distinct values that can be observed in a signal. Rather than the full range of continuous values, say voltages in the range \([-V, +V]\), we instead divide up the range into pieces of constant (quantized) value. For uniform quantization into a \(n\)-bit integer representation, we have \(2^n\) distinct numbers representing different values between \(-V\) and \(+V\). If \(q\) represents an \(n\)-bit quantized integer value \(-2^{n-1} \leq q \leq 2^{n-1}-1\), then we can map \(q\) to a value \(v(q)\):

(2.6)#\[v(q) = V \cdot \left(\frac{q}{2^{n-1}}\right),\]

as illustrated in Fig. 2.7. This process is lossy, in that not every continuous input value can be realized by a specific quantized number \(q\), and this fact raises many questions about how accurate the quantization process is.

We won’t go into all the details of analyzing quantization error, but one commonly used method to evaluate a quantization scheme is to measure its dynamic range: the ratio of its loudest value \(v_+\) to its quietest (non-zero) value \(v_-\), measured in decibels. Typically, the values in question are voltages, which you may recall from chapter 1 are proportional to sound pressure, and that intensity is proportional to the square of pressure (and therefore, the square of voltage). As a result, to compute dynamic range \(R_\text{dB}\) for quantized voltages, we’ll need to square them:

(2.7)#\[R_\text{dB} = 10 \cdot \log_{10} \left(\frac{v_+}{v_-}\right)^2 = 20 \cdot \log_{10} \frac{v_+}{v_-}.\]

To use this idea, we’ll need to calculate the smallest and largest (absolute) values attainable by (2.6). The smallest value, \(v_-\) is attained by \(q=1\), so that

\[v_- = v(1) = V \cdot \left(\frac{1}{2^{n-1}}\right).\]

The largest absolute value, \(v_+\), is attained by \(q= -2^{n-1}\):

\[v_+ = \left|v\left(-2^{n-1}\right)\right| = V \cdot \left(\frac{2^{n-1}}{2^{n-1}}\right).\]

When we form the ratio \(v_+ / v_-\), note that there is a common factor of \(V / 2^{n-1}\) that can be cancelled, resulting in:

\[\frac{v_+}{v_-} = 2^{n-1}.\]

Plugging this value into (2.7) yields

(2.8)#\[\begin{split}\begin{aligned} R_\text{dB} &= 20 \cdot \log_{10} 2^{n-1}\\ &= (n-1) \cdot 20 \log_{10} 2\\ &\approx (n-1) \cdot 6.02~[\text{dB}]\\ \end{aligned}\end{split}\]

Tip

Equation (2.8) gives rise to a commonly used rule of thumb: each bit of precision adds about 6 dB of dynamic range.

Equation (2.8) allows us to measure the dynamic range purely as a function of bit depth. The following table gives dynamic range values for several common choices of \(n\).

Precision [bits]

Dynamic range [dB]

\(n=1\)

0.0

\(n=2\)

6.02

\(n=4\)

18.06

\(n=8\)

42.14

\(n=16\) (compact disc standard)

90.31

\(n=24\)

138.47

\(n=32\)

186.64

2.4.5. Floating point numbers#

The integer quantization scheme described above is commonly used for storing audio signals (e.g., in .wav format), but it is sometimes not the most convenient representation for processing audio signals. Instead, most computers use floating point representations for numerical data, and have dedicated hardware for carrying out mathematical operations on floating point numbers. Floating point numbers differ from integer representations in several key ways, but they are the standard choice for representing fractional numbers and (approximately) continuous values. This is achieved by using a non-uniform spacing of numbers encoded in a format similar to scientific notation. Note that floating point numbers are still technically quantized, but the quantization level is practically so small that we treat them as though they were not quantized at all.

Floating point representations are defined by the IEEE 754 standard, which is quite a bit more detailed than we need to get into here. Rather than calculate the minimum and maximum values from the specification—which is doable, but tedious—we’ll instead see how to retrieve this information from the computer directly.

Like most numerical computation libraries, Numpy provides an interface for getting information about the number representations used behind the scenes. For floating point numbers, this is provided by the np.finfo function. This function accepts as input a data type (e.g., np.float32 for 32-bit floating point), and returns an object containing various constants such as the largest and smallest numbers that can be represented. The code fragment below shows how to use this information, combined with (2.7) to compute the dynamic range for 32-bit floats.

# Get the floating point information from numpy
print('32-bit floating point\n')
float32_info = np.finfo(np.float32)
print('Smallest value:\t ', float32_info.tiny)
print('Largest value:\t ', float32_info.max)

# Compute the dynamic range by comparing max and tiny
dynamic_range = 20 * (np.log10(float32_info.max) - np.log10(float32_info.tiny))
print('Dynamic range:\t {:.2f} [dB]'.format(dynamic_range))
32-bit floating point

Smallest value:	  1.1754944e-38
Largest value:	  3.4028235e+38
Dynamic range:	 1529.23 [dB]

Compared to the integer representations listed above, 32-bit floats provide a substantially higher dynamic range Given the same amount of storage (32 bits), integers have a dynamic range of about 186 dB, compared to floats with 1529! This is far in excess of the dynamic range of human hearing, and often perfectly adequate for most signal processing applications. However, if for some reason 1529 decibels is not enough, we can repeat this calculation for 64-bit floats (sometimes called double precision).

# Repeat the above, but for 64-bit floats
print('64-bit floating point\n')

float64_info = np.finfo(np.float64)
print('Smallest value:\t ', float64_info.tiny)
print('Largest value:\t ', float64_info.max)

dynamic_range = 20 * (np.log10(float64_info.max) - np.log10(float64_info.tiny))
print('Dynamic range:\t {:.2f} [dB]'.format(dynamic_range))
64-bit floating point

Smallest value:	  2.2250738585072014e-308
Largest value:	  1.7976931348623157e+308
Dynamic range:	 12318.15 [dB]

On most modern computers and programming environments, 64-bit floating point is the default numerical representation unless you specifically request something else. All of the “continuous-valued” examples in this book use 64-bit floating point. However, many digital audio workstations (DAWs) and other audio processing software will provide the option to use 32-bit floating point because it reduces the amount of storage and computation necessary, and is still sufficient for most use cases.

2.4.6. But what does it sound like?#

We can simulate quantization numerically by defining a quantize function as below. We’ll then be able to synthesize a pure tone, and hear how it sounds when quantized to varying bit depths.

Warning: the lower bit depths can sound quite harsh.

import numpy as np
from IPython.display import display, Audio

def quantize(x, n_bits):
    '''Quantize an array to a desired bit depth
    
    Parameters
    ----------
    x : np.ndarray
        The data to quantize
        
    n_bits : integer > 0
        The number of bits to use per sample
        
    Returns
    -------
    x_quantize : np.ndarray
        x reduced to the specified bit depth
    '''
    
    # Specify our quantization bins: 
    #   2^n_bits values, evenly (linearly) spaced 
    # between the min and max of the input x
    bins = np.linspace(x.min(), x.max(), 
                          num=2**n_bits, 
                          endpoint=False)
    
    # Quantize x
    return np.digitize(x, bins)


# We'll make a 1-second example tone at 220 Hz
duration = 1
fs = 11025
freq = 220

# Our sample times
t = np.arange(duration * fs) / fs

# The continuous signal
x = np.cos(2 * np.pi * freq * t)

print('Original signal (float64)')
display(Audio(data=x, rate=fs))

# And play the audio at each bit depth
for bits in [16, 8, 4, 2, 1]:
    print('{}-bit'.format(bits))
    display(Audio(data=quantize(x, bits), rate=fs))
Original signal (float64)
16-bit
8-bit
4-bit
2-bit
1-bit

One thing you might notice from listening to the examples above is that the lower quantization levels have the same fundamental frequency, but sound somehow brighter. This is because quantization introduces discontinuities to the signal. A pure sinusoid varies smoothly and continuously, while its 1-bit quantized counterpart (a square wave) jumps abruptly between different sample values. Since low-frequency waves cannot change abruptly, we can infer that severe quantization induces high frequency content in the signal. We will make this intuition more precise in a few chapters (when we cover the Fourier transform).

2.4.7. Quantization in practice#

By now, we’ve seen how quantization works, what it looks like visually, and how it sounds. But how do we use this information practically?

16-bit quantization (65536 distinct values) is the standard for compct disc-quality audio, and it suffices for many practical applications. 24-bit quantization is also common (16777216 distinct values), especially in music production or other applications with high-fidelity requirements.

Although audio data is stored this way (e.g., in .wav files), we don’t usually worry about quantization when processing audio in a computer. Instead, the most common pattern is to convert audio signals to floating point numbers (which are better approximations to continuous-valued real numbers), do whatever analysis or filtering we want to do, and then only quantize again when we need to save or play back audio at the end. Usually this happens behind the scenes, and you don’t need to worry about it directly.