5.5. The Discrete Fourier Transform#

In this section, we will formally introduce the Discrete Fourier Transform (DFT). The DFT is the digital version of the Fourier transform, which was originally developed by Joseph Fourier in the early 19th century as a way to model heat flow with differential equations [Fou22]. The Fourier transform proved to be useful for many different applications, especially in audio and signal processing.

Many texts introduce the DFT by first starting with the continuous Fourier transform (using integral calculus), and then show the DFT as a discrete-time approximation of the continuous Fourier transform. We’ll take a different approach here, motivating the DFT from first principles entirely in the discrete time domain.

5.5.1. Dealing with phase#

Recall that in the previous section, when we compare a sinusoid \(\blue{x[n]}\) to a collection of reference signals

\[\red{y_m[n] = \cos\left(2\pi \cdot \frac{m}{N} \cdot n \right)},\]

the resulting similarity score \(\purple{S(x, y_m)}\) depends on the phase of the signal, and whether or not the frequency of \(\blue{x}\) is an analysis frequency.

In the worst case, if

\[\blue{x[n] = \sin\left(2\pi \cdot \frac{m}{N}\cdot n \right) = \cos\left(2\pi \cdot \frac{m}{N}\cdot n - \frac{\pi}{2} \right),}\]

then the similarity score will be 0 even if the frequency matches one of our reference signals. This representation therefore cannot distinguish an out-of-phase sinusoid from a null signal \(x[n] = 0\).

5.5.1.1. Comparing to multiple references#

In the specific case above, we could try to resolve the problem by changing the reference signals \(\red{y}\) to use \(\sin\) instead of \(\cos\). This will fix the problem if \(\blue{x}\) is a \(\sin\) wave, but then it will fail in exactly the same way if \(\blue{x}\) is a \(\cos\) wave.

But what if we try both?

Let’s see what happens if we take \(\blue{x}\) as a sinusoid at an analysis frequency with arbitrary phase, and compare it to two reference signals at the same frequency: one is \(\cos\) and one is \(\sin\).

In the following example, we’ll continue with the settings from the previous section: \(f_s = 20\) [samples / sec], \(N=40\) [samples], and \(\red{m=3}\) (equivalently, \(f_0 = 1.5\) [cycles/sec]). This gives an input signal

(5.9)#\[\blue{x[n] = \cos\left(2\pi \cdot \frac{m}{N} \cdot n + \phi \right)}\]

where \(\phi\) is allowed to vary.

We’ll compare to reference signals

\[\red{\cos\left(2\pi \cdot \frac{m}{N} \cdot n\right)}, \quad\quad\text{and}\quad\quad \green{\sin\left(2\pi \cdot \frac{m}{N} \cdot n\right)}\]

Fig. 5.13 A sinusoid \(\blue{x}\) at an analysis frequency but with varying phase \(\phi\) (top plot) is compared to two reference signals: \(\red{\cos}\) (second plot) and \(\green{\sin}\) (fourth plot), producing element-wise products (third and fifth plots). On the right, the resulting similarity scores for both reference signals are shown as \(\phi\) varies.#

Fig. 5.13 illustrates what happens to similarity scores as we vary the phase \(\phi\) of \(x\) while comparing to both cosine and sine waves. Note that although either score can be positive, negative, or 0, they cannot both be 0 simultaneously.

Recall from (5.8) that the similarity score for \(x\) and the reference signal \(\cos(2\pi \cdot n\cdot m/N)\) depends on \(\phi\): \(S = \frac{N}{2} \cdot \cos\phi\) (if \(m\neq N/2\)). By similar reasoning, we can calculate the similarity between \(x\) and the \(\sin\) reference signal:

\[\begin{align*} S &= \sum_{n=0}^{N-1} \blue{\cos\left(2\pi \cdot \frac{m}{N} \cdot n + \phi \right)} \cdot \green{\sin\left(2\pi \cdot \frac{m}{N} \cdot n \right)}\\ &= \sum_{n=0}^{N-1} \frac{1}{2} \cdot \blue{\sin\left(2\pi \cdot \frac{m+m}{N} \cdot n + \phi\right)} - \frac{1}{2} \cdot \green{\sin\left(2\pi \cdot \frac{m-m}{N} \cdot n + \phi \right)} & \text{by product-to-sum rule}\\ &= \sum_{n=0}^{N-1} \frac{1}{2} \cdot \blue{\sin\left(2\pi \cdot \frac{2m}{N} \cdot n + \phi \right)} - \frac{1}{2} \cdot \green{\sin\left(\phi\right)} & \text{cancel } m-m=0 \\ &= \left(\frac{1}{2}\sum_{n=0}^{N-1} \blue{\sin\left(2\pi \cdot \frac{2m}{N} \cdot n + \phi \right)}\right) - \frac{N}{2} \cdot \green{\sin\left(\phi\right)} & \text{pull} -\frac{1}{2}\sin(\phi) \text{ out of sum}\\ \end{align*}\]

If \(m \notin \{0, N/2\}\), then the \(\blue{\text{first term}}\) of the summation will total to 0 because it completes a whole number of cycles in \(N\) samples. The resulting score in this case is

\[S = -\frac{N}{2}\cdot \green{\sin\left(\phi\right)} \quad\quad\quad \text{if } m \notin \{0, N/2\}.\]

Otherwise, if \(m=0\) or \(m=N/2\), the first term simplifies to

\[\blue{\sin\left(2\pi \cdot \frac{2m}{N} \cdot n + \phi\right)} = \blue{\sin\left(2\pi \cdot \frac{0}{N} \cdot n + \phi\right)} = \blue{\sin\left(\phi\right)}.\]

Combining this with the derivation above, we get the total similarity score:

(5.10)#\[S = \frac{N}{2} \cdot \blue{\sin(\phi)} - \frac{N}{2} \cdot \green{\sin(\phi)} = 0 \quad\quad\quad \text{if } m \in \{0, N/2\}.\]

5.5.1.2. Combining terms#

Stepping back, if we take \(m \notin \{0, N/2\}\), pick an arbitrary phase \(\phi\), generate a signal \(x[n]\) according to Equation (5.9), and compare it to cosine and sine waveforms as above, we will observe the following similarity scores:

  • \(\purple{S(x, \cos) = \frac{N}{2} \cdot \cos(\phi)}\)

  • \(\cyan{S(x, \sin) = -\frac{N}{2} \cdot \sin(\phi)}\).

If we additionally scale \(x\) by an amplitude \(A\), this scaling factor would carry through all of the summations above:

  • \(\purple{S(A\cdot x, \cos) = \frac{N}{2} \cdot A \cdot \cos(\phi)}\)

  • \(\cyan{S(A\cdot x, \sin) = -\frac{N}{2} \cdot A \cdot \sin(\phi)}\).

This is starting to look pretty interesting. However, it would be better if both scores had the same sign. We can accomplish this by changing which signals we compare to: instead of comparing to \(\green{\sin\left(2\pi\cdot\frac{m}{N}\cdot n\right)}\), we’ll compare to \(\green{-\sin\left(2\pi\cdot\frac{m}{N}\cdot n\right)}\). Since this is just a multiplication by \(-1\), like the amplitude scaling above, we can push it through the similarity calculation and obtain:

  • \(\purple{S(A\cdot x, \cos) = \frac{N}{2} \cdot A \cdot \cos(\phi)}\)

  • \(\cyan{S(A\cdot x, -\sin) = \frac{N}{2} \cdot A \cdot \sin(\phi)}\).

Now this is interesting.

We started with a sinusoid \(x\) having amplitude \(A\) and phase offset \(\phi\). The resulting scores can be interpreted as horizontal and vertical coordinates of a point with radius \(A\cdot N/2\) and angle \(\phi\).

illustration of amplitude and phase encoded as complex numbers

Fig. 5.14 Left: three different waves of the form \(x[n] = A \cdot \cos(2\pi \cdot n \cdot m / N + \phi)\) are compared to \(\cos\) and \(-\sin\) reference signals with the same frequency. Right: interpreting the similarity scores for each signal as horizontal and vertical coordinates places each one on a circle of radius \(A \cdot N/2\) at angle \(\phi\).#

Rather than carry around two coordinates for each frequency comparison, we can combine them into a single complex number. We will associate the comparison to \(\cos\) with the real part, and the comparison to \(-\sin\) with the imaginary part.

5.5.2. The Discrete Fourier Transform#

Finally, we can formally define the discrete Fourier transform (DFT). The DFT takes as input a signal \(x[n]\) of \(N\) samples, and produces a sequence \(X[m]\) of \(N\) complex numbers representing amplitude and phase for each analysis frequency.

Definition 5.2 (The discrete Fourier transform (DFT))

For an arbitrary input signal \(\blue{x[n]}\) of \(\blue{N}\) samples, its discrete Fourier transform is a sequence of \(N\) complex numbers \(\darkblue{X[m]}\) (\(\red{m=0,1,\dots,N-1}\)) defined as follows:

(5.11)#\[\darkblue{X[m]} = \sum_{n=0}^{N-1} \blue{x[n]} \cdot \left( \red{\cos\left(2\pi \cdot \frac{m}{N}\cdot n\right)} - \mathrm{j}\cdot\green{\sin\left(2\pi \cdot \frac{m}{N}\cdot n\right)} \right)\]

or, equivalently using Euler’s formula (4.1),

(5.12)#\[\darkblue{X[m]} = \sum_{n=0}^{N-1} \blue{x[n]} \cdot \purple{\exp\left(-\mathrm{j}\cdot 2\pi \cdot \frac{m}{N} \cdot n\right)}.\]

The DFT equation (5.11) can be implemented in code as follows:

import numpy as np

def dft(x):
    '''The Discrete Fourier Transform
    
    Parameters
    ----------
    x : np.ndarray
        The input signal
        
    Returns
    -------
    X : np.ndarray, same shape as x
        The DFT sequence: X[m] corresponds to analysis frequency index m
    '''
    
    # Get the number of samples = number of frequencies
    N = len(x)
    
    # Allocate the output array
    X = np.zeros(N, dtype=np.complex)
    
    # For each analysis frequency
    for m in range(N):
        # For each sample
        for n in range(N):
            # Compare to cos and -sin at this frequency
            X[m] = X[m] + x[n] * (np.cos(2 * np.pi * m / N * n) 
                                     - 1j * np.sin(2 * np.pi * m / N * n))
    # Return the DFT array
    return X

Note that this code will be extremely slow, and should not be used for long signals. Instead, you should probably use a fast Fourier transform (FFT) implementation, like the one provided by numpy:

import numpy as np

X = np.fft.fft(x)

5.5.2.1. Some facts about the DFT#

In the coming chapters, we will devote a significant amount of time to understanding everything we can about the DFT. For the time being, here is a brief list of basic facts that can be derived by a close examination of our process in building up the DFT.

For convenience, we will write \(\text{DFT}(x)\) to denote the entire sequence of DFT components \(\darkblue{X[0], X[1], X[2], \dots, X[N-1]}\).

  1. The DFT sequence \(\darkblue{X[0], X[1], \dots X[N-1]}\) does not index time like the sample array \(\blue{x[n]}\) does. Instead, each \(\darkblue{X[m]}\) measures the contribution of a sinusoid at a particular frequency to the entire signal \(\blue{x}\).

  2. The DFT sequence always has the same number of analysis frequencies as the number of samples \(\blue{N}\). The first \(N/2\) correspond to positive frequencies, the last \(N/2\) correspond to negative frequencies.

  3. The \(\red{m=0}\) component is always a real number—this follows immediately from (5.10). It is known as the direct current (DC) component, and always equals the sum of the sample values.

  4. For each \(\darkblue{X[m]}\), the magnitude \(|\darkblue{X[m]}|\) (np.abs(X[m])) corresponds to the amplitude of a sinusoid at frequency \(f_s \cdot \red{m} / \blue{N}\) present in the signal. Similarly, its angle in the complex plan, \(\angle \darkblue{X[m]}\) (np.angle(X[m])), corresponds to the phase of the sinusoid.

  5. The sequence of values \(|\darkblue{X[m]}|\) is known as the magnitude spectrum of \(\blue{x}\).

  6. The magnitude \(|\darkblue{X[m]}|\) is not the same as the real part \(\red{\mathsf{Re}X[m]}\). Remember that for a complex number \(z\),

\[|z| = \sqrt{\red{\left(\mathsf{Re}z\right)^2} + \purple{\left(\mathsf{Im}z\right)^2}}\]

Both the real and imaginary parts of the number contribute to both magnitude and phase.

5.5.3. Summary#

Developing the DFT takes a lot of work. So does understanding it completely. Don’t panic if this section was difficult to get through.

To finish off this chapter, the next section will show several examples signals and their corresponding DFTs.