5.6. Examples#

In this section, we’ll go in-depth with a few concrete example signals, and investigate their corresponding discrete Fourier transforms.

5.6.1. Impulse and delay#

As a first example, let’s look at an impulse:

\[\begin{split}\blue{x_|[n]} = \begin{cases} 1 & \text{if } n = 0\\ 0 & \text{otherwise}. \end{cases}\end{split}\]

In this example, the DFT equation (5.12) gives us

\[\begin{align*} \darkblue{X[m]} &= \sum_{n=0}^{N-1} \blue{x_|[n]} \cdot \purple{\exp\left(-\mathrm{j}\cdot 2\pi \cdot m\cdot n / N \right)}\\ &= \blue{x_|[0]} \cdot \purple{\exp\left(-\mathrm{j}\cdot 2\pi \cdot m \cdot 0 / N \right)} & \text{because }\blue{x_|[n\neq 0]} = 0\\ &= \blue{1} & \text{because } \purple{\exp(0)} = 1. \end{align*}\]

So each DFT component has value \(\darkblue{X[m]} = 1 = 1 + 0\mathrm{j}\).

More generally, we can consider a \(d\)-step delay signal:

\[\begin{split}\blue{x[n]} = \begin{cases} 1 & \text{if } n = d\\ 0 & \text{otherwise}. \end{cases}\end{split}\]

which gives us a DFT sequence:

\[\begin{align*} \darkblue{X[m]} &= \sum_{n=0}^{N-1} \blue{x[n]} \cdot \purple{\exp\left(-\mathrm{j}\cdot 2\pi \cdot m\cdot n / N \right)}\\ &= \blue{x[d]} \cdot \purple{\exp\left(-\mathrm{j}\cdot 2\pi \cdot m \cdot d/ N \right)} & \text{because }\blue{x[n\neq d]} = 0\\ &= \purple{\exp\left(-\mathrm{j}\cdot 2\pi \cdot m \cdot d / N \right)}. \end{align*}\]

Figure Fig. 5.15 illustrates this for a few different values of \(d\).

illustration of delayed impulses and their corresponding Fourier transforms

Fig. 5.15 Each row shows the time-domain (left) and frequency-domain (right) representation of a \(d\)-step delay signal (for varying values of \(d\). The shaded region corresponds to negative analysis frequencies.#

Viewed as a sequence (over \(m\)), the DFT of a delayed impulse produces two sinusoids (one real, one imaginary) which cycle \(d\) times over the duration of the signal.

This will be useful later on when we revisit convolution!

5.6.2. Sinusoids#

Since the DFT turns impulses into sinusoids, the next logical question to ask is what the DFT does to sinusoids? Luckily, we’ve already gone through the work of figuring this out in previous sections.

Let \(\blue{x[n]}\) be a sinusoid at analysis frequency index \(k \notin \{0, N/2\}\) with phase \(\phi\) and amplitude \(A\):

\[\blue{x[n] = A\cdot \cos\left(2\pi \cdot \frac{k}{N} \cdot n + \phi \right)}\]

For \(\red{m}=k\), we will have

\[\begin{align*} \darkblue{X[k]} &= \frac{N}{2}\cdot A \cdot \left( \red{\cos\left(\phi\right)} + \green{\mathrm{j}\cdot\sin(\phi)} \right)\\ &= \frac{N}{2}\cdot A \cdot \purple{\exp\left(\mathrm{j}\phi\right)}. & \text{By Euler's formula} \end{align*}\]

But what about the rest of the spectrum?

First, let’s look at \(\red{m=N-k}\). For the real part of the reference signal, we’ll have

\[\begin{align*} \red{\cos\left(2\pi \cdot \frac{N-k}{N} \cdot n \right)} &= \red{\cos\left(2\pi \cdot \frac{k-N}{N} \cdot n \right)} & \cos(\theta) = \cos(-\theta)\\ &= \red{\cos\left(2\pi \cdot \frac{k-N}{N} \cdot n + 2\pi \cdot \frac{N}{N}\cdot n\right)} & \text{Add a whole number of rotations}\\ &= \red{\cos\left(2\pi \cdot \frac{k-N + N}{N} \cdot n \right)} & \text{Collect like terms}\\ &= \red{\cos\left(2\pi \cdot \frac{k}{N} \cdot n \right)} & \text{Cancel } N-N=0, \end{align*}\]

so the real part of \(\darkblue{X[N-k]}\) is the same as for \(\darkblue{X[k]}\). For the imaginary part, we’ll get a sign flip:

\[\begin{align*} \green{-\sin\left(2\pi \cdot \frac{N-k}{N} \cdot n \right)} &= \green{+\sin\left(2\pi \cdot \frac{k-N}{N} \cdot n \right)} & -\sin(\theta) = \sin(-\theta)\\ &= \green{\sin\left(2\pi \cdot \frac{k-N + N}{N} \cdot n \right)}\\ &= \green{\sin\left(2\pi \cdot \frac{k}{N} \cdot n \right)}, \end{align*}\]

so the imaginary part of \(\darkblue{X[N-k]}\) is the opposite of \(\darkblue{X[k]}\). This gives us

\[\begin{align*} \darkblue{X[N-k]} &= \frac{N}{2}\cdot A \cdot \left( \red{\cos\left(\phi\right)} - \green{\mathrm{j}\cdot\sin(\phi)} \right)\\ &= \frac{N}{2}\cdot A \cdot \purple{\exp\left(-\mathrm{j}\phi\right)}. & \text{By Euler's formula} \end{align*}\]

For all other \(\red{m \neq k}\), we will get \(\darkblue{X[m] = 0}\).

illustration of a sinusoid with different phase offsets and their corresponding effect on the DFT

Fig. 5.16 Left: sinusoids at analysis frequency \(k=3\) with varying phase \(\phi\). Right: the DFT of a sinusoid is a pair of impulses located at \(k\) and \(N-k\). The height of the impulses depends on the phase \(\phi\).#

5.6.3. A real sound#

The examples above are helpful for building intuition about how the DFT behaves on idealized, synthetic signals. However, it’s also helpful to see how the DFT behaves on a real signal. Here, we’ll use the DFT to visualize a recording of a single note played on a trumpet.

# We'll use soundfile to load the signal
import soundfile as sf

# And IPython to play it back in the browser
from IPython.display import Audio, display

# A single note (D#5) played on a trumpet
# https://freesound.org/s/48224/
# License: CC BY-NC 3.0
x, fs = sf.read('48224__slothrop__trumpetf3.wav')

# Play the audio back
display(Audio(data=x, rate=fs))

# How many samples do we have?
N = len(x)

# Compute its DFT
X = np.fft.fft(x)

# and plot both the signal and its spectrum
fig, (ax_time, ax_freq) = plt.subplots(nrows=2)

# get the sample times for x
time = np.arange(N) / fs

# plot the signal
ax_time.plot(time, x)  

ax_time.set(xlabel='Time [seconds]', title='Signal $x[n]$')

# plot the magnitudes |X[m]|
ax_freq.plot(np.abs(X), color=colors[11])  

# We'll shade the negative frequency range
ax_freq.axvspan(N/2, N, color='k', alpha=0.1, zorder=-1)  
ax_freq.set(xlabel='Frequency index $m$', title='DFT magnitude $|X[m]|$');
<Figure size 432x288 with 2 Axes>
../../_images/f8d3e3adf2ea5d36bbfe848bca2061b1fe90a13988910d9c5290b7c5925973c9.svg

The first plot illustrates the raw waveform \(\blue{x[n]}\). While it’s possible to discern some properties of the audio from the plot (e.g., how its amplitude rises and decays over time), it’s virtually impossible to infer anything about the pitch or frequency content of the note.

The magnitude spectrum (second plot) shows some interesting structure, but it would be easier to understand if we use the actual analysis frequency values \(f_m\) rather than their index \(m\) for the horizontal axis.

We could calculate this manually using the rule

\[\begin{split}f_m = \begin{cases} \frac{m}{N} \cdot f_s & \text{if } 0 \leq m < N/2\\ \frac{m-N}{N} \cdot f_s & \text{if } N/2 \leq m < N. \end{cases}\end{split}\]

Fortunately, numpy gives a function that does exactly this for us: np.fft.fftfreq. It takes as input the number of samples \(N\) and the sample period \(t_s = 1/f_s\). So we can instead do:

freqs = np.fft.fftfreq(N, 1/fs)
ax_freq.plot(freqs, np.abs(X))
illustration of the DFT spectrum of a trumpet playing D-sharp 5

Fig. 5.17 The DFT spectrum \(\darkblue{|X[m]|}\) of a trumpet playing D#5 = 622.25 Hz, plotted as a function of analysis frequency \(f_m\). The middle and bottom plots contain the same information as the top, but zoomed into \(\pm 5\) KHz (middle) and \([550, 650]\) Hz (bottom). A prominent peak can be observed near \(f_m \approx 622.25\) (and -622.25).#

From Fig. 5.17 we can observe that the magnitude spectrum \(|X[m]|\) is sparse: it has small magnitudes for most frequencies (near 0). The first prominent peak (starting from 0 and looking to the right) that we can see is close to the frequency of the note: \(D\sharp 5\). If we want to find the actual frequency of the peak in the spectrum we can do so as follows:

# get our DFT frequencies
frequencies = np.fft.fftfreq(N, 1./fs)  

# find m index of largest magnitude
peak_m = np.argmax(np.abs(X))  

# Get the f0
f0 = frequencies[peak_m]  

Note that the peak is not exactly at the ideal pitch for \(D\sharp 5 = 622.25\): it lands about 10 cents flat at 618.8 Hz, with energy spread out around neighboring frequencies. This is still much closer to \(D\sharp 5\) than \(D5 = 587.33\) though, so we would still perceive it as a \(D\sharp\). This spread of energy is quite common, though it can be counter-intuitive the first time you see it. It happens because naturally generated frequencies are almost never exactly at our analysis frequencies. We’ll come back to this point in the next chapter.

If we continue looking at increasing frequencies \(f_m\) (middle plot), we will find other peaks evenly spaced. These correspond to the harmonics (also known as overtones or partials) of the note being played.

Definition 5.3 (Harmonic series)

For a given fundamental frequency \(f_0\), its harmonic series is given by

\[f_k = (k+1)\cdot f_0 \quad\quad k = 1, 2, 3, \dots\]

In the case of \(f_0 = 622.25\), its harmonic series is the sequence \(622.25, 1244.5, 1866.75, 2489.0, \dots\). The amount of energy at each harmonic of a fundamental frequency differs from one instrument to the next, and this is part of what gives each instrument its distinctive timbre.

If the signal is relatively simple (e.g., an isolated source playing a single note), the DFT is a great way to quickly get a sense of its frequency content. Do not be deceived by this example though: most signals are not simple, and a visual inspection of the DFT is not generally sufficient to understand all signals.

5.6.4. Summary#

We’ve now seen the DFT applied to several kinds of signals, producing different behaviors. Sometimes we can reason about the DFT analytically, as in the first two examples. Impulses and delays look like sinusoids in the frequency domain. Sinusoids look like (pairs of) impulses.

Other times, the best we can do is approach it empirically (as in the case of the trumpet recording), and observe the output of the DFT. With a little practice, one can learn to use the DFT to quickly get a sense of the contents of a signal that are not readily apparent from its waveform.

In the next few chapters, we’ll dig deeper into the theoretical properties of the DFT.