The Convolution Theorem
Contents
10.1. The Convolution Theorem#
Recall from (3.1) the definition of convolution between a signal \(\blue{x}\) (of length \(N\)) and impulse response \(\red{h}\) (of length \(K\)):
As we saw previously, this can be interpreted as constructing the output \(\purple{y}\) by mixing together several delayed copies of \(\blue{x}\) (index \(k\) corresponds to the delay in samples), each with a gain coefficient given by \(\red{h[k]}\).
In this section, our goal is to understand the frequency domain representation \(\magenta{Y} = \text{DFT}(\red{h} * \blue{x})\) in terms of the DFTs of the inputs \(\red{h}\) and \(\blue{x}\), which will be expressed succinctly by the convolution theorem.
10.1.1. Circular convolution#
Before we can properly apply the DFT to convolution, we’ll first need to deal with the differing assumptions of how negative sample indices are handled. When we first saw convolution, it was defined using the assumption that \(\blue{x[nk] = 0}\) if \(\blue{n < k}\), i.e., that the signal \(\blue{x}\) is silent before the 0th sample. Different ways of interpreting this assumption gave rise to the different convolution modes (full, valid, same).
The DFT, on the other hand, assumes that signals repeat indefinitely, so that \(\blue{x[nk] = x[nk + N]}\).
If we define convolution using the repetition assumption, we get what is known as circular convolution. The equation is exactly the same as (3.1); all that has changed is the interpretation of negative sample indices, which now wrap around to the end of the signal. This assumption can be encoded by using modular arithmetic to compute the delayed sample index: \(nk \mod N\).
(Circular convolution)
For a signal \(x\) of length \(N\) and impulse response \(h\) of length \(K\), the circular convolution between \(x\) and \(h\) is defined as:
10.1.2. The convolution theorem#
Now that we’ve defined circular convolution, we can formally state the convolution theorem, which is one of the most important theorems in signal processing.
(The Convolution Theorem)
Let \(\red{h}\) and \(\blue{x}\) be sequences of length \(N\), and let \(\purple{y}=\red{h}*\blue{x}\) denote the circular convolution between them.
The DFT of the convolution is the product of the DFTs:
Proof. By definition, the output signal \(\purple{y}\) is a sum of delayed copies of the input \(\blue{x[nk]}\), each scaled by the corresponding coefficient \(\red{h[k]}\).
By DFT linearity, we can think of the DFT \(\magenta{Y[m]}\) as a weighted combination of DFTs:
To reduce clutter, let’s define \(\darkblue{X_k} = \text{DFT}(\blue{x[nk]})\) to be the DFT of the \(k\)step delayed copy of \(x\).
Now, we can use the DFT Shifting Theorem to express the DFT of the delayed signal \(X_k\) in terms of the original signal \(\darkblue{X} = \text{DFT}(\blue{x})\) for each frequency index \(m\):
Substituting this into our derivation for \(\magenta{Y}\), we can continue:
which is exactly what we set out to prove.
10.1.3. Consequences of the convolution theorem#
Now that we have the convolution theorem, let’s take some time to explore what it gives us.
10.1.3.1. Fast convolution#
From the convolution theorem, we get \(\magenta{Y[m]} = \red{H[m]} \cdot \darkblue{X[m]}\). Applying the inverse DFT, we can recover the timedomain output signal:
where the product \(\red{H}\cdot \darkblue{X}\) is the elementwise product of \(\red{H}\) and \(\darkblue{X}\).
This gives us a recipe for computing the convolution \(\red{h}*\blue{x}\) using the DFT:
Pad \(\blue{x}\) and \(\red{h}\) to the same length (if necessary)
Compute DFTs \(\darkblue{X}\) and \(\red{H}\),
Multiply \(\magenta{Y[m]} = \red{H[m]} \cdot \darkblue{X[m]}\),
Take the inverse DFT of \(\purple{y} = \text{IDFT}(\magenta{Y})\).
If \(\red{h}\) and \(\blue{x}\) are of comparable length (say, both \(N\) samples), then this can be more efficient than the direct convolution algorithm. The direct convolution has an outer loop of \(N\) steps (one for each output sample) and an inner loop of \(N\) steps (one for each delay), for a total of \(\sim N^2\) computation.
However, the recipe above takes:
\(\leq N\) steps for padding
Two DFT calculations
\(N\) steps to multiply the spectra
One inverse DFT calculation
Steps 2 and 4 can be performed with \(\sim N + N \cdot \log N\) steps. The total computation time will therefore scale like \(N\cdot \log N \ll N^2\).
However, if \(h\) is much shorter than \(N\), this may not be worth it, as the direct method would take \(N\cdot K\) steps.
Standard convolution implementations like scipy.signal.convolve
typically compare the lengths of the signal to determine the most efficient means of computing the result.
10.1.3.2. Algebraic properties#
You may recall from the earlier section on properties of convolution that we asserted (without proof) that convolution is commutative:
and associative:
but we did not prove those properties. (We could have, but it would have been pretty tedious.)
The convolution theorem provides a more direct and intuitive way to see these properties:
Commutativity must exist because complex multiplication is commutative: \(\darkblue{X[m]}\cdot \red{H[m]} = \red{H[m]} \cdot \darkblue{X[m]}\).
Likewise for associativity: \(\red{H[m]} \cdot \darkblue{(G[m] \cdot X[m])} = \red{(H[m] \cdot G[m])} \cdot \darkblue{X[m]}\).
Inheriting these properties from multiplication is much easier than deriving them from scratch.
10.1.3.3. Filter analysis#
Finally, the convolution theorem provides a way to understand the effect a particular impulse response \(\red{h}\) might have on a signal. Initially, we thought of each \(\red{h[k]}\) as the gain applied to the delayed signal \(\blue{x[nk]}\).
In the frequency domain, each \(\red{H[m]}\) is a complex number, which we can denote as
where \(A \geq 0\) is the magnitude and \(\phi\) is the phase.
Using the inverse DFT, we can interpret \(\darkblue{X[m]} = B \cdot e^{\mathrm{j} \cdot \theta}\) as encoding the magnitude \(B\) and phase \(\theta\) of a particular sinusoid present in the signal \(x\).
Multiplying these two together, we see that \(\magenta{Y[m]}\) has magnitude \(A \cdot B\) and phase \(\phi + \theta\). That is, \(\red{H[m]}\) encodes how much each sinusoidal component of \(\blue{x}\) is amplified (or attenuated) (by \(A\)) and delayed (by \(\phi\)) when convolved with \(\red{h}\).
In the following sections, we’ll see how to put this idea into practice for filtering signals.
10.1.3.4. Dual convolution theorem#
One interesting corollary of the convolution theorem is the following:
(Convolution in frequency)
If \(\red{w}\) and \(\blue{x}\) are sequences of length \(N\), then elementwise multiplication in the time domain is equivalent to circular convolution in the frequency domain.
where frequencydomain convolution is defined as follows with \(\red{W = \text{DFT}(w)}\) and \(\darkblue{X = \text{DFT}(x)}\):
The proof of Corollary 10.1 is nearly identical to that of the convolution theorem, except that it uses a variation of the shifting theorem for the inverse DFT.
The dual convolution theorem is mainly useful as a theoretical device, as it can help us to understand the effects of elementwise multiplication. This occurs when windowing a signal, and if we apply the inverse DFT to each side of Corollary 10.1, we obtain
We won’t go into detail here, but the table below summarizes the relationships between DFT, IDFT, circular convolution, and elementwise multiplication.
Time domain 
Frequency domain 


Convolution \(\red{h}*\blue{x}\) 
\(\Leftrightarrow\) 
Multiplication \(\red{H}\cdot \darkblue{X}\) 
Multiplication \(\red{w}\cdot \blue{x}\) 
\(\Leftrightarrow\) 
Convolution \(\red{W}*\darkblue{X}\) 