3.3. Impulse Response#

The convolution operation is closely related to the idea of an impulse response. In this section, we’ll work through what this all means, and how convolution can be related to acoustic wave propagation.

3.3.1. What is an impulse?#

Before we go further, we’ll need to define an impulse. An impulse is an idealized signal consisting of a single 1, followed by (in theory at least) infinitely many zeros:

\[ x_\mid = [1, 0, 0, 0, \cdots]. \]

Impulses are theoretical constructs, and cannot exist in nature. The closest familiar sounds to an impulse would be things like a balloon popping, or tapping two hard objects together, but these sounds will only approximate an ideal impulse.

We can construct impulses inside a computer (or with a pencil and paper), and doing so can help us understand the behavior of many signal processing operations.

3.3.2. Impulse response of a filter#

In general, a filter is any process that consumes one signal as input and produces a new signal as output. One might express this notationally as

\[ \purple{y} = \red{g}(\blue{x}) \]

for input signal \(x\) and filter operation \(g\).

The impulse response of a filter \(g\) is the signal \(y\) produced by applying \(g\) to an impulse:

\[ g\left(x_\mid\right). \]

This is a broad and abstract definition, but in casual conversation, when people refer to filters they most often mean linear filters. We’ll go one step further in this section, and assume that the filtering operation is a convolution between the input \(x\) and some fixed sequence \(h\). One may then ask what is the impulse response of a convolutional filter?

Here, we’ll go back to the definition of convolution:

\[ \purple{y[n]} = \sum_{k=0}^{K-1} \red{h[k]} \cdot \blue{x[n-k]}. \]

Plugging in our definition of an ideal impulse \(x=x_\mid\), we see that \(x[n-k] = 1\) if \(n=k\) (so that \(n-k=0\)) and \(x[n-k] = 0\) otherwise. This means that we can simplify the calculation significantly:

(3.3)#\[\begin{split}\purple{y[n]} &= \sum_{k=0}^{K-1} \red{h[k]} \cdot \blue{x[n-k]}\\ &= \red{h[n]} \cdot x_\mid[0]\\ &= \red{h[n]}.\end{split}\]

That is, the impulse response of a convolutional filter is the sequence \(h\) itself.

Put another way, the impulse response alone is enough to completely characterize a convolutional filter: no other information is necessary.

3.3.3. Finite impulse response (FIR)#

You may have heard the term finite impulse response (FIR) and wondered what it meant. In plain terms, an FIR filter is any filter whose impulse response goes to 0 and stays there after some finite number of samples. In general, this critical number of samples is a property of the filter in question, and will vary from one filter to the next.

In the case of convolutional filters, \(g(x) = h*x\) (for some finite sequence \(h\) of length \(K\)), the impulse response must go to 0 after \(K\) samples. This is because any output sample after that point will depend only on the trailing zeros in the impulse.

In short, convolutional filters have a finite impulse response.

In later chapters, we’ll see examples of other kinds of filters which use feedback to achieve an infinite impulse response (IIR). But for now, there’s still much more to explore with convolutional filters.

3.3.4. Room impulse response#

Beyond digital filters, you can also think about impulse responses of physical environments. Imagine placing a sound source and a microphone in a room, and for now, let’s assume that the room’s walls have perfectly (acoustically) reflective surfaces. Any sound emanating from the source will then have multiple paths to the microphone: the direct (shortest) path, as well as longer paths that reflect from each wall (or multiple walls). If the sound source produces an ideal impulse, we can observe the impulse arriving at the microphone at different delay times (corresponding to the different paths of arrival). Because each path has different length, the intensity of the sound will diminish for the longer paths corresponding to higher delay times.

This process is illustrated below, for an example where we have two perpendicular walls (and no floor or ceiling, just to keep things simple).

Fig. 3.6 A sound radiating from a source (square) emanating in all directions reflects off surfaces, resulting in multiple paths to a recording device (circle).#

This entire physical process can be thought of as implementing a convolution. Each reflection path corresponds to a different delay \(k\), and the decrease in recorded intensity corresponds to the gain coefficient \(h[k]\) for that path.

Note that the impulse response depends not only of the physical environment, but also the positioning of the sound source and microphone: if you move either (or both) of these, the impulse response will generally change.

The example above is significantly simplified from physical reality in several ways. First, a real room would have three dimensions (floors and ceilings) which can also provide reflective surfaces and increase the number of paths. Second, the surface materials play a large part in how sound is reflected or diffused when colliding with a wall, so the observed delayed signal would not generally be a perfectly scaled copy of the impulse. Third, as mentioned at the beginning of this section, it’s not physically possible to produce an ideal impulse, so what you would actually record at the microphone is the convolution of the room’s impulse response with whatever sound was actually produced by the source. (In practice, non-ideal impulses can also be used, as can other signals such as sinusoidal sweeps, but that’s a bit beyond the scope of this section.)

Despite the limitations of this example, it can still be instructive to think about convolution as a physical process, as it provides mechanisms which implement delay, gain, and mixing in a natural context.

3.3.4.1. What can I do with this?#

Say you were able to perform the above experiment in an environment of your choice, resulting in an impulse response recording \(h\). Remember how earlier in this section, we saw that the impulse response of a convolution \(h*x\) is just \(h\)? This means that if we have \(h\), we can now apply it to any signal \(x\) (not just an impulse), and simulate the effect of hearing it in the environment characterized by \(h\).

The following code example demonstrates this process for a pre-recorded impulse response and input signal.

import numpy as np
from IPython.display import display, Audio

# We'll need soundfile to load waves
import soundfile

# Input signal is a short piano excerpt
# https://freesound.org/people/Piotr123/sounds/511749/
x, fs = soundfile.read('511749__piotr123__jazz-piano-intro-mono.wav')

# Impulse response is from a church: 
# https://freesound.org/s/474296/
h, fs2 = soundfile.read('474296__petherjwag__ir-church-01-mono.wav')

# Check that the sampling rates match
assert fs == fs2

# Convolve the signal with the impulse response
y = np.convolve(x, h)

display('Impulse response')
display(Audio(data=h, rate=fs))
display('Input signal')
display(Audio(data=x, rate=fs))
display('Output signal')
display(Audio(data=y, rate=fs))
'Impulse response'
'Input signal'
'Output signal'

I encourage you to try this out for yourself. If you don’t have the materials necessary to record your own impulse responses, there are plenty of examples available on Freesound.org.