3.1. Delay, gain, and mix#

The term convolution gets thrown around quite a bit in signal processing, and it can sound more complicated than it really is. In the simplest terms, convolution consists of three basic operations:

  1. delaying a signal by some fixed number of samples,

  2. applying a gain to the delayed signal (changing its amplitude),

  3. mixing (adding) the delayed and gained signal with the original signal.

Before we get into the equations in full generality, let’s work through a couple of simple examples.

3.1.1. Example 1: delay and mix#

As a first example, let’s consider the case where there is no gain applied to either the delayed or original signal, so we only have to worry about delay and mixing. If our input signal is \(x[n]\), and our delay is \(k>0\) samples, the output of this process will be a new signal \(y[n]\) defined as

\[ \purple{y[n]} = \blue{x[n]} + \blue{x[n-k]}. \]

That is, the \(n\)th output sample \(\purple{y[n]}\) is the sum of the input sample \(\blue{x[n]}\) and the input sample \(k\) steps back in time \(\blue{x[n-k]}\).

Fig. 3.1 demonstrates this process for an input \(x[n]\) generated by a square wave, and a delay of \(k=3\) samples.

Plot of a signal being mixed with a delayed copy of itself

Fig. 3.1 Top: a signal \(x[n]\) and a delayed copy of the signal \(x[n-3]\). Bottom: the sum of the two signals.#

Before we move on, there are already a couple of observations we can make.

First, the output \(y[n]\) looks substantially different from the input \(x[n]\). Wherever \(x[n]\) and \(x[n-3]\) are both positive (or both negative), \(y[n]\) becomes larger (or smaller), reaching a peak amplitude of 2. When \(x[n]\) and \(x[n-3]\) have opposite signs, they cancel each-other out, resulting in \(y[n]=0\) (e.g., at 0.2 seconds). Overall, the resulting \(y\) signal has a different shape than the input \(x\), more akin to a triangle wave than the square wave we started with. Different delays will produce different shapes, which we will perceive as changes in timbre.

Second, the first few samples of \(y[n]\) (in the shaded region) look different from the rest of the signal: these are the only places where the value of \(1\) (rather than \(-2, 0, 2\)) occur. To understand this, we need to investigate the equation for \(y\) more carefully, and think about what happens when \(n < k\). If \(n< k\), then \(n-k < 0\), so the sample index \(n-k\) corresponds to a negative time index. As stated in chapter 1, we generally assume that a signal is silent for negative time indices (i.e., before recording started). So for the first few samples, we’ll have

\[ y[n] = x[n] + \red{\cancel{x[n-k]}} \quad \text{ if } n < k. \]

This period of time corresponding to \(n<k\) is sometimes referred to as the warm-up phase, where the filter we’re applying has not yet seen enough of the input signal to operate completely.

Having defined the behavior for the warm-up phase, we can now translate the equation for \(y[n]\) into code as follows:

# Make an output buffer the same size as the input
N = len(x)
y = np.zeros(N)

# Set our delay
k = 3

for n in range(N):
    if n >= k:
        y[n] = x[n] + x[n-k]
    else:
        y[n] = x[n]

A complete code example is given below.

import numpy as np
import scipy.signal
from IPython.display import Audio

# Our input will be a 100Hz square wave for one second, sampled at 8 KHz.
fs = 8000
duration = 1
f0 = 100

times = np.arange(duration * fs) / fs
x = scipy.signal.square(2 * np.pi * f0 * times)

# The delay will be 8 samples for this example
k = 8

# Try changing k to see how it affects the sound of the output.
# Can you find a setting of k that makes the output silent?

# Initialize the output buffer to match the shape of x
N = len(x)
y = np.zeros(N)

# Compute y
for n in range(N):
    if n >= k:
        y[n] = x[n] + x[n-k]
    else:
        # At the start of the signal, x[n-k] doesn't exist yet
        # so pretend that x[n-k] = 0
        y[n] = x[n]
        

display('Input x[n]')
display(Audio(data=x, rate=fs))

display('Output y[n] = x[n] + x[n-{}]'.format(k))
display(Audio(data=y, rate=fs))
'Input x[n]'
'Output y[n] = x[n] + x[n-8]'

3.1.2. Example 2: delay + gain#

In this example, we’ll mix two different delays, each with a different gain coefficient:

\[ y[n] = \frac{1}{2} x[n] - \frac{1}{2} x[n-1]. \]

Here, the delay-0 signal (\(x[n]\)) has a gain of \(+1/2\), and the delay-1 signal (\(x[n-1]\)) has gain \(-1/2\). Intuitively, whenever the input signal is not changing (i.e., \(x[n] = x[n-1]\)), then the output signal \(y[n]\) should be zero. Whenever the signal is changing, the output shows the direction of the change:

Plot of a signal being mixed with an inverted and delayed copy of itself

Fig. 3.2 Top: an input signal \(x[n]\). Bottom: the output signal.#

Note that in this example, the gain coefficients can be both positive or negative.

The code below implements filter on the same square wave as the previous example. Try modifying the gain coefficients below. How does the sound change if you make both coefficients positive? Or both negative?

import numpy as np
import scipy.signal
from IPython.display import Audio

# Our input will be a 100Hz square wave for one second, sampled at 8 KHz.
fs = 8000
duration = 1
f0 = 100

times = np.arange(duration * fs) / fs
x = scipy.signal.square(2 * np.pi * f0 * times)

# Initialize the output buffer to match the shape of x
N = len(x)
y = np.zeros(N)

# Compute y
for n in range(N):
    if n >= 1:
        y[n] = 0.5 * x[n] - 0.5 * x[n-1]
    else:
        # At the start of the signal, x[n-1] doesn't exist yet
        # so pretend that x[n-k] = 0
        y[n] = 0.5 * x[n]
        

display('Input x[n]')
display(Audio(data=x, rate=fs))

display('Output y[n] = 1/2 * x[n] - 1/2 * x[n-1]')
display(Audio(data=y, rate=fs))
'Input x[n]'
'Output y[n] = 1/2 * x[n] - 1/2 * x[n-1]'