# Radix-2 Cooley-Tukey

## Contents

# 8.2. Radix-2 Cooley-Tukey#

As mentioned in the introduction to this chapter, there are many algorithms which are collectively referred to as “Fast Fourier Transforms”. In this section, we’ll see one of the earliest methods, (re-)discovered in 1965 by Cooley and Tukey [CT65], which can accelerate DFT calculations when \(N\) is an integral power of 2: \(N = 2^K\).

Modern FFT implementations use many tricks to speed up calculation, and generalize to arbitrary values for \(N\). That said, the core idea of the “radix-2 Cooley-Tukey” method has plenty of interest to offer, and sheds some light on how more advanced techniques can be developed.

## 8.2.1. Divide and Conquer#

We’ll start again with the DFT equation:

The key observation of Cooley and Tukey is that this summation can be broken apart in interesting ways.
Specifically, we can separate the summation into *even indices* \(n=0,2,4,\dots\) and odd indices \(n=1,3,5,\dots\).

Writing \(n=2k\) for even and \(n=2k+1\) for odd indices, we get:

All we’ve done so far is change variables: the summations are identical to the original definition. However, if we notice that \(2k / N = k / (N/2)\), we can interpret the first summation as the DFT \(\magenta{X_E[m]}\) where \(\purple{x_E[n]} = \blue{x[2n]}\) is the signal comprised of the even-numbered samples:

This trick doesn’t exactly work for the odd-part summation, but we can make it work with a little arithmetic. Observe that by the rule of exponents \(e^{a+b} = e^a \cdot e^b\),

The first factor is exactly the same as in the even case, and gives us the DFT basis.
The second factor, \(\red{e^{-2\pi \cdot \mathrm{j} \cdot m / N}\;\;}\) (commonly called a *“twiddle factor”*) does not depend on the variable of summation (\(k\)), and can therefore be factored out of the sum, giving us:

This says that the second summation is the DFT \(\magenta{X_O[m]}\) of the signal comprised of odd-indexed samples \(\purple{x_O[n]} = \blue{x[2n+1]}\), with an additional rotation of \(\red{-2\pi \cdot m / N}\).

Combining these two observations, we can express the original DFT of \(N\) samples in terms of two DFTs of \(N/2\) samples:

This expression is an example of the **divide-and-conquer** approach to problem solving.
When given a large problem to solve, it sometimes helps to break it into smaller problems whose solutions can be combined to solve the original problem.

### 8.2.1.1. Why is this helpful?#

So far, all we’ve done is express a single component of the DFT \(\darkblue{X[m]}\) *recursively* in terms of two smaller DFT’s, \(\magenta{X_E[m]}\) and \(\magenta{X_O[m]}\).
If we only cared about computing a single frequency index \(m\), this recursive formulation wouldn’t help us.
Remember, we can compute a single DFT coefficient \(\darkblue{X[m]}\) in \(N\) steps (the inner loop of the naive algorithm in the previous section), and we know that it can’t be done in fewer than \(N\) steps because each sample \(x[n]\) must be looked at.

Of course, we’re usually interested in computing the full DFT (all \(N\) frequencies).
The recursive decomposition of \(\darkblue{X[m]}\) is helpful here because it applies simultaneously to *all frequencies* \(m = 0, 1, 2, \dots, N-1\).
That is, if someone gave us the entire DFT calculations for \(\magenta{X_E}\) and \(\magenta{X_O}\), we could compute the entire DFT \(\magenta{X}\) in \(N\) additional steps by applying (8.1) (once for each \(m=0,1,2,\dots,N-1\)).
In principle, we should be able to do something like the following:

```
1def pseudo_fft(x, X_E, X_O):
2 '''Compute the DFT of x given the even and odd DFTs X_E, X_O'''
3
4 N = len(x)
5
6 # Combine the given even and odd spectra
7 X = np.zeros(N, dtype=np.complex)
8
9 for m in range(N):
10 X[m] = X_E[m] + np.exp(-2j * np.pi * m / N) * X_O[m]
11
12 return X
```

In reality, we won’t get \(\magenta{X_E}\) and \(\magenta{X_O}\) for free. However, if the total work taken to get \(\magenta{X_E}\) and \(\magenta{X_O}\) is less than the naive algorithm’s \(N^2\), this still amounts to an improvement.

## 8.2.2. How many frequencies?#

A careful look at (8.1) reveals something a little strange: the DFT \(\darkblue{X[m]}\) involves \(N\) frequencies (\(m=0,1,2,\dots, N-1\)), while the smaller DFTs \(\magenta{X_E[m]}\) and \(\magenta{X_O[m]}\) should have only \(N/2\) frequencies (\(m=0,1,2,\dots,N/2-1\)) since they are derived from signals of length \(N/2\). This introduces a problem for lines 9–10 in the code block above: how do we recover all \(N\) frequencies when only \(N/2\) are present in \(\magenta{X_E}\) and \(\magenta{X_O}\)?

Once again, **aliasing** can come to the rescue.

Recall that each \(\darkblue{X[m]}\) is the result of a comparison to a wave that completes \(\red{m}\) cycles in \(\blue{N}\) samples, or in units of Hz:

For (8.1) to make sense, \(\magenta{X_E[m]}\) should involve a comparison to a wave that complete \(\red{m}\) cycles in \(\blue{N' = N/2}\) samples. Note, however, that the sampling rate for the even signal \(\purple{x_E}\) is \(f'_s = f_s/2\) because the period between samples is double that of our original signal \(\blue{x}\). If \(\red{m} \geq N/2\), the aliasing equation (2.3) tells us that this frequency has an aliasing frequency \(\red{f_a}\) given by:

In plain words, this says that to find \(\magenta{X_E[m]}\) for \(\red{m} \geq N/2\), we should look at the DFT coefficient \(\magenta{X_E[m - N/2]}\). This is relationship is depicted visually in Fig. 8.3.

Frequency indices \(\red{m} < N/2\) do not need special aliasing treatment, as they will occur exactly where we expect them to in the DFT spectrum: \(\magenta{X_E[m]}\).

We can notationally combine these two cases (\(<N/2\) and \(\geq N/2\)) as

or, in Python code:

```
m % (N //2)
```

This allows us to correctly state the expression of \(\darkblue{X[m]}\) in terms of \(\magenta{X_E}\) and \(\magenta{X_O}\) for all \(\red{m}\):

The benefit of this form is that we only rely on \(N/2\) coefficients in both \(\magenta{X_E}\) and \(\magenta{X_O}\), but we are still able to recover all \(N\) coefficients \(\darkblue{X[m]}\). This, ultimately, is the source of the efficiency gains in FFT algorithms: the solutions to smaller sub-problems can be used multiple times to produce different parts of the output. \(\darkblue{X[m]}\) and \(\darkblue{X[m + N/2]}\) both depend on \(\magenta{X_E[m]}\) and \(\magenta{X_O[m]}\). We compute \(\magenta{X_E}\) and \(\magenta{X_O}\) once, but we get to use them twice.

## 8.2.3. A radix-2 algorithm#

Finally, combining the observations above, we can give a full definition of the radix-2 FFT algorithm.

The last ingredient to this algorithm, compared to the naive DFT implementation, is *recursion*.
To compute \(\magenta{X_E}\) and \(\magenta{X_O}\), the algorithm will **call itself** on the appropriately sub-sampled signal.
Once it has \(\magenta{X_E}\) and \(\magenta{X_O}\), it then combines the results and returns the output spectrum.

If the input has only one sample (\(N = 2^0 = 1\)), then there is only one analysis frequency (\(m=0\)), and \(X[0] = x[0]\) is just the sample value itself.
This represents the **base case**, the input which terminates the recursion.

```
1def fft2(x):
2 '''Compute the DFT of an input x of N = 2**k samples'''
3
4 # Get the length of the input
5 N = len(x)
6
7 # The DFT of a single sample is just the sample value itself
8 # Nothing else to do here, so return
9 if N == 1:
10 return x
11
12 else:
13 # Recursively compute the even and odd DFTs
14 X_even = fft2(x[0::2]) # Start at 0 with steps of 2
15 X_odd = fft2(x[1::2]) # Start at 1 with steps of 2
16
17 # Allocate the output array
18 X = np.zeros(N, dtype=np.complex)
19
20 # Combine the even and odd parts
21 for m in range(N):
22 # Find the alias of frequency m in the smaller DFTs
23 m_alias = m % (N//2)
24 X[m] = X_even[m_alias] + np.exp(-2j * np.pi * m / N) * X_odd[m_alias]
25
26 return X
```

### 8.2.3.1. Time analysis#

Analyzing the running time of the `fft2`

algorithm above requires a slightly more sophisticated approach than the naive DFT algorithm, due to the use of recursion.

Let \(T(N)\) represent the amount of time taken by the algorithm on an input of length \(N\). For now, \(T(N)\) is an unknown quantity, but this notation will help us solve for it. We’re not going to aim for a precise count of the operations performed by the algorithm, but merely a coarse upper bound on how the computation scales with the length of the input (\(N\)).

The base case, \(T(1)\), takes some constant amount of time, which we can call \(c\). This accounts for all computation up to line 10 where it returns from the \(N=1\) case.

If \(N>1\), then we see two recursive calls to inputs of length \(N/2\). This is followed by a loop of \(N\) steps (line 21), of which each step takes some constant amount of time that we’ll call \(d\), for a total of \(N\cdot d\). (We’ll absorb the amount of time to allocate the output array at line 18 into this quantity.) Using this observation, we can express the total time for \(T(N)\) recursively:

Each of these recursive calls, in turn has two more recursive calls plus \(d\cdot N/2\) work to combine the results:

and this process can repeat until the recursion stops at \(N=1\).

If we total up the non-recursive computation done at each *level* of the recursion, we can get a handle on the total amount of computation.
This is illustrated in Fig. 8.4.
To simplify notation, we’ll assume that \(c \leq d\), meaning that the arithmetic operations at lines 23–24 are more expensive than the base-case operations in lines 1–10.

What Fig. 8.4 shows us is that each level \(\ell\) of the recursion has \(2^\ell\) calls to `fft2`

, each of size \(N/2^\ell\).
Each of these does some linear amount of computation: \(d \cdot N / 2^\ell\), and if we sum these together, the result for level \(\ell\) is \(d \cdot N\).
The total number of levels is \(1 + \log_2 N\), since this counts how many times \(N\) can be divided by 2 before hitting the base case: \(N=1\) has one level, \(N=2\) should have two levels, \(N=4\) has 3 levels, and so forth.
Multiplying the per-level work by the number of levels gives us a total upper bound of

Since \(\log_2 N < N\), this comes out to be substantially more efficient than the \(d \cdot N^2\) work performed by the naive approach.

## 8.2.4. Summary#

In this chapter, we’ve seen that the radix-2 Cooley-Tukey algorithm is substantially faster than the naive DFT implementation. This is especially true when \(N\) is large, as is the common case in audio signal processing.

The code given above for `fft2`

is a rather simplistic implementation given to illustrate the basic concepts.
It can be made much more efficient in several ways, including: pre-computing and caching the “twiddle” factors, re-using a single output buffer rather than re-allocating arrays for each partial output, and so on.

The main limitation of the radix-2 method is that it only works if \(N\) is an integral power of 2: \(N= 1, 2, 4, 8, 16, \dots\). If \(N = 37\) (for example), this method cannot be used.

The radix-2 method is just one special case of the general method of Cooley and Tukey. In the radix-2 case, we divide an input of length \(N\) into 2 inputs of length \(N/2\). More generally, if \(N\) is divisible by some integer \(p\), we can divide into \(p\) inputs of length \(N/p\). The basic principle behind this more general “mixed-radix” approach is the same: the DFTs of the smaller cases are combined to form the larger case by applying the appropriate delay (“twiddle factor”) to each one. This more general approach retains the \(N\log N\) computational complexity for broader classes of input length (not just powers of 2).

Where the Cooley-Tukey method fails is when the input length \(N\) is a prime number (eg, 37, or 257), and cannot be divided evenly into pieces. In these cases, alternate methods have been developed which still achieve running time that scales like \(N\log N\).

In practice, modern FFT implementations—such as the Fastest Fourier Transform in the West (FFTW)—use many combinations of strategies to optimize the computation time for a given input length.

The take-home message is that computing DFTs can be done in much less than quadratic time:

Tip

The DFT for a signal of \(N\) samples can be computed (by FFT methods) in time \(\approx N \cdot \log_2 N\).