Exercises
9.3. Exercises#
Imagine you have a signal with length \(N=44100\) sampled at \(f_s = 44100\). How many frames would you get if you take an STFT with frame length \(N_F=4096\) and hop length \(N_H=512\)?
Sometimes, one does not want to discard any samples when performing an STFT. This can be done by padding the signal with trailing zeros. In the configuration from question 1, what is the smallest number of samples that you would need to add to capture the entire signal?
Can you give a more general form for calculating the required padding, in terms of (unknown) parameters \(N, N_F, N_H\)?
The SciPy package provides an STFT implementation scipy.signal.stft
which uses a slightly different parametrization that the one presented in this chapter.
Using a (non-trivial) test signal \(x\) of your choice, can you find parameter settings of scipy.signal.stft
that produce identical output to the wstft
function for \(N_F=2048\) and \(N_H=512\)?
Hint
It’s easiest to check the shape of the outputs first:
import numpy as np
import scipy
# [COPY in wstft definition from the text]
s1 = wstft(x, n_frame, n_hop, 'hann')
s2 = scipy.signal.stft(...)
# Check shapes
assert s1.shape == s2.shape
Note: you may need to transpose s2
by saying s2 = s2.T
so that the time- and frequency dimensions are in the same order as ours.
After you get the shapes to line up, test for numerical equivalence by using np.allclose
assert np.allclose(s1, s2)