# Similarity

## Contents

# 5.1. Similarity#

In this chapter, we’ll develop the concepts of the **frequency domain** and the **Discrete Fourier Transform** from first principles.
The underlying idea throughout is that one can equivalently represent a signal by its sequence of sample values, or as a combination of sinusoids.
The sequence of sample values, which we’ve been calling \(x[n]\) is known as the **time-domain** representation because the variable \(n\) that we change to view the entire signal corresponds to time (sample index).
The alternate representation that we’ll develop in this chapter will instead vary the **frequency** of different sinusoids, and not depend explicitly on time.

To make this all work, we’ll need to develop a way to convert from the time domain to the frequency domain. Our basic strategy will be to compare the input signal \(x[n]\) to a collection of fixed reference signals. The result of these comparisons will be a collection of similarity measurements, which (loosely speaking) measure “how much” of each reference signal is in \(x[n]\).

## 5.1.1. Measuring similarity#

One could imagine many different ways to compare two signals \(\blue{x[n]}\) and \(\red{y[n]}\).

The definition of **similarity** that we’ll use is to go sample-by-sample, multiplying \(\blue{x[n]} \cdot \red{y[n]}\), and summing up the results to produce a single number \(\purple{S}\).
In equations, this looks as follows:

or equivalently in code:

```
def similarity(x, y):
'''Compute similarity between two signals x and y
Parameters
----------
x, y : np.ndarrays of equal length
Returns
-------
S : real number
The similarity between x and y
'''
# Initialize similarity to 0
S = 0
# Accumulate the sample-wise products
N = len(x)
for n in range(N):
S = S + x[n] * y[n]
return S
```

Fig. 5.1 provides a demonstration of this similarity calculation. Two signals \(\blue{x[n]}\) (a triangle wave, top plot) and \(\red{y[n]}\) (a cosine wave, center plot) with \(N=32\) samples are compared by summing the sample-by-sample product (\(\blue{x[n]} \cdot \red{y[n]}\), bottom plot).

Each step of the loop (equivalently, term of the summation) is calculated independently, and the height of the bar on the right plot shows the running total. When the loop is finished (we’ve summed all samples \(n=0,1,\dots,31\)), we’ll have computed the total similarity \(\purple{S}\).