# Objective Empirical Mode Decomposition metric

This is a summary. If you wish to read full paper, please visit About me section and click on appropriate link, or follow one of these: [1] [2]. Python code for metrics calculation is in Code section.

Part of my research has been concentrated on empirical mode decomposition (EMD). It’s quite nice decomposition method and can generate interesting results. Nevertheless, due to its empirically, it’s quite hard to determine what those results represent. Moreover, they will change if we change some parameters of the decomposition; parameters such as stopping criteria threshold, interpolation spline technique or even definition of extremum. Having two different decompositions of the same signal, one needs to decide which one is better. What researchers have done most often is to visually compare obtained decompositions and, based on their expertise, decide which one is better. Naturally this is not an ideal solution. Despite all our good (objective) intentions, observer’s bias is unavoidable. We have tried to mitigate this problem by referring to the original concept of EMD and intrinsic mode functions (IMFs). We have came up with metrics, which are based on IMF’s idealised properties: 1) an average frequency decreases with increase of IMF’s index, 2) distinct instantaneous frequency for each IMF and 3) disjoint Fourier spectra support for IMF’s amplitude and phase.

Metric M1

This metric is based on the empirical evidence for the decrease of average instantaneous frequency, simply referred to as the average frequency, with the increase of IMF’s index number. Although the order with which the IMF components are construced in general corresponds to the decreasing IMF average frequencies, there are instances when the instantaneous frequencies cross over between the components. Since it has been claimed that each IMF has a significant instantaneous frequency, such behaviour is unwelcome and hence it is penalised by this metric. Penalties are introduced when instantaneous frequency of an IMF with lower number (high average frequency) is smaller than instantaneous frequency of any IMF with higher number. The penalty value is proportional to the length of the crossing over effect, i.e.

$(1) \qquad \qquad \displaystyle m^{\text{I}}_{j} = \sum_{k=j+1}^{N} \int_{\dot{\phi}_{k}>\dot{\phi}_{j}} \frac{dt}{T},$

where k, j are IMFs’ indices. Formula (1) compares functions of instantaneous frequencies of two IMFs and returns the total duration, over which the IMF with higher index has lower frequency. The crossing over effect has been presented in Figure 1. It shows instantaneous frequency of each IMF as a function of time. Coloured regions indicate where the crossing over occurred. Summing over all pairs of IMFs allows us to assess results for particular EMD. Metric value for the whole set is given as

$(2) \qquad \qquad \displaystyle M_{\text{I}} = \sum_{j=1}^{N} m^{\text{I}}_{j}, \qquad M_{\text{I}} \in \left[0,\frac{N(N-1)}{2}\right],$

According to this measure, the best IMF set is the one, for which $M_{\text{I}}=0$, i.e. there are no crossing-over parts in instantaneous frequency domain. The worst case, $M_{\text{I}} = N(N-1)/2$, is when the order of all IMFs is reversed, i.e. when the first IMF is under all others and the last IMF is above all others. However, this theoretical upper limit is very unlikely and the corresponding IMF set could be still considered upon index reversal.

Fig. 1. Plot of instantaneous frequency for each IMF as a function of time. Coloured regions indicates where the frequency crossing over occurs. Metric M_{\text{I}} penalises based on the length of those regions.

Metric M2

Another validating measure is based on the Bedrosian theorem. It refers to the necessary conditions for the signal’s amplitude, $a(t)$, and phase, $\phi(t)$, to be exactly recoverable using Hilbert transformation. For signal $s(t) = a(t) \cos\left( \phi(t) \right)$ these conditions require that the support of amplitude and phase’s Fourier spectra are not overlapping. In other words, if the amplitude function, $f(t) = a(t)$, and the phase function, $g(t) = \cos\left(\phi(t)\right)$, then

$(3) \qquad \qquad \displaystyle \left\langle \mathcal{F}(f), \mathcal{F}(g) \right\rangle = 0 ,$

where $\mathcal{F}$ represents the Fourier transform and $\langle h(t), l(t) \rangle = \int h^*(t) l(t) dt$ is the dot product. Here it is assumed, that all functions belong to $L^2$ normed space.

Let $F_{j}^{a} = \left| \mathcal{F} \left( a_{j}(t) \right) \right|$ and $F_{j}^{\phi} = \left| \mathcal{F} \left( \cos\left(\phi_{j}(t)\right) \right) \right|$ be absolute values of Fourier transforms of $a_{j}$ and $\cos(\phi_{j})$, respectively, for $j$ IMF. Their normalised measure of overlapping spectra is given as

$(4) \qquad \qquad \displaystyle m_{j}^{\text{II}} = \frac{\left\langle F_{j}^{a}, F_{j}^{\phi} \right\rangle} {\sqrt{\| F_{j}^{a} \| \| F_{j}^{\phi} \|}} ,$

where $\| h \| = \langle h, h \rangle$ is a norm of a function h. Assumptions of Bedrosian’s theorem are completely fulfilled when spectra are not overlapping, thus minimum value of $m^{\text{II}}_j$ is zero. This allows for different definitions of a metric for the whole IMF set, depending on application of EMD. First proposition is based on a biggest value of overlap $m_{j}$ in considered decomposition, i.e.

$(5) \qquad \qquad M_{\text{II}} = \max_{j} \{ m_{j}^{\text{II}}\}, \qquad M_{\text{II}} \in [0,1],$

and the second refers to the cumulative overlap within the decomposed set, i.e.

$(6) \qquad \qquad M_{\text{III}} = \sum_{j=1}^{N} m_{j}^{\text{II}}, \qquad M_{\text{III}} \in [0, N],$

where in both cases N is the number of extracted IMFs. Zero for both metrics implies no overlap between amplitude’s and phase’s spectra in any of IMFs.

Visual interpretation of the validation measure (4) is presented in Figure 2. It shows example Fourier spectra of slowly changing amplitude (dashed line) and higher frequency phase (solid line). Gray-striped region indicates overlapping area of both spectra. Proposed value is a measure of ratio of the overlapping area to the total area under both functions.

Since metric $M_{\text{III}}$ is a sum over all IMFs, it also contains the one which maximises value $m^{\text{II}}_j$ (Eq. (4)). This means that $M_{\text{III}}$ for each decomposition has to be at least equal or higher than $M_{\text{II}}$.

Fig. 2. Example of comparing Fourier spectrum of the amplitude with the
spectrum of phase. Gray-striped area indicates where two functions overlap.

Application of the validation measures

Each of the presented metrics highlights different properties of the decomposition. Computing all three values is equivalent to finding a point $M=(M_{\text{I}}, M_{\text{II}},M_{\text{III}})$ in a 3-dimensional space, where each dimension relates to the specific metric. The best decomposition corresponds to the minimum over all the metrics, i.e. $M=(0,0,0)$, and the worst decomposition to $M=(\frac{N(N-1)}{2},1,N)$. For any other point one has to decide on the importance, or weight, for each of the proposed metrics, on the basis of the problem being considered. Although the distance in the M-space is not strictly defined, it can be any $L^p$ norm, thus we suggest using the weighted Manhattah metric, i.e.
$(7) \qquad \qquad \| M \| = w_1 M_{\text{I}} + w_2 M_{\text{II}} + w_3 M_{\text{III}} ,$
where $w_i$ are respective weights. Their values should reflect the relative importance of features one is concentrated on.

# EMD on audio – wav and recurrance plots

Update:
Code and direct access to examples can be found on my GitHub reccurrence-plot.

There was an idea that has been bothering me for past few months, but due to time restrictions and many commitments I just couldn’t do it. Then came the conference quite far away and few transit hour at an airport. And now, few words about it.

Question: What do you get when you perform EMD on an audio?

Sound is nothing else than a vibration of the air in area where we happen to be. Human ear can register vibration of frequency roughly between 20 Hz and 20 kHz. In order to fulfil Nyquist sampling (in essence, the fastest frequency one can detect is up-down-up-down…, i.e. half the sampling rate) we need to record with at least 40 kHz sampling rate. Taking into account some additional technical difficulties and just in case sounds are typically recorded at sampling rate around 44 kHz. But there is also stereo, and… well, my point is, there is a lot of data in 1 second of sounds. For most common methods of signal analysing, say short-time Fourier transform or filters, processing is relatively fast. However, in case of EMD it is not that good. The longer the signal the more extrema in it leading to extending time of spline computation (e.g. cubic needs to know all extrema) and increase number of iterations. Needless to say, decomposing 3 min song gives very small ratio of (rewards significance) / (time spent), which is the reason I’ll show you result for few seconds sounds.

First sound (14240 samples) to decompose is disconnect sound. Below is figure where in red is the original signal and following rows are its IMFs (only four first and residue). All plots were normalised so that the biggest deflection is 1.

Signal (red) of disconnect sound and its first five IMFs.

The typical result, nothing new. However, interesting is how those IMFs listen. Their audio version is available here: first, second, third and fourth IMFs. How to interpret them? Allow me not to interpret them. Thought they are quite ‘noisy’, I think they sound rather… interesting. The nicer part of this are the recurrence plots.

Shortly, it is an image describe by a function
$I(t, \tau) = \left\{ \begin{array}{rcl} 0 & \mbox{for} & |x(t)-x(\tau)| < \epsilon \\ 1 & \mbox{for} & |x(t)-x(\tau)| \geq \epsilon \\ \end{array} \right. ,$
meaning that, if two values within signal are closer than $\epsilon$ then we draw a dot. I have modified this function slightly to increase the dynamics and instead of binary output, it gives few more values. Modified function is:
$I_N(t, \tau) = \left\{ \begin{array}{lcr} n & \mbox{for} & n \epsilon \leq |x(t)-x(\tau)| < (n+1) \epsilon \\ N & \mbox{for} & |x(t)-x(\tau)| \geq N \epsilon \\ \end{array} \right.,$
where $\mathbb{N} \ni n \in [0, N-1]$.
Here is a snippet on how to generate plots using Python:


import pylab as plt
import numpy as np

def rec_plot(s, eps=None, steps=None):
if eps==None: eps=0.01
if steps==None: steps=10
N = s.size
S = np.repeat(s[None,:], N, axis=0)
Z = np.floor(np.abs(S-S.T)/eps)
Z[Z>steps] = steps

return Z

s = np.random.random(1000)
plt.imshow(rec_plot(s))
plt.show()


Due to the size of an image and readability, here are only smaller plots (5000 points starting from 2000th point). Below are plots for second IMF, fifth IMF and residue from subtracting five fist IMFs. With slower frequencies one can notice quite mesmerising patterns. First three IMFs are abundant in high frequencies, which makes it fast varying image. The rest, however, gracefully changes it colours in nice patterns.

Recurrence plot for second IMF of disconnected sound.

Recurrence plot for fifth IMF of disconnected sound.

Recurrence plot for all IMFs above fifth of disconnected sound.

Same analysis was performed for chainsaw sound. Below are its EMD decomposition and few recurrence plots. Signal has 21381 samples, which makes it much longer to analyse. Again, for recurrence plots only time series of 3000 points were displayed. I must admit that I feel mesmerised by those patterns. Here are additionally wav files for first, second, third, and fourth IMFs.

Signal (red) of chainsaw sound and its first five IMFs.

Recurrence plot for second IMF of chainsaw sound.

Recurrence plot for fifth IMF of chainsaw sound.

Recurrence plot for all IMFs above fifth of chainsaw sound.

## Empirical mode decomposition – Introduction

### Aside

Empirical mode decomposition (EMD) is a data-driven decomposition method and was originally proposed by Huang et. al in 1998 [1]. Since that time the method has gained a lot of attention in the science community. EMD has been applied in a wide range of different fields, including geophysics, biomedicine, neuroscience, finance and many more.

The method is defined by an algorithm as follows:

1. Identify all local extrema (both minima and maxima) in input signal $s(t)$.
2. If the number of extrema is less or equal 2 then $s(t)$ is considered as a trend ($r(t) := s(t)$ ) — finish with one component.
3. Estimate top (env_max) and bottom(env_min) envelopes by interpolating respectively local maxima and local minima with natural cubic splines.
4. Calculate local mean (mean of both envelopes) — $m(t) = 0.5(env_{max} + env_{min})$.
5. Subtract the mean from the input signal $h(t) = s(t) - m(t)$ .
6. If $h_j$ fulfills the stopping criteria, then it is considered an intrinsic mode function (IMF) (a component $c_j(t)$ ) and algorithm starts again from step 1 for a signal $s(t) := m(t)$. Otherwise, it starts with $s(t) := h(t)$.

Empirical evidence is that the algorithm converges to finite number of IMFs, with which input signal can be reconstructed :

$s(t) = \sum_n^N c_i (t) + r(t)$

where $N$ is the number of components.

Top (blue) and bottomo (red) envelopes of signal (green). Their average, described as carrier wave, is presented in dashed black.

Empirical mode decomposition on signal composed of three sine modes.

[1] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.-C. Yen, C. C. Tung, and H. H. Liu, “The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis,” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 454, pp. 903-995, 1998.