Complete Ensemble EMD with Adaptive Noise (CEEMDAN) in Python

CEEMDAN is available in Python through PyEMD.

There are as many empirical mode decomposition (EMD) variations as many teams are working on it. Everyone notices that in general EMD is very helpful method, yet, there’s room for improvement. Few years back I have stopped doing modifications myself in exchange for working on mathematically sound model of coupled oscillator. Nevertheless, since I spent quite a lot of time on EMDs and have enjoy playing with it, from time to time something will catch my eye. This is what happened with Complete Ensemble EMD with Adaptive Noise (CEEMDAN).

As name suggests this is an expansion on the ensemble EMD, which was already covered. What’s the difference? In case of CEEMDAN we’re also decomposing our perturbation to the system, i.e. added noise. Method creates an ensemble of many perturbations, decomposes them using EMD and resulting IMFs are included to evaluate components of the input. I will refer to these components as cIMF. The actual algorithm was first proposed by Torres et. al [1], but shortly after an improvement in efficiency was proposed[2]. These updates refer mainly to noticing that for their purpose one doesn’t need to compute all IMFs and weight parameter can be progressively scaled as well. What exactly is this algorithm?

Let’s define operator IMFi(S) which returns ith IMF (pure EMD) of its input S and M(S) to provide local mean, i.e. M(S) = S – IMF1(S), then the algorithm is as follows:

1. Create Gaussian noise ensemble W={wi}, where $i\in[1..N]$, and decompose them using EMD.
2. For input signal S calculate grand average of local means from signal perturbed by scaled noise first IMF
$R_{1} = \frac{1}{N}\sum_{1}^{N} M(S+ \beta_0 IMF_{1}(w^{i}))$.
3. Assign first cIMF to be: C1 = S – R1.
4. Compute $R_{k}= \frac{1}{N} \sum_{i=1}^{N} M(R_{k-1} + \beta_{k-1} IMF_{k}(w^{i}))$.
5. Calculate kth cIMF as Ck = Rk-1 – Rk.
6. Iterate 4. and 5. until set of {S, Ck} fulfils EMD stopping criteria.

As it can be seen a family of parameters β has been included in the algorithm. These scalars refer to the amount of decomposed noise used to compute cIMFs. This is what the authors refer to as noise adaptive. These parameters are arbitrary, but it’s suggested in improved version [2] to set them as $\beta_{k}= \epsilon_{0} \sigma(R_k)$, where σ is standard divination of argument and ε is another arbitrary parameter. Looking at point 4. one can see that for ith residue we are using ith IMF computed from noise. This is a problem, because EMD decomposes signal into a finite set of components and it can happen that there isn’t ithIMF. In this case authors are suggesting to assume component to be equal 0.

Advantage of this variation comes from the fact that created decomposition {Ci} fully reconstructs input. This is in contrast to EEMD which doesn’t guarantee such completeness. However, with CEEMDAN questions rise regarding the meaning of added scaled IMFs of noise. Augmenting signal with ensemble of pure noise creates perturbations of input without any distinguished direction. As it has been observed by Flandrin et al. [3] when decomposing white noise EMD acts as a dyadic filter bank. This means that extracted IMFs will have preferred structure and adding them to input will be similar to adding vector with random length but particular direction.

Regardless of all, CEEMDAN is definitely an interesting method. Just purely by the number of citations it seems that I’m not the only one thinking that. I’ve included it to my Python PyEMD package, so feel free to play with it and leave some feedback.

References

[1] Torres ME, Colominas MA, Schlotthauer G, Flandrin P. A complete ensemble empirical mode decomposition with adaptive noise. InAcoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on 2011 May 22 (pp. 4144-4147). IEEE.
[2] Colominas MA, Schlotthauer G, Torres ME. Improved complete ensemble EMD: A suitable tool for biomedical signal processing. Biomedical Signal Processing and Control.
[3] Flandrin P, Rilling G, Goncalves P. Empirical mode decomposition as a filter bank. IEEE signal processing letters.

Python Empirical Mode Decomposition on Image

One of the packages I intend long term maintain and support is Python implementation of Empirical Mode Decomposition (EMD) called PyEMD. I will skip introduction of the method as it has been explained in few other posts [1, 2, 3, …]. This blog entry is more about announcement of new feature which also means new version.

PyEMD version 0.2 is out. This means that PyEMD now supports 2D data (image) decomposition. Other visible improvements include documentation and more thorough testing both of code and data cases. Installation instructions are provided on the project’s webpage.

I am more than happy to include other improvements or suggestions. The next big step will be support for 3D and multi dimensional data. Please get in touch if you feel that there is something missing.

Image decomposition is based on the simple extremum definition: a point that is above (max) or below (min) surrounding. Behind the hood this is done using SciPy’s ndim maximum_filter. These are then connected using SmoothBivariateSpline. Stopping criteria can be chosen to be either based on the number of sifting operations or threshold values for mean and standard deviations.

Below is included exemplary decomposition, with the top image being input and the following two are the outputs. Exact formula with which the image was generated is
$sin(4\pi \cdot x) \cdot \left( cos(8\pi y + 8\pi x) + 3\right) + \left(5x + 2y - 0.4 \right) y + 2$. Python code generating this example is in provided in documentation in Examples/EMD2D.

On the Phase Coupling of Two Components Mixing in Empirical Mode Decomposition

Another of my papers [1] (see About) has been published recently. Unfortunately, I cannot present it here in a full scope. Below I’m presenting an abstract from the paper and invite interested people in contacting me. This topic is really interesting and there is plenty to be done.

On the Phase Coupling of Two Components Mixing in Empirical Mode Decomposition

Abstract:
This paper investigates frequency mixing effect of Empirical Mode Decomposition (EMD) and explores whether it can be explained by simple phase coupling between components of the input signal. The input is assumed to be a linear combination of harmonic oscillators. The hypothesis was tested assuming that phases of input signals’ components would couple according to Kuramoto’s model. Using a Kuramoto’s model with as many oscillators as the number of intrinsic mode functions (result of EMD), the model’s parameters were adjusted by a particle swarm optimisation (PSO) method. The results show that our hypothesis is plausible, however, a different coupling mechanism than the simple sine-coupling Kuramoto’s model are likely to give better results.

In a very, very brief, but figure-enhanced summary: we present that for certain range of frequencies ratio a mix in frequency appears which can be explained as a difference of original frequencies. Figure below presents Fourier transform F of correlation function between each pair of IMFs for different values of frequency f. All values were normalised, such that for given f the maximum is equal to one. Additionally, on the top projection of the figure two lines have been drawn – F1 = 13-f (dashed line) and F2 = 2(13 – f) (dash-dotted line).

[1] D. Laszuk, O. J. Cadenas, and S. J. Nasuto (July, 2016) On the Phase Coupling of Two Components Mixing in Empirical Mode Decomposition, Advances in Data Science and Adaptive Analysis, vol. 8, no. 1. [Link]

Objective Empirical Mode Decomposition metric

This is a summary. If you wish to read full paper, please visit About me section and click on appropriate link, or follow one of these: [1] [2]. Python code for metrics calculation is in Code section.

Part of my research has been concentrated on empirical mode decomposition (EMD). It’s quite nice decomposition method and can generate interesting results. Nevertheless, due to its empirically, it’s quite hard to determine what those results represent. Moreover, they will change if we change some parameters of the decomposition; parameters such as stopping criteria threshold, interpolation spline technique or even definition of extremum. Having two different decompositions of the same signal, one needs to decide which one is better. What researchers have done most often is to visually compare obtained decompositions and, based on their expertise, decide which one is better. Naturally this is not an ideal solution. Despite all our good (objective) intentions, observer’s bias is unavoidable. We have tried to mitigate this problem by referring to the original concept of EMD and intrinsic mode functions (IMFs). We have came up with metrics, which are based on IMF’s idealised properties: 1) an average frequency decreases with increase of IMF’s index, 2) distinct instantaneous frequency for each IMF and 3) disjoint Fourier spectra support for IMF’s amplitude and phase.

Metric M1

This metric is based on the empirical evidence for the decrease of average instantaneous frequency, simply referred to as the average frequency, with the increase of IMF’s index number. Although the order with which the IMF components are construced in general corresponds to the decreasing IMF average frequencies, there are instances when the instantaneous frequencies cross over between the components. Since it has been claimed that each IMF has a significant instantaneous frequency, such behaviour is unwelcome and hence it is penalised by this metric. Penalties are introduced when instantaneous frequency of an IMF with lower number (high average frequency) is smaller than instantaneous frequency of any IMF with higher number. The penalty value is proportional to the length of the crossing over effect, i.e.

$(1) \qquad \qquad \displaystyle m^{\text{I}}_{j} = \sum_{k=j+1}^{N} \int_{\dot{\phi}_{k}>\dot{\phi}_{j}} \frac{dt}{T},$

where k, j are IMFs’ indices. Formula (1) compares functions of instantaneous frequencies of two IMFs and returns the total duration, over which the IMF with higher index has lower frequency. The crossing over effect has been presented in Figure 1. It shows instantaneous frequency of each IMF as a function of time. Coloured regions indicate where the crossing over occurred. Summing over all pairs of IMFs allows us to assess results for particular EMD. Metric value for the whole set is given as

$(2) \qquad \qquad \displaystyle M_{\text{I}} = \sum_{j=1}^{N} m^{\text{I}}_{j}, \qquad M_{\text{I}} \in \left[0,\frac{N(N-1)}{2}\right],$

According to this measure, the best IMF set is the one, for which $M_{\text{I}}=0$, i.e. there are no crossing-over parts in instantaneous frequency domain. The worst case, $M_{\text{I}} = N(N-1)/2$, is when the order of all IMFs is reversed, i.e. when the first IMF is under all others and the last IMF is above all others. However, this theoretical upper limit is very unlikely and the corresponding IMF set could be still considered upon index reversal.

Fig. 1. Plot of instantaneous frequency for each IMF as a function of time. Coloured regions indicates where the frequency crossing over occurs. Metric M_{\text{I}} penalises based on the length of those regions.

Metric M2

Another validating measure is based on the Bedrosian theorem. It refers to the necessary conditions for the signal’s amplitude, $a(t)$, and phase, $\phi(t)$, to be exactly recoverable using Hilbert transformation. For signal $s(t) = a(t) \cos\left( \phi(t) \right)$ these conditions require that the support of amplitude and phase’s Fourier spectra are not overlapping. In other words, if the amplitude function, $f(t) = a(t)$, and the phase function, $g(t) = \cos\left(\phi(t)\right)$, then

$(3) \qquad \qquad \displaystyle \left\langle \mathcal{F}(f), \mathcal{F}(g) \right\rangle = 0 ,$

where $\mathcal{F}$ represents the Fourier transform and $\langle h(t), l(t) \rangle = \int h^*(t) l(t) dt$ is the dot product. Here it is assumed, that all functions belong to $L^2$ normed space.

Let $F_{j}^{a} = \left| \mathcal{F} \left( a_{j}(t) \right) \right|$ and $F_{j}^{\phi} = \left| \mathcal{F} \left( \cos\left(\phi_{j}(t)\right) \right) \right|$ be absolute values of Fourier transforms of $a_{j}$ and $\cos(\phi_{j})$, respectively, for $j$ IMF. Their normalised measure of overlapping spectra is given as

$(4) \qquad \qquad \displaystyle m_{j}^{\text{II}} = \frac{\left\langle F_{j}^{a}, F_{j}^{\phi} \right\rangle} {\sqrt{\| F_{j}^{a} \| \| F_{j}^{\phi} \|}} ,$

where $\| h \| = \langle h, h \rangle$ is a norm of a function h. Assumptions of Bedrosian’s theorem are completely fulfilled when spectra are not overlapping, thus minimum value of $m^{\text{II}}_j$ is zero. This allows for different definitions of a metric for the whole IMF set, depending on application of EMD. First proposition is based on a biggest value of overlap $m_{j}$ in considered decomposition, i.e.

$(5) \qquad \qquad M_{\text{II}} = \max_{j} \{ m_{j}^{\text{II}}\}, \qquad M_{\text{II}} \in [0,1],$

and the second refers to the cumulative overlap within the decomposed set, i.e.

$(6) \qquad \qquad M_{\text{III}} = \sum_{j=1}^{N} m_{j}^{\text{II}}, \qquad M_{\text{III}} \in [0, N],$

where in both cases N is the number of extracted IMFs. Zero for both metrics implies no overlap between amplitude’s and phase’s spectra in any of IMFs.

Visual interpretation of the validation measure (4) is presented in Figure 2. It shows example Fourier spectra of slowly changing amplitude (dashed line) and higher frequency phase (solid line). Gray-striped region indicates overlapping area of both spectra. Proposed value is a measure of ratio of the overlapping area to the total area under both functions.

Since metric $M_{\text{III}}$ is a sum over all IMFs, it also contains the one which maximises value $m^{\text{II}}_j$ (Eq. (4)). This means that $M_{\text{III}}$ for each decomposition has to be at least equal or higher than $M_{\text{II}}$.

Fig. 2. Example of comparing Fourier spectrum of the amplitude with the
spectrum of phase. Gray-striped area indicates where two functions overlap.

Application of the validation measures

Each of the presented metrics highlights different properties of the decomposition. Computing all three values is equivalent to finding a point $M=(M_{\text{I}}, M_{\text{II}},M_{\text{III}})$ in a 3-dimensional space, where each dimension relates to the specific metric. The best decomposition corresponds to the minimum over all the metrics, i.e. $M=(0,0,0)$, and the worst decomposition to $M=(\frac{N(N-1)}{2},1,N)$. For any other point one has to decide on the importance, or weight, for each of the proposed metrics, on the basis of the problem being considered. Although the distance in the M-space is not strictly defined, it can be any $L^p$ norm, thus we suggest using the weighted Manhattah metric, i.e.
$(7) \qquad \qquad \| M \| = w_1 M_{\text{I}} + w_2 M_{\text{II}} + w_3 M_{\text{III}} ,$
where $w_i$ are respective weights. Their values should reflect the relative importance of features one is concentrated on.

EMD on audio – wav and recurrance plots

Update:
Code and direct access to examples can be found on my GitHub reccurrence-plot.

There was an idea that has been bothering me for past few months, but due to time restrictions and many commitments I just couldn’t do it. Then came the conference quite far away and few transit hour at an airport. And now, few words about it.

Question: What do you get when you perform EMD on an audio?

Sound is nothing else than a vibration of the air in area where we happen to be. Human ear can register vibration of frequency roughly between 20 Hz and 20 kHz. In order to fulfil Nyquist sampling (in essence, the fastest frequency one can detect is up-down-up-down…, i.e. half the sampling rate) we need to record with at least 40 kHz sampling rate. Taking into account some additional technical difficulties and just in case sounds are typically recorded at sampling rate around 44 kHz. But there is also stereo, and… well, my point is, there is a lot of data in 1 second of sounds. For most common methods of signal analysing, say short-time Fourier transform or filters, processing is relatively fast. However, in case of EMD it is not that good. The longer the signal the more extrema in it leading to extending time of spline computation (e.g. cubic needs to know all extrema) and increase number of iterations. Needless to say, decomposing 3 min song gives very small ratio of (rewards significance) / (time spent), which is the reason I’ll show you result for few seconds sounds.

First sound (14240 samples) to decompose is disconnect sound. Below is figure where in red is the original signal and following rows are its IMFs (only four first and residue). All plots were normalised so that the biggest deflection is 1.

Signal (red) of disconnect sound and its first five IMFs.

The typical result, nothing new. However, interesting is how those IMFs listen. Their audio version is available here: first, second, third and fourth IMFs. How to interpret them? Allow me not to interpret them. Thought they are quite ‘noisy’, I think they sound rather… interesting. The nicer part of this are the recurrence plots.

Shortly, it is an image describe by a function
$I(t, \tau) = \left\{ \begin{array}{rcl} 0 & \mbox{for} & |x(t)-x(\tau)| < \epsilon \\ 1 & \mbox{for} & |x(t)-x(\tau)| \geq \epsilon \\ \end{array} \right. ,$
meaning that, if two values within signal are closer than $\epsilon$ then we draw a dot. I have modified this function slightly to increase the dynamics and instead of binary output, it gives few more values. Modified function is:
$I_N(t, \tau) = \left\{ \begin{array}{lcr} n & \mbox{for} & n \epsilon \leq |x(t)-x(\tau)| < (n+1) \epsilon \\ N & \mbox{for} & |x(t)-x(\tau)| \geq N \epsilon \\ \end{array} \right.,$
where $\mathbb{N} \ni n \in [0, N-1]$.
Here is a snippet on how to generate plots using Python:


import pylab as plt
import numpy as np

def rec_plot(s, eps=None, steps=None):
if eps==None: eps=0.01
if steps==None: steps=10
N = s.size
S = np.repeat(s[None,:], N, axis=0)
Z = np.floor(np.abs(S-S.T)/eps)
Z[Z>steps] = steps

return Z

s = np.random.random(1000)
plt.imshow(rec_plot(s))
plt.show()


Due to the size of an image and readability, here are only smaller plots (5000 points starting from 2000th point). Below are plots for second IMF, fifth IMF and residue from subtracting five fist IMFs. With slower frequencies one can notice quite mesmerising patterns. First three IMFs are abundant in high frequencies, which makes it fast varying image. The rest, however, gracefully changes it colours in nice patterns.

Recurrence plot for second IMF of disconnected sound.

Recurrence plot for fifth IMF of disconnected sound.

Recurrence plot for all IMFs above fifth of disconnected sound.

Same analysis was performed for chainsaw sound. Below are its EMD decomposition and few recurrence plots. Signal has 21381 samples, which makes it much longer to analyse. Again, for recurrence plots only time series of 3000 points were displayed. I must admit that I feel mesmerised by those patterns. Here are additionally wav files for first, second, third, and fourth IMFs.

Signal (red) of chainsaw sound and its first five IMFs.

Recurrence plot for second IMF of chainsaw sound.

Recurrence plot for fifth IMF of chainsaw sound.

Recurrence plot for all IMFs above fifth of chainsaw sound.

Ensemble empirical mode decomposition (EEMD)

As mentioned in previous posts, one of the main drawbacks of EMD is its poor stability. This means that slight changes to input signal will cause changes in the output. Sometimes small, sometimes big, but almost always some changes. Ensemble empirical mode decomposition (EEMD) [1] is based on the assumption that small perturbations to input signal will also perturb slightly results around the true output. In fairness, we don’t know whether the data we are using are noise free (almost never they are) and this contamination itself affects the output.

How does EEMD work? It creates a large set (an ensemble) of EMD boxes. Let’s assume there are N of those boxes, but typically $N \geq 100$. Each of them process data s(t), but with artificially injected noise n(t). The true result is a grand average over all sets for each component, i.e. $c_j(t) = \frac{1}{N} \sum_{i=0}^{N} EMD_j \left[ s(t) + n_i(t) \right]$,
meaning that component j is a grand average of all j components for each contaminated i data out of N sets. This should work because of the noises’ zero mean property. Originally it was suggested to use random white noise distribution with a finite amplitude of 0.2 data’s standard deviation. It was also mentioned that type of distribution and amplitude parameter are not optimal for all choices and one should adjust them for one’s own purpose. Another assumption is that all components have two parts: one related to true results and a pure noise, which again, because of mean property, should on average cancel each other out.

The original article, like all Huang’s, ale really well written and gives throughout view with pros and cons of the method. Although it is still very much empirical it has been applied, with success, to many problems (multidimentional decomposition [2] or artifact removal from biosignals [3]). Moreover, because of its straight forward parallel computing nature, there are many attempts to create GPU-sutiable computing algorithms [4, 5].

As a example I have generated a signal s(t) composed of 3 sine waves:
$s(t) = -(3 \cos(2\pi \cdot 4t) + 4 \sin(2\pi \cdot 13t) + \sin(2\pi \cdot 21t))$.
This signal is displayed in Figure 1. Fairly to say it is not complicated function.

1) Original input signal composed out of 3 harmonic functions – 4, 13 and 21 Hz.

Signal s(t) has been processed with both EMD and EEMD which results are presented on Figures 2 and 3 respectively. Additionally, on those figures, expected sine waves (the ones with similar frequencies) have been added coloured with red dashed lines. These examples are just to present possible outcomes of the methods rather than promote any of them, thus results are not the most “outstanding”. One can see that both methods have returned 2 slower components and completely omitted the fastest 21 Hz component.

There are few things here worth noting about EEMD. First of all, EEMD’s first component is always related to generated noise. Again, this is empirically “proven”, but it makes sense. Since we are using noise, i.e. fastest changing signal, and EMD returns IMFs from the highest frequency first, thus it shouldn’t be surprise to obtain noise related component in each performed EMD of ensemble. Secondly, the first component should usually be close to zero due to being sum of zero mean random numbers. Personally, I haven’t read anyone advocating this to be one of EEMD sensibility test, but since we are posing noise cancellation and we know that the first component is mostly related to them, then in my opinion, it should. Thirdly, returned components are not necessarily IMFs in original definition, i.e. their local mean is not always 0. It is simple result of adding (in grand average) two or more signals that are not phase synchronised and probably have different “frequencies”. For different injected noises EMD can produce different number of IMFs, which sometimes forces addition of components that most likely are not related to the same feature.

2) EMD decomposition of input signal. IMFs are drawn with blue lines, whereas red dashed line represents input component with the closest frequency.

3) Ensemble EMD (EEMD) output of input signal. Blue lines relates to grand average over all obtained components for respective IMF order, whereas red dashed lines match the closest component from input sum.

As a bonus, I have made EEMD of null signal (no signal) using exactly the same noise components as in presented example. Figure 4. presents the output. Unsurprisingly, the components are still not IMFs, but they preserve the order of average frequency, i.e. the average frequency decreases with increase of the component’s order number.

4) Ensemble EMD (EEMD) of null function with the same noise functions as used in previous example.

[1] Z. Wu and N. E. Huang, “Ensemble Empirical Mode Decomposition: a Noise-assisted Data Analysis Method,” Adv. Adapt. Data Anal., vol. 1, no. 1, pp. 1–41, Jan. 2009.
[2] Z. Wu, N. E. Huang, and X. Chen, “The multi-dimensional ensemble empirical mode decomposition method,” Adv. Adapt. Data Anal., vol. 1, no. 03, pp. 339–372, 2009.
[3] K. T. Sweeney, S. F. McLoone, and T. E. Ward, “The use of ensemble empirical mode decomposition with canonical correlation analysis as a novel artifact removal technique.,” IEEE Trans. Biomed. Eng., vol. 60, no. 1, pp. 97–105, Jan. 2013.
[4] L.-W. Chang, M.-T. Lo, N. Anssari, K. Hsu, N. E. Huang, and W. W. Hwu, “Parallel implementation of Multi-dimensional Ensemble Empirical Mode Decomposition,” in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 1621–1624.
[5] D. Chen, D. Li, M. Xiong, H. Bao, and X. Li, “GPGPU-aided ensemble empirical-mode decomposition for EEG analysis during anesthesia.,” IEEE Trans. Inf. Technol. Biomed., vol. 14, no. 6, pp. 1417–27, Nov. 2010.

Soon after EMD was proposed researchers started analysing it and testing it for different sorts of applications. And obviously for papers, but it follows the other. One of the first discovered behaviours was acting as a dyadic filter bank on random data. This was shown in 2004 both on fractal Gaussian noise [1] and on white noise [2]. Those are very interesting papers and definitely worth reading.

Quick example of such behaviour is presented in Figures 1 and 2. The first one shows input data – 256 data samples of normal random noise with 0 mean and variance 1 – and its EMD decomposition. Each row relates to different IMF. Figure 2 presents Fourier spectrum for signals in Figure 1, i.e. first row is spectrum for whole signal, second is for first IMF, third for second IMF and so on. Vertical lines relates middle (red) and end (black) of dyadic (powers of two) bands. Values of those lines are given as 256/n^2 and 256/(n+1)^2 respectively for black and red. Although they are not ideal they fit quite nicely and closely.

What is the reason for such behaviour? Actually it’s not understood well. It has been show empirically that it exists, but as many EMD related features it lacks of mathematical explanation (again, [1] and [2] discuss this very nicely). Definitely it has to do with the type of input data (EMD on signal with 2 modes will replicate those modes, see previous post) and… spline types. Figures 3 and 4 shows exactly he same features as Figures 1 and 2, but where produced using Akima spline technique instead of Cubic natural splines. It can be seen that the vertical lines are not as close matching in Fig. 4 as they are in Fig. 2.

1) EMD decomposition of normal random data. First row is the input signal and the following relate to respective IMF. Plots are not in the same y scale.

2) Fourier spectrum made for original data and each IMF. Red and black vertical lines indicate respectively mean and end of filter bands. The exact values are powers of 2 – 256/(n+1)**2 for red and 256/n**2 for black, where n is the number of IMF. Plots are not in the same y scale.

3) EMD, using Akima’s spline instead of natural cubic, performed on normal random data. First row is the input signal and the following relate to respective IMF. Plots are not in the same y scale.

4) Fourier spectrum made for original data and each IMF from Fig. 3 (using Akima’s splines). Red and black vertical lines indicate respectively mean and end of filter bands. The exact values are powers of 2 – 256/(n+1)^2 for red and 256/n^2 for black, where n is the number of IMF. Plots are not in the same y scale.

[1] P. Flandrin, G. Rilling, and P. Goncalves, “Empirical Mode Decomposition as a Filter Bank,” IEEE Signal Process. Lett., vol. 11, no. 2, pp. 112–114, Feb. 2004.
[2] Z. Wu and N. E. Huang, “A study of the characteristics of white noise using the empirical mode decomposition method,” Proc. R. Soc. A Math. Phys. Eng. Sci., vol. 460, no. 2046, pp. 1597–1611, Jun. 2004.

Empirical mode decomposition – Introduction

Aside

Empirical mode decomposition (EMD) is a data-driven decomposition method and was originally proposed by Huang et. al in 1998 [1]. Since that time the method has gained a lot of attention in the science community. EMD has been applied in a wide range of different fields, including geophysics, biomedicine, neuroscience, finance and many more.

The method is defined by an algorithm as follows:

1. Identify all local extrema (both minima and maxima) in input signal $s(t)$.
2. If the number of extrema is less or equal 2 then $s(t)$ is considered as a trend ($r(t) := s(t)$ ) — finish with one component.
3. Estimate top (env_max) and bottom(env_min) envelopes by interpolating respectively local maxima and local minima with natural cubic splines.
4. Calculate local mean (mean of both envelopes) — $m(t) = 0.5(env_{max} + env_{min})$.
5. Subtract the mean from the input signal $h(t) = s(t) - m(t)$ .
6. If $h_j$ fulfills the stopping criteria, then it is considered an intrinsic mode function (IMF) (a component $c_j(t)$ ) and algorithm starts again from step 1 for a signal $s(t) := m(t)$. Otherwise, it starts with $s(t) := h(t)$.

Empirical evidence is that the algorithm converges to finite number of IMFs, with which input signal can be reconstructed :

$s(t) = \sum_n^N c_i (t) + r(t)$

where $N$ is the number of components.

Top (blue) and bottomo (red) envelopes of signal (green). Their average, described as carrier wave, is presented in dashed black.

Empirical mode decomposition on signal composed of three sine modes.

[1] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.-C. Yen, C. C. Tung, and H. H. Liu, “The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis,” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 454, pp. 903-995, 1998.