Complete Ensemble EMD with Adaptive Noise (CEEMDAN) in Python

CEEMDAN is available in Python through PyEMD.

There are as many empirical mode decomposition (EMD) variations as many teams are working on it. Everyone notices that in general EMD is very helpful method, yet, there’s room for improvement. Few years back I have stopped doing modifications myself in exchange for working on mathematically sound model of coupled oscillator. Nevertheless, since I spent quite a lot of time on EMDs and have enjoy playing with it, from time to time something will catch my eye. This is what happened with Complete Ensemble EMD with Adaptive Noise (CEEMDAN).

As name suggests this is an expansion on the ensemble EMD, which was already covered. What’s the difference? In case of CEEMDAN we’re also decomposing our perturbation to the system, i.e. added noise. Method creates an ensemble of many perturbations, decomposes them using EMD and resulting IMFs are included to evaluate components of the input. I will refer to these components as cIMF. The actual algorithm was first proposed by Torres et. al [1], but shortly after an improvement in efficiency was proposed[2]. These updates refer mainly to noticing that for their purpose one doesn’t need to compute all IMFs and weight parameter can be progressively scaled as well. What exactly is this algorithm?

Let’s define operator IMFi(S) which returns ith IMF (pure EMD) of its input S and M(S) to provide local mean, i.e. M(S) = S – IMF1(S), then the algorithm is as follows:

  1. Create Gaussian noise ensemble W={wi}, where i\in[1..N], and decompose them using EMD.
  2. For input signal S calculate grand average of local means from signal perturbed by scaled noise first IMF
    R_{1} = \frac{1}{N}\sum_{1}^{N} M(S+ \beta_0 IMF_{1}(w^{i})).
  3. Assign first cIMF to be: C1 = S – R1.
  4. Compute R_{k}= \frac{1}{N} \sum_{i=1}^{N} M(R_{k-1} + \beta_{k-1} IMF_{k}(w^{i})).
  5. Calculate kth cIMF as Ck = Rk-1 – Rk.
  6. Iterate 4. and 5. until set of {S, Ck} fulfils EMD stopping criteria.

As it can be seen a family of parameters β has been included in the algorithm. These scalars refer to the amount of decomposed noise used to compute cIMFs. This is what the authors refer to as noise adaptive. These parameters are arbitrary, but it’s suggested in improved version [2] to set them as \beta_{k}= \epsilon_{0} \sigma(R_k), where σ is standard divination of argument and ε is another arbitrary parameter. Looking at point 4. one can see that for ith residue we are using ith IMF computed from noise. This is a problem, because EMD decomposes signal into a finite set of components and it can happen that there isn’t ithIMF. In this case authors are suggesting to assume component to be equal 0.

Advantage of this variation comes from the fact that created decomposition {Ci} fully reconstructs input. This is in contrast to EEMD which doesn’t guarantee such completeness. However, with CEEMDAN questions rise regarding the meaning of added scaled IMFs of noise. Augmenting signal with ensemble of pure noise creates perturbations of input without any distinguished direction. As it has been observed by Flandrin et al. [3] when decomposing white noise EMD acts as a dyadic filter bank. This means that extracted IMFs will have preferred structure and adding them to input will be similar to adding vector with random length but particular direction.

Regardless of all, CEEMDAN is definitely an interesting method. Just purely by the number of citations it seems that I’m not the only one thinking that. I’ve included it to my Python PyEMD package, so feel free to play with it and leave some feedback.


[1] Torres ME, Colominas MA, Schlotthauer G, Flandrin P. A complete ensemble empirical mode decomposition with adaptive noise. InAcoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on 2011 May 22 (pp. 4144-4147). IEEE.
[2] Colominas MA, Schlotthauer G, Torres ME. Improved complete ensemble EMD: A suitable tool for biomedical signal processing. Biomedical Signal Processing and Control.
[3] Flandrin P, Rilling G, Goncalves P. Empirical mode decomposition as a filter bank. IEEE signal processing letters.


Installing Cartopy on Ubuntu 14.04 (or Travis-CI)

I’m becoming increasingly convinced that using GitHub to share projects, even in work-in-progress (WIP) state is quite beneficial. For one, someone can quickly point out that such project exists or suggest a tool which would simplify some operations. What’s more, with tools like Travis-CI or CodeCov it’s easy to follow whether the project is actually buildable or whether updates broke functionally of other features. Real convenience! Although, there is some price to pay for these goods.

One of the projects I’m currently working on is MapViz. Its aim is to provide easy to use visualization and summary for country and other administrative units data, mainly EuroStat dataset. This project heavily depends on Cartopy library – a great tool for handling geographical data, however, not so great when it comes to installing. It comes with many dependencies, some of which have to be installed separately. For example, the newest version of Cartopy requires Proj.4 lib in version 4.9, which is not available for Ubuntu older than 16.04. This is unfortunate, because Travis allows to use Ubuntu in versions 12.04 (precise) or 14.04 (trusty). For these only Proj.4 in version 4.8 is available and that’s not enough. Once these are obtained, installing Cartopy with `pip` is easy, it’s just:

$ pip install cartopy

but the trick is to install all dependencies.

The workaround is dirty, but it works. One can directly download and install Proj.4 and its dependencies (libproj9) from 16.04 (xenial) repo. For exact links go to libproj9 and libproj-dev, select your architecture and then click on any mirror link. (Just a note that by default Travis-CI uses amd64.) So, download it with wget and then install with dpkg. Here’s an example:

$ wget 
$ sudo dpkg -i libproj9_4.9.2-2_amd64.deb 

For comparison, here is my whole .travis.yml file:

dist: trusty
language: python
  - 2.7
  - 3.4
  - 3.5
  - sudo apt-get -qq update
  - sudo apt-get install -y libproj-dev proj-bin proj-data
  - sudo apt-get install -y python-pyproj
  - sudo apt-get install -y libc6
  - wget 
  - sudo dpkg -i libproj9_4.9.2-2_amd64.deb 
  - wget
  - sudo dpkg -i libproj-dev_4.9.2-2_amd64.deb
  - pip install -r requirements.txt
  - python install

Kuramoto in Stan (PyStan)

tl;dr: Project on github:

Stan is a programming language focused on probabilistic computations. Although it’s a rather recent language it’s been nicely received in data science/Bayesian community for its focus on designing model, rather than programming and getting stuck with computational details. Depending what is your background you might have heard about it either from Facebook and its Prophet project or as a native implementation for Hamiltonian Monte Carlo (HMC) and its optimised variation – No-U-Turn Sampler for HMC (NUTS).

For ease of including models in other programmes there are some interfaces/wrappers available, including RStan and PyStan.

Stan is not the easiest language to go through. Currently there are about 600 pages of documentation and then separate “documentations” for wrappers, which for PyStan isn’t very helpful. Obviously there’s no need for reading all of it, but it took me a while to actually understand what goes where an why. The reward, however, is very satisfying.

Since I’ve written a bit about Kuramoto models on this blog, it’s consistent if I share its implementation in Stan as well. Pystan-kuramoto project uses PyStan, but the actual Stan code is platform independent.

Currently there are two implementations (couldn’t come up with better names):

  • All-to-all, where Kuramoto model fit is performed to phase vector \vec{\Phi} with distinct oscillators, i.e. \vec{\Phi}_{N}(t) = \{\phi_1(t), \phi_2(t), \dots, \phi_N(t)\}.
  • All-to-one, where the model fits superposition (sum) of oscillators to phase time series \Phi_{N}(t) = \sum_{n=1}^{N} \phi_n(t).

In all honesty, this seems to be rather efficient. Optimisation is performed using penalized maximum likelihood estimation with optimization (L-BFGS). Before using it I wasn’t very knowledgeable in the algorithm, but now I’m simply amazed with its speed and accuracy. Thumbs up.

Update: Particle Swarm Optimisation in Python

It came to my attention that my PSO for Python is actually quite popular. Only after few people contacted me I’ve noticed that the public version was not the same that I’ve been using. It wasn’t bad, but definitely not as good. Thus, obviously, I’ve updated the version and cleaned it a bit.

Update programme is available from my github or from Code subpage.

What’s the difference? There are few.
– Initial values, unless provided, are psuedo-random generated using Halton sequence. This prevents from artificial clustering.
– Perturbation in form of a Gaussian noise should mitigate false local minima by forcing particles to search surrounding area.
– Added max_repetition threshold, which states the maximum number of obtaining the same optimum value. Upon reaching threshold program finishes.
– General improvement in performance.
– Improved usage documentation within the file.
– Program is now compatible with Python3.

Feel free to request any features.

There is an idea of adding progressive save, which would quit, resume and modify parameters at any point of computation.

Common dtype for NumPy arrays

Recent challenge I had was to convert two given numpy arrays such it’ll be possible to have them in common type without losing information. One can follow this link for stackoverflow entry.

Initially I thought about comparing dtypes, because in Python 2 it is allowed to do something like:

>>> np.float32 < np.float64
>>> np.int32 < np.float32
>>> np.int64 > np.float16

… which kind of makes sense(?). Well, except that int64 vs float16 which looks suspicious. It turns out that these are type objects and Python is simply comparing their locations in memory[Citation needed] and that is obviously not reliable. Actually, in Python 3 such comparison is forbidden and it fails.

As the answer to my stackoverflow question suggests one could try to use dtype.kind and dtype.itemsize to create own ordering. This is not difficult, but it should contain all types, such as (unsigned) ints, floats, bools, complex…

Fortunately, for my purposes, there is NumPy’s method which does what I want, i.e. numpy.find_common_type. It determines common type following standard coercion rules. With a help of this function my common conversion looks like:

import numpy as np

def common_dtype(x, y):
    dtype = np.find_common_type([x.dtype, y.dtype], [])
    if x.dtype != dtype: x = x.astype(dtype)
    if y.dtype != dtype: y = y.astype(dtype)

    return x, y

What to expect from such function?
float32 for float16 and float32
float32 for int16 and float32
int64 for int32 and uint16
float64 for int32 and float16

Behaves as it should.

Halton sequence in Python

Sometimes when we ask for random we don’t actually mean random by just random. Yes, pseudo-random.

Consider unitary distribution with ranges 0 and 1. Say you want to draw 5 samples. Selecting them at random would mean that we might end up with set of {0, 0.1, 0.02, 0.09, 0.01} or {0.11, 0.99, 0.09, 0.91, 0.01}. Yes, these values don’t seem very random, but that’s the thing about randomness, that it randomly can seem to not be random.

Depending on the purpose of our selection, these values might be just OK. After all, they came from that distribution. However, if our goal is to reconstruct the distribution, or extract information about with limited number of samples, it is often better to draw those samples in pseudo-random way. For example, in accordance to van der Corput sequences for 1D distributions or its generalized version Halton sequence.

The best practice for sampling N dimensional distribution is to use different prime numbers for each dimension. For example, when I need to sample a 5 dimensional unitary distribution, or search space, I will use bases of (5, 7, 11, 13, 17). This is to prevent periodic visits of the same position.

In case you are wondering what’s the difference between actual random and pseudo-random, here is a gist:

Both are good, but the actual random can produce many empty holes. What we like to have is a fair representation of all areas of our search space.

Thus, without further ado, here are some code snippets.

This is a definition of my prime generating generator:

def next_prime():
    def is_prime(num):
        "Checks if num is a prime value"
        for i in range(2,int(num**0.5)+1):
            if(num % i)==0: return False
        return True

    prime = 3
        if is_prime(prime):
            yield prime
        prime += 2

As for Halton sequence, as mentioned before it uses van der Corput sequence. Again, here is the definition:

def vdc(n, base=2):
    vdc, denom = 0, 1
    while n:
        denom *= base
        n, remainder = divmod(n, base)
        vdc += remainder/float(denom)
    return vdc

And finally, definition for the Halton sequence:

def halton_sequence(size, dim):
    seq = []
    primeGen = next_prime()
    for d in range(dim):
        base = next(primeGen)
        seq.append([vdc(i, base) for i in range(size)])
    return seq

To use all of this simply call halton_sequence(size, dim). These variables refer to the number of size of sample poll and the dimension of your problem. So if one wants to sample 3 dimensional space with 10 samples each it would be called as below. (Notice: first dimension has prime value 5, then it’s 7, 11, and following prime values.)

>>> halton_sequence(10, 3)
[0, 0.2, 0.4, 0.6, 0.8, 0.04, 0.24000000000000002, 0.44, 0.64, 0.8400000000000001],
[0, 0.14285714285714285, 0.2857142857142857, 0.42857142857142855, 0.5714285714285714, 0.7142857142857143, 0.8571428571428571, 0.02040816326530612, 0.16326530612244897, 0.30612244897959184],
[0, 0.09090909090909091, 0.18181818181818182, 0.2727272727272727, 0.36363636363636365, 0.45454545454545453, 0.5454545454545454, 0.6363636363636364, 0.7272727272727273, 0.8181818181818182]

Objective Empirical Mode Decomposition metric

This is a summary. If you wish to read full paper, please visit About me section and click on appropriate link, or follow one of these: [1] [2]. Python code for metrics calculation is in Code section.


Part of my research has been concentrated on empirical mode decomposition (EMD). It’s quite nice decomposition method and can generate interesting results. Nevertheless, due to its empirically, it’s quite hard to determine what those results represent. Moreover, they will change if we change some parameters of the decomposition; parameters such as stopping criteria threshold, interpolation spline technique or even definition of extremum. Having two different decompositions of the same signal, one needs to decide which one is better. What researchers have done most often is to visually compare obtained decompositions and, based on their expertise, decide which one is better. Naturally this is not an ideal solution. Despite all our good (objective) intentions, observer’s bias is unavoidable. We have tried to mitigate this problem by referring to the original concept of EMD and intrinsic mode functions (IMFs). We have came up with metrics, which are based on IMF’s idealised properties: 1) an average frequency decreases with increase of IMF’s index, 2) distinct instantaneous frequency for each IMF and 3) disjoint Fourier spectra support for IMF’s amplitude and phase.

Metric M1

This metric is based on the empirical evidence for the decrease of average instantaneous frequency, simply referred to as the average frequency, with the increase of IMF’s index number. Although the order with which the IMF components are construced in general corresponds to the decreasing IMF average frequencies, there are instances when the instantaneous frequencies cross over between the components. Since it has been claimed that each IMF has a significant instantaneous frequency, such behaviour is unwelcome and hence it is penalised by this metric. Penalties are introduced when instantaneous frequency of an IMF with lower number (high average frequency) is smaller than instantaneous frequency of any IMF with higher number. The penalty value is proportional to the length of the crossing over effect, i.e.

(1) \qquad \qquad \displaystyle m^{\text{I}}_{j} = \sum_{k=j+1}^{N} \int_{\dot{\phi}_{k}>\dot{\phi}_{j}} \frac{dt}{T},

where k, j are IMFs’ indices. Formula (1) compares functions of instantaneous frequencies of two IMFs and returns the total duration, over which the IMF with higher index has lower frequency. The crossing over effect has been presented in Figure 1. It shows instantaneous frequency of each IMF as a function of time. Coloured regions indicate where the crossing over occurred. Summing over all pairs of IMFs allows us to assess results for particular EMD. Metric value for the whole set is given as

(2) \qquad \qquad \displaystyle M_{\text{I}} = \sum_{j=1}^{N} m^{\text{I}}_{j}, \qquad M_{\text{I}} \in \left[0,\frac{N(N-1)}{2}\right],

According to this measure, the best IMF set is the one, for which M_{\text{I}}=0, i.e. there are no crossing-over parts in instantaneous frequency domain. The worst case, M_{\text{I}} = N(N-1)/2, is when the order of all IMFs is reversed, i.e. when the first IMF is under all others and the last IMF is above all others. However, this theoretical upper limit is very unlikely and the corresponding IMF set could be still considered upon index reversal.

Fig. 1. Plot of instantaneous frequency for each IMF as a function of time. Coloured regions indicates where the frequency crossing over occurs. Metric M_{\text{I}} penalises based on the length of those regions.

Fig. 1. Plot of instantaneous frequency for each IMF as a function of time. Coloured regions indicates where the frequency crossing over occurs. Metric M_{\text{I}} penalises based on the length of those regions.

Metric M2

Another validating measure is based on the Bedrosian theorem. It refers to the necessary conditions for the signal’s amplitude, a(t), and phase, \phi(t), to be exactly recoverable using Hilbert transformation. For signal s(t) = a(t) \cos\left( \phi(t) \right) these conditions require that the support of amplitude and phase’s Fourier spectra are not overlapping. In other words, if the amplitude function, f(t) = a(t), and the phase function, g(t) = \cos\left(\phi(t)\right), then

(3) \qquad \qquad \displaystyle \left\langle \mathcal{F}(f), \mathcal{F}(g) \right\rangle = 0 ,

where \mathcal{F} represents the Fourier transform and \langle h(t), l(t) \rangle = \int h^*(t) l(t) dt is the dot product. Here it is assumed, that all functions belong to L^2 normed space.

Let F_{j}^{a} = \left| \mathcal{F} \left( a_{j}(t) \right) \right| and F_{j}^{\phi} = \left| \mathcal{F} \left( \cos\left(\phi_{j}(t)\right) \right) \right| be absolute values of Fourier transforms of a_{j} and \cos(\phi_{j}), respectively, for j IMF. Their normalised measure of overlapping spectra is given as

(4) \qquad \qquad  \displaystyle m_{j}^{\text{II}} = \frac{\left\langle F_{j}^{a}, F_{j}^{\phi} \right\rangle} 		{\sqrt{\| F_{j}^{a} \| \| F_{j}^{\phi} \|}} ,

where \| h \| = \langle h, h \rangle is a norm of a function h. Assumptions of Bedrosian’s theorem are completely fulfilled when spectra are not overlapping, thus minimum value of m^{\text{II}}_j is zero. This allows for different definitions of a metric for the whole IMF set, depending on application of EMD. First proposition is based on a biggest value of overlap m_{j} in considered decomposition, i.e.

(5) \qquad \qquad M_{\text{II}} = \max_{j} \{  m_{j}^{\text{II}}\}, \qquad M_{\text{II}} \in [0,1],

and the second refers to the cumulative overlap within the decomposed set, i.e.

(6) \qquad \qquad M_{\text{III}} = \sum_{j=1}^{N} m_{j}^{\text{II}}, \qquad M_{\text{III}} \in [0, N],

where in both cases N is the number of extracted IMFs. Zero for both metrics implies no overlap between amplitude’s and phase’s spectra in any of IMFs.

Visual interpretation of the validation measure (4) is presented in Figure 2. It shows example Fourier spectra of slowly changing amplitude (dashed line) and higher frequency phase (solid line). Gray-striped region indicates overlapping area of both spectra. Proposed value is a measure of ratio of the overlapping area to the total area under both functions.

Since metric M_{\text{III}} is a sum over all IMFs, it also contains the one which maximises value m^{\text{II}}_j (Eq. (4)). This means that M_{\text{III}} for each decomposition has to be at least equal or higher than M_{\text{II}}.

Fig. 2.  Example of comparing Fourier spectrum of the amplitude with the spectrum of phase. Gray-striped area indicates where two functions overlap.

Fig. 2. Example of comparing Fourier spectrum of the amplitude with the
spectrum of phase. Gray-striped area indicates where two functions overlap.

Application of the validation measures

Each of the presented metrics highlights different properties of the decomposition. Computing all three values is equivalent to finding a point M=(M_{\text{I}}, M_{\text{II}},M_{\text{III}}) in a 3-dimensional space, where each dimension relates to the specific metric. The best decomposition corresponds to the minimum over all the metrics, i.e. M=(0,0,0), and the worst decomposition to M=(\frac{N(N-1)}{2},1,N). For any other point one has to decide on the importance, or weight, for each of the proposed metrics, on the basis of the problem being considered. Although the distance in the M-space is not strictly defined, it can be any L^p norm, thus we suggest using the weighted Manhattah metric, i.e.
(7) \qquad \qquad \| M \| = w_1 M_{\text{I}} + w_2 M_{\text{II}} + w_3 M_{\text{III}} ,
where w_i are respective weights. Their values should reflect the relative importance of features one is concentrated on.