Objective Empirical Mode Decomposition metric

This is a summary. If you wish to read full paper, please visit About me section and click on appropriate link, or follow one of these: [1] [2]. Python code for metrics calculation is in Code section.

 

Part of my research has been concentrated on empirical mode decomposition (EMD). It’s quite nice decomposition method and can generate interesting results. Nevertheless, due to its empirically, it’s quite hard to determine what those results represent. Moreover, they will change if we change some parameters of the decomposition; parameters such as stopping criteria threshold, interpolation spline technique or even definition of extremum. Having two different decompositions of the same signal, one needs to decide which one is better. What researchers have done most often is to visually compare obtained decompositions and, based on their expertise, decide which one is better. Naturally this is not an ideal solution. Despite all our good (objective) intentions, observer’s bias is unavoidable. We have tried to mitigate this problem by referring to the original concept of EMD and intrinsic mode functions (IMFs). We have came up with metrics, which are based on IMF’s idealised properties: 1) an average frequency decreases with increase of IMF’s index, 2) distinct instantaneous frequency for each IMF and 3) disjoint Fourier spectra support for IMF’s amplitude and phase.

Metric M1

This metric is based on the empirical evidence for the decrease of average instantaneous frequency, simply referred to as the average frequency, with the increase of IMF’s index number. Although the order with which the IMF components are construced in general corresponds to the decreasing IMF average frequencies, there are instances when the instantaneous frequencies cross over between the components. Since it has been claimed that each IMF has a significant instantaneous frequency, such behaviour is unwelcome and hence it is penalised by this metric. Penalties are introduced when instantaneous frequency of an IMF with lower number (high average frequency) is smaller than instantaneous frequency of any IMF with higher number. The penalty value is proportional to the length of the crossing over effect, i.e.

(1) \qquad \qquad \displaystyle m^{\text{I}}_{j} = \sum_{k=j+1}^{N} \int_{\dot{\phi}_{k}>\dot{\phi}_{j}} \frac{dt}{T},

where k, j are IMFs’ indices. Formula (1) compares functions of instantaneous frequencies of two IMFs and returns the total duration, over which the IMF with higher index has lower frequency. The crossing over effect has been presented in Figure 1. It shows instantaneous frequency of each IMF as a function of time. Coloured regions indicate where the crossing over occurred. Summing over all pairs of IMFs allows us to assess results for particular EMD. Metric value for the whole set is given as

(2) \qquad \qquad \displaystyle M_{\text{I}} = \sum_{j=1}^{N} m^{\text{I}}_{j}, \qquad M_{\text{I}} \in \left[0,\frac{N(N-1)}{2}\right],

According to this measure, the best IMF set is the one, for which M_{\text{I}}=0, i.e. there are no crossing-over parts in instantaneous frequency domain. The worst case, M_{\text{I}} = N(N-1)/2, is when the order of all IMFs is reversed, i.e. when the first IMF is under all others and the last IMF is above all others. However, this theoretical upper limit is very unlikely and the corresponding IMF set could be still considered upon index reversal.

Fig. 1. Plot of instantaneous frequency for each IMF as a function of time. Coloured regions indicates where the frequency crossing over occurs. Metric M_{\text{I}} penalises based on the length of those regions.

Fig. 1. Plot of instantaneous frequency for each IMF as a function of time. Coloured regions indicates where the frequency crossing over occurs. Metric M_{\text{I}} penalises based on the length of those regions.

Metric M2

Another validating measure is based on the Bedrosian theorem. It refers to the necessary conditions for the signal’s amplitude, a(t), and phase, \phi(t), to be exactly recoverable using Hilbert transformation. For signal s(t) = a(t) \cos\left( \phi(t) \right) these conditions require that the support of amplitude and phase’s Fourier spectra are not overlapping. In other words, if the amplitude function, f(t) = a(t), and the phase function, g(t) = \cos\left(\phi(t)\right), then

(3) \qquad \qquad \displaystyle \left\langle \mathcal{F}(f), \mathcal{F}(g) \right\rangle = 0 ,

where \mathcal{F} represents the Fourier transform and \langle h(t), l(t) \rangle = \int h^*(t) l(t) dt is the dot product. Here it is assumed, that all functions belong to L^2 normed space.

Let F_{j}^{a} = \left| \mathcal{F} \left( a_{j}(t) \right) \right| and F_{j}^{\phi} = \left| \mathcal{F} \left( \cos\left(\phi_{j}(t)\right) \right) \right| be absolute values of Fourier transforms of a_{j} and \cos(\phi_{j}), respectively, for j IMF. Their normalised measure of overlapping spectra is given as

(4) \qquad \qquad  \displaystyle m_{j}^{\text{II}} = \frac{\left\langle F_{j}^{a}, F_{j}^{\phi} \right\rangle} 		{\sqrt{\| F_{j}^{a} \| \| F_{j}^{\phi} \|}} ,

where \| h \| = \langle h, h \rangle is a norm of a function h. Assumptions of Bedrosian’s theorem are completely fulfilled when spectra are not overlapping, thus minimum value of m^{\text{II}}_j is zero. This allows for different definitions of a metric for the whole IMF set, depending on application of EMD. First proposition is based on a biggest value of overlap m_{j} in considered decomposition, i.e.

(5) \qquad \qquad M_{\text{II}} = \max_{j} \{  m_{j}^{\text{II}}\}, \qquad M_{\text{II}} \in [0,1],

and the second refers to the cumulative overlap within the decomposed set, i.e.

(6) \qquad \qquad M_{\text{III}} = \sum_{j=1}^{N} m_{j}^{\text{II}}, \qquad M_{\text{III}} \in [0, N],

where in both cases N is the number of extracted IMFs. Zero for both metrics implies no overlap between amplitude’s and phase’s spectra in any of IMFs.

Visual interpretation of the validation measure (4) is presented in Figure 2. It shows example Fourier spectra of slowly changing amplitude (dashed line) and higher frequency phase (solid line). Gray-striped region indicates overlapping area of both spectra. Proposed value is a measure of ratio of the overlapping area to the total area under both functions.

Since metric M_{\text{III}} is a sum over all IMFs, it also contains the one which maximises value m^{\text{II}}_j (Eq. (4)). This means that M_{\text{III}} for each decomposition has to be at least equal or higher than M_{\text{II}}.

Fig. 2.  Example of comparing Fourier spectrum of the amplitude with the spectrum of phase. Gray-striped area indicates where two functions overlap.

Fig. 2. Example of comparing Fourier spectrum of the amplitude with the
spectrum of phase. Gray-striped area indicates where two functions overlap.

Application of the validation measures

Each of the presented metrics highlights different properties of the decomposition. Computing all three values is equivalent to finding a point M=(M_{\text{I}}, M_{\text{II}},M_{\text{III}}) in a 3-dimensional space, where each dimension relates to the specific metric. The best decomposition corresponds to the minimum over all the metrics, i.e. M=(0,0,0), and the worst decomposition to M=(\frac{N(N-1)}{2},1,N). For any other point one has to decide on the importance, or weight, for each of the proposed metrics, on the basis of the problem being considered. Although the distance in the M-space is not strictly defined, it can be any L^p norm, thus we suggest using the weighted Manhattah metric, i.e.
(7) \qquad \qquad \| M \| = w_1 M_{\text{I}} + w_2 M_{\text{II}} + w_3 M_{\text{III}} ,
where w_i are respective weights. Their values should reflect the relative importance of features one is concentrated on.

Advertisements

Data Buffer (Python)

For the past few years I’ve been teaching few modules for Master students, such as programming or signal processing. In most of them, one of the main and important tasks is to write a data buffer. Usual requirement is to be able to return last N samples or T seconds. However, conceptually most of students understands, implementing this seems to be quite difficult. But it is not. Unfortunately their first line of action, rather than think themselves, is to search for the solution on the Internet. There are many attempts, better or worse, but since they are looking for it, I could also provide with my own solution. Hopefully it helps others as well.

Below is Data Buffer class that saves size of last data, update via update method. First implementation uses strictly Python’s native lists, whereas the other uses NumPy arrays.

Python’s list:

class DataBuffer(object):
    def __init__(self, size):
        " Creates 2D list buffer of tuple 'size' size. "

        self.size = size
        self.buffer =  [ [0]*size[1] for i in range(size[0]) ]
    
    def update(self, data):
        " Updates buffer with input data."
        self.buffer = [self.buffer[i][1:] + [data[i]] for i in range(self.size[0])]
    
    def getData(self, d1=[0,-1], d2=[0,-1]):
        " Returns buffer in given ranges (default: whole buffer)."

        # Converts negative indices into absolute value
        if d1[0]<0: d1[0] = self.size[0] + int(d1[0]) + 1
        if d1[1]<0: d1[1] = self.size[0] + int(d1[1]) + 1

        if d2[0]<0: d2[0] = self.size[1] + int(d2[0]) + 1
        if d2[1]<0: d2[1] = self.size[1] + int(d2[1]) + 1

        # Make sure that indices are in increasing order
        d1 = [min(d1), max(d1)]
        d2 = [min(d2), max(d2)]
        
        _buf = [ self.buffer[i][d2[0]:d2[1]] for i in range(d1[0],d1[1])]
        
        return _buf
    

Numpy’s arrays:

class DataBuffer(object):
    def __init__(self, size):
        "Creates buffer of 'size' size."
        
        self.size = size
        self.buffer = np.zeros( size )
        
    def update(self, data):
        "Updates buffer with data."
        
        self.buffer[:, :-1] = self.buffer[:, 1:]
        self.buffer[:, -1] = data
    
    def getData(self, d1=[0,-1], d2=[0,-1]):
        "Prints on screen content of buffer."

        d1 = np.array(d1)
        d1[d1<;0] += self.size[0] + 1
        d1.sort()
        
        d2 = np.array(d2)
        d2[d2<0] += self.size[1] + 1
        d2.sort()
        
        return self.buffer[d1[0]:d1[1], d2[0]:d2[1]]

Usage example:

if __name__ == "__main__"
    
    import numpy as np
    
    import random
    r = random.randint
    
    size = (5,10)
    B = DataBuffer( size )
    
    for i in range(20):
        data = [r( 0, 10) for i in xrange(size[0])]
        B.update( data)
        print


    print B.getData([0,2], [1,-2])
    print B.getData([2,-1], [1,6])

EMD on audio – wav and recurrance plots

Update:
Code and direct access to examples can be found on my GitHub reccurrence-plot.

There was an idea that has been bothering me for past few months, but due to time restrictions and many commitments I just couldn’t do it. Then came the conference quite far away and few transit hour at an airport. And now, few words about it.

Question: What do you get when you perform EMD on an audio?

Sound is nothing else than a vibration of the air in area where we happen to be. Human ear can register vibration of frequency roughly between 20 Hz and 20 kHz. In order to fulfil Nyquist sampling (in essence, the fastest frequency one can detect is up-down-up-down…, i.e. half the sampling rate) we need to record with at least 40 kHz sampling rate. Taking into account some additional technical difficulties and just in case sounds are typically recorded at sampling rate around 44 kHz. But there is also stereo, and… well, my point is, there is a lot of data in 1 second of sounds. For most common methods of signal analysing, say short-time Fourier transform or filters, processing is relatively fast. However, in case of EMD it is not that good. The longer the signal the more extrema in it leading to extending time of spline computation (e.g. cubic needs to know all extrema) and increase number of iterations. Needless to say, decomposing 3 min song gives very small ratio of (rewards significance) / (time spent), which is the reason I’ll show you result for few seconds sounds.

First sound (14240 samples) to decompose is disconnect sound. Below is figure where in red is the original signal and following rows are its IMFs (only four first and residue). All plots were normalised so that the biggest deflection is 1.

Signal (red) of disconnect sound and its first five IMFs.

Signal (red) of disconnect sound and its first five IMFs.

The typical result, nothing new. However, interesting is how those IMFs listen. Their audio version is available here: first, second, third and fourth IMFs. How to interpret them? Allow me not to interpret them. Thought they are quite ‘noisy’, I think they sound rather… interesting. The nicer part of this are the recurrence plots.

Shortly, it is an image describe by a function
I(t, \tau) = \left\{ \begin{array}{rcl} 0 & \mbox{for} & |x(t)-x(\tau)| < \epsilon \\ 1 & \mbox{for} & |x(t)-x(\tau)| \geq \epsilon \\ \end{array} \right. ,
meaning that, if two values within signal are closer than \epsilon then we draw a dot. I have modified this function slightly to increase the dynamics and instead of binary output, it gives few more values. Modified function is:
I_N(t, \tau) = \left\{ \begin{array}{lcr} n & \mbox{for} & n \epsilon \leq |x(t)-x(\tau)| < (n+1) \epsilon \\ N & \mbox{for} & |x(t)-x(\tau)| \geq N \epsilon \\ \end{array} \right.,
where \mathbb{N} \ni n \in [0, N-1].
Here is a snippet on how to generate plots using Python:


import pylab as plt
import numpy as np

def rec_plot(s, eps=None, steps=None):
    if eps==None: eps=0.01
    if steps==None: steps=10
    N = s.size
    S = np.repeat(s[None,:], N, axis=0)
    Z = np.floor(np.abs(S-S.T)/eps)
    Z[Z>steps] = steps

    return Z

s = np.random.random(1000)
plt.imshow(rec_plot(s))
plt.show()

Due to the size of an image and readability, here are only smaller plots (5000 points starting from 2000th point). Below are plots for second IMF, fifth IMF and residue from subtracting five fist IMFs. With slower frequencies one can notice quite mesmerising patterns. First three IMFs are abundant in high frequencies, which makes it fast varying image. The rest, however, gracefully changes it colours in nice patterns.

Recurrence plot for second IMF of disconnected sound.

Recurrence plot for second IMF of disconnected sound.

Recurrence plot for fifth IMF of disconnected sound.

Recurrence plot for fifth IMF of disconnected sound.

Recurrence plot for all IMFs above fifth of disconnected sound.

Recurrence plot for all IMFs above fifth of disconnected sound.

Same analysis was performed for chainsaw sound. Below are its EMD decomposition and few recurrence plots. Signal has 21381 samples, which makes it much longer to analyse. Again, for recurrence plots only time series of 3000 points were displayed. I must admit that I feel mesmerised by those patterns. Here are additionally wav files for first, second, third, and fourth IMFs.

Signal (red) of chainsaw sound and its first five IMFs.

Signal (red) of chainsaw sound and its first five IMFs.

Recurrence plot for second IMF of chainsaw sound.

Recurrence plot for second IMF of chainsaw sound.

Recurrence plot for fifth IMF of chainsaw sound.

Recurrence plot for fifth IMF of chainsaw sound.

Recurrence plot for all IMFs above fifth of chainsaw sound.

Recurrence plot for all IMFs above fifth of chainsaw sound.

Bayesian inference in Kuramoto model

For a while now, I’ve been involved in Kuramoto models and adjusting them to data. Recently I’ve stumbled upon using Bayesian inference on predicting parameters in time-evolving dynamics. Team from Physics Department at University of Lancaster has produced many papers, but to provide with some reference it might be worth to look at [1].

Technique refers to general phase \phi dynamics in oscillator, i.e.
\dot\phi_i = \omega_i + f_i (\phi_i) + g_i (\phi_i, \phi_j) + \xi_i ,
where \omega_i, f_i(\phi_i) and g_i(\phi_i, \phi_j) are intrinsic frequency, self coupling and coupling with other oscillators, respectively. Term \xi_i refers to noise and although authors claim that it can be any type of noise calculations are performed for white Gaussian type. I don’t really want to go much into details, because on first sight it might look complicated. Just to point out few steps that are necessary to introduce my examples:

  1. Rewrite model in form of \dot\phi_i = \sum_{k=-K}^{K} C_{k}^{(i)} P_{i,k} (\Phi) + \xi_i(t), where c is parameters vector, P provides significant terms (for example via Fourier decomposition) for our model and \Phi is a vector of all oscillators’ phases (at some moment).
  2. Calculate diagonal values of Jacobian matrix, i.e. \frac{\partial P_{i,k}}{\partial \phi_{i}}.
  3. Set a priori probabilities for parameters C vector and its covariance \Sigma (or concentration \Xi = \Sigma^{-1}).

Then the problem is reduced to finding maximum of minus log-likelihood function S. Authors also provide the exact formulas and algorithm, which find the extremum. Tutorial on applying their algorithm can be found in [2].

Based on MatLab code provided be the authors, and available on their webpage [3], I have written my own program in Python (code available in Code section or here ). As an experiment I have used simple Kuramoto model with sinusoidal coupling between phases
\dot\phi_1 = 28 - 0.3 \sin( \phi_1 - \phi_2) - 0.1 \sin( \phi_1 - \phi_3 ) + 0.0 \sin( \phi_2 - \phi_3),
\dot\phi_2 = 19 + 0.3 \sin( \phi_1 - \phi_2) + 0.0 \sin( \phi_1 - \phi_3 ) - 0.9 \sin( \phi_2 - \phi_3),
\dot\phi_3 = 11 + 0.0 \sin( \phi_1 - \phi_2) + 0.1 \sin( \phi_1 - \phi_3 ) + 0.9 \sin( \phi_2 - \phi_3),
which means there are 12 parameters – three intrinsic frequencies (\omega_i = \{28, 19, 11\}) and 3×3 K matrix representing coupling between pairs of oscillators. To each oscillator small Gaussian noise was added (mean 0 and standard deviation 0.01). Signals were generated for t \in [0, 40] with sampling step dt = 0.001. Furthermore, analysis was performed on 8 s segments with 2 s step. Figure 1 shows instantaneous frequencies (\dot\phi_i) for each oscillator. Figure 2 presents calculated parameters for each segment (thus time dependency). To be honest, I am quite surprised how good this technique works. I have performed more experiments, but to keep it short I am posting just one. There are some fluctuations in obtained values, but those changes are very small and most likely, in practical sense, negligible.

Fig. 1. Instantaneous frequencies of all oscillators in experiment.

Fig. 1. Instantaneous frequencies of all oscillators in experiment.

Fig. 2. Extracted parameters for presented dynamical system. First column shows values on intrinsic frequencies and the rest respective to title coupling value. Horizontal black lines indicate what are expected values.

Fig. 2. Extracted parameters for presented dynamical system. First column shows values on intrinsic frequencies and the rest respective to title coupling value. Horizontal black lines indicate what are expected values.

[1] A. Duggento, T. Stankovski, P. V. E. McClintock, and A. Stefanovska, “Dynamical Bayesian inference of time-evolving interactions: From a pair of coupled oscillators to networks of oscillators,” Phys. Rev. E – Stat. Nonlinear, Soft Matter Phys., vol. 86, no. 6, pp. 1–16, 2012.
[2] T. Stankovski, A. Duggento, P. V. E. McClintock, and A. Stefanovska, “A tutorial on time-evolving dynamical Bayesian inference,” Eur. Phys. J. Spec. Top., vol. 223, no. 13, pp. 2685–2703, 2014.
[3] Nonlinear Biomedical Physics, Lancaster University http://py-biomedical.lancaster.ac.uk/.

Particle Swarm Optimisation in Python

[Updated version available. See newest post.]

tl;dr: For Python PSO code head to codes subpage.

One might expect, that currently there is everything on the internet and it’s only a matter of time to find something of an interest. At least that’s what I expected. Unfortunately, sometimes finding the right thing and adjusting it to your own problem can take much more time that doing it yourself. This is a rule about which I often forget. And it happened again. I was suggested to try Particle Swarm Optimisation (PSO) for my problem. Since it has been some time since the introduction of that method, and since Python is a quite popular language, I expected that finding code to just do that wouldn’t be a problem. True, it wasn’t hard for few entries, but either software used some obsolete packages, or it was difficult to install it, or the instructions weren’t clearly (to me) written. Thus, after about 2 days of trying to figure out everything and apply it to my problem, I simply gave up. That is, gave up searching and wrote my own code. In about 2 h.

What is Particle Swarm Optimisation (PSO)? Wikipedia gives quite comprehensive explanation, with a long list of references on this subject. However, in just few words: PSO is a metaheuristic, meaning that it does not require any a priori knowledge on gradient or on space. It is an optimisation method with many units (particles) looking independently for the best solution. Each particle updates its position (set of parameters) depending on its velocity vector, the best position it found during search and the best position discovered by any particle in swarm. Algorithm makes all particles go towards the best known solution, however, it does not guarantee that the solution will be globally optimal.

Algorithm depends on: initial positions \vec{P}_0, initial velocities \vec{V}_0, acceleration ratio \omega and influence variable of the best global \phi_g and the best local \phi_l parameters. One also has to declare how many particles will be used and how many generations is the limit (or what is acceptable error of solution).

My program (shared in a Code section) can be executed in the following way:

pso = PSO( minProblem, bounds)
bestPos, bestFit = pso.optimize()

where minProblem is a function to be minimised (for an array of N parameters), and bounds is 2D (2 x N values) array of minimum and maximum boundary values. The bestPos is the best set of parameters and bestFit is smallest obtained value for the problem.

A small example provided with the code fits a curve to noisy data. Input signal was
S(t) = 4 t \cos( 4 \pi t^2) \exp(-3 t^2) + U(-0.025,0.025)$, where U(a,b) is a random noise from unitary [a, b] distribution. Fitted function is F_{p}(t) = p_1 t \sin( p_2 \pi t^2 + p_3) \exp(- p_4 t^2), thus the expected values are P={4, 4, pi/2, 3}. See figure for fitted curve below. Obtained values are P = { 4.61487992 3.96829528 1.60951687 3.43047877} which are not far off. I am aware that my example is a bit to simple for such powerful search technique, but it is just a quick example.

Curve fitting with PSO optimisation method. Blue and red lines are for input data and reconstruction function respectively.

Curve fitting with PSO optimisation method. Blue and red lines are for input data and reconstruction function respectively.

Plots in matplotlib pylab

When trying to explain something to someone it is usually much better to actually present it in a graphical form. For this, I’m usually using Python and its module Matplotlib. It is highly customizable package with lots and lots different display techniques both for 2D and 3D images. Great feature is that one can also create animations to include in presentations.

One of the classes in Matplotlib is called PyLab. It is meant to have the same functionality as Matlab’s plotting functions. For my everyday activity that’s typical more than enough. I’m not going to discuss package in details. I just want to mention few nice options when displaying and saving plot to a file.

As a default, pylab leaves lots of whitespace for axis, titles and borders. On one hand this is good, because it makes everything look really nice, but on the other, we are losing some space for actual graphs. If you are going to discuss the plot while presenting, sometimes you can remove “unnecessary” content.

Supposedly one imports pylab with

import pylab as py  # including pylab lib

To remove ticks from axis:

frame = py.gca()  # returns (gets) current axis
frame.axes.get_xaxis().set_ticks([]) # removes all x axis ticks
frame.axes.get_yaxis().set_ticks([]) # removes all y axis ticks

To change the distance between subplots and margins:

py.subplots_adjust(left=leftMargine, bottom=bottomMargine,
                    right=rightMargine, top=topMargine,
                    wspace=widthSpace, hspace=highSpace)

Parameters left, right, bottom and top refer to location of the grid corners. Wspace and hspace are respectively horizontal and vertical spaces between subplots. Pylab can actually do it by itself, but the result is not always the best possible. Command to do it automatically is

py.tight_layout()

On top of that, one can also remove unnecessary white space in output when saving to file

py.savefig(filename, bbox_inches='tight')

where the ‘tight’ indicates cropping margins.

Here is a short snippet where those functions are applied:

import pylab as py
import numpy as np

N = 500
t = np.linspace(0, 1, N)
s1 = np.random.random(N)
s2 = np.random.random(N)
s3 = np.random.random(N)

py.subplot(311)
py.plot(t, s1)
py.ylabel('R1')
py.gca().axes.get_xaxis().set_ticks([])

py.subplot(312)
py.plot(t, s2)
py.ylabel('R2')
py.gca().axes.get_xaxis().set_ticks([])

py.subplot(313)
py.plot(t, s3)
py.ylabel('R3')
py.xlabel('Time')

py.savefig('output', dpi=150, bbox_inches='tight')

output