Complete Ensemble EMD with Adaptive Noise (CEEMDAN) in Python

CEEMDAN is available in Python through PyEMD.

There are as many empirical mode decomposition (EMD) variations as many teams are working on it. Everyone notices that in general EMD is very helpful method, yet, there’s room for improvement. Few years back I have stopped doing modifications myself in exchange for working on mathematically sound model of coupled oscillator. Nevertheless, since I spent quite a lot of time on EMDs and have enjoy playing with it, from time to time something will catch my eye. This is what happened with Complete Ensemble EMD with Adaptive Noise (CEEMDAN).

As name suggests this is an expansion on the ensemble EMD, which was already covered. What’s the difference? In case of CEEMDAN we’re also decomposing our perturbation to the system, i.e. added noise. Method creates an ensemble of many perturbations, decomposes them using EMD and resulting IMFs are included to evaluate components of the input. I will refer to these components as cIMF. The actual algorithm was first proposed by Torres et. al [1], but shortly after an improvement in efficiency was proposed[2]. These updates refer mainly to noticing that for their purpose one doesn’t need to compute all IMFs and weight parameter can be progressively scaled as well. What exactly is this algorithm?

Let’s define operator IMFi(S) which returns ith IMF (pure EMD) of its input S and M(S) to provide local mean, i.e. M(S) = S – IMF1(S), then the algorithm is as follows:

  1. Create Gaussian noise ensemble W={wi}, where i\in[1..N], and decompose them using EMD.
  2. For input signal S calculate grand average of local means from signal perturbed by scaled noise first IMF
    R_{1} = \frac{1}{N}\sum_{1}^{N} M(S+ \beta_0 IMF_{1}(w^{i})).
  3. Assign first cIMF to be: C1 = S – R1.
  4. Compute R_{k}= \frac{1}{N} \sum_{i=1}^{N} M(R_{k-1} + \beta_{k-1} IMF_{k}(w^{i})).
  5. Calculate kth cIMF as Ck = Rk-1 – Rk.
  6. Iterate 4. and 5. until set of {S, Ck} fulfils EMD stopping criteria.

As it can be seen a family of parameters β has been included in the algorithm. These scalars refer to the amount of decomposed noise used to compute cIMFs. This is what the authors refer to as noise adaptive. These parameters are arbitrary, but it’s suggested in improved version [2] to set them as \beta_{k}= \epsilon_{0} \sigma(R_k), where σ is standard divination of argument and ε is another arbitrary parameter. Looking at point 4. one can see that for ith residue we are using ith IMF computed from noise. This is a problem, because EMD decomposes signal into a finite set of components and it can happen that there isn’t ithIMF. In this case authors are suggesting to assume component to be equal 0.

Advantage of this variation comes from the fact that created decomposition {Ci} fully reconstructs input. This is in contrast to EEMD which doesn’t guarantee such completeness. However, with CEEMDAN questions rise regarding the meaning of added scaled IMFs of noise. Augmenting signal with ensemble of pure noise creates perturbations of input without any distinguished direction. As it has been observed by Flandrin et al. [3] when decomposing white noise EMD acts as a dyadic filter bank. This means that extracted IMFs will have preferred structure and adding them to input will be similar to adding vector with random length but particular direction.

Regardless of all, CEEMDAN is definitely an interesting method. Just purely by the number of citations it seems that I’m not the only one thinking that. I’ve included it to my Python PyEMD package, so feel free to play with it and leave some feedback.

References

[1] Torres ME, Colominas MA, Schlotthauer G, Flandrin P. A complete ensemble empirical mode decomposition with adaptive noise. InAcoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on 2011 May 22 (pp. 4144-4147). IEEE.
[2] Colominas MA, Schlotthauer G, Torres ME. Improved complete ensemble EMD: A suitable tool for biomedical signal processing. Biomedical Signal Processing and Control.
[3] Flandrin P, Rilling G, Goncalves P. Empirical mode decomposition as a filter bank. IEEE signal processing letters.

Advertisements

Google wants back my microphone

My “writing” work currently goes somewhere else and have little motivation to write anything here. But, there’s something that only internet can help, whether that’s through actual help or simply transferring my annoyance.

In the past few days/weeks there has been some uproar about Facebook listening to us and later subtly suggesting products about which we talked with others. With these it’s hard to point who is objective, so I’ll paste link to web searches and I’m sure you’ll find some “evidence” – Google, Bing and DuckDuckGO. Let me also suggest Reply All podcast who recently had episode on this mysteriously called Is Facebook Spying on You?. Obviously Facebook denies all of this, but they confirm having lots of information about you whether that’s from you directly or from your friends.

Facebook and I are not in good terms for a long time. It’s more a fun social experiment rather than actual social platform. Since it isn’t on my phone there’s nothing to complain about, but there’s another omnipresent God – Google. Actually I have one of its branded phone with turned on Google Assistant, so it had to be there and had to listen to me.

Long story short, I removed microphone permissions from all Google services. Obviously some weren’t happy with this, but I can’t see how this should affect their usage. Except for Google Assistant or occasional input features, nothing should care, right? No. This is really tough break up as from time to time I’m getting vocal suggestions that are close to being commands. Google calls me to when it’s safe you’ll first need to use your phone’s screen and tap the notification then you can let the Google App access some things on your device. This is especially annoying when I’m listening to podcasts or music.

In the beginning this would go on and on, but now it’s more once a day. I don’t think that it has some “time decreasing” variable build in, so it’s definitely my action. More surprising is that even if I quickly unlock phone there won’t be anything new to give permissions to. Also, it might be only happening when the phone is locked as I haven’t had this happening otherwise.

Free AWS is good. Not awesome, but good.

Amazon with it’s Amazon Web Service (AWS) is pretty cool. It gives you access to remote machine which you don’t have to maintain. Actually you don’t have to do anything other than use it. All machines come in different flavours, but what tastes better than free? Granted that it’s extremely limited, but surely we can squeeze something out of it. Right?

AWS instances, i.e. remote machines, differ in the amount of RAM, disc space, operating system, whether they have GPU access and so on. As you can expect free tier instance is pretty low on all measure values. To be more precise free tier instance is of t2.micro type, which is a general purpose burstable instance with a single CPU, 1 GiB memory and EBS data storage (default 4Gb storage).

What is this good for? Depending on the needs, this might be good for almost anything that doesn’t require whatever these instances are lacking. (Did I help?) Obviously. So it’s not so good for heavy computations, training machine learning models or storing data. First of all, it’s better to use for these some other services like S3, DynamoDB, Lex or general machine learning. However, in case of specific requirements, it’s always better just to rent(?) more powerful instance.

These cheap instances, in my option, are very good for few tasks. The main one is web scrapping. This is tedious task that requires small CPU bandwidth, but constant access to the internet. Moreover, we don’t really want to make many calls in small time period so there needs to be a delay between each download. That’s either because we would like to avoid being detect as a bot, or for simply politeness to the owner of the server (not clogging bandwidth).

Internet is full of examples of scrappers for different type of data. I’m adding my own to the collection with r-u-listening project. The core of the project is to allow for users to find similar music to their input. It is a bit more than recommender, but more on this project probably in the future. The scraper itself is more in two parts, i.e. crawler.py and scraper.py. The database that I’m using is FreeMusicArchive.org, which goes with slogan “It’s not just free music; it’s good music”. I do recommend it and once I have something valuable I’d like to share it with them.

Unfortunately these instances don’t come with big default memory and storage. By default they have only 4 Gb storage, which when downloading mp3 tracks will be enough for about 800 tracks (assuming about 5 Mb per track). Again, as always, it depends on the task, but for machine learning algorithms we go with The more, the merrier.

As mentioned before, free tier instances allow up to 32 Gb. To do so go to EC2 service in your AWS console. In the options tab (left side) find Elastic Block Store (EBS) and select Volumes. Then select your instance and Actions, and Modify Volume. Simple, right? In all honesty, like many things in the AWS.

I’ve been using AWS for a while. Even finished AWS general course, its essentials and 3 day onsite workshop on Architecting on AWS. All is pretty simple and consistent. I like it.

Python Empirical Mode Decomposition on Image

One of the packages I intend long term maintain and support is Python implementation of Empirical Mode Decomposition (EMD) called PyEMD. I will skip introduction of the method as it has been explained in few other posts [1, 2, 3, …]. This blog entry is more about announcement of new feature which also means new version.

PyEMD version 0.2 is out. This means that PyEMD now supports 2D data (image) decomposition. Other visible improvements include documentation and more thorough testing both of code and data cases. Installation instructions are provided on the project’s webpage.

I am more than happy to include other improvements or suggestions. The next big step will be support for 3D and multi dimensional data. Please get in touch if you feel that there is something missing.

Image decomposition is based on the simple extremum definition: a point that is above (max) or below (min) surrounding. Behind the hood this is done using SciPy’s ndim maximum_filter. These are then connected using SmoothBivariateSpline. Stopping criteria can be chosen to be either based on the number of sifting operations or threshold values for mean and standard deviations.

Below is included exemplary decomposition, with the top image being input and the following two are the outputs. Exact formula with which the image was generated is
sin(4\pi \cdot x) \cdot \left( cos(8\pi y + 8\pi x) + 3\right) + \left(5x + 2y - 0.4 \right) y + 2. Python code generating this example is in provided in documentation in Examples/EMD2D.

Installing Cartopy on Ubuntu 14.04 (or Travis-CI)

I’m becoming increasingly convinced that using GitHub to share projects, even in work-in-progress (WIP) state is quite beneficial. For one, someone can quickly point out that such project exists or suggest a tool which would simplify some operations. What’s more, with tools like Travis-CI or CodeCov it’s easy to follow whether the project is actually buildable or whether updates broke functionally of other features. Real convenience! Although, there is some price to pay for these goods.

One of the projects I’m currently working on is MapViz. Its aim is to provide easy to use visualization and summary for country and other administrative units data, mainly EuroStat dataset. This project heavily depends on Cartopy library – a great tool for handling geographical data, however, not so great when it comes to installing. It comes with many dependencies, some of which have to be installed separately. For example, the newest version of Cartopy requires Proj.4 lib in version 4.9, which is not available for Ubuntu older than 16.04. This is unfortunate, because Travis allows to use Ubuntu in versions 12.04 (precise) or 14.04 (trusty). For these only Proj.4 in version 4.8 is available and that’s not enough. Once these are obtained, installing Cartopy with `pip` is easy, it’s just:

$ pip install cartopy

but the trick is to install all dependencies.

The workaround is dirty, but it works. One can directly download and install Proj.4 and its dependencies (libproj9) from 16.04 (xenial) repo. For exact links go to libproj9 and libproj-dev, select your architecture and then click on any mirror link. (Just a note that by default Travis-CI uses amd64.) So, download it with wget and then install with dpkg. Here’s an example:

$ wget http://es.archive.ubuntu.com/ubuntu/pool/universe/p/proj/libproj9_4.9.2-2_amd64.deb 
$ sudo dpkg -i libproj9_4.9.2-2_amd64.deb 

For comparison, here is my whole .travis.yml file:

dist: trusty
language: python
python:
  - 2.7
  - 3.4
  - 3.5
before_install:
  - sudo apt-get -qq update
  - sudo apt-get install -y libproj-dev proj-bin proj-data
  - sudo apt-get install -y python-pyproj
  - sudo apt-get install -y libc6
  - wget http://es.archive.ubuntu.com/ubuntu/pool/universe/p/proj/libproj9_4.9.2-2_amd64.deb 
  - sudo dpkg -i libproj9_4.9.2-2_amd64.deb 
  - wget http://es.archive.ubuntu.com/ubuntu/pool/universe/p/proj/libproj-dev_4.9.2-2_amd64.deb
  - sudo dpkg -i libproj-dev_4.9.2-2_amd64.deb
install:
  - pip install -r requirements.txt
script:
  - python setup.py install

Kuramoto in Stan (PyStan)

tl;dr: Project on github: https://github.com/laszukdawid/pystan-kuramoto


Stan is a programming language focused on probabilistic computations. Although it’s a rather recent language it’s been nicely received in data science/Bayesian community for its focus on designing model, rather than programming and getting stuck with computational details. Depending what is your background you might have heard about it either from Facebook and its Prophet project or as a native implementation for Hamiltonian Monte Carlo (HMC) and its optimised variation – No-U-Turn Sampler for HMC (NUTS).

For ease of including models in other programmes there are some interfaces/wrappers available, including RStan and PyStan.

Stan is not the easiest language to go through. Currently there are about 600 pages of documentation and then separate “documentations” for wrappers, which for PyStan isn’t very helpful. Obviously there’s no need for reading all of it, but it took me a while to actually understand what goes where an why. The reward, however, is very satisfying.

Since I’ve written a bit about Kuramoto models on this blog, it’s consistent if I share its implementation in Stan as well. Pystan-kuramoto project uses PyStan, but the actual Stan code is platform independent.

Currently there are two implementations (couldn’t come up with better names):

  • All-to-all, where Kuramoto model fit is performed to phase vector \vec{\Phi} with distinct oscillators, i.e. \vec{\Phi}_{N}(t) = \{\phi_1(t), \phi_2(t), \dots, \phi_N(t)\}.
  • All-to-one, where the model fits superposition (sum) of oscillators to phase time series \Phi_{N}(t) = \sum_{n=1}^{N} \phi_n(t).

In all honesty, this seems to be rather efficient. Optimisation is performed using penalized maximum likelihood estimation with optimization (L-BFGS). Before using it I wasn’t very knowledgeable in the algorithm, but now I’m simply amazed with its speed and accuracy. Thumbs up.

Spellchecker in VIM

Only recently I’ve started using VIM as my default editor and… I’m amazed.

I’m not advocating here for its superiority over any other text editor. You might be into design or simply wish for a graphical interface. I’m into not-using-mouse too much, so it actually fits quite well my purposes. Having said that, it took me a while to discover it with more commands than saving and exiting.

One of the features that from the outsider point of view, i.e. myself a month ago, that it couldn’t possibly ever have is spellchecker. Not that I actively thought that vim doesn’t have a spellchecker, but rather that passively ignoring the idea of suggesting corrections in a terminal. I was wrong. Vim has spellchecker and it’s pretty good.

The magical command is:
:setlocal spell spelllang=XX
where for English XX is one of {en, en_au, en_ca, en_gb, en_nz, en_us}. If you have a problem with setting up British, like I had, try command:
:setlocal spell! spelllang=en_gb
i.e. with extra exclamation mark.

Actually I suggest mapping command to some keyboard key, so when you press it turn on/off highlighting mode. To do this edit your .vimrc file (Ubuntu: ~/.vimrc) and add following line:
:map :setlocal spell! spelllang=en_gb
which means that whenever you press key F7 it will toggle command.

Once you have this set and someone introduces mistakes into your text/code, you can simply navigate between these using ]s and [s for next and previous, respectively. Obviously you can also just navigate using keys or hjkl, but who doesn’t want to feel like hacker?

Assuming someone has played with your text and make it less perfect, you have two options:

    z=   acceptance of the mistake and listening for looking up some suggestions, or
    zw   teach your dictionary this new fabulous word, or
    zg   denial, blissful ignorance, snooze button.

If you change your mind with your last command you can undo it using zuw or zug. I suggest quick read about these commands in Quick Start from spell documentation or :help spell.

Now! The default setting for spellchecking performance are against large files/projects, like PhD thesis. If you notice that after certain line spellchecker doesn’t inform you about the obvious mistake then you should use this StackOverflow advice and increase your syn sync parameters. In short, but I suggest reading the SO question and answers (and comments), you can increase minlines and maxlines in (Ubuntu) ~/.vim/after/syntax/tex.vim with:

syn sync maxlines=20000
syn sync minlines=500