Speeding up EEMD / CEEMDAN

Dawid Laszuk published on
2 min, 337 words

tl;dr: PyEMD documentation has a section on speeding up tweaks.

As an author of the PyEMD package probably the most common question I receive is "Why does it take so long to execute EEMD/CEEMDAN?". That's a reasonable question because the EEMD and CEEMDAN can be quite slow. Unfortunately, that is more about the nature of these methods rather than the implementation. (Not saying that the implementation cannot be improved.)

The question often is followed with a description of their signal, that it has 20k+ samples and some weekly seasonality collected for a couple of years with sub-hour frequency. From the perspective of the EMD et al. this means that there are many extrema which in turn means that one needs plenty of disk/memory space to accommodate interim results (especially spline) and that there's a "higher chance" to produce obtain that odd-extremum which is propagated through all siftings. Unfortunately, it is expected for the full EEMD/CEEMDAN evaluation to take minutes even if the EMD takes a couple of seconds.

Even if EEMD can be parallelized for trails in the ensemble, every added noise will cause slight changes to the signal. EMD is not robust; some perturbation will have no effect and others might return a couple more IMFs than expected. CEEMDAN is even worse in performance because its components depend on each other so the serial in nature with parts that are parallelizable.

I have added a F.A.Q. section to the PyEMD's Readme file and updated the PyEMD's documentation with a chapter on factors that affect the performance. These include the used data type, number of iterations and envelope spline selection. Let me know if something is not clear or there are others to be added. It's been a while since I have played with EMD so maybe there are some significant improvements that I should be aware of.