# Kuramoto in Stan (PyStan)

tl;dr: Project on github: https://github.com/laszukdawid/pystan-kuramoto

Stan is a programming language focused on probabilistic computations. Although it’s a rather recent language it’s been nicely received in data science/Bayesian community for its focus on designing model, rather than programming and getting stuck with computational details. Depending what is your background you might have heard about it either from Facebook and its Prophet project or as a native implementation for Hamiltonian Monte Carlo (HMC) and its optimised variation – No-U-Turn Sampler for HMC (NUTS).

For ease of including models in other programmes there are some interfaces/wrappers available, including RStan and PyStan.

Stan is not the easiest language to go through. Currently there are about 600 pages of documentation and then separate “documentations” for wrappers, which for PyStan isn’t very helpful. Obviously there’s no need for reading all of it, but it took me a while to actually understand what goes where an why. The reward, however, is very satisfying.

Since I’ve written a bit about Kuramoto models on this blog, it’s consistent if I share its implementation in Stan as well. Pystan-kuramoto project uses PyStan, but the actual Stan code is platform independent.

Currently there are two implementations (couldn’t come up with better names):

• All-to-all, where Kuramoto model fit is performed to phase vector $\vec{\Phi}$ with distinct oscillators, i.e. $\vec{\Phi}_{N}(t) = \{\phi_1(t), \phi_2(t), \dots, \phi_N(t)\}$.
• All-to-one, where the model fits superposition (sum) of oscillators to phase time series $\Phi_{N}(t) = \sum_{n=1}^{N} \phi_n(t)$.

In all honesty, this seems to be rather efficient. Optimisation is performed using penalized maximum likelihood estimation with optimization (L-BFGS). Before using it I wasn’t very knowledgeable in the algorithm, but now I’m simply amazed with its speed and accuracy. Thumbs up.

# Objective Empirical Mode Decomposition metric

This is a summary. If you wish to read full paper, please visit About me section and click on appropriate link, or follow one of these:  . Python code for metrics calculation is in Code section.

Part of my research has been concentrated on empirical mode decomposition (EMD). It’s quite nice decomposition method and can generate interesting results. Nevertheless, due to its empirically, it’s quite hard to determine what those results represent. Moreover, they will change if we change some parameters of the decomposition; parameters such as stopping criteria threshold, interpolation spline technique or even definition of extremum. Having two different decompositions of the same signal, one needs to decide which one is better. What researchers have done most often is to visually compare obtained decompositions and, based on their expertise, decide which one is better. Naturally this is not an ideal solution. Despite all our good (objective) intentions, observer’s bias is unavoidable. We have tried to mitigate this problem by referring to the original concept of EMD and intrinsic mode functions (IMFs). We have came up with metrics, which are based on IMF’s idealised properties: 1) an average frequency decreases with increase of IMF’s index, 2) distinct instantaneous frequency for each IMF and 3) disjoint Fourier spectra support for IMF’s amplitude and phase.

Metric M1

This metric is based on the empirical evidence for the decrease of average instantaneous frequency, simply referred to as the average frequency, with the increase of IMF’s index number. Although the order with which the IMF components are construced in general corresponds to the decreasing IMF average frequencies, there are instances when the instantaneous frequencies cross over between the components. Since it has been claimed that each IMF has a significant instantaneous frequency, such behaviour is unwelcome and hence it is penalised by this metric. Penalties are introduced when instantaneous frequency of an IMF with lower number (high average frequency) is smaller than instantaneous frequency of any IMF with higher number. The penalty value is proportional to the length of the crossing over effect, i.e. $(1) \qquad \qquad \displaystyle m^{\text{I}}_{j} = \sum_{k=j+1}^{N} \int_{\dot{\phi}_{k}>\dot{\phi}_{j}} \frac{dt}{T},$

where k, j are IMFs’ indices. Formula (1) compares functions of instantaneous frequencies of two IMFs and returns the total duration, over which the IMF with higher index has lower frequency. The crossing over effect has been presented in Figure 1. It shows instantaneous frequency of each IMF as a function of time. Coloured regions indicate where the crossing over occurred. Summing over all pairs of IMFs allows us to assess results for particular EMD. Metric value for the whole set is given as $(2) \qquad \qquad \displaystyle M_{\text{I}} = \sum_{j=1}^{N} m^{\text{I}}_{j}, \qquad M_{\text{I}} \in \left[0,\frac{N(N-1)}{2}\right],$

According to this measure, the best IMF set is the one, for which $M_{\text{I}}=0$, i.e. there are no crossing-over parts in instantaneous frequency domain. The worst case, $M_{\text{I}} = N(N-1)/2$, is when the order of all IMFs is reversed, i.e. when the first IMF is under all others and the last IMF is above all others. However, this theoretical upper limit is very unlikely and the corresponding IMF set could be still considered upon index reversal. Fig. 1. Plot of instantaneous frequency for each IMF as a function of time. Coloured regions indicates where the frequency crossing over occurs. Metric M_{\text{I}} penalises based on the length of those regions.

Metric M2

Another validating measure is based on the Bedrosian theorem. It refers to the necessary conditions for the signal’s amplitude, $a(t)$, and phase, $\phi(t)$, to be exactly recoverable using Hilbert transformation. For signal $s(t) = a(t) \cos\left( \phi(t) \right)$ these conditions require that the support of amplitude and phase’s Fourier spectra are not overlapping. In other words, if the amplitude function, $f(t) = a(t)$, and the phase function, $g(t) = \cos\left(\phi(t)\right)$, then $(3) \qquad \qquad \displaystyle \left\langle \mathcal{F}(f), \mathcal{F}(g) \right\rangle = 0 ,$

where $\mathcal{F}$ represents the Fourier transform and $\langle h(t), l(t) \rangle = \int h^*(t) l(t) dt$ is the dot product. Here it is assumed, that all functions belong to $L^2$ normed space.

Let $F_{j}^{a} = \left| \mathcal{F} \left( a_{j}(t) \right) \right|$ and $F_{j}^{\phi} = \left| \mathcal{F} \left( \cos\left(\phi_{j}(t)\right) \right) \right|$ be absolute values of Fourier transforms of $a_{j}$ and $\cos(\phi_{j})$, respectively, for $j$ IMF. Their normalised measure of overlapping spectra is given as $(4) \qquad \qquad \displaystyle m_{j}^{\text{II}} = \frac{\left\langle F_{j}^{a}, F_{j}^{\phi} \right\rangle} {\sqrt{\| F_{j}^{a} \| \| F_{j}^{\phi} \|}} ,$

where $\| h \| = \langle h, h \rangle$ is a norm of a function h. Assumptions of Bedrosian’s theorem are completely fulfilled when spectra are not overlapping, thus minimum value of $m^{\text{II}}_j$ is zero. This allows for different definitions of a metric for the whole IMF set, depending on application of EMD. First proposition is based on a biggest value of overlap $m_{j}$ in considered decomposition, i.e. $(5) \qquad \qquad M_{\text{II}} = \max_{j} \{ m_{j}^{\text{II}}\}, \qquad M_{\text{II}} \in [0,1],$

and the second refers to the cumulative overlap within the decomposed set, i.e. $(6) \qquad \qquad M_{\text{III}} = \sum_{j=1}^{N} m_{j}^{\text{II}}, \qquad M_{\text{III}} \in [0, N],$

where in both cases N is the number of extracted IMFs. Zero for both metrics implies no overlap between amplitude’s and phase’s spectra in any of IMFs.

Visual interpretation of the validation measure (4) is presented in Figure 2. It shows example Fourier spectra of slowly changing amplitude (dashed line) and higher frequency phase (solid line). Gray-striped region indicates overlapping area of both spectra. Proposed value is a measure of ratio of the overlapping area to the total area under both functions.

Since metric $M_{\text{III}}$ is a sum over all IMFs, it also contains the one which maximises value $m^{\text{II}}_j$ (Eq. (4)). This means that $M_{\text{III}}$ for each decomposition has to be at least equal or higher than $M_{\text{II}}$. Fig. 2. Example of comparing Fourier spectrum of the amplitude with the
spectrum of phase. Gray-striped area indicates where two functions overlap.

Application of the validation measures

Each of the presented metrics highlights different properties of the decomposition. Computing all three values is equivalent to finding a point $M=(M_{\text{I}}, M_{\text{II}},M_{\text{III}})$ in a 3-dimensional space, where each dimension relates to the specific metric. The best decomposition corresponds to the minimum over all the metrics, i.e. $M=(0,0,0)$, and the worst decomposition to $M=(\frac{N(N-1)}{2},1,N)$. For any other point one has to decide on the importance, or weight, for each of the proposed metrics, on the basis of the problem being considered. Although the distance in the M-space is not strictly defined, it can be any $L^p$ norm, thus we suggest using the weighted Manhattah metric, i.e. $(7) \qquad \qquad \| M \| = w_1 M_{\text{I}} + w_2 M_{\text{II}} + w_3 M_{\text{III}} ,$
where $w_i$ are respective weights. Their values should reflect the relative importance of features one is concentrated on.

# EMD on audio – wav and recurrance plots

Update:
Code and direct access to examples can be found on my GitHub reccurrence-plot.

There was an idea that has been bothering me for past few months, but due to time restrictions and many commitments I just couldn’t do it. Then came the conference quite far away and few transit hour at an airport. And now, few words about it.

Question: What do you get when you perform EMD on an audio?

Sound is nothing else than a vibration of the air in area where we happen to be. Human ear can register vibration of frequency roughly between 20 Hz and 20 kHz. In order to fulfil Nyquist sampling (in essence, the fastest frequency one can detect is up-down-up-down…, i.e. half the sampling rate) we need to record with at least 40 kHz sampling rate. Taking into account some additional technical difficulties and just in case sounds are typically recorded at sampling rate around 44 kHz. But there is also stereo, and… well, my point is, there is a lot of data in 1 second of sounds. For most common methods of signal analysing, say short-time Fourier transform or filters, processing is relatively fast. However, in case of EMD it is not that good. The longer the signal the more extrema in it leading to extending time of spline computation (e.g. cubic needs to know all extrema) and increase number of iterations. Needless to say, decomposing 3 min song gives very small ratio of (rewards significance) / (time spent), which is the reason I’ll show you result for few seconds sounds.

First sound (14240 samples) to decompose is disconnect sound. Below is figure where in red is the original signal and following rows are its IMFs (only four first and residue). All plots were normalised so that the biggest deflection is 1.

The typical result, nothing new. However, interesting is how those IMFs listen. Their audio version is available here: first, second, third and fourth IMFs. How to interpret them? Allow me not to interpret them. Thought they are quite ‘noisy’, I think they sound rather… interesting. The nicer part of this are the recurrence plots.

Shortly, it is an image describe by a function $I(t, \tau) = \left\{ \begin{array}{rcl} 0 & \mbox{for} & |x(t)-x(\tau)| < \epsilon \\ 1 & \mbox{for} & |x(t)-x(\tau)| \geq \epsilon \\ \end{array} \right. ,$
meaning that, if two values within signal are closer than $\epsilon$ then we draw a dot. I have modified this function slightly to increase the dynamics and instead of binary output, it gives few more values. Modified function is: $I_N(t, \tau) = \left\{ \begin{array}{lcr} n & \mbox{for} & n \epsilon \leq |x(t)-x(\tau)| < (n+1) \epsilon \\ N & \mbox{for} & |x(t)-x(\tau)| \geq N \epsilon \\ \end{array} \right.,$
where $\mathbb{N} \ni n \in [0, N-1]$.
Here is a snippet on how to generate plots using Python:


import pylab as plt
import numpy as np

def rec_plot(s, eps=None, steps=None):
if eps==None: eps=0.01
if steps==None: steps=10
N = s.size
S = np.repeat(s[None,:], N, axis=0)
Z = np.floor(np.abs(S-S.T)/eps)
Z[Z>steps] = steps

return Z

s = np.random.random(1000)
plt.imshow(rec_plot(s))
plt.show()


Due to the size of an image and readability, here are only smaller plots (5000 points starting from 2000th point). Below are plots for second IMF, fifth IMF and residue from subtracting five fist IMFs. With slower frequencies one can notice quite mesmerising patterns. First three IMFs are abundant in high frequencies, which makes it fast varying image. The rest, however, gracefully changes it colours in nice patterns.

Same analysis was performed for chainsaw sound. Below are its EMD decomposition and few recurrence plots. Signal has 21381 samples, which makes it much longer to analyse. Again, for recurrence plots only time series of 3000 points were displayed. I must admit that I feel mesmerised by those patterns. Here are additionally wav files for first, second, third, and fourth IMFs.

# Bayesian inference in Kuramoto model

For a while now, I’ve been involved in Kuramoto models and adjusting them to data. Recently I’ve stumbled upon using Bayesian inference on predicting parameters in time-evolving dynamics. Team from Physics Department at University of Lancaster has produced many papers, but to provide with some reference it might be worth to look at .

Technique refers to general phase $\phi$ dynamics in oscillator, i.e. $\dot\phi_i = \omega_i + f_i (\phi_i) + g_i (\phi_i, \phi_j) + \xi_i$,
where $\omega_i, f_i(\phi_i)$ and $g_i(\phi_i, \phi_j)$ are intrinsic frequency, self coupling and coupling with other oscillators, respectively. Term $\xi_i$ refers to noise and although authors claim that it can be any type of noise calculations are performed for white Gaussian type. I don’t really want to go much into details, because on first sight it might look complicated. Just to point out few steps that are necessary to introduce my examples:

1. Rewrite model in form of $\dot\phi_i = \sum_{k=-K}^{K} C_{k}^{(i)} P_{i,k} (\Phi) + \xi_i(t)$, where $c$ is parameters vector, $P$ provides significant terms (for example via Fourier decomposition) for our model and $\Phi$ is a vector of all oscillators’ phases (at some moment).
2. Calculate diagonal values of Jacobian matrix, i.e. $\frac{\partial P_{i,k}}{\partial \phi_{i}}$.
3. Set a priori probabilities for parameters $C$ vector and its covariance $\Sigma$ (or concentration $\Xi = \Sigma^{-1}$).

Then the problem is reduced to finding maximum of minus log-likelihood function S. Authors also provide the exact formulas and algorithm, which find the extremum. Tutorial on applying their algorithm can be found in .

Based on MatLab code provided be the authors, and available on their webpage , I have written my own program in Python (code available in Code section or here ). As an experiment I have used simple Kuramoto model with sinusoidal coupling between phases $\dot\phi_1 = 28 - 0.3 \sin( \phi_1 - \phi_2) - 0.1 \sin( \phi_1 - \phi_3 ) + 0.0 \sin( \phi_2 - \phi_3)$, $\dot\phi_2 = 19 + 0.3 \sin( \phi_1 - \phi_2) + 0.0 \sin( \phi_1 - \phi_3 ) - 0.9 \sin( \phi_2 - \phi_3)$, $\dot\phi_3 = 11 + 0.0 \sin( \phi_1 - \phi_2) + 0.1 \sin( \phi_1 - \phi_3 ) + 0.9 \sin( \phi_2 - \phi_3)$,
which means there are 12 parameters – three intrinsic frequencies ( $\omega_i = \{28, 19, 11\}$) and 3×3 K matrix representing coupling between pairs of oscillators. To each oscillator small Gaussian noise was added (mean 0 and standard deviation 0.01). Signals were generated for $t \in [0, 40]$ with sampling step $dt = 0.001$. Furthermore, analysis was performed on 8 s segments with 2 s step. Figure 1 shows instantaneous frequencies ( $\dot\phi_i$) for each oscillator. Figure 2 presents calculated parameters for each segment (thus time dependency). To be honest, I am quite surprised how good this technique works. I have performed more experiments, but to keep it short I am posting just one. There are some fluctuations in obtained values, but those changes are very small and most likely, in practical sense, negligible. Fig. 2. Extracted parameters for presented dynamical system. First column shows values on intrinsic frequencies and the rest respective to title coupling value. Horizontal black lines indicate what are expected values.

 A. Duggento, T. Stankovski, P. V. E. McClintock, and A. Stefanovska, “Dynamical Bayesian inference of time-evolving interactions: From a pair of coupled oscillators to networks of oscillators,” Phys. Rev. E – Stat. Nonlinear, Soft Matter Phys., vol. 86, no. 6, pp. 1–16, 2012.
 T. Stankovski, A. Duggento, P. V. E. McClintock, and A. Stefanovska, “A tutorial on time-evolving dynamical Bayesian inference,” Eur. Phys. J. Spec. Top., vol. 223, no. 13, pp. 2685–2703, 2014.
 Nonlinear Biomedical Physics, Lancaster University http://py-biomedical.lancaster.ac.uk/.

# Particle Swarm Optimisation in Python

[Updated version available. See newest post.]

tl;dr: For Python PSO code head to codes subpage.

One might expect, that currently there is everything on the internet and it’s only a matter of time to find something of an interest. At least that’s what I expected. Unfortunately, sometimes finding the right thing and adjusting it to your own problem can take much more time that doing it yourself. This is a rule about which I often forget. And it happened again. I was suggested to try Particle Swarm Optimisation (PSO) for my problem. Since it has been some time since the introduction of that method, and since Python is a quite popular language, I expected that finding code to just do that wouldn’t be a problem. True, it wasn’t hard for few entries, but either software used some obsolete packages, or it was difficult to install it, or the instructions weren’t clearly (to me) written. Thus, after about 2 days of trying to figure out everything and apply it to my problem, I simply gave up. That is, gave up searching and wrote my own code. In about 2 h.

What is Particle Swarm Optimisation (PSO)? Wikipedia gives quite comprehensive explanation, with a long list of references on this subject. However, in just few words: PSO is a metaheuristic, meaning that it does not require any a priori knowledge on gradient or on space. It is an optimisation method with many units (particles) looking independently for the best solution. Each particle updates its position (set of parameters) depending on its velocity vector, the best position it found during search and the best position discovered by any particle in swarm. Algorithm makes all particles go towards the best known solution, however, it does not guarantee that the solution will be globally optimal.

Algorithm depends on: initial positions $\vec{P}_0$, initial velocities $\vec{V}_0$, acceleration ratio $\omega$ and influence variable of the best global $\phi_g$ and the best local $\phi_l$ parameters. One also has to declare how many particles will be used and how many generations is the limit (or what is acceptable error of solution).

My program (shared in a Code section) can be executed in the following way:
 pso = PSO( minProblem, bounds) bestPos, bestFit = pso.optimize() 
where minProblem is a function to be minimised (for an array of N parameters), and bounds is 2D (2 x N values) array of minimum and maximum boundary values. The bestPos is the best set of parameters and bestFit is smallest obtained value for the problem.

A small example provided with the code fits a curve to noisy data. Input signal was $S(t) = 4 t \cos( 4 \pi t^2) \exp(-3 t^2)$ + U(-0.025,0.025)\$, where $U(a,b)$ is a random noise from unitary [a, b] distribution. Fitted function is $F_{p}(t) = p_1 t \sin( p_2 \pi t^2 + p_3) \exp(- p_4 t^2)$, thus the expected values are P={4, 4, pi/2, 3}. See figure for fitted curve below. Obtained values are P = { 4.61487992 3.96829528 1.60951687 3.43047877} which are not far off. I am aware that my example is a bit to simple for such powerful search technique, but it is just a quick example. Curve fitting with PSO optimisation method. Blue and red lines are for input data and reconstruction function respectively.

# Mathematica for solving coupled ordinary differential equation

Probably many know that Wolfram Mathematica is a great tool. I knew it as well, but now I’m actually observing first hand how powerful it really is. It is like with chilli pepper – everyone knows that it is hot, but you unless you taste it you won’t really understand it. My work involves solving and manipulating many ordinary differential equations (ODE) which quite often are coupled. Writing basic script in Python to do that isn’t hard. For simple cases one can use SciPy’s build-in function ode from class integrate (documentation). It isn’t very fast, and writing everything takes some time, but it was still faster than solution I saw for Matlab or C++. However, the winner, in my opinion, is Mathematica with it’s simple structure. I don’t have it often, but looking at code for generating and solving $n$ coupled ODEs and seeing how fast it’s performed makes me simply happy. Really simple!

Below is code to generate $n$ coupled ODEs and parameters for them. Note: Subscript[x, i] is only syntax and could be exchanged with $x_i$, but for copying purposes I left it in long form.

 n = 7; Do[Do[Subscript[k, i, j] = RandomReal[{0, 1}], {i, 1, n}], {j, 1, n}]; Do[Subscript[w, i] = RandomReal[{2 n, 4 n}], {i, 1, n}]; Do[Subscript[r0, i] = RandomReal[{1, 3}], {i, 1, n}]; Do[Subscript[p, i] = RandomReal[{1, 3}], {i, 1, n}];

 eqns = Table[{Subscript[ $\phi$, i]'[t] == Subscript[w, i] + Sum[Subscript[k, i, j] Sin[Subscript[ $\phi$, j][t] - Subscript[ $\phi$, i][t]], {j, 1, n}], Subscript[ $\phi$, i] == Subscript[p, i]}, {i, 1, n}]; vars = Table[Subscript[ $\phi$, i][t], {i, 1, n}]; 

sol = NDSolve[eqns, vars, {t, 0, 2 Pi}]; 

Whole magic is in function NDSolve (documentation), which solves numerically equations eqns for variables vars in provided range. Now all what’s left is to plot results. Again, very simple.

This code generates plots and display them in column. Output graphs below.
 FreqTable = Table[Plot[Evaluate[D[Subscript[ $\phi$, i][t] /. sol, t]], {t, 0, 2 Pi}, GridLines -> {{}, {Subscript[w, i]}}, AxesLabel -> {Time[s], InstFreq [1/s]}], {i, 1, n}]; GraphicsColumn[FreqTable, ImageSize -> Full] # Plots in matplotlib pylab

When trying to explain something to someone it is usually much better to actually present it in a graphical form. For this, I’m usually using Python and its module Matplotlib. It is highly customizable package with lots and lots different display techniques both for 2D and 3D images. Great feature is that one can also create animations to include in presentations.

One of the classes in Matplotlib is called PyLab. It is meant to have the same functionality as Matlab’s plotting functions. For my everyday activity that’s typical more than enough. I’m not going to discuss package in details. I just want to mention few nice options when displaying and saving plot to a file.

As a default, pylab leaves lots of whitespace for axis, titles and borders. On one hand this is good, because it makes everything look really nice, but on the other, we are losing some space for actual graphs. If you are going to discuss the plot while presenting, sometimes you can remove “unnecessary” content.

Supposedly one imports pylab with

import pylab as py  # including pylab lib


To remove ticks from axis:

frame = py.gca()  # returns (gets) current axis
frame.axes.get_xaxis().set_ticks([]) # removes all x axis ticks
frame.axes.get_yaxis().set_ticks([]) # removes all y axis ticks


To change the distance between subplots and margins:

py.subplots_adjust(left=leftMargine, bottom=bottomMargine,
right=rightMargine, top=topMargine,
wspace=widthSpace, hspace=highSpace)


Parameters left, right, bottom and top refer to location of the grid corners. Wspace and hspace are respectively horizontal and vertical spaces between subplots. Pylab can actually do it by itself, but the result is not always the best possible. Command to do it automatically is

py.tight_layout()


On top of that, one can also remove unnecessary white space in output when saving to file

py.savefig(filename, bbox_inches='tight')


where the ‘tight’ indicates cropping margins.

Here is a short snippet where those functions are applied:

import pylab as py
import numpy as np

N = 500
t = np.linspace(0, 1, N)
s1 = np.random.random(N)
s2 = np.random.random(N)
s3 = np.random.random(N)

py.subplot(311)
py.plot(t, s1)
py.ylabel('R1')
py.gca().axes.get_xaxis().set_ticks([])

py.subplot(312)
py.plot(t, s2)
py.ylabel('R2')
py.gca().axes.get_xaxis().set_ticks([])

py.subplot(313)
py.plot(t, s3)
py.ylabel('R3')
py.xlabel('Time')

py.savefig('output', dpi=150, bbox_inches='tight') 