Kuramoto in Stan (PyStan)

tl;dr: Project on github: https://github.com/laszukdawid/pystan-kuramoto


Stan is a programming language focused on probabilistic computations. Although it’s a rather recent language it’s been nicely received in data science/Bayesian community for its focus on designing model, rather than programming and getting stuck with computational details. Depending what is your background you might have heard about it either from Facebook and its Prophet project or as a native implementation for Hamiltonian Monte Carlo (HMC) and its optimised variation – No-U-Turn Sampler for HMC (NUTS).

For ease of including models in other programmes there are some interfaces/wrappers available, including RStan and PyStan.

Stan is not the easiest language to go through. Currently there are about 600 pages of documentation and then separate “documentations” for wrappers, which for PyStan isn’t very helpful. Obviously there’s no need for reading all of it, but it took me a while to actually understand what goes where an why. The reward, however, is very satisfying.

Since I’ve written a bit about Kuramoto models on this blog, it’s consistent if I share its implementation in Stan as well. Pystan-kuramoto project uses PyStan, but the actual Stan code is platform independent.

Currently there are two implementations (couldn’t come up with better names):

  • All-to-all, where Kuramoto model fit is performed to phase vector \vec{\Phi} with distinct oscillators, i.e. \vec{\Phi}_{N}(t) = \{\phi_1(t), \phi_2(t), \dots, \phi_N(t)\}.
  • All-to-one, where the model fits superposition (sum) of oscillators to phase time series \Phi_{N}(t) = \sum_{n=1}^{N} \phi_n(t).

In all honesty, this seems to be rather efficient. Optimisation is performed using penalized maximum likelihood estimation with optimization (L-BFGS). Before using it I wasn’t very knowledgeable in the algorithm, but now I’m simply amazed with its speed and accuracy. Thumbs up.

Update: Particle Swarm Optimisation in Python

It came to my attention that my PSO for Python is actually quite popular. Only after few people contacted me I’ve noticed that the public version was not the same that I’ve been using. It wasn’t bad, but definitely not as good. Thus, obviously, I’ve updated the version and cleaned it a bit.

Update programme is available from my github or from Code subpage.

What’s the difference? There are few.
– Initial values, unless provided, are psuedo-random generated using Halton sequence. This prevents from artificial clustering.
– Perturbation in form of a Gaussian noise should mitigate false local minima by forcing particles to search surrounding area.
– Added max_repetition threshold, which states the maximum number of obtaining the same optimum value. Upon reaching threshold program finishes.
– General improvement in performance.
– Improved usage documentation within the file.
– Program is now compatible with Python3.

Feel free to request any features.

There is an idea of adding progressive save, which would quit, resume and modify parameters at any point of computation.

Halton sequence in Python

Sometimes when we ask for random we don’t actually mean random by just random. Yes, pseudo-random.

Consider unitary distribution with ranges 0 and 1. Say you want to draw 5 samples. Selecting them at random would mean that we might end up with set of {0, 0.1, 0.02, 0.09, 0.01} or {0.11, 0.99, 0.09, 0.91, 0.01}. Yes, these values don’t seem very random, but that’s the thing about randomness, that it randomly can seem to not be random.

Depending on the purpose of our selection, these values might be just OK. After all, they came from that distribution. However, if our goal is to reconstruct the distribution, or extract information about with limited number of samples, it is often better to draw those samples in pseudo-random way. For example, in accordance to van der Corput sequences for 1D distributions or its generalized version Halton sequence.

The best practice for sampling N dimensional distribution is to use different prime numbers for each dimension. For example, when I need to sample a 5 dimensional unitary distribution, or search space, I will use bases of (5, 7, 11, 13, 17). This is to prevent periodic visits of the same position.

In case you are wondering what’s the difference between actual random and pseudo-random, here is a gist:

Both are good, but the actual random can produce many empty holes. What we like to have is a fair representation of all areas of our search space.

Thus, without further ado, here are some code snippets.

This is a definition of my prime generating generator:

def next_prime():
    def is_prime(num):
        "Checks if num is a prime value"
        for i in range(2,int(num**0.5)+1):
            if(num % i)==0: return False
        return True

    prime = 3
    while(1):
        if is_prime(prime):
            yield prime
        prime += 2

As for Halton sequence, as mentioned before it uses van der Corput sequence. Again, here is the definition:

def vdc(n, base=2):
    vdc, denom = 0, 1
    while n:
        denom *= base
        n, remainder = divmod(n, base)
        vdc += remainder/float(denom)
    return vdc

And finally, definition for the Halton sequence:

def halton_sequence(size, dim):
    seq = []
    primeGen = next_prime()
    next(primeGen)
    for d in range(dim):
        base = next(primeGen)
        seq.append([vdc(i, base) for i in range(size)])
    return seq

To use all of this simply call halton_sequence(size, dim). These variables refer to the number of size of sample poll and the dimension of your problem. So if one wants to sample 3 dimensional space with 10 samples each it would be called as below. (Notice: first dimension has prime value 5, then it’s 7, 11, and following prime values.)

>>> halton_sequence(10, 3)
[
[0, 0.2, 0.4, 0.6, 0.8, 0.04, 0.24000000000000002, 0.44, 0.64, 0.8400000000000001],
[0, 0.14285714285714285, 0.2857142857142857, 0.42857142857142855, 0.5714285714285714, 0.7142857142857143, 0.8571428571428571, 0.02040816326530612, 0.16326530612244897, 0.30612244897959184],
[0, 0.09090909090909091, 0.18181818181818182, 0.2727272727272727, 0.36363636363636365, 0.45454545454545453, 0.5454545454545454, 0.6363636363636364, 0.7272727272727273, 0.8181818181818182]
]

Bayesian inference in Kuramoto model

For a while now, I’ve been involved in Kuramoto models and adjusting them to data. Recently I’ve stumbled upon using Bayesian inference on predicting parameters in time-evolving dynamics. Team from Physics Department at University of Lancaster has produced many papers, but to provide with some reference it might be worth to look at [1].

Technique refers to general phase \phi dynamics in oscillator, i.e.
\dot\phi_i = \omega_i + f_i (\phi_i) + g_i (\phi_i, \phi_j) + \xi_i ,
where \omega_i, f_i(\phi_i) and g_i(\phi_i, \phi_j) are intrinsic frequency, self coupling and coupling with other oscillators, respectively. Term \xi_i refers to noise and although authors claim that it can be any type of noise calculations are performed for white Gaussian type. I don’t really want to go much into details, because on first sight it might look complicated. Just to point out few steps that are necessary to introduce my examples:

  1. Rewrite model in form of \dot\phi_i = \sum_{k=-K}^{K} C_{k}^{(i)} P_{i,k} (\Phi) + \xi_i(t), where c is parameters vector, P provides significant terms (for example via Fourier decomposition) for our model and \Phi is a vector of all oscillators’ phases (at some moment).
  2. Calculate diagonal values of Jacobian matrix, i.e. \frac{\partial P_{i,k}}{\partial \phi_{i}}.
  3. Set a priori probabilities for parameters C vector and its covariance \Sigma (or concentration \Xi = \Sigma^{-1}).

Then the problem is reduced to finding maximum of minus log-likelihood function S. Authors also provide the exact formulas and algorithm, which find the extremum. Tutorial on applying their algorithm can be found in [2].

Based on MatLab code provided be the authors, and available on their webpage [3], I have written my own program in Python (code available in Code section or here ). As an experiment I have used simple Kuramoto model with sinusoidal coupling between phases
\dot\phi_1 = 28 - 0.3 \sin( \phi_1 - \phi_2) - 0.1 \sin( \phi_1 - \phi_3 ) + 0.0 \sin( \phi_2 - \phi_3),
\dot\phi_2 = 19 + 0.3 \sin( \phi_1 - \phi_2) + 0.0 \sin( \phi_1 - \phi_3 ) - 0.9 \sin( \phi_2 - \phi_3),
\dot\phi_3 = 11 + 0.0 \sin( \phi_1 - \phi_2) + 0.1 \sin( \phi_1 - \phi_3 ) + 0.9 \sin( \phi_2 - \phi_3),
which means there are 12 parameters – three intrinsic frequencies (\omega_i = \{28, 19, 11\}) and 3×3 K matrix representing coupling between pairs of oscillators. To each oscillator small Gaussian noise was added (mean 0 and standard deviation 0.01). Signals were generated for t \in [0, 40] with sampling step dt = 0.001. Furthermore, analysis was performed on 8 s segments with 2 s step. Figure 1 shows instantaneous frequencies (\dot\phi_i) for each oscillator. Figure 2 presents calculated parameters for each segment (thus time dependency). To be honest, I am quite surprised how good this technique works. I have performed more experiments, but to keep it short I am posting just one. There are some fluctuations in obtained values, but those changes are very small and most likely, in practical sense, negligible.

Fig. 1. Instantaneous frequencies of all oscillators in experiment.

Fig. 1. Instantaneous frequencies of all oscillators in experiment.

Fig. 2. Extracted parameters for presented dynamical system. First column shows values on intrinsic frequencies and the rest respective to title coupling value. Horizontal black lines indicate what are expected values.

Fig. 2. Extracted parameters for presented dynamical system. First column shows values on intrinsic frequencies and the rest respective to title coupling value. Horizontal black lines indicate what are expected values.

[1] A. Duggento, T. Stankovski, P. V. E. McClintock, and A. Stefanovska, “Dynamical Bayesian inference of time-evolving interactions: From a pair of coupled oscillators to networks of oscillators,” Phys. Rev. E – Stat. Nonlinear, Soft Matter Phys., vol. 86, no. 6, pp. 1–16, 2012.
[2] T. Stankovski, A. Duggento, P. V. E. McClintock, and A. Stefanovska, “A tutorial on time-evolving dynamical Bayesian inference,” Eur. Phys. J. Spec. Top., vol. 223, no. 13, pp. 2685–2703, 2014.
[3] Nonlinear Biomedical Physics, Lancaster University http://py-biomedical.lancaster.ac.uk/.