Free AWS is good. Not awesome, but good.

Amazon with it’s Amazon Web Service (AWS) is pretty cool. It gives you access to remote machine which you don’t have to maintain. Actually you don’t have to do anything other than use it. All machines come in different flavours, but what tastes better than free? Granted that it’s extremely limited, but surely we can squeeze something out of it. Right?

AWS instances, i.e. remote machines, differ in the amount of RAM, disc space, operating system, whether they have GPU access and so on. As you can expect free tier instance is pretty low on all measure values. To be more precise free tier instance is of t2.micro type, which is a general purpose burstable instance with a single CPU, 1 GiB memory and EBS data storage (default 4Gb storage).

What is this good for? Depending on the needs, this might be good for almost anything that doesn’t require whatever these instances are lacking. (Did I help?) Obviously. So it’s not so good for heavy computations, training machine learning models or storing data. First of all, it’s better to use for these some other services like S3, DynamoDB, Lex or general machine learning. However, in case of specific requirements, it’s always better just to rent(?) more powerful instance.

These cheap instances, in my option, are very good for few tasks. The main one is web scrapping. This is tedious task that requires small CPU bandwidth, but constant access to the internet. Moreover, we don’t really want to make many calls in small time period so there needs to be a delay between each download. That’s either because we would like to avoid being detect as a bot, or for simply politeness to the owner of the server (not clogging bandwidth).

Internet is full of examples of scrappers for different type of data. I’m adding my own to the collection with r-u-listening project. The core of the project is to allow for users to find similar music to their input. It is a bit more than recommender, but more on this project probably in the future. The scraper itself is more in two parts, i.e. crawler.py and scraper.py. The database that I’m using is FreeMusicArchive.org, which goes with slogan “It’s not just free music; it’s good music”. I do recommend it and once I have something valuable I’d like to share it with them.

Unfortunately these instances don’t come with big default memory and storage. By default they have only 4 Gb storage, which when downloading mp3 tracks will be enough for about 800 tracks (assuming about 5 Mb per track). Again, as always, it depends on the task, but for machine learning algorithms we go with The more, the merrier.

As mentioned before, free tier instances allow up to 32 Gb. To do so go to EC2 service in your AWS console. In the options tab (left side) find Elastic Block Store (EBS) and select Volumes. Then select your instance and Actions, and Modify Volume. Simple, right? In all honesty, like many things in the AWS.

I’ve been using AWS for a while. Even finished AWS general course, its essentials and 3 day onsite workshop on Architecting on AWS. All is pretty simple and consistent. I like it.

Python Empirical Mode Decomposition on Image

One of the packages I intend long term maintain and support is Python implementation of Empirical Mode Decomposition (EMD) called PyEMD. I will skip introduction of the method as it has been explained in few other posts [1, 2, 3, …]. This blog entry is more about announcement of new feature which also means new version.

PyEMD version 0.2 is out. This means that PyEMD now supports 2D data (image) decomposition. Other visible improvements include documentation and more thorough testing both of code and data cases. Installation instructions are provided on the project’s webpage.

I am more than happy to include other improvements or suggestions. The next big step will be support for 3D and multi dimensional data. Please get in touch if you feel that there is something missing.

Image decomposition is based on the simple extremum definition: a point that is above (max) or below (min) surrounding. Behind the hood this is done using SciPy’s ndim maximum_filter. These are then connected using SmoothBivariateSpline. Stopping criteria can be chosen to be either based on the number of sifting operations or threshold values for mean and standard deviations.

Below is included exemplary decomposition, with the top image being input and the following two are the outputs. Exact formula with which the image was generated is
sin(4\pi \cdot x) \cdot \left( cos(8\pi y + 8\pi x) + 3\right) + \left(5x + 2y - 0.4 \right) y + 2. Python code generating this example is in provided in documentation in Examples/EMD2D.

Installing Cartopy on Ubuntu 14.04 (or Travis-CI)

I’m becoming increasingly convinced that using GitHub to share projects, even in work-in-progress (WIP) state is quite beneficial. For one, someone can quickly point out that such project exists or suggest a tool which would simplify some operations. What’s more, with tools like Travis-CI or CodeCov it’s easy to follow whether the project is actually buildable or whether updates broke functionally of other features. Real convenience! Although, there is some price to pay for these goods.

One of the projects I’m currently working on is MapViz. Its aim is to provide easy to use visualization and summary for country and other administrative units data, mainly EuroStat dataset. This project heavily depends on Cartopy library – a great tool for handling geographical data, however, not so great when it comes to installing. It comes with many dependencies, some of which have to be installed separately. For example, the newest version of Cartopy requires Proj.4 lib in version 4.9, which is not available for Ubuntu older than 16.04. This is unfortunate, because Travis allows to use Ubuntu in versions 12.04 (precise) or 14.04 (trusty). For these only Proj.4 in version 4.8 is available and that’s not enough. Once these are obtained, installing Cartopy with `pip` is easy, it’s just:

$ pip install cartopy

but the trick is to install all dependencies.

The workaround is dirty, but it works. One can directly download and install Proj.4 and its dependencies (libproj9) from 16.04 (xenial) repo. For exact links go to libproj9 and libproj-dev, select your architecture and then click on any mirror link. (Just a note that by default Travis-CI uses amd64.) So, download it with wget and then install with dpkg. Here’s an example:

$ wget http://es.archive.ubuntu.com/ubuntu/pool/universe/p/proj/libproj9_4.9.2-2_amd64.deb 
$ sudo dpkg -i libproj9_4.9.2-2_amd64.deb 

For comparison, here is my whole .travis.yml file:

dist: trusty
language: python
python:
  - 2.7
  - 3.4
  - 3.5
before_install:
  - sudo apt-get -qq update
  - sudo apt-get install -y libproj-dev proj-bin proj-data
  - sudo apt-get install -y python-pyproj
  - sudo apt-get install -y libc6
  - wget http://es.archive.ubuntu.com/ubuntu/pool/universe/p/proj/libproj9_4.9.2-2_amd64.deb 
  - sudo dpkg -i libproj9_4.9.2-2_amd64.deb 
  - wget http://es.archive.ubuntu.com/ubuntu/pool/universe/p/proj/libproj-dev_4.9.2-2_amd64.deb
  - sudo dpkg -i libproj-dev_4.9.2-2_amd64.deb
install:
  - pip install -r requirements.txt
script:
  - python setup.py install

Kuramoto in Stan (PyStan)

tl;dr: Project on github: https://github.com/laszukdawid/pystan-kuramoto


Stan is a programming language focused on probabilistic computations. Although it’s a rather recent language it’s been nicely received in data science/Bayesian community for its focus on designing model, rather than programming and getting stuck with computational details. Depending what is your background you might have heard about it either from Facebook and its Prophet project or as a native implementation for Hamiltonian Monte Carlo (HMC) and its optimised variation – No-U-Turn Sampler for HMC (NUTS).

For ease of including models in other programmes there are some interfaces/wrappers available, including RStan and PyStan.

Stan is not the easiest language to go through. Currently there are about 600 pages of documentation and then separate “documentations” for wrappers, which for PyStan isn’t very helpful. Obviously there’s no need for reading all of it, but it took me a while to actually understand what goes where an why. The reward, however, is very satisfying.

Since I’ve written a bit about Kuramoto models on this blog, it’s consistent if I share its implementation in Stan as well. Pystan-kuramoto project uses PyStan, but the actual Stan code is platform independent.

Currently there are two implementations (couldn’t come up with better names):

  • All-to-all, where Kuramoto model fit is performed to phase vector \vec{\Phi} with distinct oscillators, i.e. \vec{\Phi}_{N}(t) = \{\phi_1(t), \phi_2(t), \dots, \phi_N(t)\}.
  • All-to-one, where the model fits superposition (sum) of oscillators to phase time series \Phi_{N}(t) = \sum_{n=1}^{N} \phi_n(t).

In all honesty, this seems to be rather efficient. Optimisation is performed using penalized maximum likelihood estimation with optimization (L-BFGS). Before using it I wasn’t very knowledgeable in the algorithm, but now I’m simply amazed with its speed and accuracy. Thumbs up.

Spellchecker in VIM

Only recently I’ve started using VIM as my default editor and… I’m amazed.

I’m not advocating here for its superiority over any other text editor. You might be into design or simply wish for a graphical interface. I’m into not-using-mouse too much, so it actually fits quite well my purposes. Having said that, it took me a while to discover it with more commands than saving and exiting.

One of the features that from the outsider point of view, i.e. myself a month ago, that it couldn’t possibly ever have is spellchecker. Not that I actively thought that vim doesn’t have a spellchecker, but rather that passively ignoring the idea of suggesting corrections in a terminal. I was wrong. Vim has spellchecker and it’s pretty good.

The magical command is:
:setlocal spell spelllang=XX
where for English XX is one of {en, en_au, en_ca, en_gb, en_nz, en_us}. If you have a problem with setting up British, like I had, try command:
:setlocal spell! spelllang=en_gb
i.e. with extra exclamation mark.

Actually I suggest mapping command to some keyboard key, so when you press it turn on/off highlighting mode. To do this edit your .vimrc file (Ubuntu: ~/.vimrc) and add following line:
:map :setlocal spell! spelllang=en_gb
which means that whenever you press key F7 it will toggle command.

Once you have this set and someone introduces mistakes into your text/code, you can simply navigate between these using ]s and [s for next and previous, respectively. Obviously you can also just navigate using keys or hjkl, but who doesn’t want to feel like hacker?

Assuming someone has played with your text and make it less perfect, you have two options:

    z=   acceptance of the mistake and listening for looking up some suggestions, or
    zw   teach your dictionary this new fabulous word, or
    zg   denial, blissful ignorance, snooze button.

If you change your mind with your last command you can undo it using zuw or zug. I suggest quick read about these commands in Quick Start from spell documentation or :help spell.

Now! The default setting for spellchecking performance are against large files/projects, like PhD thesis. If you notice that after certain line spellchecker doesn’t inform you about the obvious mistake then you should use this StackOverflow advice and increase your syn sync parameters. In short, but I suggest reading the SO question and answers (and comments), you can increase minlines and maxlines in (Ubuntu) ~/.vim/after/syntax/tex.vim with:

syn sync maxlines=20000
syn sync minlines=500

Easy clipboard in bash

Those who know me, know that I’m not a fan of using mouse. Whenever possible I try to avoid using it. Although, after years of experience, I am fluent in keyboarding there are still some tasks that I need mouse. Well, until quite recent.

The number one of annoying tasks is copying things from terminal to clipboard. Depending what exactly I needed to do with it I’d either select something with mouse and then copy, or stream output to file and then select it within editor. Some people found it weird, but yes, often copying through editor is much faster.

Revolution came with `xclip`. I’m not yet fluent in doing everything with it, but even with limited experience that I have it feels like a superpower. This program allows to copy into/from X clipboard.

Most unix distro have it installed by default or at least in package manager. In case of Ubuntu one can install it with:
$ sudo apt-get update
$ sudo apt-get install xclip

Examples:
Copy current directory into clipboard:
$ pwd | xclip

Paste whatever you have in clipboard:
$ xclip -o

Copy to global clipboard so you can use in any other program:
$ echo $PATH | xclip -selection clipboard

If this is too long to type, one can always use aliases, i.e. setting shorter name for long command. Either type directly into terminal (but that is only for current terminal), or update your ~/.bashrc file with:
alias "c=xclip"
alias "v=xclip -o"
alias "cs=xclip -selection clipboard"
alias "vs=xclip -o -selection clipboard"

Then copying content of a file to a clipboard:
$ cat file.txt | cs

… and CTRL+V wherever one wishes.

Yes, this is awesome!

Source:
StackOverflow: How can I copy the output of a command directly into my clipboard?

Kuramoto in Python

Code for Kuramoto in Python is available here or from code subpage.
Explanation on how to use it is on the bottom of this post.

Tiny introduction

Kuramoto[1, 2] is probably one of the most popular and successful models for coupled oscillators. There is plenty of information about it, but in brief summary it models oscillators’ phases to be dependent on scaled phase differences for all pairs of oscillator.

A while ago I have actually wrote on a specific method that I used to find optimal parameters for a Kuramoto model given some observation. (See Bayesian Dynamic Inference here.)

Mathematically speaking this model is defined by a set of coupled ordinary differential equations (ODEs). Given N oscillators, dynamics for each oscillator’s phase \phi_i is defined as i\in N is defined by \dot\phi_i = \omega_i + \sum_{j=0}^N k_{ij} \sin(\phi_i-\phi_j), where the summation is over all others oscillators. Note that one can leave coupling with itself, because in such case \sin(\phi_i - \phi_i) = 0, thus it doesn’t input anything into the final solution.

Quick note that Kuramoto coupling model can be used to reference model with coupling function that can be represented as a sum of harmonic functions. For example, second order means that we are including both k \sin(\Delta\phi) and k^{(2)}\sin(2\Delta\phi). In general Kuramoto model of order M would describe \dot\phi_i = \omega_i + \sum_{m=0}^{M} \sum_{j=0}^N k_{ij}^{(m)} \sin(m (\phi_i-\phi_j)).

Examples

Below are two examples. Both use the same core values with the exception that for the second system the coupling function defined by two harmonic terms. This naturally changes dynamics, but it shouldn’t change much average value, which is close to intrinsic frequency.

Exact values for the first experiment are presented in table below. Each row is a different oscillator with initial phase Φ0 and intrinsic frequency Ω. The following columns denote coupling strength between respective pairs of oscillators.

Φ0 ω k.1 k.2 k.3
1 0 28 0.2 1.1
2 π 19 0.5 -0.7
3 0 11 0.3 0.9

Phase dynamics, i.e. time derivative of obtained phases, are presented in Fig. 1. One can see that all plots are centred on intrinsic frequency with some modulations.

kuramoto_dphi_simple

Fig. 1. Phase dynamics, i.e. time derivative, in simple Kuramoto model.

Table below shows values used in the second simulation. Extra 3 columns denote scaling values used in second harmonic.

Φ0

ω k.1 k.2 k.3 k(2).1 k(2).2 k(2).3
1 0 28 0.2 1.1 -0.5 0.2
2 π 19 0.5 -0.7 -0.4 1.0
3 0 11 0.3 0.9 0.8 0.8

Again phase dynamics have been presented in a form of plot (Fig. 2). I think the difference is clear. Despite having similar mean values (approximately equal to intrinsic frequency), their modulations have change. Not only the frequency content of these modulations has changed, but also their amplitude.

kuramoto_dphi_2harmonic

Fig. 2. Phase dynamics, i.e. time derivative, in Kuramoto model with included second harmonic.

Using code

Except for downloading code from either github or code subpage, one is expected to have SciPy module. Kuramoto uses it to solve differential equations.

Script below shows an example of how one can use the Kuramoto module. When you run this, make sure that kuramoto.py is either in your path. Note also that most of the code is to just defining initial parameters and plotting the results. Actual execution of the module are two lines. Fig. 1 is the expected output.

import numpy as np
import pylab as plt
from kuramoto import Kuramoto

# Defining time array
t0, t1, dt = 0, 40, 0.01
T = np.arange(t0, t1, dt)

# Y0, W, K are initial phase, intrinsic freq and 
# coupling K matrix respectively
Y0 = np.array([0, np.pi, 0])
W = np.array([28, 19, 11])

K1 = np.array([[  0, 0.2,  1.1],
               [0.5,   0, -0.7],
               [0.3, 0.9,    0]])
               
K2 = np.array([[   0, -0.5, 0.2],
               [-0.4,    0, 1.0],
               [ 0.8,  0.8,   0]])
K = np.dstack((K1, K2)).T

# Passing parameters as a dictionary
init_params = {'W':W, 'K':K, 'Y0':Y0}

# Running Kuramoto model
kuramoto = Kuramoto(init_params)
odePhi = kuramoto.solve(T).T

# Computing phase dynamics
phaseDynamics = np.diff(odePhi)/dt

# Plotting response
nOsc = len(W)
for osc in range(nOsc):
    plt.subplot(nOsc, 1, 1+osc)
    plt.plot(T[:-1], phaseDynamics[osc])
    plt.ylabel("$\dot\phi_{%i}$" %(osc+1))
plt.show()

[1] Y. Kuramoto, “Chemical Oscillations, Waves, and Turbulence,” 1984, doi: 10.1007/978-3-642-69689-3.
[2] Steven H. Strogatz, “From Kuramoto to Crawford: exploring the onset of synchronization in populations of coupled oscillators,” 2000, doi: 10.1016/S0167-2789(00)00094-4.