Halton sequence in Python

Dawid Laszuk published on
3 min, 507 words

Sometimes when we ask for random we don't actually mean random by just random. Yes, pseudo-random.

Consider unitary distribution with ranges 0 and 1. Say you want to draw 5 samples. Selecting them at random would mean that we might end up with set of {0, 0.1, 0.02, 0.09, 0.01} or {0.11, 0.99, 0.09, 0.91, 0.01}. Yes, these values don't seem very random, but that's the thing about randomness, that it randomly can seem to not be random.

Depending on the purpose of our selection, these values might be just OK. After all, they came from that distribution. However, if our goal is to reconstruct the distribution, or extract information about with limited number of samples, it is often better to draw those samples in pseudo-random way. For example, in accordance to van der Corput sequences for 1D distributions or its generalized version Halton sequence.

The best practice for sampling N dimensional distribution is to use different prime numbers for each dimension. For example, when I need to sample a 5 dimensional unitary distribution, or search space, I will use bases of (5, 7, 11, 13, 17). This is to prevent periodic visits of the same position.

In case you are wondering what's the difference between actual random and pseudo-random, here is a gist: [gallery ids="869,868" type="rectangular"]

Both are good, but the actual random can produce many empty holes. What we like to have is a fair representation of all areas of our search space.

Thus, without further ado, here are some code snippets.

This is a definition of my prime generating generator:

def next_prime():
    def is_prime(num):
        "Checks if num is a prime value"
        for i in range(2,int(num**0.5)+1):
            if(num % i)==0: return False
        return True

    prime = 3
    while(1):
        if is_prime(prime):
            yield prime
        prime += 2

As for Halton sequence, as mentioned before it uses van der Corput sequence. Again, here is the definition:

def vdc(n, base=2):
    vdc, denom = 0, 1
    while n:
        denom *= base
        n, remainder = divmod(n, base)
        vdc += remainder/float(denom)
    return vdc

And finally, definition for the Halton sequence:

def halton_sequence(size, dim):
    seq = []
    primeGen = next_prime()
    next(primeGen)
    for d in range(dim):
        base = next(primeGen)
        seq.append([vdc(i, base) for i in range(size)])
    return seq

To use all of this simply call halton_sequence(size, dim). These variables refer to the number of size of sample poll and the dimension of your problem. So if one wants to sample 3 dimensional space with 10 samples each it would be called as below. (Notice: first dimension has prime value 5, then it's 7, 11, and following prime values.)

>>> halton_sequence(10, 3)
[
[0, 0.2, 0.4, 0.6, 0.8, 0.04, 0.24000000000000002, 0.44, 0.64, 0.8400000000000001],
[0, 0.14285714285714285, 0.2857142857142857, 0.42857142857142855, 0.5714285714285714, 0.7142857142857143, 0.8571428571428571, 0.02040816326530612, 0.16326530612244897, 0.30612244897959184],
[0, 0.09090909090909091, 0.18181818181818182, 0.2727272727272727, 0.36363636363636365, 0.45454545454545453, 0.5454545454545454, 0.6363636363636364, 0.7272727272727273, 0.8181818181818182]
]