Warning! Numerical approximation.

For the last 3 weeks (or even more) I’ve been comparing results from two programs: one written in Matlab and the other in Python. Although great effort has been put into making them as similar as possible, the outputs were ‘a little bit’ different. Needless to say, I’ve wasted three weeks. Here’s the problem:

>>> import numpy as np
>>> rand = np.random.random
>>> r1 = rand(10)
>>> r2 = rand(10)*0.01
>>> R1 = r1 - 10*r2
>>> R2 = r1.copy()
>>> for i in xrange(10): R2 -= r2
... 
>>> R1 - R2
array([ -5.55111512e-16,   4.44089210e-16,  -1.11022302e-16,
        -1.11022302e-16,   0.00000000e+00,   4.44089210e-16,
         6.93889390e-18,  -5.55111512e-16,   0.00000000e+00,
         0.00000000e+00])

What I did there was creating two random sets of data (r1, r2) and then assigning value of r1 – 10*r2 to R1 and R2. However, the R2 is done in steps – R2 = ((((((((((r1-r2)-r2)-r2)-r2)-r2)…) -r2), if you prefer. The problem here is that with each iterations numbers are rounded to their nearest representation of the number. Oddly as it sounds, without additional effort computers don’t usually have representation of all values as this would be inefficient. Thus, even after 10 iterations there is a little discrepancy in results of two methods. This error grows and accumulates even more after greater number of iterations! Depending on one’s data this noise maybe insignificant, as it is just 1e-15, but in my case it did make a huge difference.

What’s worth noting is the error here is a multiple of 1.11e-16, which is half of a claimed Machine Epsilon for Python.

Advertisements

Pythons ‘if’ conditions

The ‘if’ condition, being frequently used as it is, is not always well understood. Most of the people don’t care about the order of referenced conditions, which is fine. However, wanting to get most of it, one should consider putting the less compute demanding conditions first. Here is an example in Python:

>>> def one():
...   print '1'
...   return True
... 
>>> def two():
...   print '2'
...   return False
... 
>>> one() and two()
1
2
False
>>> two() and one()
2
False

As you can see, having two conditions linked with ‘and’, they are checked one at the time starting from left most. It is not magic. This is exactly what one would expect. If ‘if’ statement is able to make a decision during the evaluation, it will!