For the last 3 weeks (or even more) I’ve been comparing results from two programs: one written in Matlab and the other in Python. Although great effort has been put into making them as similar as possible, the outputs were ‘a little bit’ different. Needless to say, I’ve wasted three weeks. Here’s the problem:
>>> import numpy as np >>> rand = np.random.random >>> r1 = rand(10) >>> r2 = rand(10)*0.01 >>> R1 = r1 - 10*r2 >>> R2 = r1.copy() >>> for i in xrange(10): R2 -= r2 ... >>> R1 - R2 array([ -5.55111512e-16, 4.44089210e-16, -1.11022302e-16, -1.11022302e-16, 0.00000000e+00, 4.44089210e-16, 6.93889390e-18, -5.55111512e-16, 0.00000000e+00, 0.00000000e+00])
What I did there was creating two random sets of data (r1, r2) and then assigning value of r1 – 10*r2 to R1 and R2. However, the R2 is done in steps – R2 = ((((((((((r1-r2)-r2)-r2)-r2)-r2)…) -r2), if you prefer. The problem here is that with each iterations numbers are rounded to their nearest representation of the number. Oddly as it sounds, without additional effort computers don’t usually have representation of all values as this would be inefficient. Thus, even after 10 iterations there is a little discrepancy in results of two methods. This error grows and accumulates even more after greater number of iterations! Depending on one’s data this noise maybe insignificant, as it is just 1e-15, but in my case it did make a huge difference.
What’s worth noting is the error here is a multiple of 1.11e-16, which is half of a claimed Machine Epsilon for Python.