Why does changing the sum order returns a different result?
23.53 + 5.88 + 17.64
= 47.05
23.53 + 17.64 + 5.88
= 47.050000000000004
Both Java and JavaScript return the same results.
I understand that, due to the way floating point numbers are represented in binary, some rational numbers (like 1/3 - 0.333333...) cannot be represented precisely.
Why does simply changing the order of the elements affect the result?
(2.0^53 + 1) - 1 == 2.0^53 - 1 != 2^53 == 2^53 + (1 - 1)
). Hence, yes: be wary when choosing the order of sums and other operations. Some languages provide a built-in to perform "high-precision" sums (e.g. python's math.fsum
), so you might consider using these functions instead of the naive sum algorithm.
Maybe this question is stupid, but why does simply changing the order of the elements affects the result?
It will change the points at which the values are rounded, based on their magnitude. As an example of the kind of thing that we're seeing, let's pretend that instead of binary floating point, we were using a decimal floating point type with 4 significant digits, where each addition is performed at "infinite" precision and then rounded to the nearest representable number. Here are two sums:
1/3 + 2/3 + 2/3 = (0.3333 + 0.6667) + 0.6667
= 1.000 + 0.6667 (no rounding needed!)
= 1.667 (where 1.6667 is rounded to 1.667)
2/3 + 2/3 + 1/3 = (0.6667 + 0.6667) + 0.3333
= 1.333 + 0.3333 (where 1.3334 is rounded to 1.333)
= 1.666 (where 1.6663 is rounded to 1.666)
We don't even need non-integers for this to be a problem:
10000 + 1 - 10000 = (10000 + 1) - 10000
= 10000 - 10000 (where 10001 is rounded to 10000)
= 0
10000 - 10000 + 1 = (10000 - 10000) + 1
= 0 + 1
= 1
This demonstrates possibly more clearly that the important part is that we have a limited number of significant digits - not a limited number of decimal places. If we could always keep the same number of decimal places, then with addition and subtraction at least, we'd be fine (so long as the values didn't overflow). The problem is that when you get to bigger numbers, smaller information is lost - the 10001 being rounded to 10000 in this case. (This is an example of the problem that Eric Lippert noted in his answer.)
It's important to note that the values on the first line of the right hand side are the same in all cases - so although it's important to understand that your decimal numbers (23.53, 5.88, 17.64) won't be represented exactly as double
values, that's only a problem because of the problems shown above.
Here's what's going on in binary. As we know, some floating-point values cannot be represented exactly in binary, even if they can be represented exactly in decimal. These 3 numbers are just examples of that fact.
With this program I output the hexadecimal representations of each number and the results of each addition.
public class Main{
public static void main(String args[]) {
double x = 23.53; // Inexact representation
double y = 5.88; // Inexact representation
double z = 17.64; // Inexact representation
double s = 47.05; // What math tells us the sum should be; still inexact
printValueAndInHex(x);
printValueAndInHex(y);
printValueAndInHex(z);
printValueAndInHex(s);
System.out.println("--------");
double t1 = x + y;
printValueAndInHex(t1);
t1 = t1 + z;
printValueAndInHex(t1);
System.out.println("--------");
double t2 = x + z;
printValueAndInHex(t2);
t2 = t2 + y;
printValueAndInHex(t2);
}
private static void printValueAndInHex(double d)
{
System.out.println(Long.toHexString(Double.doubleToLongBits(d)) + ": " + d);
}
}
The printValueAndInHex
method is just a hex-printer helper.
The output is as follows:
403787ae147ae148: 23.53
4017851eb851eb85: 5.88
4031a3d70a3d70a4: 17.64
4047866666666666: 47.05
--------
403d68f5c28f5c29: 29.41
4047866666666666: 47.05
--------
404495c28f5c28f6: 41.17
4047866666666667: 47.050000000000004
The first 4 numbers are x
, y
, z
, and s
's hexadecimal representations. In IEEE floating point representation, bits 2-12 represent the binary exponent, that is, the scale of the number. (The first bit is the sign bit, and the remaining bits for the mantissa.) The exponent represented is actually the binary number minus 1023.
The exponents for the first 4 numbers are extracted:
sign|exponent
403 => 0|100 0000 0011| => 1027 - 1023 = 4
401 => 0|100 0000 0001| => 1025 - 1023 = 2
403 => 0|100 0000 0011| => 1027 - 1023 = 4
404 => 0|100 0000 0100| => 1028 - 1023 = 5
First set of additions
The second number (y
) is of smaller magnitude. When adding these two numbers to get x + y
, the last 2 bits of the second number (01
) are shifted out of range and do not figure into the calculation.
The second addition adds x + y
and z
and adds two numbers of the same scale.
Second set of additions
Here, x + z
occurs first. They are of the same scale, but they yield a number that is higher up in scale:
404 => 0|100 0000 0100| => 1028 - 1023 = 5
The second addition adds x + z
and y
, and now 3 bits are dropped from y
to add the numbers (101
). Here, there must be a round upwards, because the result is the next floating point number up: 4047866666666666
for the first set of additions vs. 4047866666666667
for the second set of additions. That error is significant enough to show in the printout of the total.
In conclusion, be careful when performing mathematical operations on IEEE numbers. Some representations are inexact, and they become even more inexact when the scales are different. Add and subtract numbers of similar scale if you can.
=)
+1 for your hex-printer helper... that's really neat!
Jon's answer is of course correct. In your case the error is no larger than the error you would accumulate doing any simple floating point operation. You've got a scenario where in one case you get zero error and in another you get a tiny error; that's not actually that interesting a scenario. A good question is: are there scenarios where changing the order of calculations goes from a tiny error to a (relatively) enormous error? The answer is unambiguously yes.
Consider for example:
x1 = (a - b) + (c - d) + (e - f) + (g - h);
vs
x2 = (a + c + e + g) - (b + d + f + h);
vs
x3 = a - b + c - d + e - f + g - h;
Obviously in exact arithmetic they would be the same. It is entertaining to try to find values for a, b, c, d, e, f, g, h such that the values of x1 and x2 and x3 differ by a large quantity. See if you can do so!
double d = double.MaxValue; Console.WriteLine(d + d - d - d); Console.WriteLine(d - d + d - d);
- the output is Infinity then 0.
This actually covers much more than just Java and Javascript, and would likely affect any programming language using floats or doubles.
In memory, floating points use a special format along the lines of IEEE 754 (the converter provides much better explanation than I can).
Anyways, here's the float converter.
http://www.h-schmidt.net/FloatConverter/
The thing about the order of operations is the "fineness" of the operation.
Your first line yields 29.41 from the first two values, which gives us 2^4 as the exponent.
Your second line yields 41.17 which gives us 2^5 as the exponent.
We're losing a significant figure by increasing the exponent, which is likely to change the outcome.
Try ticking the last bit on the far right on and off for 41.17 and you can see that something as "insignificant" as 1/2^23 of the exponent would be enough to cause this floating point difference.
Edit: For those of you who remember significant figures, this would fall under that category. 10^4 + 4999 with a significant figure of 1 is going to be 10^4. In this case, the significant figure is much smaller, but we can see the results with the .00000000004 attached to it.
Floating point numbers are represented using the IEEE 754 format, which provides a specific size of bits for the mantissa (significand). Unfortunately this gives you a specific number of 'fractional building blocks' to play with, and certain fractional values cannot be represented precisely.
What is happening in your case is that in the second case, the addition is probably running into some precision issue because of the order the additions are evaluated. I haven't calculated the values, but it could be for example that 23.53 + 17.64 cannot be precisely represented, while 23.53 + 5.88 can.
Unfortunately it is a known problem that you just have to deal with.
I believe it has to do with the order of evaulation. While the sum is naturally the same in a math world, in the binary world instead of A + B + C = D, it's
A + B = E
E + C = D(1)
So there's that secondary step where floating point numbers can get off.
When you change the order,
A + C = F
F + B = D(2)
To add a different angle to the other answers here, this SO answer shows that there are ways of doing floating-point math where all summation orders return exactly the same value at the bit level.
Success story sharing
May extend this later - out of time right now!
waiting eagerly for it @Jondouble
andfloat
, where for very large numbers, consecutive representable numbers are more than 1 apart.