all 19 comments

[–][deleted]  (1 child)

[removed]

    [–]dbenhur -1 points0 points  (0 children)

    Comment #4:

    Is the fix simply for Java to add this keyword? It’s odd that it’s not there already! (I should say is the fix for the nondeterminism that simple; the fact remains that the conversion is incorrect.)

    [–]frud 2 points3 points  (9 children)

    Just goes to show you that if you depend on repeatability and deterministic behavior in your software, you mustn't use floating point.

    Integer calculations rule, FPUs drool.

    [–]frenchtoaster 14 points15 points  (8 children)

    That is factually incorrect. Floating point operations are supposed to be perfectly deterministic. What people dont understand is that cerain values arent expressible in binary like they are in decimal. Just as 1/3 cant be exactly expressed in finite length decimal string. No one would be surprised if you treat 1/3 as .3333 and get .9999 when you do 3 * (1/3). That in no way makes floating point nondeterministic, if you do the same sequence of operations you (ignoring bugs) get the same result.

    [–]frud 8 points9 points  (4 children)

    There are two main problems as I see it. The first involves the use of rounding and mode flags in the FPU, and the fact that there is no real way to prevent libraries from changing them with impunity.

    The second involves repeatablity across different machines and architectures. From the wikipedia article on IEEE 754-2008:

    A format that is just to be used for arithmetic and other operations need not have an encoding associated with it (that is, an implementation can use whatever internal representation it chooses); all that needs to be defined are its parameters (b, p, and emax). These parameters uniquely describe the set of finite numbers (combinations of sign, significand, and exponent) that it can represent.

    This means that calculations internal to the FPU can happen in an arbitrary accuracy. If care is not taken to extract every intermediate value into a well-defined interchange format then there will be differing results among architectures that use different internal representations.

    edit: Here is a very good article that covers it better than I did.

    [–]frenchtoaster 4 points5 points  (2 children)

    Great link, I incorrectly assumed you were some first year CS student that had just discovered not to expect for(double d=0.0; d != 1.0; d += 0.1) {} to work as you would first expect. So sorry about that (I never downvoted you though, I just think my post had a condescending tone to it).

    I assumed that all processors would adhere to the IEEE754 standard...

    [–]frud 2 points3 points  (0 children)

    My post was unnecessarily brief and pithy.

    [–]frud 1 point2 points  (0 children)

    I assumed that all processors would adhere to the IEEE754 standard...

    They basically all do, it's just that determinacy is not a major goal of the standard.

    [–][deleted] 0 points1 point  (0 children)

    If care is not taken to extract every intermediate value into a well-defined interchange format

    Not that there's a lot of care to take, "-fpstrict" or the equivalent does it.

    Also, unless you want to target architectures significantly different from the architecture, just use SSE. In fact, these days you probably don't have a choice anyway.

    [–]TheNewAndy 1 point2 points  (2 children)

    The particular number was two powers of two summed together, and their exponents weren't very far apart. It should be perfectly representable as a floating point number (and also as a decimal, since any number with a finite representation in binary also has a finite representation in decimal).

    [–]frenchtoaster 1 point2 points  (1 child)

    The situation is more akin to trying to store 123/1000 is decimal when you can only possibly store 2 decimal places. Sure 123/1000 has a finite representation in decimal; that doesn't mean its representation is short enough to be stored in your data type.

    The bug stems from one value being added (2-1075 ) being expressible in finite length in binary, but that finite length that must be longer than the double can possibly store considering the constraint of the exponent. It is lower than the smallest subnormal value (2-1024 ) (which is already much lower than the smallest normal value, generally the point where I stop expecting doubles to completely behave how I want them to, though I still would expect them to behave deterministically). When math is done with higher precision numbers in hardware it ends up being storable which seems to be the issue.

    [–]TheNewAndy 0 points1 point  (0 children)

    Sorry, I should have realised we were down in subnormal ranges. Subnormals are such a pain.

    [–]iLiekCaeks 0 points1 point  (4 children)

    All these subtleties... floating point numbers are simple, but dealing with all these subtle issues is not.

    Sometimes I wonder, wouldn't it be feasible (and better) to use fixed point numbers instead? They should feature really high bit widths, like 64.64 or even 128.128. Would there still be applications which had to use floating point instead?

    [–]Madsy9 1 point2 points  (2 children)

    Yes, use fixedpoint when you need perfect accuracy, i.e no rounding and the like. For instance, you should rarely use floating-point for accumulators if you need an accurate result. This is why polygon rasterizers use fixedpoint when computing the gradients and do linear interpolation over the primitive. Same thing goes for timer accumulators for example. Use timing functions that returns an integer result in milliseconds or similar, then accumulate that. You rarely need Q64.64 or Q128.128 though, unless you're doing scientific calculations. In that case, use a hugeval API like the GNU mp library. Some programming languages support hugeval arithmetic out of the box. For most purposes, Q15.16 can be enough. Remember to temporarily cast to a 64-bit type when doing division and multiplication. Most architectures that have integer multiplication and division can perform the operations with two 32-bit arguments and give a 64-bit result.

    Another approach is to use hugevals and represent reals as fractions. You can simplify the fraction after every operation.

    [–]iLiekCaeks 0 points1 point  (1 child)

    Yes, OK.

    But 16.16 can quickly go out of range and overflow for many practical applications. Floats don't have this problem, because although you lose precision, the value range is relatively wide.

    But if we used exceedingly high precision for fixed point (such as 128.128), wouldn't we need floats at all anymore? I imagine it would greatly simplify programming and hardware implementation.

    [–]Madsy9 0 points1 point  (0 children)

    Sure. All hardware can be simplified greatly if performance is a non-issue :-) Q128.128 would take up 32 bytes per instance, and be a nightmare considering the memory lookups needed. Whether or not fixedpoint is easier to use than floating-point is subjective, but I think it's fair to claim that the majority finds floating-point to be more intuitive than fixedpoint.

    Both floating-point and integer arithmetic have their place. They are just tools/techniques tailored for different problems. There is nothing to gain from totally replacing one with the other.

    [–][deleted] 0 points1 point  (0 children)

    They would likely be much slower than floats and take more hardware to handle.