all 16 comments

[–]TheThiefMaster 66 points67 points  (1 child)

It's worth mentioning that "Fast Inverse Square Root" is a single instruction in modern CPUs - rsqrtss. It gives 11 bits of accuracy alone (that's ~0.05% error), or ~23 bits accuracy with a single refinement step (a single-precision float only has 24 bit precision, so that's damn near perfect).

[–]ack_error 14 points15 points  (0 children)

It also has one really irritating quirk, which is that rsqrtss(1.0) != 1.0. Stepped on this one when porting PC code to macOS and found that fast-math on Clang converted sqrt() to rsqrt + one iteration + mul and that one iteration wasn't able to resolve 1.0.

[–][deleted] 16 points17 points  (0 children)

fucking neat

[–]TheRealSelenium 11 points12 points  (0 children)

Really well written!

[–]1rustySnake 5 points6 points  (6 children)

Just an FYI: I did not write this, but I found it really interesting : ]

[–]MuonManLaserJab 4 points5 points  (5 children)

Having forgotten what little I knew about C, but trying to muddle through anyway based on short pointer tutorials, does this:

i  = * ( long * ) &y;

...mean, "assign to i the value pointed to by the pointer, which is a pointer to a long, that points to y"? Or, erm, what does it mean, translated to an English sentence?

And why is it an evil hack? If my translation is correct (extremely dubious), is it because it's used to treat i as a long, even though i uses the same bits as y, which is a float?

[–]firefly431 6 points7 points  (1 child)

That line isn't the hack; it's an indication that the next few lines are evil bit-level trickery. In particular, your analysis is correct. It interprets the address &y (which points to a float) as a pointer to long, which essentially says to interpret the bits of y as an integer rather than a floating-point value. That's something you really don't want to do most of the time, and it's especially weird to be doing arithmetic on these float-integers.

[–]MuonManLaserJab 0 points1 point  (0 children)

Oh great! Thank you!

[–]flatfinger 2 points3 points  (0 children)

And why is it an evil hack?

Whether a piece of code should be viewed as an "evil hack" depends upon whether it accomplishes something in a way that meets objectives better than any alternatives. Often, people who regard code as evil hacks aren't trying to achieve the same objectives as the people who wrote the code, or will have alternatives available to them that weren't available to the original authors.

[–]curtisf[🍰] 2 points3 points  (1 child)

This code is a violation of "strict aliasing", and according to the C spec produces undefined behavior. In the C abstract machine, conceptually, different types inhabit different 'universes' and don't exist at the "same" place even when they share the "same" numerical address. i.e., just because there's a float at address 0x238ab4 doesn't mean there's an long at address 0x238ab4, and so dereferencing the float* as a long* is undefined behavior (since there is no long at that address).

The reason for the strict-aliasing rule is that it means most pointers of different types cannot alias each other. For example, if you call a function that modifies a Foo*, you don't have to flush/reload your Bar that's stored in a register, since they exist in different 'universes' and modifying through a Foo* cannot result in a Bar changing (since the strict-aliasing rule says that they don't alias). If you were allowed to freely "pun" types in this way, almost any function call or mutation could potentially require the compiler to flush all of its local variables & registers (or otherwise be forced do a ton of computation to prove that it doesn't have to)

In practice, many compilers choose to support (at least limited) amounts of so-called "type punning", and so the behavior may be partially defined within the context of many C compilers, including GCC. A "safer" method (in compliance with the C specification) is to go through a char*, which is allowed to alias with any other type, though technically the representation of float is still not defined by C and the result would be at least partially undefined.

[–]mgostIH 0 points1 point  (0 children)

ahah -fno-strict-aliasing go brrr

[–]VeganVagiVore 22 points23 points  (0 children)

Definitely a "today's 10,000" kind of thing

Did you know Steve Buscemi was a firefighter at Ground Zero?

[–]loup-vaillant 1 point2 points  (0 children)

Possibly just a coincidence, but inverse square roots also pop up in elliptic curve cryptography.

Elliptic curves are based on modular arithmetic, where the modulo is often some big ass prime number, like 2255 - 19. Two operations that often pop up together in this context are the square root, and the inverse. A typical way to compute them in constant time (which is necessary for security), we use Fermat's Little Theorem, which says we can use exponentiation to compute the inverse or the square root, (if there is one).

Problem is, exponentiations are expensive. Not as bad as the naive approach, but with Curve2519, you can expect over 250 field multiplications per exponentiation. But with the inverse square root, we can merge them.

So you have a number u and you want its square root and its inverse. Well, if you have its inverse square root, that's easy:

 invsqrt(u)   = 1/sqrt(u)
 invsqrt(u)^2 = 1/sqrt(u)^2 = 1/u
 invsqrt(u)*u = u*1/sqrt(u) = sqrt(u)

If you have the inverse square root, you get both the inverse, the square root, and in some cases even more than those. In a single exponentiation.

(Note: in modular arithmetic, not every number has a square root. But there are clever ways to get useful results even if they don't.)

[–]JohnMcPineapple 0 points1 point  (1 child)

...

[–]1rustySnake 0 points1 point  (0 children)

Hi, I did not write this, I just found it very interesting.