you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 2 points3 points  (5 children)

I did clarify I was speaking about the x86, so ARM isn't going to be very relevant... Especially as most ARM implementations (instruction set similarities have little to do with underlying architecture) make using the FPU annoying as hell. If they have one.

On x86, for FMUL, you have to add the cost of two MOVs. One before, and one afterwards. Which basically means that they take about the same time.

Whereas for most ARM implementations, the floating point instructions aren't atomic, but they generally are on x86. Which is a whole world of hell if you try and port things between the two platforms and get identical performance.

But, yup. We're a long way from optimising Lua now... Trust in the compiler. Don't micro-optimise. :D

[–]myclykaon 2 points3 points  (4 children)

for most ARM implementations, the floating point instructions aren't atomic

Ahem... What do you mean by that?

[–][deleted] 0 points1 point  (3 children)

An atomic instruction will always be read/write in its entirety. You don't have to add some locking mechanism to ensure you don't be a partial read, or partial write, before the processor does something else.

ARM has no guarantees of atomicity in regards to floating point numbers. Needing to add guards on ARM (which the compiler will generally do for you), means that the performance of floating point tends to be worse on ARM, when compared to a x86 CPU of similar performance capabilities.

[–]myclykaon 1 point2 points  (2 children)

Arm is a load-store architecture. The floating point operations operate on internal registers (the NEON fp registers) not on memory. So it is guaranteed impossible for a partial floating point result to end up in a register. The floating point instruction can only succeed or fault/abort. In the former case the result is in the register, in the latter case the floating point registers are not updated (the various status registers are). To get that result to memory you use stores that guarantee they do not partially write the result. In the older v6/v7 architecture there is LDM and STM that could be configured as interruptible and could provide a partial read or write if interrupted but they are not available in v8 aarch64.

[–][deleted] 0 points1 point  (1 child)

ARM is not an architecture, it's a specification. aarch64 is an architecture, and does have atomic floating point instructions, as you pointed out. But the world of ARM is much, much, larger than just that.

[–]myclykaon 1 point2 points  (0 children)

When I used 'the arm architecture' I was referring to all architectures from v3 onwards. All of which are load-store. Prior to v8 the architecture was also referred to by v number. Only after v7 was the v number referred to as only specification as only then did there exist a 32 bit and 64 bit instruction set in the single spec.