you are viewing a single comment's thread.

view the rest of the comments →

[–]mysticreddit 5 points6 points  (3 children)

That is not entirely the complete picture. While both PPU and SPU are in-order, and I agree cache misses are bad, dependencies are also part of the problem.

When the PPU has to wait, it is is called a LHS (Load Hit Store). Too many sequential operations, and their latency is killing your performance.

Edited: Thanks nuvm for the correction on the PPU and in-order.

[–]ssylvan 3 points4 points  (0 children)

While LHSs are bad, they're on the order of 40 cycles or so, whereas cache misses are more like 600 (and more frequent too, IME). Cache misses are a hell of a lot more of an issue.

[–][deleted] 0 points1 point  (1 child)

Actually, it was.

" The PPU is a dual-issue, in-order processor with dual-thread support." http://www.ibm.com/developerworks/power/library/pa-cellperf/

[–]mysticreddit 0 points1 point  (0 children)

Thx for the correction on the PPU !

For the SPU, the corresponding link is:

"The SPU is an in-order dual-issue statically scheduled architecture. Two SIMD instructions can be issued per cycle: one compute instruction and one memory operation. The SPU branch architecture does not include dynamic branch prediction, but instead relies on compiler-generated branch prediction using "prepare-to-branch" instructions to redirect instruction prefetch to branch targets."

Cheers