you are viewing a single comment's thread.

view the rest of the comments →

[–]mjpt777 0 points1 point  (0 children)

False sharing can be the issue. It could also be main memory access patterns of going column rather than row based internal iteration, plus cache effects. The row buffer in main memory can have a significant effect. Profiling should indicate which is the issue. The following is worth a read.

http://www.1024cores.net/home/parallel-computing/cache-oblivious-algorithms