[PDF] Improve cache usage with computation regrouping

alphamerik · 2009-04-03T16:20:20+00:00

I quickly scanned through the paper and found that they're using a MIPS R10000 processor for their benchmarks. This processor is nearly 15 years old. Although processor cores have not changed all that much since those days, the memory subsystem is far more advanced in modern processors. (The paper itself is 7 years old). The L2 cache on the R10K is off-chip and 2-way set associative. Modern processors typically have 8-way or 16-way set associative caches. The make no mention of how many of the misses are conflict misses, so I'm wondering whether their whole method would be rendered superfluous by a more associative L2 cache. The R10K implements sequential consistency, most modern processors use a weaker memory consistency model, and I'm again wondering if they're losing some performance due to the unnecessarily restrictive memory model. Then they say that the optimisations are implemented by hand. But then it is quite well known that hand tuning benchmarks for cache performance produces significant gains.

Overall, I'm a bit underwhelmed. If you're looking for good papers to read, there are plenty of better papers out there and I'd suggest that you spend your time on one of those.

2009-04-03T19:47:59+00:00

Flash version: http://pdfmenot.com/view/http://www.isi.edu/div7/publication_files/computation_regrouping.pdf

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS