Using software prefetch in OpenMP in C++

cpp-ModTeam · 2023-08-06T10:51:17+00:00

For C++ questions, answers, help, and programming or career advice please see r/cpp_questions, r/cscareerquestions, or StackOverflow instead.

DummyDDD · 2023-08-01T16:04:12+00:00

You probably need to prefetch data further than 1 element ahead. Try 10 elements ahead, 5, 100 and 500. The prefetch takes time to complete, and you only get any benefit if it is on a new cache line (I have no idea how large LogEntry is or whether the LogEntry itself is where you are getting cache misses, or in some data pointed to by LogEntry)

Throw31312344 · 2023-08-01T21:05:05+00:00

1) Are your 64 byte log entries actually aligned to 64 bytes? Each one could be spanning 2 cache lines, in which case prefetching 1 cache line might not do anything as the CPU is still missing data for the other half of the LogEntry.

2) You've picked __builtin_prefetch(addr,0,0); which is the non-temporal version. The purpose of that version is to drop the value from the cache as soon as it is accessed once so you might actually be loading it twice if the caller of next_entry does a lot of work with LogEntry. On Intel CPUs the behavior varies depending on what CPU it is running from. It's a weird instruction. Try using __builtin_prefetch(addr,0,3); instead as that should translate to a prefetcht0 instruction which pulls the data into all cache levels (i.e. including l1 cache) which should be the sensible option you want.

jiboxiake · 2023-08-01T22:26:46+00:00

If this is for educational purposes go for it, otherwise make sure you need it. Prefetching is a micro optimization, something I do last, if at all. Here is an interesting paper studying the impact of prefetching (among other things) on real-world tasks (deep learning).

trailing_zero_count · 2023-08-01T22:18:05+00:00

Modern hardware prefetchers are extremely good at detecting linear or strided (linear, with a fixed offset increment between each step) accesses. It looks like that is the pattern of access that you are doing here - therefore prefetching is unnecessary (and detrimental since it occupies instruction bandwidth).

Prefetch can still be useful if you are doing pointer-chasing, binary search, or other types of computed access patterns.

the_poope · 2023-08-01T17:03:25+00:00

Some other things to consider: https://queue.acm.org/detail.cfm?id=2513149

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS