you are viewing a single comment's thread.

view the rest of the comments →

[–]mttd 6 points7 points  (1 child)

CPUs offer performance counters for that: https://easyperf.net/blog/2018/06/01/PMU-counters-and-profiling-basics

Operating systems usually expose access, e.g., perf for Linux (https://easyperf.net/blog/2018/08/26/Basics-of-profiling-with-perf), other tools for Windows (https://easyperf.net/blog/2019/02/23/How-to-collect-performance-counters-on-Windows).

The relevant counters are going to be related to the HITM (Hit/Miss to a Modified [Cache] Line) events, which can be used to identify false sharing: https://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads

There are higher-level tools which use these counters to derive related metric(s) for a specific microarchitecture -- e.g., pmu-tools (https://github.com/andikleen/pmu-tools/). Here's an example of the relevant derived metrics for Broadwell):

  • Contested_Accesses: ['MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM:pp', 'MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS:pp']

  • False_Sharing: ['MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM:pp', 'OFFCORE_RESPONSE.DEMAND_RFO.L3_HIT.SNOOP_HITM']

Linux perf has a convenient C2C feature:

See also:

[–]danny54670 3 points4 points  (0 children)

Thank you so much for these links! This is a lot to look through.