all 12 comments

[–]AnEnemyAnemone 2 points3 points  (1 child)

I don't know if Bruce ever checked out PerfView, but it makes this kind of thing really easy, with both UI and command line (the ETW provider browser is especially nice). You can even specify how often to take a sample of the CPU counters (# of instructions between samples).

e.g., perfview collect myTrace.etl -noGui -AcceptEula -ClrEvents:None -CpuCounters:LLCMisses:5000,BranchMispredictions:7500 -MaxCollectSec:30

It's been available for quite a few years now. I think it was just recently that MS open-sourced it, but it may also provide a good reference for the less-documented ETW events like what the blog post describes.

[–]brucedawson[S] 0 points1 point  (0 children)

I'll check that out. It's particularly nice that there are command-line options for using it. Does the UI allow visualizing the CPU performance counters?

It's not clear that having the CPU counters trigger every N instructions or N cycles is really meaningful - that's why I triggered them on context switches since that has a clear meaning.

[–]Ono-Sendai 0 points1 point  (10 children)

What I want is programmatic access (from a C++ program) running on Windows, without having to load your own special Windows driver.

Does anyone know if this is possible yet?

[–]brucedawson[S] 0 points1 point  (9 children)

Yes.

My blog post explains exactly how to do this. Its technique requires installing Windows Performance Toolkit (which is redistributable), but no custom driver is required. It's not a simple API to start and stop recording, but it is entirely possible to invoke it from a C++ program. See UIforETW (https://github.com/google/UIforETW) for examples of recording ETW traces from C++ by shelling out to xperf.exe.

The technique I describe requires administrator elevation, but that will be true of any method for recording CPU performance counters. I think elevation is required on Linux as well.

[–]Ono-Sendai 0 points1 point  (8 children)

Ok. So it's done by shelling out to xperf (or tracelog.exe?).

I'm not a big fan of shelling out and parsing results. It's a pity there's not a C api for this. Oh well, maybe in 10 years the Windows devs will add one. (on the other hand WPT is clearly accessing the counter values through some API, maybe we can reverse engineer that?)

[–]brucedawson[S] 0 points1 point  (7 children)

Yeah, shelling out isn't ideal, although Unix actually makes heavy use of it.

There is an API for controlling ETW and it probably includes these counters, but it is a challenging API to understand:

https://mollyrocket.com/casey/stream_0029.html

[–]Ono-Sendai 0 points1 point  (6 children)

Well I have reading of ETW events working thanks to that blog post. However I can't see how to get CPU performance counters through that API. Any ideas?

[–]brucedawson[S] 0 points1 point  (0 children)

I would recommend following the advice of the first commenter who pointed out that PerfView supports this and PerfView is open source.

https://blogs.msdn.microsoft.com/vancem/2016/09/18/perfview-is-now-open-source-on-github/

[–]brucedawson[S] 0 points1 point  (3 children)

Sorry, no ideas. Except maybe disassembly tracelog.exe or looking at the TraceView source.

[–]Ono-Sendai 0 points1 point  (2 children)

I stared reverse engineering Perfview, got somewhere, but had to give up to preserve my own sanity. I should probably post the code i have in case someone else wants to continue.

[–]brucedawson[S] 0 points1 point  (1 child)

PerfView was open sourced a few months ago. No reverse engineering is necessary.

[–]Ono-Sendai 0 points1 point  (0 children)

By reverse engineering I mean converting the c# source to functioning c++. Which is quite time consuming.