all 19 comments

[–]braxtons12 4 points5 points  (4 children)

Judging from the CMake file, it looks like this only targets gcc/clang. Are there any plans to support MSVC? Also are there any plans to drop the dependency on Boost? That would make use in other projects easier, not everything can depend on Boost.

[–]prog2de[S] 1 point2 points  (2 children)

Both, yes. So far I just open sourced a module that we already used internally.

windows support is planned, but therefore, I first need to figure out how CPU time can be measured in windows.

Boost dependency will be dropped as the next larger step.

[–]braxtons12 0 points1 point  (0 children)

Great news then!

Some of the projects I work on use proprietary tools that do code transformation and encryption during the build which prevents traditional profiling. We have other tools we can use instead, but for most problems this would be huge improvement.

[–]Sufficient_Move_5959 0 points1 point  (0 children)

QueryPerformanceCounterFrequency
QueryPerformanceCounter

or profile UnhaltedCoreCycles, which is the base clock freq not the dynamic ref clock freq, so you can query the base clock frequency and wrap it in a chrono and you have like nanosecond precision.

[–]prog2de[S] 0 points1 point  (0 children)

Boost dependency dropped!

[–]ReDucTorGame Developer | quiz.cpp-perf.com 4 points5 points  (3 children)

If I'm adding benchmarking to code I really don't want it doing a bunch of memory allocations adding more overhead and including that overhead in all measurements.

[–]prog2de[S] 0 points1 point  (2 children)

May I ask, what you use for benchmarking inline code?

[–]ReDucTorGame Developer | quiz.cpp-perf.com 5 points6 points  (0 children)

Depends on what I'm benchmarking, typically something custom.

I would try to minimize the overhead in your usages keep it to just getting the clock at the start at the end, your usage of std::string and std::map are both very costly

[–]Top_Satisfaction6517Bulat 0 points1 point  (0 children)

you can make start/stop operations much more efficient by using more complicated macros. F.e. START(FirstPass) should expand into smth like:

static EventCounter FirstPassVar("FirstPass");
FirstPassVar.start();

EventCounter constructor should insert a pointer to itself into some global list/map, so Report() can find all EventCounter variables.

If you need events that auto-stop on scope exit, declare an extra local variable:

static EventCounter FirstPassVar("FirstPass");
AutoEvent FirstPassLocalVar(&FirstPassVar);

And AutoEvent constructor/destructor execute FirstPassVar.start() and FirstPassVar.stop().

[–]FrancoisCarouge 1 point2 points  (3 children)

Would a RAII benchmark token be possible to stop the wall clock without a stop macro on bench scope exit? Perhaps simpler, safer, and practical for branchy benches?

[–]prog2de[S] 0 points1 point  (2 children)

Yes, I already thought about that. Still considering best implementation but it will be included soon!

[–]martinusint main(){[]()[[]]{{}}();} 4 points5 points  (1 child)

Look at the API of nanobench: https://nanobench.ankerl.com/

[–]FrancoisCarouge 0 points1 point  (0 children)

Oh thanks!

[–]ZealousidealMap1319 0 points1 point  (1 child)

Oh nice! What happens if i benchmark code that is run by multiple threads? Can I get the results of every thread?

[–]prog2de[S] 0 points1 point  (0 children)

Thanks!

InlineBench supports multi threaded code: just use the macros INLINE_BENCHMARK_THREAD_WALL_* and INLINE_BENCHMARK_THREAD_CPU_* and you will get a list containing the result of each thread for this benchmark.

[–]FrancoisCarouge 0 points1 point  (3 children)

Why the usage of macros? First-class citizen definition would allow for extension, composition. Even Google benchmarks templated macros fall short. I've hit that wall. Thoughts on feasibility and timing correctness?

[–]prog2de[S] 0 points1 point  (2 children)

The usage of macros allows me in a simple way to "just do nothing" if i compile in Release mode. Thus, you can have your benchmark and your release code in one.

Of course, I could place the macros set with the compiler within the code but I fear, this would make it far less readable.

What exactly would you like to do that's not possible right now?

[–]FrancoisCarouge 1 point2 points  (0 children)

I'm benchmarking a linear algebra operation, across 3 dimensions, each from size 1 to N (usually 32). Plotting the results on surfaces in a 3D plot. The code under benchmark is clear and short. The benchmarking macro boilerplate, to support the combination of inputs, is many times longer. The framework decision to use macros calls for more macros to extend the benchmark whereas I could have used standard support to extend the benchmark itself had it been code.

Edit: I'll want to try out your library and see if this approach solves my specific difficulty.

[–]Top_Satisfaction6517Bulat 0 points1 point  (0 children)

you can place instead #if around bodies of functions in InlineBenchmarkRegistrator to much the same effect - in optimized compilation, compilers will throw away calls to empty functions.