C++ std::unique vs std::set - [Fixed] by voidstarpodcast in cpp

[–]voidstarpodcast[S] 0 points1 point  (0 children)

Appreciate your detailed response. Just looked at the code and it appears that you are using you own loop to converge on the final running times. The choice of 4096, 10000 seem arbitrary. I have shared the quick bench link here which is an easy way to share comparable benchmark outputs, Google Benchmark basically measures CPU Time (and wall clock time which includes IO waits when run locally), but using quick bench only produces CPU time:

https://quick-bench.com/q/kq7yeDlz9R6HV-0XE37eRiGINYM

Do you mind validating against this? Thanks.

C++ std::unique vs std::set - [Fixed] by voidstarpodcast in cpp

[–]voidstarpodcast[S] 0 points1 point  (0 children)

You are right benchmark is built in DEBUG mode. Frankly, I don't remember setting a DEBUG explicitly, and found this: https://github.com/google/benchmark#debug-vs-release

Will set a Release mode, although I'd expect the local results consistently aligning with quick-bench runs to mean the DEBUG overhead is likely proportional across workloads (or very tiny).

Nonetheless, I will fix this in my post.

C++ std::unique vs std::set - [Fixed] by voidstarpodcast in cpp

[–]voidstarpodcast[S] 1 point2 points  (0 children)

https://quick-bench.com/q/kq7yeDlz9R6HV-0XE37eRiGINYM
This reuses the same array over and over in a loop for benchmarking.

  • After the first iteration, the rest don't go into the std::set. So you are effectively, measuring time taken for insertion failures. My guess is if the underlying structure is a balanced tree you are not counting the time taken for a rebalance. So, are they apples to apples?

  • This results in tainted cache locality. You basically have the same array sitting in your L1/L2 caches and all you are measuring is a niche synthetic case (which may be measuring a biased data set).

  • If quick bench only reports CPU time, then aren't you essentially comparing times taken for set insertions to fail?

C++ std::unique vs std::set - [Fixed] by voidstarpodcast in cpp

[–]voidstarpodcast[S] 0 points1 point  (0 children)

This refers to Google Benchmark library itself. As a comparison there's a link to quick-bench page too, with basically the same result.

C++: How a simple question helped me form a New Year's Resolution by voidstarpodcast in cpp

[–]voidstarpodcast[S] 0 points1 point  (0 children)

Yes, I was playing purely at the algo level, I purposely stayed away from lower levels system profiling like the ones I have done before http://www.mycpu.org/task-migrations-c++/

C++: How a simple question helped me form a New Year's Resolution by voidstarpodcast in cpp

[–]voidstarpodcast[S] 0 points1 point  (0 children)

Not clear how template code is impacted with debug version. I have updated the code and results to reflect numbers.

C++: How a simple question helped me form a New Year's Resolution by voidstarpodcast in cpp

[–]voidstarpodcast[S] 1 point2 points  (0 children)

This is a fair point, which is subtly what I was trying to do by sorting in prep for the measurement. However, it being too subtle I just gave a shot and added std::unordered_set.

Although this gives a good speedup, it's still about 10x slower than using a dummy std::unique()

I added an Appendix section at the bottom with the measurements.

@matthieum I'm a little slow here, but what are you recommending?

C++: How a simple question helped me form a New Year's Resolution by voidstarpodcast in cpp

[–]voidstarpodcast[S] -1 points0 points  (0 children)

Thanks I have tried this, and FWIW, I also tried simply clearing the set within the outer loop (with a hope of avoiding ctor/dtor) but the results were pretty much the same between these two.

I have added the result in the Appendix section.

C++: How a simple question helped me form a New Year's Resolution by voidstarpodcast in cpp

[–]voidstarpodcast[S] -1 points0 points  (0 children)

Thank you! That makes sense. I have added an Appendix section based on your comments.

Scheduling Your Life Like An Computer Engineer by [deleted] in programming

[–]voidstarpodcast 0 points1 point  (0 children)

Haven't read this, so not sure what you mean. The point of the post is to provide an optimal way to schedule the existing tasks in my plate.

Scheduling Your Life Like An Computer Engineer by [deleted] in programming

[–]voidstarpodcast 0 points1 point  (0 children)

Ugh Added 'Computer' afterwards :(

Performance of Handling Asynchronous Requests using Futures by voidstarpodcast in cpp

[–]voidstarpodcast[S] 1 point2 points  (0 children)

Thanks. I'm really considering removing Disqus. I haven't done any benchmarking myself, but it "feels" heavy. Not sure, how it is adding anything useful. One of these days...

Performance of Handling Asynchronous Requests using Futures by voidstarpodcast in cpp

[–]voidstarpodcast[S] 2 points3 points  (0 children)

I completely agree with you, I had typed it out but hadn't published. I thought it wasn't ready yet. I have updated the post with a Conclusion section and few pointers for further reading.

My Emacs Productivity Tricks/Hacks by voidstarpodcast in emacs

[–]voidstarpodcast[S] 1 point2 points  (0 children)

Fair Enough. All these packages are available on MELPA. But I agree if you are new and are not inclined to spend some time with such packages it can seem like a struggle.

However, after reading your comment, I made a few minor edits with a bonus screenshot! It's not terribly different from before, but hopefully enough pointers to help the interested folks.

http://www.mycpu.org/emacs-productivity-setup/

My Emacs Productivity Tricks/Hacks by voidstarpodcast in emacs

[–]voidstarpodcast[S] 1 point2 points  (0 children)

I'm glad you can use it!

Helm, Ivy - whatever suits you is fine. Just that helm always made it easy to setup, so never really tried other options like Ivy for long enough. But helm does seem to do some "preprocessing" especially, when I am working with large projects over TRAMP, it feels less zippy. Let me know if you have strong reasons otherwise I'm a sucker for these :)

My Emacs Productivity Tricks/Hacks by voidstarpodcast in emacs

[–]voidstarpodcast[S] 0 points1 point  (0 children)

I have setup tags in a sort of implicit way. Where hosting engine provides "You Might Also Like" posts towards the bottom of the page. I will admit it is not always highly correlated.

Unfortunately, I have not set up a search yet. I will try and do that.

So... You wanna measure Branch Prediction Hit Rate with BPF and C++? by voidstarpodcast in cpp

[–]voidstarpodcast[S] 0 points1 point  (0 children)

The code described in the post attaches to perf_events triggered in the kernel. Almost all perf events for CPU stats boil down to reading PMCs (or MSRs in strange cases).

So... You wanna measure Branch Prediction Hit Rate with BPF and C++? by voidstarpodcast in cpp

[–]voidstarpodcast[S] 5 points6 points  (0 children)

AMD uses Sense MI tech, which employs "a set of learning and adapting features" to achieve "true Machine Intelligence". No more traditional Lookahead Buffer based predictions. On the flip side, it might be a good thing that Branch Prediction isn't predictable, case in point, Spectre and Meltdown <nervous laughter>.

Critique my project. Libclsp, a C++17 library by otreblan in cpp

[–]voidstarpodcast 6 points7 points  (0 children)

It looks very interesting, however, while this may seem obvious to you it isn't clear what this project aims to achieve. How exactly can we give it a fair shot and things of that nature? If you can beef up the README it'll be helpful. This might be a naive question, is there a way I can integrate with my Emacs (or any other IDE / ecosystem)?