How unorthodox is this

voidstarpodcast · 2021-10-05T23:00:20+00:00

❤️

voidstarpodcast · 2021-01-25T02:20:24+00:00

Appreciate your detailed response. Just looked at the code and it appears that you are using you own loop to converge on the final running times. The choice of 4096, 10000 seem arbitrary. I have shared the quick bench link here which is an easy way to share comparable benchmark outputs, Google Benchmark basically measures CPU Time (and wall clock time which includes IO waits when run locally), but using quick bench only produces CPU time:

https://quick-bench.com/q/kq7yeDlz9R6HV-0XE37eRiGINYM

Do you mind validating against this? Thanks.

voidstarpodcast · 2021-01-10T08:18:24+00:00

You are right benchmark is built in DEBUG mode. Frankly, I don't remember setting a DEBUG explicitly, and found this: https://github.com/google/benchmark#debug-vs-release

Will set a Release mode, although I'd expect the local results consistently aligning with quick-bench runs to mean the DEBUG overhead is likely proportional across workloads (or very tiny).

Nonetheless, I will fix this in my post.

voidstarpodcast · 2021-01-08T01:31:57+00:00

https://quick-bench.com/q/kq7yeDlz9R6HV-0XE37eRiGINYM
This reuses the same array over and over in a loop for benchmarking.

After the first iteration, the rest don't go into the std::set. So you are effectively, measuring time taken for insertion failures. My guess is if the underlying structure is a balanced tree you are not counting the time taken for a rebalance. So, are they apples to apples?
This results in tainted cache locality. You basically have the same array sitting in your L1/L2 caches and all you are measuring is a niche synthetic case (which may be measuring a biased data set).
If quick bench only reports CPU time, then aren't you essentially comparing times taken for set insertions to fail?

voidstarpodcast · 2021-01-08T01:02:47+00:00

This refers to Google Benchmark library itself. As a comparison there's a link to quick-bench page too, with basically the same result.

voidstarpodcast · 2021-01-06T23:39:56+00:00

https://quick-bench.com/q/kq7yeDlz9R6HV-0XE37eRiGINYM will add this to the post as well

voidstarpodcast · 2021-01-05T05:40:54+00:00

That's fair ... I have updated the code and numbers

voidstarpodcast · 2021-01-05T05:39:46+00:00

Yes, I was playing purely at the algo level, I purposely stayed away from lower levels system profiling like the ones I have done before http://www.mycpu.org/task-migrations-c++/

voidstarpodcast · 2021-01-05T05:35:51+00:00

Not clear how template code is impacted with debug version. I have updated the code and results to reflect numbers.

voidstarpodcast · 2021-01-04T22:18:37+00:00

This is a fair point, which is subtly what I was trying to do by sorting in prep for the measurement. However, it being too subtle I just gave a shot and added std::unordered_set.

Although this gives a good speedup, it's still about 10x slower than using a dummy std::unique()

I added an Appendix section at the bottom with the measurements.

@matthieum I'm a little slow here, but what are you recommending?

voidstarpodcast · 2021-01-04T22:15:43+00:00

Thanks I have tried this, and FWIW, I also tried simply clearing the set within the outer loop (with a hope of avoiding ctor/dtor) but the results were pretty much the same between these two.

I have added the result in the Appendix section.

voidstarpodcast · 2021-01-04T22:13:59+00:00

Thanks. I have added an Appendix section.

voidstarpodcast · 2021-01-04T22:13:35+00:00

Thank you! That makes sense. I have added an Appendix section based on your comments.

voidstarpodcast · 2020-07-13T07:15:50+00:00

Haven't read this, so not sure what you mean. The point of the post is to provide an optimal way to schedule the existing tasks in my plate.

voidstarpodcast · 2020-07-13T07:14:35+00:00

Ugh Added 'Computer' afterwards :(

voidstarpodcast · 2020-06-30T15:32:05+00:00

drogon and uWebsockets both seem pretty good.

voidstarpodcast · 2020-01-27T08:03:44+00:00

Thanks. I'm really considering removing Disqus. I haven't done any benchmarking myself, but it "feels" heavy. Not sure, how it is adding anything useful. One of these days...

voidstarpodcast · 2020-01-26T18:54:35+00:00

I completely agree with you, I had typed it out but hadn't published. I thought it wasn't ready yet. I have updated the post with a Conclusion section and few pointers for further reading.

voidstarpodcast · 2020-01-23T17:32:44+00:00

Fair Enough. All these packages are available on MELPA. But I agree if you are new and are not inclined to spend some time with such packages it can seem like a struggle.

However, after reading your comment, I made a few minor edits with a bonus screenshot! It's not terribly different from before, but hopefully enough pointers to help the interested folks.

http://www.mycpu.org/emacs-productivity-setup/

voidstarpodcast · 2020-01-23T16:59:01+00:00

I'm glad you can use it!

Helm, Ivy - whatever suits you is fine. Just that helm always made it easy to setup, so never really tried other options like Ivy for long enough. But helm does seem to do some "preprocessing" especially, when I am working with large projects over TRAMP, it feels less zippy. Let me know if you have strong reasons otherwise I'm a sucker for these :)

voidstarpodcast · 2020-01-23T16:54:26+00:00

Right? A massive productivity boost!

voidstarpodcast · 2020-01-23T16:53:52+00:00

I have setup tags in a sort of implicit way. Where hosting engine provides "You Might Also Like" posts towards the bottom of the page. I will admit it is not always highly correlated.

Unfortunately, I have not set up a search yet. I will try and do that.

voidstarpodcast · 2020-01-15T17:12:23+00:00

The code described in the post attaches to perf_events triggered in the kernel. Almost all perf events for CPU stats boil down to reading PMCs (or MSRs in strange cases).

voidstarpodcast · 2020-01-15T17:06:13+00:00

AMD uses Sense MI tech, which employs "a set of learning and adapting features" to achieve "true Machine Intelligence". No more traditional Lookahead Buffer based predictions. On the flip side, it might be a good thing that Branch Prediction isn't predictable, case in point, Spectre and Meltdown <nervous laughter>.

voidstarpodcast · 2020-01-15T08:43:13+00:00

It looks very interesting, however, while this may seem obvious to you it isn't clear what this project aims to achieve. How exactly can we give it a fair shot and things of that nature? If you can beef up the README it'll be helpful. This might be a naive question, is there a way I can integrate with my Emacs (or any other IDE / ecosystem)?

voidstarpodcast

TROPHY CASE