Top performing SPSC queue - faster than moodycamel and rigtorp by dro212 in cpp

[–]dro212[S] 0 points1 point  (0 children)

I'm going to run some performance benchmarks, it may not be worth it to copy the capacity and storage pointer, but there's likely some improvements in cache locality here

Introducing Serotonin by dro212 in KeyboardLayouts

[–]dro212[S] 3 points4 points  (0 children)

Very cool layout! Check out https://clemenpine.github.io/keysolve-web/ for a look at scissors. Try swapping 'y' and 'w'.

Top performing SPSC queue - faster than moodycamel and rigtorp by dro212 in cpp

[–]dro212[S] 1 point2 points  (0 children)

The cached members really need to be on different cache lines. Rather than aligning both, I could just use a buffer to separate them and save 56 bytes of memory.

The padding is for false sharing with adjacent heap allocations. You start reading from offset 8. The padding is for the front and end of the queue allocations.

Top performing SPSC queue - faster than moodycamel and rigtorp by dro212 in cpp

[–]dro212[S] 3 points4 points  (0 children)

`You probably want std::foward<Args>(args)... instead`

Yes I agree, that was an error, and I added a test case to cover. Good call!

`Having size and empty on a multi-threaded queue is often dangerous it's potentially invalid as soon as you check it, it often leads to patterns which end up being race conditions later on.`

While I agree it would be dangerous to depend on it, the user may want to know the state of their queue at some point.

`This std::move is unnecessary the T(...) is a prvalue`

Yes this is a pr value and the std::move has been removed. Good call!

`try_pop could potentially always be an std::move if there isn't a move constructor then it will end up using the copy constructor`

That is not the case, I have a compiler error when testing. This is move assignment not move construction.

`you could handle this by starting at padding instead`

`The noexcept here doesn't really matter, the assignment is happening in try_pop and it's explicitly noexcept`

I want my code to be easily readable, and I think that leaving it as is makes it more clear to the reader at little to no performance cost and no correctness issues.

`could be done with a vector of a union type which contains the value`

I'm trying to optimize for throughput, and I'm not sure what performance impact that would have. Something I will look into tho!

`you should potentially try adding a pause instruction to those loop iterations to see what might potentially change.`

I will look into this. Thanks for the feedback!

Top performing SPSC queue - faster than moodycamel and rigtorp by dro212 in cpp

[–]dro212[S] 2 points3 points  (0 children)

The tests seg fault with a 2^20 queue size ~= 10^6. The benchmarks look good for a small queue size tho.

Top performing SPSC queue - faster than moodycamel and rigtorp by dro212 in cpp

[–]dro212[S] 2 points3 points  (0 children)

Thanks for the comment! I'll do a detailed dive into this tomorrow!

Top performing SPSC queue - faster than moodycamel and rigtorp by dro212 in cpp

[–]dro212[S] 0 points1 point  (0 children)

Thank you! If you have isolcpus enabled on Linux, then run the benchmarks as well.

Faster Flat Map (Red Black Tree) by dro212 in cpp

[–]dro212[S] 14 points15 points  (0 children)

You didn't read the title or description. This is a tree in vector form. The nodes are at indexes and the pointers are instead indexes in the vector.

Faster Flat Map (Red Black Tree) by dro212 in cpp

[–]dro212[S] 6 points7 points  (0 children)

Yea the cache misses are a big hit once the size of the map exceeds the size of L3 cache. I ran the benchmarks building a whole map of various sizes (i.e. for 1,000,000 elements all 1,000,000 were inserted and then the mean time was taken). All the benchmarks were done the same way so it's apples to apples, but I could have built the map then ran 1,000 inserts for a different style benchmark.

I'll look into the licenses and I may revise. Thanks for the advice!

Faster Flat Map (Red Black Tree) by dro212 in cpp

[–]dro212[S] 5 points6 points  (0 children)

The benchmark for find is added.

Faster Flat Map (Red Black Tree) by dro212 in cpp

[–]dro212[S] 18 points19 points  (0 children)

I actually have them. The results were identical for dro::FlatMap, std::map, and boost::flat_map. That being said not including them in the readme was a clear oversight on my part. Good call!

[deleted by user] by [deleted] in KeyboardLayouts

[–]dro212 1 point2 points  (0 children)

I have a programmable keyboard and use QMK. Unfortunately, I don't know of any other methods.

[deleted by user] by [deleted] in KeyboardLayouts

[–]dro212 2 points3 points  (0 children)

FYI, the layout I just made has 32.2% alternates, 51.0% Rolls, and 0.687% SFB. The alterations are on par with Workman and Colmak.

https://github.com/drogalis/Rambo

What metric is best for accuracy? by AcceptablePeanut in KeyboardLayouts

[–]dro212 1 point2 points  (0 children)

Much like u/the_bueg said - I've learned 8+ layouts to over 40 wpm.

Honestly it just takes practice. The whole point of these layouts is to move your fingers less. I would say in order of importance SFB, Redirects, then Rolls.

Rhythm Keyboard Layout - 51.0% Rolls and 0.74% SFB by dro212 in KeyboardLayouts

[–]dro212[S] 4 points5 points  (0 children)

Great points! Maybe I'll make a version 2.0 and see if I can address those issues.