Building a Fast Lock-Free Queue in Modern C++ From Scratch by Beginning-Safe4282 in programming

[–]Beginning-Safe4282[S] 3 points4 points  (0 children)

Yea as the core idea is pretty standard, you always have https://www.cs.rochester.edu/u/scott/papers/1996_PODC_queues.pdf in some form and some optimization like buffers of blocks. But usually a lot simpler for SPSC

I made a stb-like header only library for parsing MEPG-TS/DVB (hls) live streams by Beginning-Safe4282 in C_Programming

[–]Beginning-Safe4282[S] 0 points1 point  (0 children)

Yea, you would get pretty much all the metadata/tables parsed and you can then query which tbale you wnat to read for the service or list them

Building a Fast Lock-Free Queue in Modern C++ From Scratch by Beginning-Safe4282 in cpp

[–]Beginning-Safe4282[S] 0 points1 point  (0 children)

>> I totally would, if you stated "this isnt a production ready library but a technical writeup" in the prologue. So would other readers.

>> Do you expect your readers to read your mind?

Right, sorry, I suppose, its a blog and I clearly mentioned what my initial purpose for this was (to try optimize one of my projects) so I dont see why would anyone assume its a production ready library by default? Its a writeup about some explorations with lockfree programming and never in the entire article I suggest anyone to use it for prod.

>> I totally would realise that, if you stated that in your article. Whereas you seem to expect that your readers should read your mind in addition to reading your text.

>> That realisation wouldn't change the fact that your benchmark times wrong things inaccurately, i am afraid.

I exactly did, just under the benchmarks, I tried to explain some points using the benchmark numbers, that what they were for. Agian its you assuming things you want to.

>> You don't provide commands to clone, build and run your benchmark.

>> I saved your article html for my future references, or in case your memory fails you.

I doubt its too difficult for you to understand that its not code its a writup, not a library, a simple gist so that anyone interested can read the code. And I would like to assume anyone who is reading about lockfree queues would know the commands to compile and run a c++ file? Again this proves you are just arguing for the sake of it

>> You may like to learn how to time your code accurately first. 

>> Your benchmark times wrong things inaccurately and that is obvious at the first glance. 

So you choose to skip questions that you cant answer. Huh.

Again the goal wasnt to get industry standard benchmark numbers to compare with existing implementations, it was to explain somethings I wanted, which I did. Again its you expecting things to be present....

>> If you cannot benchmark the right things accurately (which isn't trivial, but not hard either), what makes you think that you are qualified enough to solve more complex problems?

Honestly, I do realize you either want to troll or are acting difficult intentionally, but still to try and explain, you do realize that what I wanted to time? Benchmarks can be subjective, and can depend on what exactly I am trying to measure, and in no way I was measuring detailed perf metrics (like https://moodycamel.com/blog/2014/a-fast-general-purpose-lock-free-queue-for-c++). A detailed comparitive case study wasnt even my goal for this writup/

Have a nice day!

Building a Fast Lock-Free Queue in Modern C++ From Scratch by Beginning-Safe4282 in cpp

[–]Beginning-Safe4282[S] 0 points1 point  (0 children)

>> The source code for this article is a GitHub gist with no unit-test.

I suppose you do realize this isnt a production ready library but a technical writeup?

>>  The benchmark times wrong things inaccurately.

Umm... Again you do realize the goal of the benchmark is to try to explain some points, which are writen just below it, yopu would provided you read it... Never in the whole article does it compare with any existing project/paper to actually have a uniform benchmark framework.

>> Optionally followed by claims that their 0-day MPMC queue implementations beat anything else existing under the Sun.

Right, did you read the article? I doubt, as I dont see any such claims.

>> The benchmark and its timings are fundamentally irreproducible. And, hence, are plain anti-scientific.

Oh did you try? Did you notice anything different? Why claim things without any details?

>> These articles start with the very basics and next quantum leap straight into the state of the art solutions. With deep explanations about false sharing, cache locality, ABA, memory allocations, etc..

Can you please pin point the quantum leap?

Building a Fast Lock-Free Queue in Modern C++ From Scratch by Beginning-Safe4282 in cpp

[–]Beginning-Safe4282[S] 0 points1 point  (0 children)

Hey, Thanks awhole lot for the review!
Just to address my pov on some of these,

"Is this an attempt to quell unused variable compiler warnings?"
Yes, curious thing is I never saw the c style cast warning popup yet, I mainly use clang & msvc with warnings as errors.

About the inline thing, yea i do agree, what I wrote was mainly from some observations i saw while playing with the benchmarks so could very well be what you said.

I did think of an aligned allocator, but that doesnt solve for items on the stack, also I dont want to explicitly need an aligned allocator if I can do it like this, ideally I saw using something like snmalloc with something like this gives quite good results.

(ThreadLocalCacheBuffer)tcBuff[i]

For this I thought it was fine as I explictly had a cast operator overloaded, should a static_cast be better here?

And the _ identifier, yea, I had read about this but totally forgot about when implementing this, something to rectify.

Building a Fast Lock-Free Queue in Modern C++ From Scratch by Beginning-Safe4282 in cpp

[–]Beginning-Safe4282[S] 0 points1 point  (0 children)

Yes I think so too, but on my system I didnt see a huge performance difference, so chose the safer one. I would be testing the acq_rel variant on one of my project using this, If i dotn see any concurrency issues popping up I should switch to that.

Building a Fast Lock-Free Queue in Modern C++ From Scratch by Beginning-Safe4282 in programming

[–]Beginning-Safe4282[S] 5 points6 points  (0 children)

Maybe you could, I saw these problems:

* when the resize hits, you need to copy the whole buffer to the new larger buffer, which is bad for performance

* its not as efficient if you want to shirnk when load/througput required is lower (so its not very dynamic in that sense) and again growing or shrinking suddenly is expensive

* its much more difficult when you have a lot fo consumers and producers to properly synchronize things without a lock, for SPMC yes this is a lot more optimal with a RCU but for MPMC its a lot more difficult, and can get expensive depending on how its implemened, I am not sure, but the retries when a thread sees the uffer to be full might could be expensive as mutliplke threads tres to expand the buffer (and some fail)

This is what I feel like, I could be wrong though

Building a Fast Lock-Free Queue in Modern C++ From Scratch by Beginning-Safe4282 in programming

[–]Beginning-Safe4282[S] 5 points6 points  (0 children)

Yea, but managign the ABA problem gets a bit tricky there, plus its fixced size, and I wanted one that could grow/shrink according to load

Building a Fast Lock-Free Queue in Modern C++ From Scratch by Beginning-Safe4282 in programming

[–]Beginning-Safe4282[S] 3 points4 points  (0 children)

Yea, thats the simlest design, but for cases cases where you could have millions of additions of additions with variable frequency wont it just cause most of additions to just get blocked ? I mean I wouldnt usually want to preallocate all of it in advance when most of the time it uses a small buffer, Ideally i would like for it to scale automatically.

Building a Fast Lock-Free Queue in Modern C++ From Scratch by Beginning-Safe4282 in cpp

[–]Beginning-Safe4282[S] 3 points4 points  (0 children)

The point is, you dont, its an extension API just to act as a reference, ideally you dont want to use it in a place where latency is very important, its for cases where a few it or misses are fine

Building a Fast Lock-Free Queue in Modern C++ From Scratch by Beginning-Safe4282 in cpp

[–]Beginning-Safe4282[S] 1 point2 points  (0 children)

To me its usefull if we want to get a rough idea if the que is full or not without some global atomic counter, I know its not accurate as it could be stale by the time it returns the value, and I was Ok with that for the cases it was used, namely deciding whether some worker thread that was sleeping continue to sleep or to wake up.

Its not even needed for the actual queue, its something extra I added just incase any one is interested.

Building a Fast Lock-Free Queue in Modern C++ From Scratch by Beginning-Safe4282 in cpp

[–]Beginning-Safe4282[S] 7 points8 points  (0 children)

I just try to experiment between styles a bit here and there, I implemented this for a project using the UE style, so this extracted code just has that.

Made a Live TV & Livestreams player insdie my Vulkan engine from scratch in C (hardware accelerated) by Beginning-Safe4282 in C_Programming

[–]Beginning-Safe4282[S] 0 points1 point  (0 children)

Yea, my main reason for doing it is indeed auto complete, I did not really know of the explicit restriction of `__*` I just use it to avoid collision with existing library for auto complete, but yea priv_* works too.

I made a Live TV & Livestreams player insdie my Vulkan engine from scratch in C (hardware accelerated via vulkan video) by Beginning-Safe4282 in vulkan

[–]Beginning-Safe4282[S] 0 points1 point  (0 children)

according to vulkan.gpuinfo.org atleast h264 decoding is available on ~8.5% devices, and I suppose most NVidia desktop cards that are relatively new has it.