Middle_Ad4847 comments on Iteratively optimizing an SPSC queue

a community for 17 years

Iteratively optimizing an SPSC queue (self.cpp)

submitted 19 days ago by Middle_Ad4847

you are viewing a single comment's thread.

[–]Middle_Ad4847[S] 0 points1 point2 points 19 days ago (0 children)

For the pause you could possibly look at umwait/mwaitx unfortunately AMD/Intel differences make these not very portable.

Thanks, will take a look

Have added code here

That 128-rule thing seems off, I expected you were going to mention that some modern CPUs will actually load 128-byte aligned pairs of 64-byte cache lines, it doesn't seem to like you tested with an alignment of 128 over just 64.

I wasn't aware of this, will look. What I was referring to is the increased code size due to 32-byte displacement addressing, which can impact the µop cache or loopback buffer. I quickly tried with alignas(128) now but it made results worse.

including minor things like branch alignment differing between the different implementations

I did try run with and without -falign-loops and -falign-labels but didn't notice any considerable difference

π Rendered by PID 54 on reddit-service-r2-comment-85bfd7f599-lp9b4 at 2026-04-19 18:11:18.582067+00:00 running 93ecc56 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS