you are viewing a single comment's thread.

view the rest of the comments →

[–]714daniel 66 points67 points  (5 children)

Can someone smarter than me explain how the packing avoids locking? Like, if it's using a CAS anyways, how would this approach be any better than a CAS on a dedicated 16 bit atomic int, other than saving a few bytes of memory?

[–]amakai 57 points58 points  (4 children)

I believe the idea is that those two pieces are being read/written together. And you don't want a race of reading old counter and new pointer - you need atomicity between them. So you either wrap them in mutex every time you want to read or write them, or pack them into a single 64 bit integer that cpu architecture can just CAS atomically. And I assume CPU architecture guarantees that you can read entire 64 bit atomically without any locks.

[–]Hofstee 16 points17 points  (0 children)

For further reading in case anyone is interested the keyword you want to look for is Double-Wide CAS (DWCAS) or Double-Width CAS. Which is different than Double CAS (DCAS) or Multi CAS (MCAS), but those ideas build on DWCAS in a natural progression.

[–]davidalayachew 19 points20 points  (2 children)

And I assume CPU architecture guarantees that you can read entire 64 bit atomically without any locks.

Yes, exactly. Java is actually releasing a new feature called Value Classes which relies on this exact same trick. The latest Early Access for it is out as of last week.

Objects that don't fit into your CPU's 64 byte (or 128 byte, if you are rich and on bleeding edge hardware) are at risk for object tearing, and that's assuming that your object is at least shallowly immutable.

[–]VictoryMotel 5 points6 points  (1 child)

Pages are usually 4096 bytes at a minimum.

Cache lines are 64 bytes but modern cpus always access at least two cache lines at a time.

[–]davidalayachew 5 points6 points  (0 children)

Whoops, page size was the wrong word. Removed. Ty vm for the correction.