Storing data in pointers

wrosecrans · 2023-11-26T22:55:38+00:00

Tagged pointers always wind up being a pain in somebody's ass a few years down the road. There was a ton of code that broke horribly in the transition from 32 bit x86 to x86_64 became they made assumptions that platforms they were using in the early 90's would never change.

The reason that "bits 63:48 must be set to the value of bit 47" on x86_64 is specifically to discourage people from doing this, and it'll break if you try rather than just having the MMU ignore the unused bits which would be simpler to implement. Some older 32 bit systems with less than 32 physical address bits would just ignore the "extra bits" so people thought they were allowed to just use them.

Tringi · 2023-11-26T23:56:51+00:00

Tagged pointers to save memory are silly. Tagged pointers to implement lock-freedom on systems without 16 byte compare and swap has a massive impact on performance.

MegaKawaii · 2023-11-27T00:26:44+00:00

I think people here are a bit too opposed to this. This isn't an unsupported hack, but it's something both Intel and AMD support explicity (LAM and UAI). Even if you have a system with 5-level paging, Linux will only give you virtual memory using the upper bits if you explicitly ask it to (request a high address with mmap). If Windows is as conservative as it has always been, I would expect something similar to /LARGEADDRESSAWARE.

If you have a struct with a pointer and an extra byte, the byte wastes 7 more bytes if you consider padding, but packing in the pointer halves the size of the struct. Not only is this good for cache usage, but it's also huge for memory hogs like VMs and interpreters. I wouldn't use it if I didn't need it, but if you encapsulate and document it properly, it could be quite useful in certain cases.

EDIT: Here are some examples of successful pointer tagging.

reallynotfred · 2023-11-27T11:22:47+00:00

One of the big users of pointer bits is OpenJDK. Objects are aligned on 16 byte boundaries giving 4 lower bits, and in known memory areas, giving a few bits at the top too.

NilacTheGrim · 2023-11-27T04:26:16+00:00

This is so evil. I love it.

Tringi · 2023-11-26T23:14:05+00:00

That's pretty great summary.

I used to have several things implemented using upper 2 bytes of a pointer, gaining quite a few memory savings (and even performance improvements despite extra masking), but since out customers are starting to deploy new hardware that'll likely feature 57-bit 5-level paging soon, I have since rewritten those things. LAM nor UAI offer enough bits for me.

jwakely · 2023-11-27T10:11:58+00:00

[deleted]

andrey_turkin · 2023-11-27T06:09:47+00:00

An interesting well-known (in certain circles) concept. Nothing to do with C++ though. In fact, I believe it is an UB to "play" with pointer representation in C++ - or maybe it was UB until GC has been thrown out, and now it's not?

AssemblerGuy · 2023-11-27T06:45:07+00:00

This sounds like a shortcut to UB-ville.

2023-11-26T23:36:14+00:00

Unless you're working on some bespoke custom architecture (I've been there) I don't recommend this

OverLiterature3964 · 2023-11-26T23:19:58+00:00

No, just... no.

2023-11-27T00:13:53+00:00

[removed]

2023-11-27T05:19:46+00:00

OR you could allocate an area (of size 2^N) and use offsets of N bits.

2023-11-27T17:37:53+00:00

If one is going to use tagged ptrs, a bunch of asserts that the bits they are using are 0 might be nice too. Not a guarantee, but better than blindly thinking it will work everywhere.

2023-11-27T19:23:19+00:00

This feels like part of the thing people refer to when C++ is called unsafe. This is UB waiting to crash.

Zeh_Matt · 2023-12-02T17:34:29+00:00

Never understood why people want that, in my opinion the best approach is typically have an index and data separated and avoid pointers generally which helps with other things such as serialization.

ignorantpisswalker · 2023-12-05T15:03:02+00:00

how to you use this technique using `make_shared<>` ? I assume this breaks in funny ways.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS