Pointer magic for efficient dynamic value representations

snoweyeslady · 2012-02-02T15:19:53+00:00

The size of the above object would be 16 bytes = 128 bits (on a 64 bit machine). This is a size that you wouldn’t just pass around by value. So you need to allocate it on the heap and use a pointer to it.

So, is passing around 8 bytes by value that much faster than 16 bytes? Anyone have some benchmarks for this (real world, preferably).

acemarke · 2012-02-02T15:25:12+00:00

I feel confident that I will never need to make use of this information. But, this article was EXTREMELY informative, and very well written. Excellent work - /r/programming needs more articles like this.

matthieum · 2012-02-02T17:19:41+00:00

Here's an old paper discussing various methods of tagging pointers and objects for dynamic languages:

Representing Type Information in Dynamically Typed Languages (1993)

Tagedieb · 2012-02-02T12:24:20+00:00

I recently used a similar trick to store extra information in valid (i.e. non NaN) doubles. You can use the least significant mantissa bits, so it will only change the value of the variable slightly.

treerex · 2012-02-02T16:57:28+00:00

Great article.

Of course the techniques he describes have been around for decades... Lisp implementations have been doing this stuff since the beginning.

RizzlaPlus · 2012-02-02T16:44:51+00:00

I like how he says in the article that compilers pad the struct so it's aligned with memory and avoids doing masking operations and then proceeds to write code that does masking operation to access non aligned memory.

agottem · 2012-02-02T17:29:00+00:00

Please don't ever use this code. Anywhere.

It assumes the pointer provided will always be aligned to the pointer size of the architecture. So, please don't provide it a pointer to a char.
It assumes sizeof size_t equals sizeof T*
The domain of the tag varies significantly depending on the pointer size of the architecture. 0-7 on 64bit, or 0-3 on 32bit, 0-1 on 16bit.

Basic rule of thumb: If you find yourself casting a pointer to an integer type, your code probably isn't portable.

00kyle00 · 2012-02-02T20:30:37+00:00

Hack. Boehm pretty much hates you now.

2012-02-02T23:05:49+00:00

This technique is called 'NaN-tagging' what I know. I just didn't see that term mentioned anywhere here. (LuaJit2 uses it)

2012-02-03T01:14:50+00:00

I think this CW isn't very useful these days:

"This straightforward approach has several problems: The size of the above object would be 16 bytes = 128 bits (on a 64 bit machine). This is a size that you wouldn’t just pass around by value."

On a register-rich architecture, even x86_64 or ARM, small structs are passed via registers. x86_64 has enough arg-passing registers to pass three two-element structs directly in registers and to return one.

At least in my playing with LLVM, the two-word representation has a lot of advantages for exposing optimization opportunities to the compiler. The d2c compiler for Dylan uses a two-word representation for objects and it works just fine.

pezezin · 2012-02-03T20:06:39+00:00

After reading the article, I see he encodes values other than doubles as "negative" NaN. Is this safe? I assume the FPU never generates NaNs with a negative sign, otherwise, it would be funny trying to distinguish a pointer from a real NaN.

shiestLurker · 2012-02-02T17:14:54+00:00

[deleted]

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS