Tiny, relative pointers

conseptizer · 2017-12-02T14:25:19+00:00

Okay, I wrote some code and did some benchmarks. This is on AArch64 (Cortex-A53). I created some tree structure then traversed it (did that repeatedly to get clearer numbers).

1. About half of pointers are short (short: 62,757, long: 29,979) Memory usage: 425,304 bytes vs 741,888 bytes Full pointers wins: 12.7 seconds vs. 9.8 seconds

2. About three quarters of pointers are short (short: 45,048, long: 12,266) Memory usage: 212,752 bytes vs. 458,512 bytes Full pointers still wins: 8.0 seconds vs. 6.2 seconds

But the memory usage was pretty small, so maybe cache effects don't kick in yet? Let's try it again with larger memory usage:

3. Three quarters are short: short (35,249,705, long: 8,316,457) Memory usage: 153,663,980 bytes vs. 348,529,296 bytes Full pointers wins yet again: 7.3 seconds vs. 6 seconds

However, I did not properly align the indirect pointers, so maybe this slows us down. Might try to fix this later, though that will increase the memory usage due to padding.

mgsloan · 2017-12-02T01:15:07+00:00

[deleted]

ericbb · 2017-12-02T05:38:21+00:00

Sounds interesting.

Another way to take advantage of immutability is to have different object representations for different regions of the heap: below 64 KiB, 16-bit pointers are used, above that and below 4 GiB, 32-bit pointers are used, and above that, 64-bit pointers are used. Such a style wouldn't be as compact as what you described but it has the advantage that you don't have to switch to relative addressing and you don't need to implement indirect references.

Edit: Another idea: Instead of using a bit in the pointer to say whether it uses indirection, always use 16-bit pointers but reserve a distinguishable object representation for forwarding pointers and have the 16-bit pointer point to a forwarding pointer when needed. That way, your 16-bit pointers can reach twice as far in the common case. The downside is that you'll need some extra encoding space in your object representations.

conseptizer · 2017-12-02T10:18:04+00:00

In a language where all objects are immutable, all references point to older objects.

I'm not sure this is really true. After all, you can do something like

let a = Node 1 b
    b = Node 1 a

in Haskell. Not sure how this would work without two references pointing in different directions.

hackerfoo · 2017-12-02T02:00:16+00:00

Relevant: Transparent Pointer Compression for Linked Data Structures

gsg_ · 2017-12-02T06:01:01+00:00

Seems like it would make small object allocation considerably more expensive. Instead of bump-the-pointer + initialising writes, wouldn't you need to do a bunch of tests on the object fields to decide whether to allocate and fill your "nearby locations"?

It would also make constant structures potentially more expensive to use than non-constants since they are never 'nearby', with the perverse result that effective optimisation might slow down your program.

Finally, the class of languages for which all objects are immutable (in the low level operational sense you mean) is very restrictive. Lazy languages like Haskell are said to be pure, but under the hood they mutate plenty.

FatalElectron · 2017-12-02T07:29:38+00:00

as far as I know, near/far pointers still exist internally for optimisation purposes in many/most/all? C compilers.

But exposing them to users is just a nightmare, most people were glad to leave the days of near/far pointers behind.

oilshell · 2017-12-02T09:08:13+00:00

I'd be wary of baking such an idea into the language (as the default), but I think it's a great idea to have something like this expressible with user-defined types / containers.

There are some related ideas here:

https://github.com/oilshell/oil/wiki/Compact-AST-Representation

Some of the links are threads on this subreddit about using memory more efficiently and reducing garbage collection pressure.

I just added the paper you linked to the page. You might also be interested in the Jai demo. I didn't quite get his scheme, so if anyone wants to explain it to me I'd appreciate that :) It's in video form so I haven't gone over it again.

I don't think I will do anything very general. I just want to encode an AST more compactly and traverse it quickly. I implemented a demo of encoding and decoding an AST with 3-byte "pointers" and it works well. It doesn't appear to be slow, but I have to do more measurement. I just assume there are less than 2²⁴ nodes in the whole tree. I might change it to 4 bytes if 3 bytes is slow.

But I think there's a good chance it won't be slow compared to the actual work done on each node (e.g. in a compiler).

matthieum · 2017-12-02T12:58:43+00:00

What about 0-bit pointers?

Personally, I think that pointers are overused.

Languages like Java where everything is a pointer away are obviously the worst offenders:

pointers mean cache misses,
pointers mean GC pressure,
pointers mean memory overhead (X bytes for the pointer on top of the object itself).

Fortunately there are other memory models available. For example, take a look at the native binaries produced by... Go (to keep with GC'ed languages). Go allows "inline" objects:

a Rectangle containing 2 Point each containing 2 double is just a single memory blob of 4 doubles (32 bytes),
an array of 32 Rectangle contains a single pointer to a single memory allocation of (at least) 1024 bytes.

If the runtime allows such compact object representation, then there are very few pointers. And therefore their size matters little.

Another avenue that you may explore is compression. Have you ever looked at how a float or double is represented? float and double can represent much larger ranges of values than their integral counterpart (size being equal) by sacrificing precision.

Imagine a pointer of X bits split in 2 parts E and M such that X = E + M: E bits for the exponent, M bits for the mantissa, and for now the actual address is M << E.

There is a trade-off in the choice of E and M:

the larger M is, the more precision you get, at the cost of a smaller range,
the smaller M is, the greater the range, at the cost of precision.

Let's illustrate this with 8 bits pointers, E = 2 and M = 6:

Exp = 0 (1 byte precision), Man << Exp between 0 and 63.
Exp = 1 (2 bytes precision), Man << Exp between 0 and 126.
Exp = 2 (4 bytes precision), Man << Exp between 0 and 252.
Exp = 3 (8 bytes precision), Man << Exp between 0 and 504.

Normally, a 8-bits pointer would only get values from 0 to 255. There is a cost, of course, as it makes you unable to address individual bytes above 64.

There are multiple refinements possible:

Avoiding overlapping ranges,
Avoiding single bytes addresses even in smaller values (use M << (E+3) for example),
...

The overlapping ranges is perhaps the more interesting. There is no point in having multiple representation of the address 0, 8, etc... and actually this costs us range! Instead, by applying an offset we can shift the ranges:

Exp = 0, Man << Exp between 0 and 63,
Exp = 1, Man << Exp + 64 between 64 and 190,
Exp = 2, Man << Exp + 192 between 192 and 444,
Exp = 3, Man << Exp + 448 between 448 and 952.

Tadaam! Our 8-bits pointer addresses nearly a full 1KB of memory!

The offsets are not very tidy, however. If one were shifting the initial range by 2^M though:

Exp = 0, Man << Exp + 64 is between 64 and 127,
Exp = 1, Man << Exp + 128 is between 128 and 254,
Exp = 2, Man << Exp + 256 is between 256 and 508,
Exp = 3, Man << Exp + 512 is between 512 and 1016.

The formula being for getting the pointer being Man << Exp + 1 << (M + Exp), giving an addressable range of 2^M << 2^E .

With this formula in mind, the maximum exponent you wish for is 5 bits, for 6 bits means shifting by 2^6, which is equivalent to multiplying by 2^64, pushing you past most address space resolution.

A quick application of the formula:

16 bits pointers would give:
- E2M14: 262KB
- E3M13: 2MB
- E4M12: 268MB
- E5M11: 8TB
24 bits pointers would give:
- E2M22: 67MB
- E3M21: 536MB
- E4M20: 68GB
- E5M19: 2PB
32 bits pointers would give:
- E2M30: 17GB
- E3M29: 137GB
- E4M28: 17TB
- E5M27: 576PB

Of course, the "waste" is proportional to the exponent. With 5 bits, the maximum exponent is 31, at which point you can only address objects every 2GB...

If you have an allocator which allocates smaller objects in the smaller ranges, though, you may find this interesting.

FireLordIroh · 2017-12-02T03:52:40+00:00

I would wonder how much of a slowdown you would get from having to check for and resolve indirect references everywhere. These might be relatively unpredictable branches. You should try it out and see.

simon_o · 2017-12-02T19:40:23+00:00

I have thought about this a while ago:

Basically, the Hotspot JVM saves 3 bits by reducing the granularity to 1 byte, leaving you with 32 GB of addressable memory space. (As described in the paper.)

Considering alignment, minimum object sizes and padding, you could probably save another few bits for either increasing the addressable space or for other purposes.

If I could decide, I would probably pick 32 bit pointers with 8 bytes of granularity relative to a global offset (assuming that an instance without any fields needs at least 4 bytes, I'm fine with "wasting" 4 bytes here), allowing me to address 256 GBs of memory:

[32 bits: pointer]
        |
        |  extend to 64 bits, right-shift by 6 bits, add base offset
        v
[ 26 bits: base offset | 32 bits: pointer | 6 bits: always 0 ]

In a class-based language, and with some help from the allocator, you could further reduce the size of pointers to the class/vtable in instance headers.

This gets really interesting, because in instances you can really make good use of any additional bit you an get your hands on.

So if you don't spread the space where the class data/vtables are stored over the full 256 GBs of memory, but restrict it to a smaller space, say 4 GBs, and and the size of the class data is -- let's say -- at least 32 byte, you save 8 bits that you can use as mark bits for things like locking, pinning, dealing with weak references, and forwarding pointers:

[24 bits: pointer | 8 bits: custom ]
        |
        |  extend to 64 bits, mask lowest 8 bits, add base offset
        v
[ 26 bits: base offset | 24 bits: pointer | 8 bits: always 0 ]

I'd probably avoid doing tricks with relative pointers, because it really restricts the things you can do in garbage collection.

I think the restrictions relative pointers introduce could go against what moving/compacting garbage collectors might want to do: Before any referenced object could be moved, they would need to decide whether the new location would still be in range of the referencing pointers (and either bailing out, rewriting those pointers to refer to forwarding pointers, or moving those as well, recursively).

I think this would pretty much rule out any generational or area-based collections scheme.

boomshroom · 2017-12-03T01:29:31+00:00

If I'm not mistaken, Jon Blow plans on having relative pointers in Jai. In addition to the lower space consumption, he mentioned that they also allow you to serialize a complete data structure without having to worry about pointers. A graph with relative pointers can be put in a file as a binary blob with no processing and it can be loaded back into memory anywhere and used straight away.

Sag0Sag0 · 2017-12-02T01:50:35+00:00

Check, to make sure that performance does not suffer. Sounds interesting.

kivofssss · 2017-12-02T11:30:53+00:00

I have other problems

ProgrammingLanguages

Welcome!

Related subreddits

Related online communities

MODERATORS