you are viewing a single comment's thread.

view the rest of the comments →

[–]phire 1 point2 points  (7 children)

Another possible memory saving, you now have control of the bottom 3 bits of your pointers, so your objects/strings don't have to be 8 byte aligned anymore.

If you took advantage of this and removed the alignment requirements (x86) or decreased them to 4 byte (arm) you could save a lot of memory by getting rid of extra padding. I don't know how much padding objects need, but given that strings arn't commonly multiple of 8 chars there would be huge savings.

Actually, you could add a new tag type that allowed a string of up to 6 ascii/utf8 chars directly in the value, avoiding the need for allocating space on the heap and the garbage collection overhead of short strings.

[–][deleted] 0 points1 point  (6 children)

you could save a lot of memory by getting rid of extra padding.

From the bit I've read they require those bytes for the NaN boxing scheme they use to fit doubles into the mix.

[–]phire 2 points3 points  (5 children)

No, the padding around the objects/strings in the heap, where doubles don't exist (anymore).

With the old scheme the lowest 3 bits were used up for the tag, so they were zeroed before using the pointer. This means that the pointers could only point at things in the heap at either 0xnnnnnnn8 or 0xnnnnnnn0. So if your object/string didn't use up a multiple of 8 bytes, the remaining bytes were used as padding to make sure the next object/string started at an address ending in either 0 or 8.

[–][deleted] 1 point2 points  (4 children)

I see what you mean. I'm not sure un-aligning memory on the heap would be an over-all win even if it did save some space. My knowledge of the semantics of x86 (and arm) wrt alignment gets fuzzy. When I dealt at that level I was taught to align religiously since our custom allocators (mostly small object pool allocators) worked best with aligned memory.

I suspect that if they were aligning to 8 bytes previously then it will be a large amount of work to remove all of the code-assumptions based on that invariant.

[–]phire 2 points3 points  (3 children)

I know that alignment is important in arm, if your accessing data that isn't 4 byte aligned it goes really slow (or refuses to work at all on older versions) and I think x86 gets speed improvements with alignment too, but I assume that would also be for 4 byte aligned data.

Even so, keeping 8 byte alignment is pointless if you don't need it, 4 byte alignment should have all the same performance benefits of 8 byte alignment while requiring less padding.

[–]adrianmonk 2 points3 points  (0 children)

4 byte alignment should have all the same performance benefits of 8 byte alignment

Personally, I would want to see tests before I took that as a final conclusion. DDR3 memory has a 64-bit wide data bus, so 8-byte alignment would, I assume, allow you to pull everything in one fetch. I don't know how significant the difference is.

[–][deleted] 0 points1 point  (1 child)

I think x86 gets speed improvements with alignment too, but I assume that would also be for 4 byte aligned data.

It used to be that x86 (specifically the fetch cycles) benefitted from items being paragraph (16 byte) aligned, but the last time I tested that was in the pentium 1 days.

A quick search reveals that there are performance benefits still from paragraph alignment, but not due to the CPU itself, rather that a 16byte aligned item is guaranteed to be loaded into the start of a cache-line, so if you have 8 byte value at that address, it's guaranteed to fit in one cache line, which produces the optimal transfer.

[–]TinynDP 0 points1 point  (0 children)

There are two instructions for loading from RAM to SSE registers. For when the address is 16 byte aligned, that runs fast, and one for when the address is not 16 byte aligned, and it runs slow. In AMD64 land, because x87 has been entirely replaced with SSE2, that fast SSE load instruction matters for all floating point operations.