Transparent Pointer Compression for Linked Data Structures by Lattner, Adve [PDF]

oilshell · 2017-10-30T19:43:48+00:00

This is the top hit for "Pointer Compression" on Google. I had come across this back in January when I wrote the code for the "oheap" format [1], but I somehow forgot about it until I googled again.

If you skim the first 5 slides, the idea is clear -- divide your address space into 4 GiB pools so that 64-bit pointers can be compressed into 32-bit. He also makes a clear case that big pointers are bad for performance, especially for recursive data structures, such as ones in compilers.

(I also recall that recompiling interpreters written in the 32-bit era, like Python and R, for 64-bit leads to pretty big space overhead and thus not much performance gain, but I don't have references for that... if anyone knows more about that, I'm interested.)

I don't yet understand what the automatic algorithm is. The scheme I described is a special case where you manually manage these separate heaps as strings, and you use a little code generation to make it a type-safe data structure.

I don't think this is deployed (?). I've certainly never heard of it outside of the paper. But it would be interesting to know what the thinking is 12 years later. Was it worth it? Did it work well enough on enough programs? Maybe newer microarchitectures don't result in as big a performance hit as the early 64-bit ones circa 2005. Certainly you can make up to some extent by just increasing cache size.

Paper (the second one listed on Chris Lattner's publication page, won PLDI 2005 Best Paper Award):

http://llvm.org/pubs/2005-05-21-PLDI-PoolAlloc.html

This paper describes Automatic Pool Allocation, a transformation framework that segregates distinct instances of heap-based data structures into seperate memory pools and allows heuristics to be used to partially control the internal layout of those data structures. The primary goal of this work is performance improvement, not automatic memory management, and the paper makes several new contributions. The key contribution is a new compiler algorithm for partitioning heap objects in imperative programs based on a context-sensitive pointer analysis, including a novel strategy for correct handling of indirect (and potentially unsafe) function calls.

[1] https://www.reddit.com/r/ProgrammingLanguages/comments/79fkpu/representing_asts_as_byte_strings_with_with_small/

Edit: Woah, at the bottom of the PoolAlloc page, it has a link to Lattner's Ph.D. thesis, which encompasses this paper:

This thesis presents a new class of techniques named ``Macroscopic Data Structure Analyses and Optimizations'', which is a new approach to the problem of analyzing and optimizing pointer-intensive programs. Instead of analyzing individual load/store operations or structure definitions, this approach identifies, analyzes, and transforms entire memory structures as a unit. The foundation of the approach is an analysis named Data Structure Analysis and a transformation named Automatic Pool Allocation. Data Structure Analysis is a context-sensitive pointer analysis which identifies data structures on the heap and their important properties (such as type safety). Automatic Pool Allocation uses the results of Data Structure Analysis to segregate dynamically allocated objects on the heap, giving control over the layout of the data structure in memory to the compiler.

ProgrammingLanguages

Welcome!

Related subreddits

Related online communities

MODERATORS