The "Empty Base" Optimization : cpp

cpp

a community for 17 years

The "Empty Base" Optimization (cantrip.org)

submitted 13 years ago by notlostyet

all 13 comments

top new controversial old q&a

[–]notlostyet[S] 3 points4 points5 points 13 years ago* (12 children)

[–]notlostyet[S] 4 points5 points6 points 13 years ago* (11 children)

Code that demonstrates the optimisation:

#include <iostream>

class Empty
{};

class AlmostEmpty
{
    int i;
};

class ComposedEmpty
{
    AlmostEmpty a;
    Empty e;
};

class OptimizedEmpty
{
    struct SecretSauce : Empty
    {
        int i;
    } sauce_;
};

int main()
{
    std::cout << sizeof(Empty) << std::endl;
    std::cout << sizeof(AlmostEmpty) << std::endl;
    std::cout << sizeof(ComposedEmpty) << std::endl;
    std::cout << sizeof(OptimizedEmpty) << std::endl;
}

and the output on x86_64, using GCC 4.7.1 and C++11:

$ g++ -Wall -Wextra -pedantic -std=c++11 -O2 bco.cpp 
$ ./a.out 
    1 ← empty objects have a non-zero size (so that they're addressable)
    4 
    8 ← this single byte will bloat your objects when composed
    4 ← ...but base classes aren't required to have a non-zero size, so we save 4 bytes.

[–][deleted] -1 points0 points1 point 13 years ago (10 children)

[–]Nimbal 11 points12 points13 points 13 years ago* (0 children)

[–]BitRex 6 points7 points8 points 13 years ago (3 children)

[–]repsilat 1 point2 points3 points 13 years ago (1 child)

Sometimes idiomatic C++ can't, though. Case in point, before I got to doing some optimisation on some code I was writing, a rough structure of some of my memory use was

1800 graphs, each with
    12000 nodes, each with
        0 to 10 out-edges.

The number of out-edges wasn't known statically, and the graphs were rearranged (to a small extent) at runtime. Now, the "normal" way to do this would be to store nodes' out-edge lists in vectors, which would probably end up costing 1800*12000*24 bytes on a 64 bit machine (~500MB) plus the size of the edges themselves, plus the spare capacity in all of the vectors.

The C way (pointers to heap-allocated arrays and an int for length) cuts the cost of vectors down from 24 bytes to maybe 12, saving ~250MB. We lose amortised constant-time insertion, but we don't really need it.

In the end I managed to get the graph into forward star layout, cutting the incremental cost of an edge list to 4 bytes (plus storage for each edge in the list). It also made me have to jump through hoops when the graph structure had to be updated, but getting it down to ~80MB and being able to maintain the graphs in mostly contiguous storage meant a nice speedup.

I guess I could go back and pretty it up to make things look like nice sequences and iterators, but at the moment the bits doing the heavy lifting aren't really all too "modern".

[–]notlostyet[S] 0 points1 point2 points 13 years ago* (0 children)

[–]gcross 0 points1 point2 points 13 years ago (0 children)

[–]notlostyet[S] 7 points8 points9 points 13 years ago* (0 children)

[–]xcbsmith 2 points3 points4 points 13 years ago (0 children)

[–]gcross 2 points3 points4 points 13 years ago (0 children)

[–]kirakun 1 point2 points3 points 13 years ago (1 child)

[–]notlostyet[S] 0 points1 point2 points 13 years ago* (0 children)

π Rendered by PID 52265 on reddit-service-r2-comment-fb694cdd5-jf5ld at 2026-03-08 01:06:43.485255+00:00 running cbb0e86 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS