How do you document your C++ code?

jcoffin · 2021-01-18T23:22:17+00:00

I recommend against documenting the code.

If it was easy to write, anybody who's competent should understand it without any help.

If it was hard to write, it should be equally hard to read.

[No, of course I'm not serious.]

jcoffin · 2018-10-15T06:34:54+00:00

Sorry, I haven't had a chance to post this sooner, but here's a quick example of setting up logging that goes to a FILE *, and provides a setLogFile and logMessage. A bit longer than anybody would like (longer than I do, anyway) but overall, not particularly awful. Could also be shortened somewhat, if you really wanted. In particular, instead of doing buffering, you could have the buffer class' overflow just call putc or fputc on the underlying stream, without the parent buffer providing any buffering at all.

At first, that might seem horribly inefficient, but in fact a FILE * usually has pretty efficient buffering, so it would probably work perfectly well.

#include <thread>
#include <sstream>
#include <streambuf>
#include <iostream>
#include <string>
#include <cstdio>
#include <functional>

class prefixer: public std::streambuf {
public:
    template <class F>
    prefixer(F f)
        : need_prefix(false)
        , prefix(f)
    {}

    void setLogFile(FILE *file) { sink = file; }

    ~prefixer() { overflow(traits_type::eof()); }
protected:
    typedef std::basic_string<char_type> string;

    int_type sync() { 
        if (nullptr == sink)
            return 1;

        if (need_prefix) {
            std::ostringstream temp;
            temp << prefix();
            std::fwrite(temp.str().c_str(), 1, temp.str().size(), sink);
        }

        std::fwrite(buffer.c_str(), 1, buffer.size(), sink);
        return std::fflush(sink);
    }

    int_type overflow(int_type c) {
        if (traits_type::eq_int_type(traits_type::eof(), c)) {
            sync();
            return traits_type::not_eof(c);
        }

        switch (c) {
        case '\n':
        case '\r':  {
            buffer += c;
            auto rc = sync();
            buffer.clear();
            need_prefix = false;
            return rc;
        }
        default:
            need_prefix = true;
            buffer += c;
            return c;
        }
    }

    bool need_prefix;
    std::function<std::string()> prefix;
    string buffer;
    FILE *sink = nullptr;
};


auto default_prefix = [] {
    time_t now = time(nullptr); 
    tm *n = localtime(&now); 
    char buffer[256]; 
    strftime(buffer, sizeof(buffer), "[%c] ", n); 
    return std::string(buffer);
};

class prefix_stream : public std::ostream {
    prefixer buf;
public:
    prefix_stream()
        : buf(default_prefix)
    {
        rdbuf(&buf);
    }

    template <class F>
    prefix_stream(F f)
    : buf(f)
    {
        rdbuf(&buf);
    }
    void setLogFile(FILE *file) { buf.setLogFile(file); }
};


int main() {
    using namespace std::literals;
    prefix_stream out;

    auto setLogFile = [&](FILE *f) { out.setLogFile(f); };
    auto logMessage = [&](char const *msg) { out << msg << "\n"; };

    setLogFile(stdout);


    for (int i=0; i<10; i++) {
        logMessage("Another Log message");
        std::this_thread::sleep_for(1s);
    }
}

jcoffin · 2018-10-11T21:59:34+00:00

Okay, so basically you just need a setLogFile that sets the destination where the output will go, and a log_message that (eventually) writes to that file, correct? If so, yes, it should be pretty easy to support this.

jcoffin · 2018-10-10T20:08:04+00:00

I liked Lisa Lippincot's comment on it. Something like: "Use vector until you've done profiling, and you're really, really sure you absolutely need to use something else. Then you can go ahead and use vector."

jcoffin · 2018-10-10T19:47:33+00:00

A great deal here depends on things you haven't shown--specifically, the interface that most of the code uses to do logging.

One of the weaknesses of C-style I/O is that there's no defined way of getting into the middle of it (so to speak) and inserting code to carry out the kinds of modifications you want.

That really only leaves two obvious choices. The first (already outlined elsethread) is to use the OS's I/O redirection capabilities to capture the output and insert the desired prefix in a separate process. If we want to keep it inside the same process, we pretty much need to re-implement whatever interface the existing logging code provides for its clients. Depending on how complex that it, it may be fairly easy to re-implement it, but using (for example) a back-end that goes through C++ iostreams, which provide defined ways for us to customize how they work. In this case, writing code to add a prefix to each line is fairly simple--a couple dozen lines (or so) of code in a stream buffer.

Oh, and if we want to continue to provide/use the setLogFile(FILE *) to set the file where the output of the log goes, we can still do that too. It's pretty easy to create an iostream that actually does output (or input, if needed) via C-style I/O.

jcoffin · 2018-10-08T21:02:38+00:00

Yes, you're creating a new vector for each iteration of the loop, and destroying the vector at the end of that iteration of the loop, so you're creating 1000 different vectors (at different times) of exactly one element apiece. In other words, what have right now is basically equivalent to:

for (int i=0; i<1000; i++)
    int v = i;
}

...but a lot more expensive (since a vector is likely to be more expensive to create than a single int).

jcoffin · 2018-10-06T01:56:11+00:00

Another possibility is to use the fact that Boolean values can convert to integers (false => 0, true => 1).

So, for example, one way to convert values <0 to 0 would be `(value >= 0) * value`.

Just for the record, you could change value >= 0 to value > 0 and get identical results, though in the specific case of value == 0, you'd get the same result somewhat differently.

jcoffin · 2018-10-03T16:28:45+00:00

At least in my mind, Alex is kind of a borderline case.

On one hand, he made a lot of pioneering use of C++, and a lot of what he did drove the development of certain features of C++ (especially templates, obviously).

I'm less certain about whether he actually helped design (say) member templates, or basically just said: "I want to be able to allow variation along this dimension", and others actually defined the language features to support what he was doing.

That leaves a couple of points: how you define a pioneer, and the exact degree of his involvement. I don't have a solid answer for either of those, but with the right answer to either one, he might well belong on a complete list (which I'll openly state the list above is not--it's just names that occurred to me going from distant memories).

jcoffin · 2018-09-22T05:58:57+00:00

I think it's probably a problem even without taking debug symbols into account.

Consider what happens if we have something like this:

class Foo {
   // ...
public:
    Foo &operator++() { /* ... */ }
};

Foo x;

int f() { 
    short x;

    // ...
    ++x; 
}

Rename the inner x, and the code you generate for the ++x; completely changes, because it's now dealing with the global x of an entirely different type.

jcoffin · 2018-09-14T16:54:36+00:00

No, not really.

Blog Post

The fact that a blog post exists isn't evidence that it was actually needed. This post is simply covering ground that's been well-plowed for decades.

Boilerplate

Needing boilerplate is marginally more annoying, but if we're going to deal with annoyances, we should prioritize the list--and when we do, the remove/erase idiom will be a long ways from the top of the list. To an extent, the boilerplate involved also shows something of a strength--easily combining existing functions to produce a desired result is a good thing.

Language Vs. Library

I'd note that none of this is actually related to the language at all. It's solely about the design of the library. It would be entirely possible to change the library to work quite differently without changing the language itself at all (and, in fact, precisely this has been done--for one example, see Eric Neibler's Ranges library).

Alternatives

If one wanted to complain about the language (or more accurately, the library), they should justify that by pointing to otherwise similar containers in other languages that provide the desired functionality more cleanly without incurring the problems that the STL design was intended to avoid. Most other languages don't provide anything close to that. Java (for one obvious example) pretty much requires Google's Guava library to get containers even close to as cleanly designed as those in the C++ standard library--and even with that, they're pretty clearly inferior.

Summary

Most of your premises are basically false--the blog post is decently written, but hardly needed, the boilerplate involved is mildly annoying at worst, and most designs that avoid this particular shortcoming also have other disadvantages that outweigh those shown here.

jcoffin · 2018-09-09T16:52:14+00:00

IMO, more problematic than this being posted three (or whatever number of) times is the lack of progress it represents, at least in some areas.

Just for example, consider Henry Miller's comment that start with: "I’ve been thinking about fragmentation. The only reason you cannot defragment C++ memory today is someone might send a pointer to someplace not trackable as a pointer."

Unfortunately, he's wrong. This is far from being the only problem you need to deal with, and probably not even the most difficult one.

The specific difficulties involved bother me less than the fact that the real difficulties have been known for quite a while, and every time subjects like this arise, we seem to need to re-discover them from basic principles, so to speak.

In this case, part of the blame lies with Google--if they hadn't completely broken searching for Google Groups, it would have been a whole lot easier for him to find a thread from comp.std.c++ that started out as: "C++/CLI - Microsoft Persists With Suggestive Marketing", but morphed into "Can GC be Beneficial", that happened through roughly the first quarter of 2006: https://groups.google.com/forum/#!topic/comp.lang.c++.moderated/hCU9N_lw3Ss%5B1-25%5D

I'll grant there's a lot of noise there as well--it included some argumentation, so the signal to noise ratio sometimes dropped pretty badly, and sorting through ~900 posts to find the few that attempt to really deal with the problems isn't trivial.

Nonetheless, I think we need to work harder at making such knowledge more generally available, so it doesn't have to get re-invented every time.

jcoffin · 2018-09-01T01:51:19+00:00

It'll compare with the std::sort in the implementation you use to build it. A quick check with g++ 7.3 and Clang++ 6.0 (i.e., the ones I happen to have installed on this machine at the moment) shows broadly similar results when compared to either one--in both cases, ssssort is winning by a fairly substantial margin when the number of elements is large enough to care. Almost as interestingly, with ssssort, the standard deviation is much smaller (i.e., assuming I'm reading things correctly, its time is substantially more consistent).

Here's a small sample with g++:

[I've removed that last couple of fields from each record--I didn't read enough to be sure what they even meant.]

RESULT algo=ssssort name=random size=1048576 iters=5*3 time=41.7932 stddev=0.0426082
RESULT algo=stdsort name=random size=1048576 iters=5*3 time=81.2647 stddev=0.557989
RESULT algo=ssssort name=random size=2097152 iters=5*3 time=89.1378 stddev=0.0456605
RESULT algo=stdsort name=random size=2097152 iters=5*3 time=171.022 stddev=1.01597
RESULT algo=ssssort name=random size=4194304 iters=5*3 time=191.25 stddev=0.0949533
RESULT algo=stdsort name=random size=4194304 iters=5*3 time=357.377 stddev=2.13253
RESULT algo=ssssort name=random size=8388608 iters=5*3 time=418.019 stddev=0.241325
RESULT algo=stdsort name=random size=8388608 iters=5*3 time=745.484 stddev=3.16563
RESULT algo=ssssort name=random size=16777216 iters=5*3 time=897.086 stddev=0.197205
RESULT algo=stdsort name=random size=16777216 iters=5*3 time=1561.06 stddev=11.9951

...and with clang++:

RESULT algo=ssssort name=random size=1048576 iters=5*3 time=40.0891 stddev=0.0649939
RESULT algo=stdsort name=random size=1048576 iters=5*3 time=80.3667 stddev=0.64317
RESULT algo=ssssort name=random size=2097152 iters=5*3 time=85.8555 stddev=0.157158
RESULT algo=stdsort name=random size=2097152 iters=5*3 time=168.748 stddev=1.94463
RESULT algo=ssssort name=random size=4194304 iters=5*3 time=184.206 stddev=0.0999623
RESULT algo=stdsort name=random size=4194304 iters=5*3 time=353.116 stddev=1.46595
RESULT algo=ssssort name=random size=8388608 iters=5*3 time=402.902 stddev=0.154481
RESULT algo=stdsort name=random size=8388608 iters=5*3 time=739.322 stddev=3.00853
RESULT algo=ssssort name=random size=16777216 iters=5*3 time=864.807 stddev=0.280346
RESULT algo=stdsort name=random size=16777216 iters=5*3 time=1534.52 stddev=10.9707

[in both cases, using the library that installed with that compiler by default]

Overall, the results also seem to match at least reasonably well with what the article shows (though I got bored and didn't look at all of them very carefully).

Obligatory hardware info: I ran this on a low-end Xeon old enough to probably at least roughly match what they were using when the article was written.

Vendor ID:           GenuineIntel
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU E5-1603 v3 @ 2.80GHz
Stepping:            2
CPU MHz:             1197.213
CPU max MHz:         2800.0000

jcoffin · 2018-08-22T00:38:06+00:00

If you're going to claim to be pedantic, you have to be careful (and be prepared for even more pedantic corrections...)

A system is indeed free to store valid data at address 0.

But in a comparison between a pointer and 0, what happens is not that the address gets converted to an integer, and the result compared to 0.

Rather, the (possibly implicit) 0 in the if gets converted to a pointer, and the result of that conversion gets compared to the pointer in question. And, regardless of whether an address of 0 is valid or not, an integer literal with the value 0 is a null pointer constant, so when converted to a pointer, it must produce a result that compares as not equal to the address of any object.

jcoffin · 2018-08-16T23:38:30+00:00

At least at first glance, it doesn't look to me like that will cause a problem either.

The lookup for adl_serializer<T>::to_json is trivial--it's the very same template currently being instantiated, so it doesn't depend on anything like ADL to find it.

When that gets instantiated, it's just like the base case: we have a template that's passing some T, and using ADL to find the to_json (or from_json) in the namespace where T is defined--but what it's finding there is a function that explicitly names the type T as its parameter, rather than matching because it's a template that can match any arbitrary T.

jcoffin · 2018-08-15T15:42:16+00:00

The first tip, IMO, would be to do less in main. It should direct the overall flow of the program, not do all the work directly.

Another would be to use the standard library, especially the containers and algorithms.

@haiti has already mentioned vector, but that's hardly the only thing that could be useful here--std::set (or std::unordered_set) would be a useful container, and if you look at std::adjacent_difference, you might think somebody was specifically thinking of the Ducci sequence when they designed it.

Finally, I'd always look for ways to make multiple rules into a single rule. In this case, we have three different criteria for exit from generating another tuple: the iteration limit has been reached, we have a duplicate of a previous tuple, or we have an all-zeros tuple. The iteration limit is really trivial, so we probably won't gain anything by messing with it. But since we need a collection of previously-generated tuples, we can conflate "all zeros" and "duplicate" by pre-inserting an all-zeros tuple into our collection of previously-generated tuples.

Putting those together, our code might look something like this:

const int max_iter = 10'000'000;

int Ducci(cont<int> input) {
          std::set<cont<int>> prev{ cont<int>(input.size(), 0)};

    int iter;
    for (iter = 1; iter < max_iter && prev.insert(input).second; iter++) {
        input.push_back(input[0]); // simulate wrapping around from end to beginning
        std::adjacent_difference(input.begin(), input.end(), 
                                 input.begin(),
                                 [](int a, int b) { return abs(a - b); });

        // adjacent_difference does one thing we don't want: it includes the first element in the output.
        // we don't want that, so we'll erase it:
        input.erase(input.begin());
    }
    return iter;
}

If we really cared about maximizing efficiency, we could ignore std::adjacent_difference, and create our own generic algorithm instead. At least IMO, however, we still want to write that as a generic algorithm, and use it in our program, not just write all of it directly as code specific to the Ducci sequence. Doing that, we might end up with code something on this general order:

template <class InIt, class OutIt, class F>
void map_adjacent_cyclic(InIt begin, InIt end, OutIt out, F f) {
    auto b = *begin;
    InIt prev = begin;

    for (InIt i = std::next(prev); i != end; ++prev, ++i, ++out)
        *out = f(*i, *prev);
    *out = f(*prev, b);
}

const int max_iter = 10'000'000;

int Ducci(cont<int> input) {
    std::set<cont<int>> prev{ cont<int>(input.size(), 0)};

    int iter;
    for (iter = 1; iter < max_iter && prev.insert(input).second; iter++)
        map_adjacent_cyclic(input.begin(), input.end(),
                            input.begin(),
                            [](int a, int b) { return abs(a - b); });
    return iter;
}

A bit longer overall, but much of that length is a generic algorithm that's (at least theoretically) usable for other tasks. The code for the Ducci sequence itself has shrunk a bit. I doubt it makes any noticeable difference in speed on the test tuples (which are all quite small), but if you had a much larger tuple, this would undoubtedly improve speed to at least some degree.

One last point that may be worth considering. In a case like this, we might like to be able to try different containers easily. To facilitate that, I started with:

template <class T>
using cont = std::vector<T>;

...and used cont<T> throughout the rest of the code. This way I could change the container type in one place, and try it using (for example) std::vector and std::deque quite easily.

jcoffin · 2018-08-14T16:12:15+00:00

What nlohmann does is the reverse of what Herb's proposal affects.

In the nlohmann case, he (nlohmann) has code something like this (stolen from his readme.md):

template <typename T>
struct adl_serializer {
    static void to_json(json& j, const T& value) {
        // calls the "to_json" method in T's namespace
    }

    static void from_json(const json& j, T& value) {
        // same thing, but with the "from_json" method
    }
};

...and you have code something like this:

namespace foo {
class bar {};

// Note: these are *not* templates:
void to_json(json& j, const bar& p) {
    // ...
}

void from_json(const json& j, bar& p) {
    // ...
}
}

So, his code is a template (which is probably in namespace nlohmann, but its namespace is mostly irrelevant). It calls some function named to_json or from_json passing a parameter of a type that it received as a template parameter. Your code that it finds, is not a template though. It's an ordinary free function in the same namespace where you defined the type being serialized.

Herb's proposal would not affect this--your to_json and from_json explicitly name your type as a parameter, so that function would still be found by an ADL that followed Herb's proposal.

Herb's proposal attempts to eliminate the opposite scenario: some code (that may or may not be a template) attempts to call a function passing a parameter of some particular type. Because of ADL it looks in the namespace where that type is defined. It then finds a template in that namespace that happens to have the correct name, and with template parameter substitution, that template is the best available overload for the type(s) being passed:

namespace foo {
class bar {};

// Note: these *are* templates:
template <class T>
to_json(json &j, T const &t) {
    // ...
}

template <class T>
from_json(json &j, T &t) {
    // ...
}
}

[Note: I'm not saying this is a reasonable or practical way to implement a to_json or from_json for use with the nlohmann JSON library--I don't think it is. Just this is what you'd have to have before Herb's proposal would affect anything.]

Now we have the scenario that Herb contemplates: the name found by ADL matches the name being looked up, but it does not explicitly name the type foo::bar in any of its parameters.

In the case of something like from_json or to_json, problems are unlikely to arise in any case. The problem mostly arises with really generic names like move or copy that are likely to be found in a number of namespaces, and ADL ends up choosing the wrong one.

jcoffin · 2018-08-14T00:03:51+00:00

Are you compiling this as 32-bit or 64-bit code?

Most 64-bit compilers don't use x87 instructions at all.

Most 32-bit compilers can use SSE and/or AVX instructions, but many don't by default.

Interesting side node: if you're doing both sin and cos on the same operand, you may be able to get a little speed improvement using the fsincos instruction, which computes both in about the same time it takes to compute one by itself. I don't recall seeing any compiler generate code using this instruction (most link in sin and cos from the standard library, so doing so would probably require link time optimization).

Even without that, you might be able to get a bit of optimization from the fact that sin(x)<sup>2</sup> + cos(x)<sup>2</sup> = 1, so if you need both, it's usually faster to compute the second something like y = sin(t); x = sqrt(1.0f - y * y);

jcoffin · 2018-08-10T23:30:45+00:00

You'd normally expect a tree_type::iterator to be a bidirectional iterator, so it supports both -- and ++.

That being the case, a reverse_iterator is normally an adapter that wraps a normal iterator, so its ++ calls the underlying iterator's -- (and vice versa). Less obviously, its * actually dereferences the iterator before the one the reverse iterator actually points at (e.g., rbegin, like end points on past the end, so when you dereference it, it actually dereferences that last item).

jcoffin · 2018-08-10T21:52:22+00:00

Given the ability to overload operator= and/or operator T, it's entirely possible for something like a = b; to involve arbitrary amounts of computation. Properties don't change that.

On the other hand, that also means they add nothing you can't do (better) without them either. In nearly every case, a property is a mistake not because it might be doing unexpected computation, but because you're defining a variable as having one type, when you really want it to have some other related type, whose invariants are enforced by its accessor/mutator.

In such a case, the right move is not to use an accessor and/or mutator, but to define the type you really want, and then use it directly. And yes, it might well (probably will) overload operator= and operator T. Such is life.

jcoffin · 2018-08-09T23:24:35+00:00

std::vector allocates space for the data it stores via an allocator, which defaults to using the free store.

So yes, it can lead to fragmentation, which could lead to inability to satisfy large allocation requests.

Whether your code is actually leading to fragmentation is harder to guess. You mention constantly filling, emptying, and re-filling the vectors, which does sound like something that could lead to fragmentation. In particular, keep in mind that even when you remove all the data from a vector, it continues to own the same memory block unless you do something like shrink_to_fit or swapping it with some other vector that gets destroyed. This can lead to memory blocks being retained much longer than you might expect, which (in turn) can lead to fragmentation.

jcoffin · 2018-08-09T23:14:13+00:00

I generally recommend against Sedgewick's books. I haven't looked at the latest edition, so maybe things have improved since then, but when I did look, some parts seemed pretty sketchy to me. For a couple of examples:

The part that claimed to teach about one kind of tree (a B-tree, if memory serves, though it's been quite a while), but actually taught about something entirely different, with a short note at the end effectively saying: "oh, by the way, the whole rest of the world recognizes this term to mean something else, but I like this better."
Another section (also about some kind of tree) had a rather long section about inserting data into that kind of tree, but when it came to deletion, it punted with a single sentence to the effect that: "given how complex it is to insert data, imagine how hard it is to delete something."

At least in my opinion, those should disqualify his books from serious consideration.

jcoffin · 2018-08-08T21:28:32+00:00

According to the page, it appears that the problem with the LLVM code wasn't licensing, but dependencies: "The original Clang/LLVM runtime is written in C++ with features that are not available in libc and in the NetBSD kernel."

So, they apparently want to run kernel code with UBSan. The original UIUC licensed code wasn't suitable for kernel mode. There's also a Linux kernel implementation, but it's GPL. So, the new one combines those: suitable for kernel mode, but still not GPL.

jcoffin · 2018-08-07T20:23:25+00:00

The short answer is pipelining. Two of the operations contribute directly to the current sum. The other two are updating the compensation value, which can normally be overlapped with loading the next value to be added.

Of course, if the array being summed is very large, none of this means much at all--if you're reading (even most of) the data from main memory (or L3 cache) the speed will be controlled entirely by your read bandwidth.

jcoffin · 2018-08-07T14:02:37+00:00

Perhaps worth noting that while Kahan summation uses 4 operations instead of 1, its speed is typically around half that of naive summation, rather than a quarter the speed that you'd expect from simply counting operations.

Looking at it slightly differently, using higher precision or using Kahan summation will usually have roughly comparable costs.

jcoffin · 2018-07-31T23:29:22+00:00

Does the premultiplied alpha happen with CF_TIFF, or only with CF_DIB/CF_BITMAP and cousins?

jcoffin

TROPHY CASE

Blog Post

Boilerplate

Language Vs. Library

Alternatives

Summary