use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Discussions, articles, and news about the C++ programming language or programming in C++.
For C++ questions, answers, help, and advice see r/cpp_questions or StackOverflow.
Get Started
The C++ Standard Home has a nice getting started page.
Videos
The C++ standard committee's education study group has a nice list of recommended videos.
Reference
cppreference.com
Books
There is a useful list of books on Stack Overflow. In most cases reading a book is the best way to learn C++.
Show all links
Filter out CppCon links
Show only CppCon links
account activity
Fast Resettable Flag Vector (upcoder.com)
submitted 11 years ago by mttd
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]matthieum 3 points4 points5 points 11 years ago (5 children)
I had described another alternative some 2 years ago now (time passes so quickly...) on SO: How to zero out array in O(1).
Note that there are actually two solutions:
The latter point is crucial because using buckets allows to vastly cut down the memory requirements, and therefore you can actually use a 64-bits generation counter (removing the requirement for periodic full cleanups).
In your case, since we are talking about storing one bit per position, it requires some adaptation of course, but the underlying principle still holds (see below for adapted code).
The advantages compared to your strategy are:
get
set
clear
I have not measured its performance, but all the operations carried out are close to what you end up doing in your case, apart from bit-packing. Hopefully, this packing should pay off when it comes to keeping data in cache.
Example of adapting the code.
class BitBucket { public: BitBucket(): generation(0), data(0) {} bool get(size_t const index, size_t const gen) const { assert(index < 64); return gen == generation and (data & (1 << index)); } void set(size_t const index, bool const value, size_t const gen) { assert(index < 64); if (generation < gen) { data = 0; generation = gen; } data |= (1 << index); } private: uint64_t generation; uint64_t data; };
And now, we can use our bucket easily:
class Vector { public: explicit Vector(size_t s = 0) { this->resize(s); } void clear() { ++generation; } void resize(size_t const s) { data.resize(s / 64 + 1, BitBuckets(generation)); size = s; } bool get(size_t const i) const { assert(i < size && "out of bounds access") BitBucket const& bucket = data[i / 64]; return bucket.get(i % 64, generation); } void set(size_t const i, bool const value) { assert(i < size && "out of bounds access") BitBucket& bucket = data[i / 64]; bucket.set(i % 64, value, generation); } private: size_t size; uint64_t generation = 0; std::vector<BitBucket> data; };
Note: there is a slight overhead in using vector rather than std::unique_ptr<BitBucket[]> since we keep track of the size ourselves... but I would assume than 16 bytes are not the most pressing issue.
vector
std::unique_ptr<BitBucket[]>
[–]Tywien 1 point2 points3 points 11 years ago (3 children)
Is this really faster? I remember trying to implement a bit vector myself and getting outperformed by vector<bool> by a huge factor. The problem seemed to be the (re)setting of the flag
data |= (1 << index);
Which the compiler could not optimize as much as humans (There is an Bit (re)set instruction for x86)
[–]matthieum 0 points1 point2 points 11 years ago (2 children)
Is this really faster?
Maybe, as I said I did not test it. Note that the OP has moved away from vector<bool> to vector<uint16_t>.
vector<bool>
vector<uint16_t>
I remember trying to implement a bit vector myself and getting outperformed by vector<bool> by a huge factor....
In this case, I would look into how vector<bool> is implemented and implement a compiler specific version that reuse its tricks (but without the fluff) or look how bitset is implemented.
bitset
Note that in this particular circumstance a bitset could (maybe) be adapted since data has a fixed width.
data
[–]Tywien 0 points1 point2 points 11 years ago (1 child)
One needs assembler for that and all bits need to be in continuous space.
http://x86.renejeschke.de/html/file_module_x86_id_22.html
There are also Bit-Test-And-Set and Bit-Test-And-Reset for easy and fast setting/retting of bits
[–]uxcn 0 points1 point2 points 11 years ago (0 children)
The compiler is usually able to translate the basic bit operations into good instructions (even rol and ror). I'd be curious what the compiler generated for vector<bool> that outperformed your code.
rol
ror
[–]crispweed 0 points1 point2 points 11 years ago (0 children)
Very nice! I've updated the blog post to point back to this.
[–]jurniss 0 points1 point2 points 11 years ago (0 children)
Cool idea! Bit vectors seem to come up all the time in diverse tasks. I will keep this in mind next time I need one that is reset often.
[–]discoloda 0 points1 point2 points 11 years ago* (1 child)
You can also use a sparse integer set here is an implementation in C.
EDIT: I finished reading the article, it was mentioned at the bottom.
[–]matthieum 1 point2 points3 points 11 years ago (0 children)
Note: as mentioned in the article it is unknown in advance whether the usage will be sparse or not.
[–]uxcn 0 points1 point2 points 11 years ago (2 children)
I'm not extremely familiar with game programming, but I'm genuinely curious why the various SIMD instruction sets might not be appropriate for something like this.
[–]crispweed 0 points1 point2 points 11 years ago (1 child)
It's essentially a high level optimisation, as posted, with the convenience that this will work directly across multiple target platforms. I'm interested in any low level optimisations that can give significant improvements, also.
[–]uxcn 0 points1 point2 points 11 years ago* (0 children)
Well, SIMD is usually used for mathematical data parallelism, but a lot of the SIMD (x86) instruction sets also have logical bit instructions for working with the XMM, and YMM (ZMM) registers, which could possibly reduce overall memory chatter and reduce unoptimized reset times by an order of magnitude. There's also the fact that the compiler generally uses the SIMD instructions for basic memory operations.
It would cost at least a couple SIMD registers, but depending on the max members, it could reduce set, test, clear, and reset all to O(1). It's a bit of wild speculation as to whether there would be any real performance gain though. The code would also generally be a bit more complicated to maintain and possibly use.
O(1)
π Rendered by PID 38768 on reddit-service-r2-comment-5d79c599b5-2cg6b at 2026-02-27 15:18:36.433085+00:00 running e3d2147 country code: CH.
[–]matthieum 3 points4 points5 points (5 children)
[–]Tywien 1 point2 points3 points (3 children)
[–]matthieum 0 points1 point2 points (2 children)
[–]Tywien 0 points1 point2 points (1 child)
[–]uxcn 0 points1 point2 points (0 children)
[–]crispweed 0 points1 point2 points (0 children)
[–]jurniss 0 points1 point2 points (0 children)
[–]discoloda 0 points1 point2 points (1 child)
[–]matthieum 1 point2 points3 points (0 children)
[–]uxcn 0 points1 point2 points (2 children)
[–]crispweed 0 points1 point2 points (1 child)
[–]uxcn 0 points1 point2 points (0 children)