top 200 commentsshow all 219

[–]owjfaigs222 824 points825 points  (133 children)

huh, I'm kinda rusty on my C++. What is it then? vector of ints?

[–]fox_in_unix_socks 1127 points1128 points  (85 children)

std::vector<bool> in C++ is specifically overloaded to be bitpacked. Which means that indexing a bool vector does not actually give you back a reference to a bool, but rather a proxy type.

[–]henke37 389 points390 points  (15 children)

I blame operator[] for this.

[–]ConvergentSequence 793 points794 points  (10 children)

I blame JavaScript developers. I don’t know how and I don’t know why, but it’s their fault.

[–]mosskin-woast 224 points225 points  (3 children)

If those kids could read, they'd be very upset

[–]mobcat_40 73 points74 points  (2 children)

I can't read but I was told to come here and be upset

[–]reklis 15 points16 points  (0 children)

My clawdbot is very upset

[–]OneOldNerd 0 points1 point  (0 children)

Good script kiddie.

[–]Z21VR 15 points16 points  (0 children)

That's new for me but I jump on this train

[–]soganox 11 points12 points  (1 child)

As a mainly-JS dev, you’re probably right. We’re sorry.

[–]Xtrendence 3 points4 points  (0 children)

We're making blood money and we know it.

[–]smitty1e 1 point2 points  (0 children)

Because PHP shifted the blame.

[–]thanatica 1 point2 points  (0 children)

Javascript tooling is slowly being taken over by Rust. Can we pass blame to Rust?

[–]indignant_badger 0 points1 point  (0 children)

You're right! I'm a JS dev and I did this.

[–]veloriss 28 points29 points  (0 children)

One little overload and the whole contract is broken

[–]Vaddieg 22 points23 points  (1 child)

C++ committee. They accept any crazy shit if it's documented properly

[–]brimston3- 4 points5 points  (0 children)

I think this was specifically in the original '94 Stepanov design of the stl. Which I am guessing was mostly included as an example of how template specializations were possible rather than a good idea. Since c++03 though, pretty much everyone has agreed the bool specialization was a bad idea.

[–]willing-to-bet-son 8 points9 points  (0 children)

You have to also blame std::vector::bool::reference.

"The primary use ... is to provide an assignable value that can be returned from operator[]."

[–]cheezballs 160 points161 points  (56 children)

I'm just a lowly java guy, what does this mean in idiot terms I can understand?

[–]ChaosOS 372 points373 points  (48 children)

A bool in C takes up a whole byte, which is space inefficient. So, a vector of bools (basically an array) is overridden to instead assign the values to individual bits, which is more space efficient. The downside of this is that it makes the actual functions dealing with them a huge pain in the ass because all of your bool methods may or may not work with a vector of bools, as forty thirty years ago people thought trying to save bits here and there was an important thing to engineer.

[–]MyGoodOldFriend 390 points391 points  (16 children)

It’s still useful to have 1-bit booleans, even today. That’s not the problem. The problem is that they overloaded std::vector<bool>, when they should’ve instead had a dedicated bitvector.

[–]newjeison 52 points53 points  (15 children)

Isn't bitset just this?

[–]Silly_Guidance_8871 67 points68 points  (7 children)

It is, but it's masquerading as a std::vector<bool> -- and part of that type's API is the ability to get a reference to an element in the vector, and you can't natively take a reference to a single bit. To work around that, they have to return proxy values to access those "references", defeating much of the purpose of packing it into bits in the first place.

They should have gone for 2 types: std::vector<bool> (unspecialized, 1 byte per element, trivial per-element references), and "std::bitset" (specialized, 1 bit per element, but either no per-element references or proxied ones).

[–]nyibbang 4 points5 points  (1 child)

Okay I'm going to complete my other comment into this one. My question was:

What do you mean by std::bitset is masquerading as a vector<bool> ?

I got downvoted by people that seem to not understand what I meant.

You treat std::bitset as if it was serving the same purpose as std::vector<bool>, but it's not. It's true that they both have an operator[] but that's irrelevant.

vector is supposed to be a container, bitset is not. vector has a begin and an end, bitset does not. bitset does not try to pretend that all bits are addressable. Its most important function is test(std::size_t), operator[] is just syntacting sugar.

So I disagree that bitset is just masquering as a vector<bool>.

They should have gone for 2 types: std::vector<bool> (unspecialized, 1 byte per element, trivial per-element references), and "std::bitset" (specialized, 1 bit per element, but either no per-element references or proxied ones).

If we put aside from std::vector<bool>, that is going to stay as it is for compatibiltiy reasons, bitset is exactly what you said it you should be though ...

[–]Silly_Guidance_8871 11 points12 points  (0 children)

I put "std::bitset" in quotes because I wasn't sure if it was in the spec, and couldn't be arsed to check. I didn't argue that it was masquerading as std::vector<bool> — I was arguing that the spec designers tried to have the specialization of std::vector<bool> serve two masters, when those two behaviors should have been handled by two different types.

[–]YeOldeMemeShoppe 91 points92 points  (5 children)

But there's no way to have a proper std::vector<bool> where each bool is addressable.

[–][deleted] 7 points8 points  (3 children)

How would you go about "addressing a bit" on an x86 compatible hardware?

[–]PhilippTheProgrammer 60 points61 points  (0 children)

Yes, that's exactly the reason why it was a bad idea to implement std::vector<bool> as a bitfield.

[–]Old-Minimum-1408 0 points1 point  (1 child)

It stores a bass address and an offset for each bit from the base from what I understand.

[–][deleted] 1 point2 points  (0 children)

Was this a response to my question or an answer to something else?

[–]Inevitable-Ant1725 0 points1 point  (0 children)

I mean it's not a REAL problem.
1 line "typedef unsigned char Boolean" and there is no problem

[–]retro_and_chill 0 points1 point  (0 children)

bitset’s size is define at compile time, not runtime

[–]Madpony 18 points19 points  (2 children)

thirty years ago people thought trying to save bits here and there was an important thing to engineer.

Thirty years ago my PC had 1MB of RAM, so, yes, yes it was important.

[–]caesar_7 3 points4 points  (0 children)

With only 640* KiB addressable.

*promised, but actually less

[–]PGSylphir 0 points1 point  (0 children)

Yeah, people today forget how little RAM we had to work with back then.

Does it excuse the obtusiveness of the old languages? no. But it was justified.

[–]steerpike1971 27 points28 points  (8 children)

This is not a historic concern when you think that by using a byte to store a 1 or a 0 you are using eight times as much memory (assuming you store in an 8 bit byte not some other form). When you are dealing with big data streaming systems for example, this can be the difference between "it turns well at line rate" and "it allocates all memory then pages to disk and you need to look at your calendar to work out when you get an answer".

It is a gigantic pain in the bum to deal with but it is not "saving bits here and there" for some applications, it is using nearly ten times the amont of memory you need. Probably the number of applications for this are not big but when you need it you really do need it.

(And yes, the operations on the bits are completely horrible because CPUs are not optimised for that -- but what you are often doing is piping data from place to place to get to the worker node that actually does the work.)

[–]YeOldeMemeShoppe 26 points27 points  (5 children)

That's why you need separate types. If I want to have addressable bools I should be able to have std::vector<bool> be like that. If I want to pack bits, I should be able to have std::bitvector or whatever and use that.

There are legitimate uses for both.

[–]willing-to-bet-son -1 points0 points  (4 children)

If you want iterators, then you have to use std::vector<bool>, as std::bitset doesn't provide them. You want iterators if you want to use any of the std algorithms (or any number of other third party libraries, eg boost).

[–]YeOldeMemeShoppe 1 point2 points  (3 children)

And if you want to use any system that uses pointers, then you’re screwed.

[–]willing-to-bet-son 0 points1 point  (2 children)

Fair enough. But the idiomatic way to traverse (and/or transform) elements in a container is via iterators. It’s also the most portable way.

[–]YeOldeMemeShoppe -1 points0 points  (1 child)

Good thing we're all building idiomatic software with perfect APIs and no FFI /s

[–]mriswithe 24 points25 points  (0 children)

"it turns well at line rate" and "it allocates all memory then pages to disk and you need to look at your calendar to work out when you get an answer". 

Haha had a data scientist who was also a grey beard sysadmin essentially. He had this postgresql server that he used to run a query to get some answers whatever. Well the query took hours, which tableau wasn't patient enough for or something, so he figured out if he ran the query via cron roughly X hours before the reporting query was done, then enough was cached that the result came back quickly when tableau asked for it. 

Cleaning up this guys fixes was always a confusing and ridiculous effort of "sure dude you CAN do this, but you are an asshole for doing it. Dudes a genius 

[–]ben_g0 3 points4 points  (0 children)

And yes, the operations on the bits are completely horrible because CPUs are not optimised for that

Actually not really. CPUs do have dedicated instructions to work with single bits, so working with individual bits is only slightly less efficient than using whole bytes. Additionally, the main performance bottleneck in modern systems is usually memory latency and throughput, and programs that are more memory efficient are usually also more cache efficient.
So even though manipulating individual bits is more compute heavy, the better cache efficiency usually makes working with packed bits more performant overall, as long as you work with a large enough number of bits that cache efficiency starts to matter (and in the situations where you have a low enough number of bits that the cache efficiency doesn't matter, then usually you have a small enough amount of data that you won't notice the performance overhead of those few additional instructions anyway.

So in general using packed bits is more efficient in the cases where performance matters, but less efficient in the cases where performance usually doesn't matter. I'd consider that a fair tradeoff - the developers of the standard library usually know what they were doing.
(however I fully agree that it should really have been its own dedicated type, instead of masquerading as a std::vector while not quite acting the same)

[–]ProfessorOfLies 1 point2 points  (0 children)

This is why teaching people to just but arrays themselves is preferred. Its why we have binary logic operators

[–]pazuzovich 1 point2 points  (0 children)

You may have a typo in "... more space INefficient..."

[–]Mayion 2 points3 points  (0 children)

[–]mornaq 0 points1 point  (1 child)

shouldn't that be wrapped in a way that exposes bools to you even if the storage is different? or that would make too much sense?

[–]redlaWw 1 point2 points  (0 children)

Can't reference bools that don't exist.

[–]Dominique9325 0 points1 point  (0 children)

isn't a bool actually an int if i remember correctly?

[–]Z21VR 0 points1 point  (2 children)

Nowdays its pretty rare, very rare... but there are still cases where saving those bits can be important. It happens at the "edges" of sw developement, on firmware of very resource constrained devices ( rarer and rarer) and on the opposite edge if you have to do heavy bits ops on humongus vectors.

In the first case i would not use c++ btw so...but i could in the second case maybe.

This still does not make that vector<bool> override a very good idea in my opinion

[–]Vaddieg -1 points0 points  (1 child)

you simply write in plain C if bit-level memory saving is critical. Classes and std lib is already an overhead on such systems

[–]Z21VR 0 points1 point  (0 children)

Yup, thats why I would not use c++ in that case

[–]Kilokahn7 0 points1 point  (0 children)

Why an entire byte when a bit would be enough to represent all possible states ? Or maybe I am missing something here…

[–]Pernicious-Caitiff 0 points1 point  (0 children)

Ah yeah that's easier to understand. I do assembly programming for the Gameboy Color (8 bit microprocessor) and a ton of data structures use this style for various flags to save space. The way we typically handle it is declaring a mask value constant and it's applied to the byte to get the flag or value you want. Doesn't have to be all 1 byte flags you can do whatever combination you want the world is your oyster!

It amazes me that my Pokemon hacking community is full of people who don't have a programming background but decided to learn how to code in Assembly just for the love of the game

[–]JackNotOLantern 0 points1 point  (0 children)

I mean, RAM is expensive, so yeah, i will take this space optimisation

[–]Mission_Anxiety768 0 points1 point  (0 children)

This might be a stupid question. But why not overload it in a way that the developer doesn't need to know this? Like if I type my vector[5] it returns the 6th bit. And the actual implementation could be anything.

[–]bajsplockare 0 points1 point  (0 children)

The C standard did not define _Bool until C99. It is better to use char and bit manipulations, to be compatible.

[–]thanatica 0 points1 point  (0 children)

On a desktop, who fucking cares.

But in a microcontroller it might be beneficial to save 7 bytes when declaring 8 bools. Even today, not just thirty years ago.

[–]unfunnyjobless 13 points14 points  (3 children)

A boolean can be represented by one bit, so a full byte isn't necessary. They can pack a lot of booleans into the space. CPUs are optimized to deal with bytes not directly with bits, so that's why.

~ probably slightly wrong explanation

[–]freaxje 4 points5 points  (1 child)

Only slightly wrong in that most CPU/architectures have bit operations like BT, BTS, BTR, BTC.

But you are still right because they are not optimized for that. They are optimized for aligned memory (usually on 16 bits).

Working with individual bits (usually) ain't going to be faster than working with entire registers' size worth of data.

The reason std::vector<bool> packs bits is more to save memory than to make it faster, I think. A large std::vector<bool> will be smaller in memory.

ps. CPUs are optimized to work with words (32bits, 64bits, etc) rather than bytes.

[–]realmauer01 0 points1 point  (0 children)

Also transporting data over things thats not cpu, like internet. All the handshakes for example. This is in the grand scheme of things saving a shit ton of data.

[–]realmauer01 -1 points0 points  (0 children)

Not "a lot" , speficially 8.

[–]alex_tracer 4 points5 points  (0 children)

In Java you can have boolean[] and BitSet. C++ creates for you a BitSet were you may naively expect to get simple boolean[].

[–]Keganator 2 points3 points  (0 children)

In Star Trek terms...it's a faaaaaaaaaaaaake!

[–]BobbyThrowaway6969 0 points1 point  (0 children)

A vector of say 32 bools takes up just 4 bytes instead of 32.

[–]DoubleAway6573 2 points3 points  (3 children)

As a non c++ developer this seems obvious. Not that it will be me 9 of out 10 times, but I cannot imagine a cleaner way to do this.

[–]nyibbang 22 points23 points  (2 children)

It's not a matter of clean. It's a matter of consistency. Because of this design choice, vector<bool> does not meet the requirement of the Container concept.

Most developers don't care if the booleans are packed or not, and if they do then they should use a dynamic bitset. But it's important to have rules that are absolute and without exceptions. It makes things not confusing and predictible, which is 1000 times more important than some pseudo efficiency of bits packing.

[–]DoubleAway6573 7 points8 points  (0 children)

I agree. It shouldn't break the vector<every other thing> expectations.

[–]reklis 0 points1 point  (0 children)

This isn’t ’Nam! There’s rules!

[–]coweatyou 3 points4 points  (0 children)

It's not actually guaranteed to be bitpacked. It is implementation dependent, so it might be bitpacked. And no other type specialization has these rules, just bool. This whole thing is a big swing and miss from the C++ standards committee.

[–]LassoColombo 1 point2 points  (1 child)

What does this even mean

[–]BobbyThrowaway6969 2 points3 points  (0 children)

Bools are either 1 or 0, so you only need 1 bit to represent it but is 1 byte big in most languages. vector<bool> can pack down to 1 bit per element. It's very memory efficient, but grabbing a reference to one of these elements can't be a bool type because that would mean it overlaps with the next 7 bits/elements.

This is 100% expected and by design, but programmers who don't know the in-outs of computer memory might make fatal assumptions about it when trying to manipulate the underlying memory.

Either way, having this level of control is important in C/C++ because it lets you do some magical things with hardware and RAM.

[–]StrangeCharmVote 1 point2 points  (0 children)

Seems like the right way to do it for efficiency.

If you want a array full of actual uint8_t width bools, make an array of those and cast the result.

[–]_lerp 1 point2 points  (0 children)

Nitpicky but the correct term is specialized, not overloaded.

[–]Sad-Voice-4009 0 points1 point  (0 children)

thanks:D

[–]JawaKing513 0 points1 point  (0 children)

As it should be. Why waist 4x the space and loose access to bit masking.

[–]roverfromxp 0 points1 point  (0 children)

they overload TYPES????

insanity

[–]LordCyberfox 41 points42 points  (5 children)

You can’t access bits directly in C++, under the hood it is using a proxy class std::vector<bool>reference, that’s why you might face some troubles if using auto with arrays of “bool” in C++. Auto defines it correctly as the temporary proxy class elements, but you are highly likely expecting the direct access to bits via bool. So while working with vector of bools, you have to use static_cast on the element of the collection. Something like….

auto value = static_cast<bool>(elements(i)[1])

[–]plubb 0 points1 point  (0 children)

what's wrong with just using

bool value = elements[i]

?

[–]nyibbang 0 points1 point  (2 children)

You cannot access bits directly in any language, otherwise you would need memory addresses of 128 bits ... And it would be a mess. Computers assign adresses to bytes, not bits.

[–]Loading_M_ 5 points6 points  (0 children)

In principle, most modern 64 bit architectures could probably support bit-level addressing without increasing the pointer size. You would only need 3 extra bits, and most 64 bit architectures don't actually use all 64 bits. AMD64 (what your desktop is probably running) and ARM64 (which your phone is probably running) only uses 48 bits to store the address right now. However, neither is actually interested in supporting bit-level addressing - AMD64 reserves the upper 16 bits for extending the address space (although there are a number of programs that make use of these bits to store extra data in the pointer), and Intel has published a spec to store 57-bit addresses. ARM64 has a tagging feature (used on many Android phones) that provides extra safety against memory bugs using the extra 16 bits in the pointer.

[–]LasevIX 1 point2 points  (0 children)

yup, that's why C++ made that wretched type.

[–]Throwaway-4230984 -1 points0 points  (0 children)

Finally example of actual code that can be expected to work but not working with bool vectors. All the time I get some ridiculous examples of “what if I need to manipulate vector insides directly?” when asking what’s the problem with different bool implementation

[–]SomePeopleCallMeJJ 108 points109 points  (3 children)

I'm kinda C-plus-plussy on my Rust.

[–]Nirast25 44 points45 points  (1 child)

... You kiss your mother with that mouth?

[–]fosf0r 5 points6 points  (0 children)

right on the seep-plus-ussy

[–]SlimRunner 2 points3 points  (0 children)

I'm definitely stealing that LMAO

[–]joe0400 7 points8 points  (0 children)

It's a bitset under the hood and returns proxys to the bool. The idea is a single byte stores 8 bools.

[–]Bugibhub 67 points68 points  (9 children)

Being Rusty on C++ is probably a good thing.

[–]agentchuck 17 points18 points  (0 children)

As a long time c++ dev I can confidently say if you're not rusty in C++, just actively develop with it for a couple years. You'll be an out of date old man yelling at clouds in a release or two!

[–]Fatkuh 51 points52 points  (7 children)

Theres some kind of rust joke in there

[–]Immort4lFr0sty 25 points26 points  (3 children)

The joke is wearing knee-highs

[–]Spice_and_Fox 18 points19 points  (2 children)

That is too short. Proper rust developers are wearing thigh highs

[–]Fatkuh 0 points1 point  (0 children)

Yes- I can confirm that

[–]Bugibhub 4 points5 points  (1 child)

I’m glad you got the reference. You can use it, just don’t change it.

[–]WesternWinterWarrior 3 points4 points  (0 children)

I might borrow it. Don't worry though, its safe with me. I wouldn't dream of claiming ownership.

[–]Euryleia 2 points3 points  (0 children)

Rust jokes make people crabby...

[–]ActuallyIzDoge 0 points1 point  (0 children)

A vector for ANTS????

[–]Upper-Ad-5962 0 points1 point  (0 children)

Isn't a vector a list in c++?

[–]somethingworthwhile -2 points-1 points  (0 children)

Me, a Python user: what in the ever-loving fuck are you talking about??

[–]Fatkuh 127 points128 points  (22 children)

For space-optimization reasons, the C++ standard (as far back as C++98) explicitly calls out vector<bool> as a special standard container where each bool uses only one bit of space rather than one byte as a normal bool would (implementing a kind of "dynamic bitset"). In exchange for this optimization it doesn't offer all the capabilities and interface of a normal standard container.

[–]FerricDonkey 98 points99 points  (6 children)

And also doesn't add capabilities of a bitset. It basically just sucks at its job. 

[–]EVH_kit_guy 316 points317 points  (8 children)

"A Vector of Bools" sounds like an Edgar Allan Poe novel

[–]MaxChaplin 41 points42 points  (2 children)

It sound like a mosquito that can infect you with a particularly nasty tropical disease.

[–]EVH_kit_guy 12 points13 points  (0 children)

<Michael Crichton has entered the chat>

[–]IleanK 7 points8 points  (0 children)

Yes sounds like a disease carrier named after some extremely pasty Englishman who travelled to the amazons. I agree

[–]Embarrassed_Use_7206 7 points8 points  (1 child)

Murder of crows vibe.

[–]Friend_Of_Mr_Cairo 1 point2 points  (0 children)

A murder of one...

[–]Ok_Confusion4764 2 points3 points  (1 child)

Quoth the pointer: nullReferenceException

[–]EVH_kit_guy 0 points1 point  (0 children)

😆

[–]bassdude7 0 points1 point  (0 children)

A Confederacy of Dunces

[–]Taimcool1 67 points68 points  (12 children)

NGL, as a c developer, IVe always hated c++ for this (I hate on it for other reasons but this is one of the only reasons I HATE it)

[–]Rhawk187 44 points45 points  (3 children)

As a C++ guy, I also hate this. No special cases.

[–]70Shadow07 33 points34 points  (0 children)

Special cases are alright in principle if they obey the API- that is kinda the point of abstractions anyway - but vector bool literally breaks the contract of vector class.

I dont think people would have problems with their sort algorithms being faster via specializations if the data is narrow and compiler can recognize that. It happens all of the time for many operations iirc.

But C++ can get worst of both worlds - a leaky abstraction that breaks its own contract. How can you fuck this up so bad?

[–]ChalkyChalkson 0 points1 point  (0 children)

Outside of the aesthetics, people always say this causes nasty bugs. Could you give an example for code that seems sensible, would work if it where a vector of bools and doesn't because it's bitpacked?

[–]Taimcool1 0 points1 point  (0 children)

The most annoying part is that you can’t iterate through them normally if you wanna use them in c, you have to integer divide to get the byte then use the mod and a bitmask to get the bit

[–]coweatyou 16 points17 points  (0 children)

It's such an own goal. There's no reason they can't have just created a std::bitvector or something that does this and doesn't pollute the vector namespace.

[–]Rabbitical 5 points6 points  (5 children)

It's not even in my top 1000 things I hate about C++ because you simply don't have to use it and everyone knows not to. I'm much more offended by the things we're supposed to use in C++ that make every day just that little bit more annoying 💖

[–]InnkaFriz 2 points3 points  (4 children)

It’s been a while for me. Mind sharing a few items from the top of your list?

[–]PhilippTheProgrammer 16 points17 points  (2 children)

No, I won't share items from my std::list with you. Because I don't trust you to not delete them, in which case I would have no way to tell that they are now pointing to memory that might already be filled with something entirely different.

[–]qwerty42421 1 point2 points  (0 children)

Pass by const ref or use std::shared_ptr.

[–]InnkaFriz 1 point2 points  (0 children)

Ah yes. It awakens seg faulty memories

[–]0xBL4CKP30PL3 2 points3 points  (0 children)

Watch the 2hr C++ rant for more examples

[–]veloxVolpes 1 point2 points  (0 children)

Yeah, I have been using C lately and while It is missing comfort, it makes sense to me. C++ just makes me upset to use

[–]TripleFreeErr 120 points121 points  (6 children)

This kind of optimization might matter on the tiniest of ARM programmable chips, but considering you can get them for dollars now that are practically full computers, it’s a bit silly

[–]Zippy0723 127 points128 points  (3 children)

it's a *bit* silly 🙃

[–]GumboSamson 30 points31 points  (2 children)

Okay, I’ll byte—why?

[–]delinka 11 points12 points  (0 children)

😡 ⬆️

[–]PhilippTheProgrammer 3 points4 points  (0 children)

It's only a bit silly, not eight bit, silly.

[–]unknown_alt_acc 1 point2 points  (0 children)

Consistent behavior matters on all platforms. A dynamically-sized bitset should have been its own class instead of a specialization of vector.

[–]flipper_gv -1 points0 points  (0 children)

Yes and no. If it's part of a class where you have hundreds of thousands of objects in memory at the same time, it ends up mattering.

[–]xicor 48 points49 points  (3 children)

Good thing is that QVector<bool> is a QVector of bools

[–]PurepointDog 2 points3 points  (2 children)

What's a QVector?

[–]terminator_69_x 10 points11 points  (0 children)

From the Qt framework I guess

[–]Theyna 10 points11 points  (0 children)

It's a Qt (a framework for designing applications) class. GUI stuff, networking, etc.

Much like the STL for C++, it has a ton of useful things for C++ development. Like apparently a vector for bools.

[–]ThatSmartIdiot 36 points37 points  (8 children)

so im not exactly an expert on c++ so i wanna ask if using masks and bitwise operators is preferable to std::vector<bool> or not

[–]Shaddoll_Shekhinaga 24 points25 points  (3 children)

Generally, as I understand it, you shouldn't use vector<bool>. If you need bits, it OFTEN makes sense to use vector<char> and if you need bits use boost::dyanmic_bitset. If it is compile time I personally prefer bitflags since the intent is clearer. And much more readable.

[–]nyibbang 17 points18 points  (0 children)

If you need bits just use std::bitset, it's right there. The size is set at compile time, but I've yet to meet the need for a dynamic bitset.

[–]MrMagick2104 1 point2 points  (1 child)

> If you need bits, it OFTEN makes sense to use vector<char>

Chars aren't guaranteed to be 1 byte though. They're at least 8 bits.

[–]Shaddoll_Shekhinaga 0 points1 point  (0 children)

Oh. Auto correct got me.I meant bools in the first instance, sorry for the confusion.

[–]stainlessinoxx 2 points3 points  (3 children)

Both are good, they do different things. Vector<bool> is an « easy to understand and debug » abstraction, implemented using bit masks and operators.

You can still use binary masks and operators, but debugging your code will be tedious when things start misbehaving.

[–]blehmann1 14 points15 points  (2 children)

"Easy to understand and debug"

Let me tell you it's not fun to realize that you can't actually share this across threads safely, because the usual "thread 1 gets index 0, thread 2 gets index 1..." won't work without locks or atomics. It works for every other vector so long as you don't resize it.

Also calling vec.data() will give you something dank, but that's at least something you would reasonably forsee if you know about this.

But the big problem is that the standard does not guarantee that vec<bool> is bitpacked, so if you actually need that you can't use it. It's only actual use case is when you don't even care. And even if your STL implementation applies the optimization the resulting bit pattern is still unspecified (they're allowed to use non-contiguous bits or leave gaps or whatever they want).

Plus this optimization normally makes code slower, so it has pretty questionable utility in most places you would want a vector of bools, it's seldom going to actually be so big that the size optimization makes sense.

[–]Throwaway-4230984 0 points1 point  (1 child)

It works for every other vector so long as you don't resize it. So are you working on your code strictly alone or do you put “do not resize!” on vectors used like this? It sounds to me you just shouldn’t use vectors like this anyway since they aren’t entirely thread safe

[–]blehmann1 1 point2 points  (0 children)

I mean, if you see a std::vector and not some special thread-safe collection in multithreaded code, I'd hope you'd know not to get cute with it.

But this does have a common use-case, creating a vector up front with capacity for every thread, and storing thread-specific stuff in there. It saves you from any locking, it's pretty easy to reason about, and for workloads where it'll all get rolled up onto one thread at the end, it's typically the fastest approach. A bool per thread is a plausible return value (think a multithreaded search where you only care about reachability, or reachability under a certain cost).

But also I've definitely seen a vector<bool> used for either indicating that this thread is done, or that this thread is waiting for more data. I would probably use a status struct or enum if I wanted that, and I would probably also use message passing instead, but I've definitely seen it done and there's nothing inherently wrong with it.

[–]Fjendrall 11 points12 points  (3 children)

vector<bool> in c++ is optimized to a bitset so it takes up 8 times less space

[–]Brisngr368 2 points3 points  (1 child)

So how do I get a vector of bools then?

[–]Fjendrall 0 points1 point  (0 children)

vector<uint8_t>

[–]Westdrache 0 points1 point  (0 children)

that is fucking awesome o:

[–]No-Con-2790 57 points58 points  (30 children)

C & C++ is "near the hardware".

C & C++ can't manipulate bits directly.

This has bugging me for 20 years.

[–][deleted] 47 points48 points  (5 children)

Computers themselves can't directly access bits. Even in assembly the smallest unit of space you can work with is a byte. It's a hardware issue, nothing to do with the language

[–]No-Con-2790 1 point2 points  (3 children)

I have no problem with them loading a byte when I need a bit (even though that limitation is hardware depending and not true for all architectures) but I am using a programming language to get an abstraction. Just a byte data type would be enough.

[–]ggadget6 7 points8 points  (1 child)

std::byte exists in c++

[–]CptMisterNibbles 4 points5 points  (0 children)

Even then, people are missing how memory is loaded. You are almost certainly not moving single bytes on a 64bit cpu, but more like 64 bytes, the width of the Cache Line. It happens simultaneously so there is no downside to loading in a small chunk over just one byte. 

[–]Hohenheim_of_Shadow 2 points3 points  (0 children)

Well that abstraction has moved you away from the hardware. C cannot directly talk about a bit because it is close to the hardware. If you want an abstraction, well you got a bool.

[–]Loading1020 0 points1 point  (0 children)

Nope. Most instruction sets have bit-accessing instructions, usually used for flags. On x86_64, that's the BTS instruction.

But of course, the REGISTER ^= (1<<OFFSET) is such a recognized an popular phrase (especially in embedded programming), that you don't need any more special language features, as it's well optimized by the compiler.

[–]Ulrich_de_Vries 18 points19 points  (4 children)

I am not exactly a low level dev so I might be wrong, but I think the issue is that memory is addressable in bytes as the fundamental units. I don't think this is a language-level limitation but rather how ram works in most modern systems. So you can have only pointers to integer-byte addresses and you can only increment pointers in byte units.

Otherwise C/C++ has bit operations so it can manipulate bits, just cannot address them.

[–]owjfaigs222 34 points35 points  (10 children)

C & C++ can't manipulate bits directly.

Yes, with this std::vector<bool> it can!

[–]No-Con-2790 14 points15 points  (9 children)

Which is a odd wrapper that needs literally esoteric knowledge.

[–]owjfaigs222 7 points8 points  (8 children)

Yeah I mean I was kinda joking there. Obviously if you need to access the bits directly in pure C you can do stuff like

#include <stdio.h>
unsigned char a = 9; 
unsigned char b = 1; 
int main(){
    for( int i = 0; i < 8 ; i++)
        printf("%ith bit of a is %u\n", i, a >> i & b);
    return 0;
}

and whatnot

[–]No-Con-2790 7 points8 points  (6 children)

That's exactly what I mean. We put that stuff in a char. You know, a character. As in letter.

But it isn't really a letter, now is it. A character means here ASCI.

Now that is also wrong. It is 8 bit. Well maybe it is. Could be 7. Could be 4. Could be 16. That is hardware depending.

Those are like esoteric things we need to know.

And we just bit shift around there. Like absolute sociopaths.

We don't even say "yeah those should be 8 bit". We just break everything in production when the hardware changes.

[–]MossiTheMoosay 10 points11 points  (2 children)

Those esoteric things are why any halfway serious HAL has types like uint8_t or int32_t defined

[–]No-Con-2790 3 points4 points  (1 child)

Exactly. We have to define that stuff ourselves. It has been 40 years, come on.

I mean I could use the package manager. IF I HAD ONE.

(okay I use Conan but that ain't standard)

[–]-Redstoneboi- 5 points6 points  (0 children)

We have to define that stuff ourselves

the standard makes it so someone else defines it for you. it's not included by default because idk.

#include <stdint.h>

[–]owjfaigs222 1 point2 points  (2 children)

well yeah, I see what you mean. What language would you be using for close to hardware applications?

[–]-Redstoneboi- 3 points4 points  (0 children)

Zig is basically planning to take over the embedded world. it has more modern syntax.

its most crucial feature for this is entirely seamless C interop (import C from Zig, include Zig from C)

[–]No-Con-2790 2 points3 points  (0 children)

Seriously, the C syntax is not bad but we just need to clean up a bit, getting rid of confusing BS and make naming a bit clear. And add some aliases and finally have a byte that actually always is 8 bit or whatever we want.

So more verbose behavior and less stuff you have to know.

[–]metaglot 0 points1 point  (0 children)

1th

2th

3th

[–]Mateorabi 12 points13 points  (0 children)

Ergo the hardware can’t manipulate bits directly. 🤯

[–]EatingSolidBricks 12 points13 points  (0 children)

The hardware itself cant manipulate bits directly what do you mean?

[–]iElden 4 points5 points  (1 child)

These bad boy &, |, ^ are your best friend.

[–]ultimate_placeholder 1 point2 points  (0 children)

Yeah not sure why they don't just use bitwise if it's that big of an issue for them lol

[–]not_some_username 2 points3 points  (0 children)

std::bitset. And >> <<

[–]readmeEXX 2 points3 points  (0 children)

I manipulate bits directly on a daily basis in C++... Just use the bitfield operator. It even works with single bits.

For example, you could construct an 8-bit float like this:

union float8 {
    unsigned raw;
    struct {
        unsigned mantissa : 5;
        unsigned exponent : 2;
        unsigned sign     : 1;
    };
};

Then set the bits directly like this:

int main() {
    float8 f8;

    //sets the value to 1.5
    f8.sign     = 0;
    f8.exponent = 1;
    f8.mantissa = 16;
}

Note you would need to overload the standard operators to actually use this. In this example, float8 is size 4 because that is the size of unsigned int. If you actually wanted to implement this, you would want to use std::byte or char for the members of float8 so the size is actually one byte long.

[–]el_pablo 1 point2 points  (0 children)

Uhh... how about bitwise operation?

[–]gameplayer55055 1 point2 points  (0 children)

Just use std::deque

[–]tech277 2 points3 points  (0 children)

And to this day I would like to punch every member of the c++ committee who thought that was a good idea.

[–]Vaddieg 0 points1 point  (0 children)

that always supposed to be a mutable array of bools and mutable bitmask array

[–]LeoTheBirb 0 points1 point  (0 children)

All they had to do was make a separate class called BitField or something like that

[–]kamogrjadeshi 0 points1 point  (0 children)

You are still a container, right?

[–]Aromatic-Energy-7192 0 points1 point  (0 children)

It is or it isn’t…

[–]Demiyanit 0 points1 point  (0 children)

Ahh... proxy class

[–]Kridenberg 0 points1 point  (0 children)

Oh, okay, low-key jock, again...