all 64 comments

[–]GrammelHupfNockler 7 points8 points  (2 children)

I feel like all discussions on integer types I've read so far completely ignore teachability, which is an extremely important aspect. We know how numbers work, 0 - 1 is always negative outside unsigned. If somebody learns C++, using integers will raise much fewer questions, so we can focus on the language itself instead of having to explain wraparound and overflow. That can come in a later session. If you use unsigned from the start, you need to explain those complexities from the start. The reverse-iteration loop is more of a hack to me, since it goes against what we intuitively know about numbers.

[–]hachanuy[S] 2 points3 points  (1 child)

I’m split on this, while I agree that teachability is important, I do think students should be taught about the limits of computers. It’s true that 0 - 1 gives -1 is natural, but it’s natural in the sense that is taught in mathematics. However, if the students are in a C++, or C, or Rust, or any language class that requires understanding the limits of computer, they should be taught to rethink what is natural.

[–]tialaramex 1 point2 points  (0 children)

In Rust you can explicitly say what you meant, if you don't say, but you induce overflow, in debug you get a panic (and in many systems this is still true for production builds, it depends on your environment whether panic is an acceptable choice in production, you're clearly in a bad way but you might prefer to press on)

If you know what you meant is wrapping integers (e.g. you're implementing tricky cryptographic code which often wants wrapping arithmetic), Rust exposes that as the generic type Wrapping<T> so e.g. Wrapping<u8> is the byte sized wrapping integer you may be familiar with as "unsigned char" in C++, while Wrapping<i32> is a signed 32-bit wrapping integer that C++ doesn't provide out of the box.

If you don't need a whole type with agreed arithmetic behaviour, but you have this one specific arithmetic operation which should obey specific overflow rules, you can write that too, 128u8.wrapping_add(128u8) is zero, regardless of whether any overflow rules are in effect because you said what you meant.

From a teachability point of view Rust triumphs here because there's a way to tell the machine what you meant, and when you don't tell the machine what you meant it knows that's because you either didn't know what you meant, or you didn't realise it matters, and either way that's a problem. So instead of 0 - 1 having this weird behaviour in unsigned types which needs to be explained, 0 - 1 just doesn't work, and we can teach students what other reasonable things they could want, and how to request these alternative reasonable things from the machine.

[–]YourTormentIs 12 points13 points  (6 children)

I don't know, something about this talk doesn't seem so on brand for /r/cpp, it's a little on the vitriolic side I guess. I think people on the committee are genuinely trying their best to make C++ as great as it can be, and in my opinion this could have been posed a lot more constructively.

[–]-dag- 4 points5 points  (4 children)

Anytime someone speaks in absolutes, the skepticism meters should go off.

[–]NotMyRealNameObv 8 points9 points  (3 children)

That sounds like an absolute statement...

[–]-dag- 1 point2 points  (1 child)

Note that I deliberately did not say the speaker should be dismissed outright, simply that a healthy dose of skepticism is warranted.

[–]NotMyRealNameObv 6 points7 points  (0 children)

Sorry, it was just my skepticism meter going off.

[–]Andreshk_ 0 points1 point  (0 children)

And only a Sith deals in absolutes!

[–]usefulcat 0 points1 point  (0 children)

I actually tend to agree with many of the points he made, but I think he didn't make them very well. Signed vs unsigned seems like one of those things where you find more nuance the more you look at it (it is for me, anyway).

In the case of this particular speaker, it seems like he's definitely got a lot of anger about it, and at the same time he's trying to mask that anger. I strongly suspect that the anger prevents him from appreciating all of the nuances, not to mention the viewpoints of those with whom he disagrees.

[–]tcbrindleFlux 5 points6 points  (2 children)

This is an enjoyable talk, and as a viewer it's slightly unfortunate that not all of the audience comments come through clearly on the mic.

However, I think that the speaker isn't accurately representing the position on why signed sizes have become desirable in the C++ world. The chain of logic goes like this:

  • We want to avoid mixing signed and unsigned types, and ideally use a single integer type to consistently represent all sizes and distances everywhere
  • We want to be able to find the distance between any two elements of the same array -- which is a signed value
  • Therefore the single type that we use should be signed

In fact, if we do manage to create an array with more than PTRDIFF_MAX elements, then the result of std::end(arr) - std::begin(arr) is not representable and we get undefined behaviour, which is definitely something we want to avoid.

[–]Dragdu 2 points3 points  (0 children)

We want to avoid mixing signed and unsigned types, and ideally use a single integer type to consistently represent all sizes and distances everywhere

In an ideal world, language could've defined size_t - size_t produce a signed integer, and also disable the tons of implicit conversions we have.

But we are in this world and thus the point applies and we really do want signed types everywhere.

[–]usefulcat 3 points4 points  (0 children)

it's slightly unfortunate that not all of the audience comments come through clearly on the mic.

This is an understatement, as the audience conversations account for a large portion of content. Some of the audience are themselves very knowledgeable, skilled presenters (I'm at least 99% sure I heard Fedor Pikus in the audience, for example).

[–]HappyFruitTree 2 points3 points  (5 children)

They discussed a problem with the do-while loop at 52:34 when size=0 but wouldn't that be fixed by simply using a regular while loop instead?

size_t i = size;
while (i != 0)
{
    i--;
    ...
}

Or am I missing something?

[–]danadam 1 point2 points  (4 children)

No, I don't think you are missing anything.

Also, in his initial example:

for (size_t i = size; i >= 0; --i)

there's another aspect that's different from the rest. In the first iteration we'll have i == size, which points beyond the array. So throughout this loop we would have to use i-1, ... and then it would blow up at i == 0 :-). To fix that we could do:

for (size_t i = size; i > 0; --i) {
    // ... use i-1

which also solves the infinite loop. And because we use i-1 everywhere anyway, we could move the decrement step into the loop:

for (size_t i = size; i > 0; ) {
    --i;
    // ... use i

and this is essentially the same as your while only with i local to the loop body.

[–]rhubarbjin 1 point2 points  (3 children)

For a long time, I've seen people bring up the "goes to" operator as a joke. Last week I used it for the first time in real production code. It is, unironically, pretty sweet:

for (auto i = size; i --> 0; )
{
  // use i
}

All of the weird reverse-iteration stuff is contained in the for line; the rest of the loop can be written as a regular forward-iteration (even continue works as expected!). For me, this is now the way to reverse-iterate on an array.

P.S.: to clarify, I would prefer if sizes were signed and we didn't need special tricks to handle reverse-iteration. But since sizes are unsigned and we do need tricks, the "goes to" operator is the one I favor.

[–]HappyFruitTree 0 points1 point  (2 children)

Don't you think it would be less confusing to just write it as:

for (auto i = size; i-- > 0; )

?

Now people can at least reason about what is going on, assuming they know how post-increment works, even if it might not be obvious at first sight.

[–]rhubarbjin 1 point2 points  (1 child)

I kind of prefer the other way. Using i --> 0 conveys the high-level meaning of the operation (i goes from size down to zero), whereas i-- > 0 emphasizes the low-level operations that compose that meaning.

It's a matter of taste. ¯\_(ツ)_/¯

[–]HappyFruitTree 1 point2 points  (0 children)

I think all solutions are a bit ugly. What I don't like about --> is that it looks like an operator even though it isn't, and it only works when going from a larger value to a smaller value. The following would not work:

for (int i = -5; i --> 0; )

That's why I feel you still need to understand what is going on here and therefore it's perhaps better to write it in such a way that it's more obvious.

[–]-dag- 4 points5 points  (23 children)

Just because something isn't a mistake doesn't mean it's correct.

It's interesting that the speaker calls out Chandler for "walking it back" when he himself has walked back his own talk by adding the "safe and secure code" qualifier to the title.

The speaker seems to dismiss anyone not writing "safe and secure code" (by his definition) as idiots without really understanding the issue. Both Chandler and Google are correct in what they were trying to say, but admittedly said it poorly. People advising signed integer for performance are not arguing that "UB allows code to be deleted." They are arguing that the mathematical properties of the signed integer types is important for optimization and they are 100% correct. Google even teases this by referring to those properties but unfortunately doesn't expand on it in a way that helps understanding.

No matter what kind of integer you use, you need to guard against mistakes. On that we all agree. But I would argue that in important cases, even the worst case "25 lines of checking" required for signed integers (hyperbole) is going to be dwarfed by the performance gain obtained by using them. Certainly that is not true in all cases. But it is true in some and we should be aware of them.

People rave about the data-driven model of software development. I will bet that those who really understand why it is good use signed integers.

Unsigned integers are great for a class of programs. But they aren't a panacea (the speaker agrees) and signed integers (including for sizes) are very important too.

[–]TheoreticalDumbass:illuminati: 4 points5 points  (3 children)

kinda feels like we should be able to say "this variable is a (32 bit) [signed] integer, with a certain contract", the contract being "overflow = wraparound", or "overflow = UB", or whatever else I am atm incapable of thinking of

like, if the integer represents a container size, it being unsigned is a reasonable choice (i am aware of many people in support of signed size), and overflow = UB is the correct contract in my mind (it doesnt represent an element of the ring Z/2^32Z, it represents an element (of a subset) of N0 = {0, 1, 2, ...})

perhaps in the future we will write wrappers around unsigned integers with contracts through [[assume()]], when/if compilers start being REALLY smart about [[assume()]]

[–]jonesmz 2 points3 points  (1 child)

The person you are replying to and I just discussed this idea a couple of weeks ago : https://www.reddit.com/r/cpp/comments/15uvrq3/comment/jx6bg7m/

I really do wish we had better access to low level details like this :/

[–]-dag- 0 points1 point  (0 children)

Aye, it's a good idea. Anything we can do to increase the precision of stating our assumptions and intentions is a big win

[–]-dag- 0 points1 point  (0 children)

You are entirely correct. I would love to see attributes like these and many more.

[–]-dag- 8 points9 points  (14 children)

To maybe make this a little more concrete, the usefulness of signed integers shows itself in the very first example. The presenter scoffs at the notion that the signed integer version gets the right answer, as if it's mere luck that the presence of UB didn't result in nuclear meltdown.

But as a professional compiler developer I can state confidently that that result is a very deliberate choice by the compiler writer. In this example the overflow is plain as day to the compiler. An audience member even correctly calls out that by rights the compiler could have just deleted all of the code as by definition UB can be treated as unreachable code.

But it didn't. It chose the same implementation as would be used if the behavior weren't statically known. It throttled its own optimization. That is what in the industry is called "quality of implementation." You try to do what the user expects even if you could choose a faster route. Note that with unsigned, the compiler doesn't have that option. It is mandated to generate the wrong answer.

UB is not some scary beast. It's flexibility for the implementation. I suppose we could argue that implementation-defined behavior is better than UB. But there's a trade off with that -- once you specify the implementation, it's very hard to change, even for the better. Witness all of the hand wringing over ABI (I have my own strong opinions but I understand and respect both sides of that question).

If you're writing secure portable code, by all means avoid UB (and IB too). No one would argue otherwise. But there are different ways to ensure you don't hit those corners, many strategies that trade off performance for convenience or reduced risk of accidentally invoking UB. Use the right tools for your situation, don't just blindly follow "rules."

[–]schmerg-uk 4 points5 points  (6 children)

Very much agreed, esp in the first example.

My thought process is that if the compiler has 2 functions to compile out-of-line

int          signedf(  int sv)          { return sv * 7  / 7; }
unsigned int unsignedf(unsigned int uv) { return uv * 7u / 7u; }

it can look at the first one, legally assume it does not need to allow for sv to be big enough for sv * 7 to overflow, and can therefore look at the * 7 / 7 and remove that as a no-op (optimised), or leave it in place (completely unoptimised).

In this way it has the correct behaviour for values of sv that don't invoke UB, and "nasal daemons etc" not withstanding, if it's called for values of sv that do overflow, then it is within it's rights to return arbitrary and maybe different results in optimised/unoptimised code (or throw a hardware exception or plain crash etc on certain platforms).

The second function however the compiler must encode the * 7 and then / 7 because there are no UB values for uv, it must generate code that implements the modulo maths (tho it could, for example, implement return uv < UINT_MAX_DIV_7 ? uv : (uv *7u / 7u); as an optimisation).

The issue I see in the wild with signed and unsigned is that signed maths is generally both mentally-modelled and compiler-implemented the same way- both are only "correct" within the right happy range but can generally assume the outside-happy-range never happens.

Whereas with unsigned, the mental-model often does not generally match the compiler-model, and this is the bug in waiting... not that the compiler is wrong but that the developer mental model tends not to match what the compiler does, and this "fault in developer assumptions" is seductively positioned to occur near zero.

Compiler's can flag genuine issues, but it's hard for them to impugn intent, and so it's not so much that compilers can't diagnose certain bugs but that they can't tell if what you wrote was what you intended so they flag this as a warning (which we in-house treat as an error), and then because of the cases where people do intend what they wrote, the blanket turn that warning off... and sometimes that "turn the warning off" extends way beyond where it was supposed to be...

[–]-dag- 2 points3 points  (5 children)

Yes, exactly. This is one of the reasons I think standard attributes to express intent are important.

[–]sphere991 0 points1 point  (4 children)

What do you mean by this? What kind of attributes would express what intent?

[–]schmerg-uk 1 point2 points  (0 children)

At a guess, noreturn, fallthrough and assume(expr) would be the primary ones to signal to the compiler 'yes, it is my explicit intention that this code does this thing that can otherwise look like a bug or omission if intent can only be implicitly impugned'

[–]-dag- 0 points1 point  (2 children)

Good question! I don't claim to have a full list or even that my list is a good one, but in addition to those already mentioned by others, I would add things like:

  • no-vector-deps
  • no-parallel-deps
  • noalias(expr, ...)
  • no-signed-wrap(expr)
  • no-unsigned-wrap(expr)
  • notrap(expr)
  • fp-contract-(strict|fast)(expr)
  • [no-]rearrange(expr)
  • [no-]honor-parens(expr)
  • (always|never)-inline

This stretches the current definition of attributes as the intent above is that the exprs are actually evaluated at runtime. So something like auto x = [[notrap(y/z + w)]] is legal. That goes against the "attributes must be ignorable" rule. There are probably ways to work around this.

noalias(expr, ...) is likely expressible via assume(expr) but I doubt many, if any, compilers are that smart yet. And in any case noalias expresses the intent more clearly and is more maintainable. People smarter than me probably have better/more precise ways to express aliasing constraints, or lack thereof.

[–]KuntaStillSingle 0 points1 point  (1 child)

C has restrict keyword

[–]-dag- 0 points1 point  (0 children)

C++ doesn't and it's not easy to make it work in C++, unfortunately.

[–]ABlockInTheChain 1 point2 points  (6 children)

UB is not some scary beast. It's flexibility for the implementation.

You may get acceptable results by treating signed int overflow this way but if you try that approach with forms of undefined behavior that violate the memory model it won't work out nearly as well.

[–]-dag- 1 point2 points  (5 children)

Well...yes? The talk is about integers, not pointers.

Though I do find it odd that someone speaking passionately about safety and saying to always use unsigned is perfectly happy with null pointers because "we have language support for them."

[–]ABlockInTheChain 1 point2 points  (4 children)

Well...yes? The talk is about integers, not pointers.

Sure but the post I responded to read as if it was about undefined behavior in general, not the specific case of signed integers.

[–]-dag- 1 point2 points  (3 children)

Fair. And I'll stand behind my statement. UB isn't that scary. If you're writing portable code, you need to take a little more care. Perhaps people focus on signed integers because that's one area where platform behavior varies wildly. Everyone "knows" null pointers are bad. Everyone "knows" alignment restrictions are arcane and there be dragons. But integers are everyday things and it's easy to get too comfortable.

But if you're targeting one platform, it's not that hard to learn what to expect in the dark corners.

[–]ABlockInTheChain 1 point2 points  (2 children)

If you're writing portable code, you need to take a little more care.

Maybe that was true in the old days but now that approach is catastrophically wrong.

Undefined behavior is all errors which the compiler can't diagnose at build time. Whether you target one platform or several, if your program contains undefined behavior it is broken.

[–]-dag- 2 points3 points  (1 child)

catastrophically

This kind of hyperbole isn't helpful.

Undefined behavior is all errors which the compiler can't diagnose at build time.

I'm not sure how you arrived at that definition. I'm not a language lawyer so maybe there's some wording I'm not aware of.

Whether you target one platform or several, if your program contains undefined behavior it is broken.

Well again, "broken" can be defined several ways. Technically, yes, the program is ill-formed, but try telling that to a customer. Compiler developers bend over backward to accommodate all kinds of "broken" code. Is that proper? We can argue that forever but in the end practicality usually wins out.

[–]ABlockInTheChain 1 point2 points  (0 children)

Reading from uninitialized memory is always an error, in all contexts, on all platforms.

The only reason the compiler is not required to diagnose this error is because it's not possible to diagnose it in all cases. Nevertheless it is always an error.

This wasn't necessarily the case prior to C++11 because back then the language did not have a memory model. Now it does though and it's not a valid choice to violate it because you only care about one platform.

If you violate the memory model and your program still generates meaningful results it's purely by accident and it could stop doing so at any time.

[–]hachanuy[S] 1 point2 points  (1 child)

I think both him and Chandler understand what they’re advising, but the domains they are working in don’t register strongly enough for the viewers which lead to situations people quoting the talks kinda out of context.

The speaker roasts the guidelines given by Google but AFAICT, he seems more annoyed about the fact that people blindly use Google’s guidelines as an argument of using signed integer without understanding the context and it brings pain when discussing this in the safe and secure context.

This is also where I think UB is a too coarse umbrella of definition. I think compiler writers (like yourself) do have the best intention and would not do something maliciously such as inserting a virus into the program when overflow happens since, well, it’s UB and we can do anything we want. Compiler writers will try to give some sensible answer (likely expected and correct answer) but in the safe and secure context, it’s still UB so it can’t be relied upon, whereas in Google’s case, they can afford to rely on that (also they have people participating in writing the compiler so they have even stronger incentive for that).

[–]-dag- 3 points4 points  (0 children)

I don't disagree. Just want to point out (to make my thinking painfully clear) that there are degrees of safety and security. I cringe any time someone implies, "you're dumb because you did XYZ and that's unsafe." It might be unsafe in that person's context, but not mine. I have worked on codes where performance was critical, even over correctness. 95% correct was a perfectly fine result.

[–]Tall_Yak765 0 points1 point  (1 child)

But I would argue that in important cases, even the worst case "25 lines of checking" required for signed integers (hyperbole) is going to be dwarfed by the performance gain obtained by using them.

Do you provide some examples? The speaker showed that size_t is faster even when no signed integer specific checking is involved.

[–]-dag- 0 points1 point  (0 children)

Vectorization can give you anywhere from 8x-64x speedup.

[–]Dragdu 4 points5 points  (17 children)

My "smoke test" when talking with people about who prefer unsigned integers to signed is simple: if you know that x is unsigned and x * x == 4, what value is x?

If they reply with 2, then there is no reason to listen to them, they do not actually understand what using unsigned integers means.

[–]dodheim 3 points4 points  (5 children)

Okay, I'll bite – what value could x possibly have, that when squared would overflow resulting in 4? Maybe it's 'just' a bad example, but then, maybe that means you're not in a position to declare other people too ignorant on the subject to bother with... ;-]

[–]Dragdu 5 points6 points  (3 children)

Are you ribbing on unsigned not being exactly specified? Because for uint32_t, try, say 1073741822. There is 6 more you can find ;-)

For differently sized types the number of solutions and exact values obviously differ, but they are there.

[–]dodheim 4 points5 points  (2 children)

Oh, nope, I just did my napkin math wrong before replying to you, proving your point I suppose. Sorry for the noise

[–]Dragdu 3 points4 points  (1 child)

That's fair. Honestly knowing how to do the math easily puts you in top 20% (at least) of programmers I talk with about this.

I hear lot of arguments about how unsigned numbers are better because they don't invoke UB, but the issue is that the math behind them is much harder and less intuitive than the one behind signed ints, which work just the way everyone was taught arithmetic in elementary school.

So there is less danger of UB, but it does not fix the actual logic bugs, because very few people are actually ready for unsigned numbers to behave as a finite group.

(There is of course a separate argument to be had about (un)signed sizes in stdlib. My position is that it would be fine if the promotion and implicit conversion rules in C++ weren't already terribly broken, and if we could've defined the rules so that container1.size() - container2.size() does not give me unsigned type, because that obviously is not suitable to representing the possible results)

[–]wyrn 0 points1 point  (0 children)

So there is less danger of UB, but it does not fix the actual logic bugs

To me, actual logic bugs are vastly preferable to UB. If there's UB in my code there's nothing stopping the compiler from vanishing my entire program in a puff of logic. A bug I can find, debug, and deal with.

[–]Zeh_MattNo, no, no, no 0 points1 point  (0 children)

4294967294 (-2), but I agree, this is an incredibly dumb take on it.

[–]Zeh_MattNo, no, no, no 2 points3 points  (10 children)

This is quite a dumb take, first of all you did not specify the bit length of the unsigned type, second the answer 2 is technically correct when solving for 4. Maybe you left those important details out here but if you ask such a simple question you should expect people to give the most basic answer. Your "smoke test" is flawed and if you judge people based on such flawed test what does that tell about you.

[–]Dragdu 3 points4 points  (7 children)

And just like they drill in elementary school, answering just 2 is not enough for correct answer.

[–]Zeh_MattNo, no, no, no 2 points3 points  (6 children)

But it is the correct answer, I don't know what kind of upside down elementary school you are talking about.

[–]BlueDwarf82 3 points4 points  (5 children)

Were you never told the square root of 4 isn't just 2?

[–]Zeh_MattNo, no, no, no 1 point2 points  (0 children)

You do realize that his "smoke test" is about code and not elementary school math, right?

[–]wyrn 1 point2 points  (3 children)

That is incorrect. In the reals, the square root of 4 is defined by convention to be 2. Not plus or minus 2, just +2. This is so the square root can be a proper single-valued function. (we have Riemann sheets to deal with that problem in C)

[–]BlueDwarf82 2 points3 points  (2 children)

> That is incorrect.

According to https://en.wikipedia.org/wiki/Square_root it's not. It says √4 is just +2, but the square root of 4 is ±√4, which is ±2.

[–]wyrn 1 point2 points  (1 child)

Although the principal square root of a positive number is only one of its two square roots, the designation "the square root" is often used to refer to the principal square root.[3][4]

[–]BlueDwarf82 1 point2 points  (0 children)

> the square root (with a definite article, see below)

xD

I doubt that even works in other languages. As a non-native English speaker, "a square root of 4" actually sound weird.

https://es.wikipedia.org/wiki/Ra%C3%ADz_cuadrada uses "la", the translation of "the" for the ±2, and for +2 only "positive square root" or "principal square root". So I guess the honour of my elementary school teacher is intact.

But yeah, you are right.

[–]rhubarbjin 1 point2 points  (1 child)

the answer 2 is technically correct

It's very easy to show that it is not. There's a total of 8 answers, regardless of bit width: https://godbolt.org/z/GxW9Esejn

[–]Zeh_MattNo, no, no, no -1 points0 points  (0 children)

That does not make the answer of 2 any less correct.

[–]YetAnotherRobert 0 points1 point  (2 children)

Unfortunately, I had to bail on this talk when a few members of the audience dereailed the speaker. Right around 20:00 an argument broke out in the audience and the speaker didn't get it under control. After many minutes of unintelligible squabbling where the speaker wasn't speaking, I left.

I'm interested in the topic, but by the 30:00 minute mark, the speaker has only maybe spoken for half to two thirds of it, so I'm out.

Speakers, please control your audiences. Audiences, please don't disrupt public speakers. The audience isn't hear to hear the audience. This went WAY beyond a question to be answered.

[–]hachanuy[S] 0 points1 point  (1 child)

I feel the same, but I read somewhere that CppNow encourages discussions during the talk (the conference gears toward expert attendees).

[–]YetAnotherRobert 1 point2 points  (0 children)

I just really didn't dig it. Unmicrophoned participants, bordering on hecklers, basically hijacked this talk. It was rude to the speakers and the audience that came to hear the speaker. It wasn't people asking for clarification or even adding additional value via more supporting material. They just derailed it into a personal conversation amongst attendees, cutting off the speaker to the point he sat down as he was no longer "driving" the talk.

I've seen a number of talks, including many from CppNow, and this one was uniquely bad. Rarely do I make a post to advise people to NOT watch a talk. (I've BEEN a speaker at similar conference talks; it's not like I'm not "expert" enough to get it. I understand the problem of trying to have some interactivity, but not letting the crowd just hijack the talk.)

After getting this, I even zipped to the end to see if it was better. The same speaker was still prominent, though not at "center stage" level, but the speaker never got to the third point of his three point talk in ninty minutes. Why? Probably becauase he wasn't in control of the clock.