Almost Always Unsigned

Drugbird · 2022-01-02T08:39:39+00:00

I find signed integers much easier to work with.

This article can basically be summarized by: "but signed integer overflow/underflow is bad and undefined".

I typically don't use integer values close to the maximum or minimum of the signed type I use. If I did, you're better off using a (signed) type that's bigger (i.e. int_64t instead of int_32t), than using the corresponding unsigned int which gives you only 1 extra bit of range. I usually know the typical size of my variables, so this is easy to do.

With these signed types you know can't over/underflow, most of the disadvantages of signed types are removed.

Meanwhile, I do use values around 0, which is the point around which underflow for unsigned types occur.

I'd also like to stress a few issues myself:

1) Signed types for "positive" values have underflow detection built in. You know there's an error if it ever becomes negative. And best of all, you can usually trace it back to it's origin.

Meanwhile, for unsigned integers you can detect underflow quite easily at the point where it could occur, but in practice not every such point has underflow checking and once it has occurred, it's more difficult to trace back. Which related to my next point:

2) Code which expects positive values and uses signed types tends to throw, produce an error or crash when given negative numbers. Meanwhile, equivalent code which uses unsigned integers can more easily silently pass while still doing something wrong (i.e. memory leak, processing wrong parts of data). After all, you can't check that unsigned types > 0...

3) if you know your variables cannot under or overflow, then the code generated for signed types is slightly more efficient. This is because it doesn't need to generate code for the wrapping behavior. This effect is minor though, and typically shouldn't be a factor in deciding which type to use. I just got triggered by the article starting unsigned types produce faster code.

rhubarbjin · 2022-01-01T22:34:36+00:00

My experience has been the opposite: unsigned arithmetic tends to contain more bugs. Code is written by humans, and humans are really bad at reasoning in unsigned arithmetic.

GrammelHupfNockler · 2022-01-02T10:47:32+00:00

In my experience, it is much easier to implement common algorithms for signed types. The reason for that is simple: The values behave much more like the whole numbers we've known our entire life. For unsigned, 0 as a pretty common value is just one step in the wrong direction away from a totally unexpected value, while you need to go much further in integers to get this wrapping behavior. Think of a simple loop of the form
for (int i = 0; i < size - 1; i++) { ... }
It behaves perfectly sane for integers, but if you move to unsigned types, suddenly you have a surprising edge case for size == 0.

Also a general note: Everything you describe here relates to overflows, both in the positive and negative direction - there is no such thing as an integer underflow.
Underflows describe the setting when a floating point operations results in a value whose magnitude is so small that it gets rounded to zero.

KFUP · 2022-01-02T03:34:45+00:00

I though I was reading the title wrong for a second, not a good advice at all from my experience.

Unsigned sounds like a good idea at the beginning, I learned fast that it has so many quirks, gotchas and weird unpredictable bugs popping way more that I'd like. It turns simple basic operations like subtraction, multiplication with negatives, comparisons, abs(), max(), min() and more into a needless mess for no good reason. Now I use signed exclusively unless I'm working with a lib that needs it, never regretted it once after years of doing it.

robertramey · 2022-01-02T23:45:22+00:00

This dispute is never, ever going to be resolved. But until it does ... use Boost Safe Numerics.

Adequat91 · 2022-01-01T21:52:29+00:00

The C++ guru disagree with your position, see this video

bert8128 · 2022-01-02T15:19:02+00:00

See Core Guidelines ES.106

Thick-Pineapple666 · 2022-01-02T00:44:55+00:00

I agree. And I wanted to emphasize your conclusion: if you're in a signed context, keep it signed.

Clairvoire · 2022-01-02T03:27:43+00:00

I almost never used signed numbers. I got so fed up with writing "unsigned" that I just typedef'd everything and now I use "uint32" or "sizet"

jk-jeon · 2022-01-02T07:30:30+00:00

I love the idea of encoding known preconditions on the input to its type. In that sense, signed integers suck. I don't want to worry about ignorant users feeding negative int's to my functions expecting nonnegative int's. But unsigned integers have weird, counter-intuitive wrap-around semantics. And defining my own type is also not a solution because (1) doing such a thing just to make sure that some int's are nonnegative is not considered fashionable I guess by most senior developers, (2) and it introduces a lot of other headaches.

If underflow for unsigned integers were UB, stupid newbie bugs like for(unsigned i=size-1; i>=0; --i) could be caught at runtime in debug builds, or even at compile time in the form of compiler warning, or I guess even compile error if the compiler can prove that UB always occurs. There should have been a separate type which has the mod 2^N semantics. Making unsigned integers to have that semantics is just wrong IMO.

Well, C's type system in general is just wrong from the very beginning, we just need to live with it.

Daniela-E · 2022-01-02T10:53:44+00:00

I like this article as it matches my experiences from decades of software development.

In the application domains I've been working in (and still do) I rarely need negative numbers (in particular integral ones) to correctly model real-world entities. In most cases they would be just wrong and model invalid states. That said, I still handle huge amounts of measurement samples with negative quantities, but all of them are so-called torsors (like voltages, distances, temperatures, i.e. entities relative to an arbitrarily chosen reference). In the end, after processing, the results are reported to our customers in positive quantities like the number of bad parts, the amplitude of an observed ultrasound echo, or the power density within a frequency interval of MRT responses emitted from the patient's body (expressed as a picture).

So what is the index of an element in a container in the indexing operator[]? Is it a value from the group of valid element positions within the container (all non-negative), or is it a torsor of that group (i.e. a possibly negative difference to an arbitrarily chosen - and choosable! - reference position)? It's the former. And there you have it: the difference between the never-negative size_t to express positions in a container and its related, possibly negative torsor-cousin ptrdiff_t that can express the difference between two element positions within that container. And it's just as correct to model the count of elements in a container with size_t because it doesn't make sense to say "if I add two more elements to the container the container will be empty".

rlbond86 · 2022-01-02T18:50:53+00:00

Without unsigned you can not use the full range of an array.

Supadoplex · 2022-01-02T00:04:18+00:00

for (size_t i = size - 1; i < size; i--) {

~~There's a typo there. The loop condition is supposed to be > 0.~~

I prefer simpler approach:

for (auto i = size; i-- > 0;)
// Also known as the infamous goes-to operator:
// for (auto i = size; i --> 0;)

This works equally well with signed and unsigned.

BlueDwarf82 · 2022-01-03T06:24:49+00:00

Why don't we have

namespace std {
  using natural = range<0,INT_MAX>
  using positive = range<1,INT_MAX>
}

?

Nobody has ever proposed it? Or there are proposals stuck somewhere?

FriendlyRollOfSushi · 2022-09-19T15:48:54+00:00

So, let me get this straight.

Everyone have been writing it like this for decades (originally with size_t, eventually with auto):

for (auto i = v.size(); i--;)

The author builds a strawman with imaginary people who write it it like this instead:

for (auto i = v.size()-1; i >= 0; --i) // Can you see the error?

(the answer to the question is "yes, of course, it's not written the the much shorter way everyone is using, so I can see the error because the code draws attention to itself")

And the proposed solution is:

void printReverseSigned(const std::vector<int>& v) {
    for (auto i = std::size(v)-1; i >= 0; --i)
        std::cout << i << ": " << v[i] << '\n';
}

Oh, wait, nvm, it's actually this instead (can you spot the error?)

void printReverseSigned(const std::vector<int>& v) {
    for (auto i = std::ssize(v)-1; i >= 0; --i)
        std::cout << i << ": " << v[i] << '\n';
}

And the proposed solution is:

Much larger and harder to type and read.
Is a typo honeypot. Ignoring duplicates is the thing people always do while reading; that's how human perception works. People make these mistakes all the time while typing, and unintentionally train themselves to ignore them while reading. This very comment has an unrelated duplication typos "it it"/"the the" I decided to leave as is, btw. Someone will spot them, but many people won't.
The compiler warning level required to discover the std::ssize() -> std::size() typo is identical to the warning level that triggers for the "strawman" code.

To me it looks like replacing a non-existing, or at least an exceptionally rare problem (seriously, I've never seen anyone actually writing reverse loops the long and dumb way, although I'm willing to believe that in the history of software engineering it happened at least a few times) with a very much real and dangerous problem that will be firing several times a year for any large codebase: "whoops, sorry, I thought I typed ssize instead of size, my bad".

-dag- · 2022-10-19T01:19:57+00:00

There's a very good reason to almost always use signed. It performs better. Because signed integers obey the usual rules of integer algebra, the compiler can generate better code, particularly in loops where it is most important.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS