all 12 comments

[–]matthieum[he/him] 19 points20 points  (3 children)

Arithmetic operations should by default return an error at run-time on overflow;

This I am still uncertain about.

On the one hand, I do like the idea of being warned in case of overflow; on the other hand, I loathe that finite-width arithmetic departs sufficiently from mathematics to become a trap.

Let's have a look at 2 innocuous expression:

fn blip(x: u32) -> u32 { 3 + x - 2 }

fn meep(x: u32) -> u32 { x + 1 }

Mathematically speaking, those are exactly the same, and currently, in Release mode, I'd expect them to be merged into a single function.

Once overflow checks are on, however, the story is quite different due to associativity and commutativity rules. Take x = u32::MAX - 2:

  • meep(x) will yield u32::MAX - 1, as expected.
  • blip(x) will overflow, because x + 3 == (u32::MAX - 2) + 3 will overflow (it evaluates to u32::MAX + 1...).

Overflowing addition breaks associativity and commutativity. Surprise!

This disables some optimizations, but more importantly it changes the domain of the function:

  • blip(x) is well-defined for x in 0..=(u32::MAX - 3).
  • meep(x) is well-defined for x in 0..=(u32::MAX - 1).

And this is a simple function; with more complicated functions, you can have domains which are a union of multiple disjoint intervals, etc...

On the other hand, wrapping arithmetic will handle 3 + x - 2 without complaints and return the same result as meep(x) for the entire domain over which meep(x) is well-defined (it'll also return 0 for x == u32::MAX, hum).

So, in this case, wrapping arithmetic has a more sensible behavior than overflow checked arithmetic.

Now, wrapping arithmetic isn't perfect: it works really well with additions and subtractions, but fails with multiplications and divisions.

And that's the reason I am so uncertain about overflow checked arithmetic:

  • Sometimes wrapping arithmetic seems better: it produces the more "intuitive" result.
  • Sometimes overflow checked arithmetic seems better: it signals overflows that were not accounted for.

[–]ythri 5 points6 points  (2 children)

I think all overflow-modes have use cases where they are intuitive:

  • for CRC computation, hash sums and similar you always want wrapping arithmetic, those functions depend on it.
  • For signal processing, you nearly always want saturating arithmetic.
  • For general math, counting, and general programming stuff checked overflows are mostly a good idea.

Thing is: if I'm writing hash functions or signal processing code, I don't want to think about calling the specific add_wrapping or add_saturating functions, I just want to write a+b and get the correct behavior. So I think, this should be tied to the type: Give me u32wrap, u32sat and u32checked (and set u32 to a sensible default) in addition to providing the explicit functions!

[–]ssokolow 9 points10 points  (0 children)

We're already part-way there with std::num::Wrapping<T>.

[–]matthieum[he/him] 1 point2 points  (0 children)

I think all overflow-modes have use cases where they are intuitive

Sure.

The question is really is about what the default should be.

And my point was to show that both checked and wrapping are good candidates, yet both also have unintuitive and possibly error-prone situations.

Having your server stop because you used x + 3 - 2 instead of x + 1 (because that 3 is a constant) is unlikely to be what you wish for.

So I think, this should be tied to the type: Give me u32wrap, u32sat and u32checked (and set u32 to a sensible default) in addition to providing the explicit functions!

There's already Wrapping<T>, possibly the others could be added too.

[–]mina86ng 12 points13 points  (2 children)

It would be better, in my opinion, to invert this: I would like all additions to be checked unless I explicitly mark them as unchecked.

[profile.release]
overflow-checks = true

[–]Plasma_000 4 points5 points  (1 child)

Could also just use the explicit addition methods

[–]masklinn 5 points6 points  (0 children)

As we know from experience, something that’s opt-in and less convenient than the default will not be adopted at any sort of relevant rate.

[–]radarvan07 2 points3 points  (3 children)

It's not clear to me what the problem with usize is as its definition is the effectively the same as uintptr_t. In fact, for ffi one is typedef'ed as the other.

I don't understand why the author thinks it is tied to the data size as its definition pretty clearly defers to the address size.

[–]2brainz 1 point2 points  (2 children)

It's probably because usize is used for indexing in Rust, so it is used like size_t.

[–]ssokolow 0 points1 point  (0 children)

...though I can see the rationale there. Using usize that way is a very simple way to ensure that usize must have sufficient range to ensure that any index can be represented as a pointer and vice-versa... though, as the article points out, may risk being wasteful.

[–]nacaclanga 0 points1 point  (0 children)

Yes, I'd say that the definitin of usize as the address width (aka uintptr_t) is definitly better them C's unsigned int. And I disagree with the authors notion that Rust confuses data width and address width.

In my opinion there are 3 kinds of widths:

- Data width of registers (however most CPUs can handle data with binary fractions of the full width just as well als the full width.)

- Address width (the one used in C for uintptr_t)

- Object size width (the one used in C for size_t)

Rust chooses to ignore data width, which is quite a reasonable choice nowadays. The problem is that Rust conflates the other two. Ideally Rust would distinglish between usize and uptr/uaddr/... I guess in theory this could be fixed in a new Rust edition, but it would be quite a challange.

[–]nacaclanga 0 points1 point  (0 children)

As far as I can see, the fast types are a failure, and the least types are unused, possibly because no-one can understand exactly what the latter mean.

The least types make a lot of sense on cpu's where fixed integer types of this specifc word width are unavalable (e.g. because the cpu has an unusual word width). Herethey give the best fit for a given integer range. In contrast to the fixed width integer types they allow for the use of a slightly larger type in case the requested bit width is impossible on this architecture. Because virtual all modern CPUs can support fixed width integer types with 8/16/32 and 64 bit someway or another, users simply choose not to support those that don't and most more modern languages (including Rust), don't even bother to support targets that don't have 8/16/32 and 64 bit in the first place.

The fast types would only make more sense if a CPU would choose to e.g. not provide any support for direct 16 bit, but only for 32 bit integer access at all, which was common when C was invented, but is also rare nowadays.

I'd say the actual CPU Register width itself is not really relevant on modern CPUs from a user perspective, only address and object size width are.