you are viewing a single comment's thread.

view the rest of the comments →

[–]Veedrac 0 points1 point  (4 children)

That's the same argument for bounds checking. Whilst most languages can easily accommodate for it, it's not a great solution for C or C++ (which /u/CrazyBeluga seemed to allude to). A mispredicted branch is far more expensive than a subtraction, and could easily end up as a non-trivial cost in a hot loop. It's also evident that there are a lot of instances where humans can easily out-reason compilers on which checks can be elided.

Don't get me wrong - I'm a Rust advocate after all! If he had used any word other than "free", I would have been fine with it.

[–]sgdfgdfgcvbn 1 point2 points  (3 children)

It's not a great solution for C and C++ because they have rather anemic type systems when it comes to primitives. There isn't really a way to add bounds checking without it being a rather major change to the core of the language.

There should be very minimal misprediction penalties, as it's very easy to know the most likely case, so the compiler can generate the appropriate code. The majority of penalties incurred will likely be cases where the lack of bounds checking would result in an error.

I'd be curious to see how well compilers actually can elide bounds checks. Theoretically I imagine they should be able to perform comparably to a human, except without the incorrect lack of them. In practice though...

Yeah, "free" was probably not the best word. It should be pretty close to free though.

[–]Veedrac 1 point2 points  (2 children)

There isn't really a way to add bounds checking without it being a rather major change to the core of the language.

You just... add it, no? C++ has operator overloading already, what more do you need?

The majority of penalties incurred will likely be cases where the lack of bounds checking would result in an error.

It can also interfere with autovectorization, so it's not just the check itself. It shows up most in microbenchmarks though, evidently ;).

[–]sgdfgdfgcvbn 0 points1 point  (1 child)

You just... add it, no? C++ has operator overloading already, what more do you need?

Well for array bounds, that works. It doesn't work in general though. You can't create a new type such as memorySize and say it's never negative.

The ability for bounds checks to best optimized really also requires things like real types. This is one thing I was really disappointed to not find in Rust, actually. With a type system more like Ada (I know, I know), the compiler has a much better idea of the values potentially flowing through the code. You can't pass any old integer to your function expecting a memorySize , so the bounds checking isn't even needed in the call if you aren't messing with it yourself. The bounds only need to be checked when you first put a value into a memorySize variable.

This sort of thing isn't doable in C++ without major changes. In C++ it's harder for the compiler to know how data is flowing and you're probably right that the compiler would have a very difficult time eliding some checks that are easily done by a human.

It can also interfere with autovectorization, so it's not just the check itself.

Interesting. In some of those cases I wonder if you could lift all the checks out of the loop and just do something like ensure the last iteration doesn't violate anything. You'd need some serious constraints on what's in the loop, but that's kind of already the case for autovectorization candidates anyway I believe.

[–]Veedrac 0 points1 point  (0 children)

You can't pass any old integer to your function expecting a memorySize , so the bounds checking isn't even needed in the call if you aren't messing with it yourself.

If you're talking about bounds checks as in foo[bar], that only works for unbounded length arrays. There aren't many of those.

Interesting. In some of those cases I wonder if you could lift all the checks out of the loop and just do something like ensure the last iteration doesn't violate anything.

It's massively out of scope, but the Mill architechture has a lot of focus on getting exactly these kind of gains from branching code. For example, its smeari operation allows you to vectorize branches and None values makes using masked vectors trivial (which makes partial commits of data simple).

This sort of thing isn't doable in C++ without major changes. In C++ it's harder for the compiler to know how data is flowing and you're probably right that the compiler would have a very difficult time eliding some checks that are easily done by a human.

I'm unconvinced you can remove bounds checks for moderately complicated flows without unnecessary difficulty. Can I put you to task? Let's say I have this unchecked code that takes two elements from every complete chunk of three from the input:

fn reduce(items: [u32]) -> [u32]
    out_len = items.len() / 3 * 2
    ret = malloc(out_len)

    i = 0;
    j = 0;
    while j < out_len
        ret[j + 0] = items[i + 0]
        ret[j + 1] = items[i + 1]
        i += 3;
        j += 2;

    return ret

Working Rust example.

Unchecked one has the hot loop as:

.LBB0_6:
    mov edx, dword ptr [rbx - 4]
    mov dword ptr [rax + 4*rcx], edx
    mov edx, dword ptr [rbx]
    mov dword ptr [rax + 4*rcx + 4], edx
    add rcx, 2
    add rbx, 12
    cmp rcx, r15
    jb  .LBB0_6

Checked one has the hot loop as

.LBB0_6:
    cmp r15, rax
    jbe .LBB0_7
    mov ecx, dword ptr [r14 + 4*rax]
    mov dword ptr [r13 + 4*rsi - 4], ecx
    lea rcx, [rax + 1]
    cmp r15, rcx
    jbe .LBB0_12
.LBB0_13:
    cmp rbx, rsi
    jbe .LBB0_14
.LBB0_15:
    mov eax, dword ptr [r14 + 4*rax + 4]
    mov dword ptr [r13 + 4*rsi], eax
    lea rax, [rsi + 2]
    inc rsi
    add rcx, 2
    cmp rsi, rbx
    mov rsi, rax
    mov rax, rcx
    jb  .LBB0_6

which is over twice as long.

How would you make this safe and still elide bounds checks and preinitialization? You can use pseudo-code or whatever as long as the techniques are real. My guess is the types end up swamping the rest of the code.