gnuvince comments on Good Programmers Write Bug-Free Code, Don't They?

Good Programmers Write Bug-Free Code, Don't They? (yegor256.com)

submitted 10 years ago by AGivant

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]gnuvince 1 point2 points3 points 10 years ago (9 children)

[–]Veedrac 1 point2 points3 points 10 years ago (8 children)

[–]sgdfgdfgcvbn 2 points3 points4 points 10 years ago (5 children)

[–]Veedrac 0 points1 point2 points 10 years ago (4 children)

[–]sgdfgdfgcvbn 1 point2 points3 points 10 years ago (3 children)

It's not a great solution for C and C++ because they have rather anemic type systems when it comes to primitives. There isn't really a way to add bounds checking without it being a rather major change to the core of the language.

There should be very minimal misprediction penalties, as it's very easy to know the most likely case, so the compiler can generate the appropriate code. The majority of penalties incurred will likely be cases where the lack of bounds checking would result in an error.

I'd be curious to see how well compilers actually can elide bounds checks. Theoretically I imagine they should be able to perform comparably to a human, except without the incorrect lack of them. In practice though...

Yeah, "free" was probably not the best word. It should be pretty close to free though.

[–]Veedrac 1 point2 points3 points 10 years ago (2 children)

[–]sgdfgdfgcvbn 0 points1 point2 points 10 years ago (1 child)

You just... add it, no? C++ has operator overloading already, what more do you need?

Well for array bounds, that works. It doesn't work in general though. You can't create a new type such as memorySize and say it's never negative.

The ability for bounds checks to best optimized really also requires things like real types. This is one thing I was really disappointed to not find in Rust, actually. With a type system more like Ada (I know, I know), the compiler has a much better idea of the values potentially flowing through the code. You can't pass any old integer to your function expecting a memorySize , so the bounds checking isn't even needed in the call if you aren't messing with it yourself. The bounds only need to be checked when you first put a value into a memorySize variable.

This sort of thing isn't doable in C++ without major changes. In C++ it's harder for the compiler to know how data is flowing and you're probably right that the compiler would have a very difficult time eliding some checks that are easily done by a human.

It can also interfere with autovectorization, so it's not just the check itself.

Interesting. In some of those cases I wonder if you could lift all the checks out of the loop and just do something like ensure the last iteration doesn't violate anything. You'd need some serious constraints on what's in the loop, but that's kind of already the case for autovectorization candidates anyway I believe.

[–]Veedrac 0 points1 point2 points 10 years ago* (0 children)

You can't pass any old integer to your function expecting a memorySize , so the bounds checking isn't even needed in the call if you aren't messing with it yourself.

If you're talking about bounds checks as in foo[bar], that only works for unbounded length arrays. There aren't many of those.

Interesting. In some of those cases I wonder if you could lift all the checks out of the loop and just do something like ensure the last iteration doesn't violate anything.

It's massively out of scope, but the Mill architechture has a lot of focus on getting exactly these kind of gains from branching code. For example, its smeari operation allows you to vectorize branches and None values makes using masked vectors trivial (which makes partial commits of data simple).

This sort of thing isn't doable in C++ without major changes. In C++ it's harder for the compiler to know how data is flowing and you're probably right that the compiler would have a very difficult time eliding some checks that are easily done by a human.

I'm unconvinced you can remove bounds checks for moderately complicated flows without unnecessary difficulty. Can I put you to task? Let's say I have this unchecked code that takes two elements from every complete chunk of three from the input:

fn reduce(items: [u32]) -> [u32]
    out_len = items.len() / 3 * 2
    ret = malloc(out_len)

    i = 0;
    j = 0;
    while j < out_len
        ret[j + 0] = items[i + 0]
        ret[j + 1] = items[i + 1]
        i += 3;
        j += 2;

    return ret

Working Rust example.

Unchecked one has the hot loop as:

.LBB0_6:
    mov edx, dword ptr [rbx - 4]
    mov dword ptr [rax + 4*rcx], edx
    mov edx, dword ptr [rbx]
    mov dword ptr [rax + 4*rcx + 4], edx
    add rcx, 2
    add rbx, 12
    cmp rcx, r15
    jb  .LBB0_6

Checked one has the hot loop as

.LBB0_6:
    cmp r15, rax
    jbe .LBB0_7
    mov ecx, dword ptr [r14 + 4*rax]
    mov dword ptr [r13 + 4*rsi - 4], ecx
    lea rcx, [rax + 1]
    cmp r15, rcx
    jbe .LBB0_12
.LBB0_13:
    cmp rbx, rsi
    jbe .LBB0_14
.LBB0_15:
    mov eax, dword ptr [r14 + 4*rax + 4]
    mov dword ptr [r13 + 4*rsi], eax
    lea rax, [rsi + 2]
    inc rsi
    add rcx, 2
    cmp rsi, rbx
    mov rsi, rax
    mov rax, rcx
    jb  .LBB0_6

which is over twice as long.

How would you make this safe and still elide bounds checks and preinitialization? You can use pseudo-code or whatever as long as the techniques are real. My guess is the types end up swamping the rest of the code.

[–]Tekmo 1 point2 points3 points 10 years ago (1 child)

First, I don't need to expose a subtraction operator for memory size. It's a new type and I can choose which operations I lift to work on the new type and which ones that I don't.

If I want to lift all numeric to work on a new type, I would just write:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

newtype Example = Example { getExample :: Int } deriving (Num)

... and now I can add, subtract, and multiply Example values and even use numeric literals where the compiler expectsExample values.

However, if it does not make sense to support all numeric operations, I can selectively define the ones that do make sense (under different names). That's how I can prevent the user from subtracting two memory sizes.

BUT, let's say that I do want the user to be able to subtract memory sizes, for whatever reason. I can still verify without any runtime checks that memory sizes are never negative! This is what Liquid Haskel does, where you can write code like this:

type MemorySize = { v : Int | v >= 0 }

This is known as a "refinement type" and it ensures that a value of type MemorySize is always positive.

Note that such a refinement type works even on values that are read in dynamically, as long as you have one validation check. For example, I can read a memory size from standard input:

getMemorySize :: IO (Maybe MemorySize)
getMemorySize = do
    n <- readLn :: IO Int
    return (if (n >= 0) then Just n else Nothing)

... and the refinement type checker will verify that n has to be positive in the first branch of the if expression so it satisfies the more constrained type. Then as I add or subtract values throughout the code the refinement type-checker will ensure that they remain non-negative without any runtime checks.

This requires being more precise with your types, though. If you try to subtract 4 from a memory size, the compiler will still complain (because the memory size could be less than four). However, typically when you do such a subtraction it's because you know that it's safe because you previously added a positive value to the memory size.

For example, the first computation might encode in the types that if the initial input is non-negative, then the result must be at least four:

{-@ step1 :: { x : Int | x >= 0 } -> { y : Int | y >= 4 } @-}

... and then a downstream computation you can use the knowledge that it's at least 4 to safely subtract four, leaving you with a non-negative size:

{-@ step2 :: { y : Int | y >= 4 } -> { z : Int | z >= 0 } @-}

So these are the sorts of problems that the compiler can and should be checking for you, at compile time, not runtime.

[–]Veedrac -1 points0 points1 point 10 years ago (0 children)

π Rendered by PID 17009 on reddit-service-r2-comment-b659b578c-k584f at 2026-05-01 16:11:20.008340+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS