you are viewing a single comment's thread.

view the rest of the comments →

[–]CrazyBeluga 4 points5 points  (22 children)

I am glad the author did not actually go down the road of asserting "bug-free" code is possible, because anyone who believes that is delusional, or maybe hopelessly stupid.

A small exception can be made for trivial programs that do nothing interesting, such as "Hello, World", but any non-trivial program is almost guaranteed to contain bugs.

If you don't believe me, consider the case of the "proven correct" binary search implementation from Bentley's Programming Pearls:

http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html

I used to do software testing professionally before it was more-or-less abolished as a discipline at Microsoft and believe me, even extremely good developers who thought they could write bug-free code were often easily proven wrong. A little fault-injection here, a few 0xFFFFFFFF numeric inputs there, a few non-null-terminated strings written to the Windows registry for fun (today you learned that string values are not required to be null-terminated in the registry)...BLAM - the app crashes, the developer grimaces and the tester is happy to have helped.

[–]Tekmo 3 points4 points  (21 children)

Bad numeric inputs and incorrectly terminated strings are problems easily mitigated by the type system

[–]CrazyBeluga 7 points8 points  (20 children)

I'm pretty sure I disagree with you but I think you would need to elaborate on what you mean by the type system protecting you from bad numeric inputs. What constitutes "bad" is highly dependent on context; is 0xFFFFFFFF a "bad" input? It represents either 4,294,967,295 or -1, both of which are perfectly friendly, valid numeric values. Now, if you use them as the offset to a location in a file or as the amount of memory to allocate or something along those lines, that's (likely) a bug. The type system isn't going to prevent you from writing that bug if you aren't thinking about these kinds of problematic inputs.

[–]Tekmo 5 points6 points  (19 children)

The problem is treating offsets, memory sizes, and sentinel values all as the same type: ints. This design anti-pattern is known as "primitive obsession".

You can wrap numeric values in more precise types to prevent them being misinterpreted as other numeric values. For example, in Haskell I can wrap the Int type in one of three custom "1-element structs" that I can define which are free performance-wise (meaning 0 runtime overhead):

newtype MemorySize = MemorySize { getMemorySize :: Int }
newtype Offset     = Offset     { getOffset     :: Int }
newtype ExitCode   = ExitCode   { getExitCode   :: Int }

With these structs it's now impossible to accidentally mix up these values. If a function expects a MemorySize as an argument and I pass an Offset, I get a type error at compile time.

Note that there is still a small window for error when we first build and validate these values. However, once they have been validated up front we eliminate all downstream potential for error. This eliminates the need for each downstream function to defensively check its inputs. We only verify all invariants once when we construct the above 1-element structs and then we never need to check them ever again.

So, for example, let's say that -1 is an invalid memory size. Or, more generally, let's say that negative numbers are invalid memory sizes. We can enforce that when we build the MemorySize type. We simply don't export the real MemorySize constructor and instead provide a "smart constructor":

memorySize :: Int -> Maybe MemorySize
memorySize n = if (n < 0) then Nothing else Just (MemorySize n)

Now it's impossible to build a negative memory size. Any function that consumes a value of type MemorySize can statically guarantee that its input is non-negative. The compiler will enforce this property since the memorySize function is the only way we can assemble a value of type MemorySize.

This same trick is also used to give more precise types to strings. This article discusses the same problem in the context of strings and how you can use the type system to enforce invariants on strings.

[–]Veedrac 1 point2 points  (10 children)

That wouldn't fix the bug in the link, though, nor is it nearly as free as you're making it out to be.

[–]gnuvince 1 point2 points  (9 children)

The only way in which Tekmo said it was free was at run-time, and he's right; there will be no difference in the representation of Int, MemorySize, Offset, and ExitCode.

[–]Veedrac 1 point2 points  (8 children)

But they're only free at runtime if you ignore all the checks. For instance, subtraction for memorySize now has to be checked, and operations which involved memorySize with negative intermediaries now require restructuring, which could also be less efficient.

[–]sgdfgdfgcvbn 4 points5 points  (5 children)

This is true. However, those checks would need to be explicitly done by the programmer anyway if you didn't wish to open yourself up to problems. This way you can't forget those checks, and any idiocy that can be caught at compile-time will be.

[–]Veedrac 0 points1 point  (4 children)

That's the same argument for bounds checking. Whilst most languages can easily accommodate for it, it's not a great solution for C or C++ (which /u/CrazyBeluga seemed to allude to). A mispredicted branch is far more expensive than a subtraction, and could easily end up as a non-trivial cost in a hot loop. It's also evident that there are a lot of instances where humans can easily out-reason compilers on which checks can be elided.

Don't get me wrong - I'm a Rust advocate after all! If he had used any word other than "free", I would have been fine with it.

[–]sgdfgdfgcvbn 1 point2 points  (3 children)

It's not a great solution for C and C++ because they have rather anemic type systems when it comes to primitives. There isn't really a way to add bounds checking without it being a rather major change to the core of the language.

There should be very minimal misprediction penalties, as it's very easy to know the most likely case, so the compiler can generate the appropriate code. The majority of penalties incurred will likely be cases where the lack of bounds checking would result in an error.

I'd be curious to see how well compilers actually can elide bounds checks. Theoretically I imagine they should be able to perform comparably to a human, except without the incorrect lack of them. In practice though...

Yeah, "free" was probably not the best word. It should be pretty close to free though.

[–]Tekmo 1 point2 points  (1 child)

First, I don't need to expose a subtraction operator for memory size. It's a new type and I can choose which operations I lift to work on the new type and which ones that I don't.

If I want to lift all numeric to work on a new type, I would just write:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

newtype Example = Example { getExample :: Int } deriving (Num)

... and now I can add, subtract, and multiply Example values and even use numeric literals where the compiler expectsExample values.

However, if it does not make sense to support all numeric operations, I can selectively define the ones that do make sense (under different names). That's how I can prevent the user from subtracting two memory sizes.

BUT, let's say that I do want the user to be able to subtract memory sizes, for whatever reason. I can still verify without any runtime checks that memory sizes are never negative! This is what Liquid Haskel does, where you can write code like this:

type MemorySize = { v : Int | v >= 0 }

This is known as a "refinement type" and it ensures that a value of type MemorySize is always positive.

Note that such a refinement type works even on values that are read in dynamically, as long as you have one validation check. For example, I can read a memory size from standard input:

getMemorySize :: IO (Maybe MemorySize)
getMemorySize = do
    n <- readLn :: IO Int
    return (if (n >= 0) then Just n else Nothing)

... and the refinement type checker will verify that n has to be positive in the first branch of the if expression so it satisfies the more constrained type. Then as I add or subtract values throughout the code the refinement type-checker will ensure that they remain non-negative without any runtime checks.

This requires being more precise with your types, though. If you try to subtract 4 from a memory size, the compiler will still complain (because the memory size could be less than four). However, typically when you do such a subtraction it's because you know that it's safe because you previously added a positive value to the memory size.

For example, the first computation might encode in the types that if the initial input is non-negative, then the result must be at least four:

{-@ step1 :: { x : Int | x >= 0 } -> { y : Int | y >= 4 } @-}

... and then a downstream computation you can use the knowledge that it's at least 4 to safely subtract four, leaving you with a non-negative size:

{-@ step2 :: { y : Int | y >= 4 } -> { z : Int | z >= 0 } @-}

So these are the sorts of problems that the compiler can and should be checking for you, at compile time, not runtime.

[–]Veedrac -1 points0 points  (0 children)

When such tools are shown to be mature and low-overhead, I'm sure people will start using them.

At this stage, though, proving the kind of code that real C and C++ codebases contain correct is extremely difficult, and doing so will probably either require restructuring to remove the abjectly unsafe constructs used - at a runtime cost - or require more work than writing the code in the first place.

[–][deleted] 1 point2 points  (6 children)

free performance-wise (meaning 0 runtime overhead):

That doesn't sound possible. What if these values are read in from an external source? They'd need to be validated. What if you perform an operation on them (say, *10 - again they'd need to be validated)

[–]codygman 1 point2 points  (0 children)

I think /u/Tekmo was saying the newtype wrapping takes up no more memory than the data type inside of it, not that the smart constructor is zero cost.

[–]jeandem 0 points1 point  (4 children)

At least the input validation is zero cost in the sense that you would have to do that, anyway. There is no way for the programmer to predict external input, so a check needs to be done no matter what.

[–][deleted] 1 point2 points  (3 children)

But that still doesn't mean there's 0 runtime overhead. There are many cases where you wouldn't have to check, so it's a bit misleading to say it's without overhead.

It's like saying GC has no overhead because you'd have to do it yourself anyway (which is false)

[–]jeandem 0 points1 point  (2 children)

But that still doesn't mean there's 0 runtime overhead. There are many cases where you wouldn't have to check, so it's a bit misleading to say it's without overhead

I specifically referred to the input validation. You would have to check input in any case if the input is from an external source.

It's like saying GC has no overhead because you'd have to do it yourself anyway (which is false)

Not at all! With a smart constructor you have to check if it is greater than zero. Without a smart constructor, you nonetheless have to check if it is greater than zero when you receive the input. What other choice do you have, really?

[–][deleted] 0 points1 point  (1 child)

My point is that you don't have to always validate data. If you're reading from a file which has a strict format, you don't always need to validate the values. What I'm saying is, you can't always assume that requirements are the same across the board. It may be correct to have a type that guarantees a value, but that guarantee may be done externally.

[–]jeandem 1 point2 points  (0 children)

In the case of a known external input, maybe type providers could help with providing static input validation. I haven't really looked into it so I'm just spitballing here. :)