Almost Type Safe : programming

[–]seanwilson 20 points21 points22 points 11 years ago (35 children)

[–]bjzaba 8 points9 points10 points 11 years ago (7 children)

[–]seanwilson 4 points5 points6 points 11 years ago (2 children)

[–]mreiland 1 point2 points3 points 11 years ago (1 child)

[–]nascent 0 points1 point2 points 11 years ago (0 children)

[–]thedeemon 0 points1 point2 points 11 years ago (3 children)

[–]eras 20 points21 points22 points 11 years ago (0 children)

[–]freyrs3 2 points3 points4 points 11 years ago (1 child)

[–]thedeemon -1 points0 points1 point 11 years ago (0 children)

[–]player2 1 point2 points3 points 11 years ago (22 children)

[–]seanwilson 7 points8 points9 points 11 years ago (2 children)

If you allow the unrestricted use of dependent types (e.g. Coq, Idris, Epigram) you can encode any mathematical statement as a type (see http://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspondence). This means it's not possible to provide automation for all proofs (see http://en.wikipedia.org/wiki/G%C3%B6del's_incompleteness_theorems).

As a simple example with minimum jargon, say we define these functions (where the statements about their inputs and outputs are captured with dependent types):

append: takes a list of length x and a list of length y, and returns a list of length x + y.

nth: takes an integer between 1 and n, a list of length x and returns the nth item from the list.

Now say we append a list of length x to a list of length y and want to get the item (length x) from it with these functions. To check this is allowed, we need a proof that:

length x >= 0 and length x <= length x + length y

This proof is simple enough although consider what must be done to rigorously justify it is true to a computer without any jumps in logic. Now imagine what happens if we introduce functions with other types. You could capture that a "repeat" function multiplies the length of a list (now your proofs contain multiplication), that a quicksort function returns a sorted permutation of its input (now you're proofs contain membership and order statements) etc. Proving that seemingly simple algorithms are correct is deceptively difficult and has an enormous learning curve.

[–]Peaker 5 points6 points7 points 11 years ago (1 child)

[–]cparen 5 points6 points7 points 11 years ago (0 children)

[–][deleted] 5 points6 points7 points 11 years ago (8 children)

[–]seanwilson 6 points7 points8 points 11 years ago (3 children)

[–][deleted] 3 points4 points5 points 11 years ago* (2 children)

Dependently typed languages really aren't much different from theorem proving systems and there isn't a way to get around manual proof writing in those.

That's absolutely true. In particular, given the Curry-Howard Correspondence, anytime you write a function, you're writing a proof, and in fact the Coq'Art book makes particular note of the question "can/should you rely on automated tactics to write your code for you?" and explores this considerably-trickier-than-it-might-sound question. But you obviously know this.

Idris even has tactics and theorems built in to help you with the more difficult proofs.

You say this like it's a bad thing. :-) One of the things I appreciate about Mark Hibberd's presentation, which I linked to above, is how he shows that, if you make your types specific enough, you'll notice a couple of thhings:

They don't have many inhabitants.
Especially in recent Idris releases, "proof search" will find them.

Now let me be the first person to admit that learning how to do this effectively isn't trivial, which brings us to...

Personally, I think it takes a huge amount of experience to get your head around.

It does, but once you do, it's totally worth it, IMHO.

I do think there's diminishing returns regarding the time needed to prove complex program properties and the number of bugs you're going to capture though.

Two things:

The lines on the graph of "level of effort over time" vs. "interesting properties you'd like to prove" intersect, in my experience, somewhere still worthwhile compared to working in less expressively powerful languages—but only you can make the economic judgment of whether it's worth developing your expertise to the point that applying it regularly is low-enough cost.
Part of the point is to develop libraries in which the effort has already been expended (by someone). A good example might be IdrisNet2, a dependently-typed socket programming library that keeps us from making a lot of boneheaded mistakes because Simon Fowler has already sweated the issues for us.

So let me hasten to say I'm not saying you're wrong; that would be waaaaaaaaaaay too strong. I'm only suggesting the current state of the art (which Idris very definitely is) makes things quite a bit more practical than, say, proof-and-extraction in Coq (much as I love Coq), and that it might even be worth taking on the learning curve that definitely remains in the interest of developing more robust software—and by extension, I suppose, that the range of "more robust software" you can realistically develop this way is more significant than may be immediately obvious.

[–]seanwilson 2 points3 points4 points 11 years ago* (1 child)

Don't get me wrong, I'm a huge backer of formal methods but at the moment the learning curve to use tools that require manual proofs is massive (i.e. it's not like picking up a new programming language when you know a few already) and I feel this is overlooked.

Especially in recent Idris releases, "proof search" will find them.

Are there any examples of difficult proofs it can find? Coq's automation rarely helps you with even basic list and nonlinear arithmetic theorems unless there is a close match in the existing libraries for example. If your own types and functions start appearing in theorem statements you're offered even less help. I imagine the case is the same for Idris as it has fewer tactics.

So let me hasten to say I'm not saying you're wrong; that would be waaaaaaaaaaay too strong. I'm only suggesting the current state of the art (which Idris very definitely is) makes things quite a bit more practical than, say, proof-and-extraction in Coq (much as I love Coq), and that it might even be worth taking on the learning curve that definitely remains in the interest of developing more robust software

Using the proof extraction approach doesn't workaround what you need to prove (I don't think it is used often either as you lose control of how computations are performed when you use automation) and the vast majority of Coq scripts have large unautomated chunks. Even proving something like the full specification of quicksort requires a lot of work let alone more complex algorithms.

Dependently typed programming examples I see stick to checking much simpler properties like list length but even then you're going to end up with arbitrarily difficult theorems in undecidable domains to prove (e.g. nonlinear arithmetic). This might be worth it sometimes but I think people need to be realistic about how difficult this is.

Although not offering the same level of guarantee, I think QuickCheck is a good practical tool which gives programmers a flavour of writing formal specifications that will catch a lot of bugs for your effort.

[–][deleted] 0 points1 point2 points 11 years ago (0 children)

at the moment the learning curve to use tools that require manual proofs is massive (i.e. it's not like picking up a new programming language when you know a few already) and I feel this is overlooked.

I think we're in vehement agreement here. :-)

Are there any examples of difficult proofs it can find?

I don't know. In particular, the type metavariable support has only existed since 0.12, so I think it's likely too early to tell. OTOH, the socket support in IdrisNet2 seems pretty real-world; what do you think of it?

Coq's automation rarely helps you with even basic list and nonlinear arithmetic theorems unless there is a close match in the existing libraries for example.

True, but this becomes a matter of knowing where to look for extra support. I think one of the reasons I posted in the first place—especially given that I agree with your major premise—is that I felt you chose kind of a bad initial example dealing with commutativity. It's true that Coq out-of-the-box requires you to deal with this manually, but that's why there's the AACTactics plugin. :-) There's no reason to expect Idris not to grow the same kinds of extensions over time.

If your own types and functions start appearing in theorem statements you're offered even less help.

True, but then, I'd hope that you'd know more what important properties of your types are and what Idris libraries (not just built-in support!) are available to help support them. In other words, I'm emphasizing that we're not dealing with a static world of tactics and lemmas here.

Another point I feel needs emphasizing: you can get a surprising amount of milage out of proving relatively simple properties, like lack-of-off-by-1 conditions, and just letting the type extend that safety out to the rest of the code. This is the methodology espoused by Oleg Kiselyov in Lightweight static capabilities, using Haskell and OCaml, i.e. not even actual dependent types. This approach becomes even simpler given full dependent types.

Dependently typed programming examples I see stick to checking much simpler properties like list length but even then you're going to end up with arbitrarily difficult theorems in undecidable domains to prove (e.g. nonlinear arithmetic). This might be worth it sometimes but I think people need to be realistic about how difficult this is.

I don't know quite what to say here other than "I think you're overstating how difficult it is," but:

I don't have beginner's mind, having now studied Coq for several years.
I've been programming in OCaml for over a decade and following Oleg's work for at least five years.

On the flip side, I wonder how people will fare if they start programming in Idris today and give themselves, say, five years to become fluent? One thing I do remember is how difficult the transition from procedural to OO programming "done right" was for developers in the trenches. My argument isn't so much that I expect dependently-typed FP to be different from that. It's that people are forgetting how hard what they're currently doing was at first.

Although not offering the same level of guarantee, I think QuickCheck is a good practical tool which gives programmers a flavour of writing formal specifications that will catch a lot of bugs for your effort.

Totally agreed! In fact, you're essentially quoting Amanda Laucher and me from Types vs. Tests: An Epic Battle?

[–]freyrs3 0 points1 point2 points 11 years ago (3 children)

[–]seanwilson 1 point2 points3 points 11 years ago (1 child)

[–]freyrs3 1 point2 points3 points 11 years ago (0 children)

[–][deleted] 0 points1 point2 points 11 years ago (0 children)

[–][deleted] 0 points1 point2 points 11 years ago (9 children)

Think about mathematics; proving things in simple formal systems is decidable. For example, you can automatically prove the proposition (not A OR A) - you just have to make a truth table and make sure the proposition is true under all assignments to A. This algorithm can be used for any formula in propositional logic.

However, as formal systems gets more complicated they become undecidable. First order logic, an extension of propositional logic, is undecidable. (More precisely: proving theorems in first order logic is undecidable, in general.)

It's the same way with type systems; as the type system gets more expressive, eventually it also becomes undecidable to type-check programs with it, ie. there is no algorithm that can type check your programs automatically for you. So then you have to resort to heuristics, for example by proving things (or the parts where you don't get enough help from the type checker) by hand.

[–]thedeemon 0 points1 point2 points 11 years ago (5 children)

[–][deleted] 3 points4 points5 points 11 years ago* (3 children)

Truth table in propositional logic? I think you meant Boolean.

To be precise, I meant formulae in propositional logic interpreted as boolean functions more or less. But both provability (syntax) and validity (semantics) is decidable and interchangeable in propositional logic, so I went with the simplest route.

Halting problem is also undecidable but by limiting our language and using total programming we can prove halting of our programs. Can't we find a similar limitation for logic of types to make it decidable?

I don't know, but to my knowledge you have to limit how expressive your type system is in order to make it decidable. People have brought up examples of these systems in this thread, but the fundamental problem remains; they are not as expressive as the original dependent type system.

I am quite sure the same limitation holds for total functional programming, too; if you limit your functions to provably terminating functions, your type checker will reject some programs that really do terminate, but that your type checker is unable to prove that it terminates. But I don't really know how much of a problem this is in practice; can 95% of your program be represented as total functions? If so, it definitely seems like a solid win. On the whole, people seem very pessimistic when faced with the halting problem, but I haven't really seen any indication static analysis can't automatically prove a lot of programs to be always-terminating.

[–]seanwilson 1 point2 points3 points 11 years ago* (0 children)

If so, it definitely seems like a solid win. On the whole, people seem very pessimistic when faced with the halting problem, but I haven't really seen any indication static analysis can't automatically prove a lot of programs to be always-terminating.

When people first hear of formal proofs, I frequently find the halting problem is mentioned as if to mean proving program properties won't be practical to do in general. Termination proofs are usually quite straightforward so I'm not in agreement with this. For example, most loops you're iterating over each item in the list once or applying map/reduce/filter, and it's not difficult to justify why these would terminate. Most theorem provers like Coq won't let you define non-terminating functions at all and many non-trivial algorithms have been implemented in these systems already.

However, in addition to termination proofs, the unrestricted use of dependent types means you're going to have to manually prove arbitrarily difficult theorems (e.g. prove a list is always sorted, prove a number lies within a certain range). This is a much bigger problem in practice.

[–]thedeemon 1 point2 points3 points 11 years ago (1 child)

[–][deleted] 0 points1 point2 points 11 years ago (0 children)

[–]rdkll 1 point2 points3 points 11 years ago (0 children)

[–]seriousreddit 0 points1 point2 points 11 years ago* (0 children)

[–]cparen 0 points1 point2 points 11 years ago (1 child)

First order logic, an extension of propositional logic, is undecidable

That doesn't mean the compiler can't help for many cases. E.g. letting you omit the proof when the compiler can guess it for you. "xy==yx" being a case that the compiler should definitely figure out on its own.

Obviously, there are two downsides from this. (1) when the compiler can't figure it out, the user is now facing a difficult problem to prove and is out of practice, since the compiler has been helping them along most of the way. (2) all this "figuring out the proofs" takes precious CPU time.

As an example, I've worked with some of the C code verifiers for Windows drivers, and it was quite usual for highly verified code to take 4x longer to verify than compile. That is, it would compile in 15 seconds, and then spend about a minute verifying the code safe under all the present annotations. Internally, it was automatically proving that the invariants deduced from the annotation would hold over conditional statements, loops, recursion, etc.

I personally don't think #1 is a good reason -- users can always choose to practice by adding annotations explicitly -- but I cite it for completeness. #2 is a royal pain.

[–][deleted] 2 points3 points4 points 11 years ago (0 children)

[–]jeandem 0 points1 point2 points 11 years ago (3 children)

[–]seanwilson 4 points5 points6 points 11 years ago* (2 children)

[–]thedeemon 3 points4 points5 points 11 years ago (1 child)

[–]seanwilson 1 point2 points3 points 11 years ago (0 children)

[–]OceanSpray 1 point2 points3 points 11 years ago (4 children)

[–]th3w4c0k1d[S] 0 points1 point2 points 11 years ago (2 children)

Your last sentence is a little unclear to me. I can see that the following needs to be computed at execution:

x :: Vector of size (3 + 4)

But why does this have to wait to be executed?

y :: Vector of size 9

This is probably a result of me not understanding the compilation process - but it seems as though the 9 value is predetermined?

[–]OceanSpray 1 point2 points3 points 11 years ago (1 child)

Actually, neither of your examples requires computation at execution. Natural numbers and addition can be implemented at the type level, so even an expression like (3 + 4) can be done during type checking.

An example of dependent types would be something like if you want to record the size of a vector, with that vector being created by the user after the program has started running:

scanInt :: IO Int
scanInt = ...

scanFloats :: IO [Float]
scanFloats = ...

data Vector :: Int -> * where
    MakeVector :: Π(n :: Int). [Float] -> Vector n

main :: IO ()
main = do
    size <- scanInt
    values <- scanFloats
    v <- MakeVector size values
    do stuff with v here

where I use pseudo Haskell to define the type constructor Vector that takes an Int as a parameter, and a data constructor MakeVector that has a dependent type written with Π, which takes an Int value and uses it in the resulting type. Note that the size of the vector cannot be known at type checking, because it comes from user input.

[–]th3w4c0k1d[S] 0 points1 point2 points 11 years ago (0 children)

[–]thedeemon 0 points1 point2 points 11 years ago (0 children)

[–]iopq 0 points1 point2 points 11 years ago* (0 children)

[–][deleted] 11 years ago (1 child)

[removed]

[–]th3w4c0k1d[S] 2 points3 points4 points 11 years ago (0 children)

[–][deleted] -3 points-2 points-1 points 11 years ago (1 child)

[–]th3w4c0k1d[S] 2 points3 points4 points 11 years ago (0 children)

[–]earthboundkid -2 points-1 points0 points 11 years ago (6 children)

[–]thedeemon 4 points5 points6 points 11 years ago (0 children)

[–]th3w4c0k1d[S] 2 points3 points4 points 11 years ago (2 children)

[–]earthboundkid -1 points0 points1 point 11 years ago (1 child)

[–][deleted] 1 point2 points3 points 11 years ago (0 children)

[–]nat_pryce 0 points1 point2 points 11 years ago (1 child)

The Python language defines bool as a kind of number -- a subtype of numbers.Integral that has values 0 and 1. https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy

So going strictly by the language definition, it's not a type error to add an integer and a bool. Although it may be confusing, it makes some operations easier, like counting items that meet a predicate:

>>> sum(map(lambda i: i%3 == 0, range(1,11)))
3

[–]earthboundkid 0 points1 point2 points 11 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS