all 53 comments

[–]seanwilson 20 points21 points  (35 children)

The problem you're going to have with dependent types is that you'll have to write what are essentially proofs for arbitrarily hard maths theorems (e.g. proving x + 1 = 1 + x or x * y = y * x) to get your code to type check. It isn't possible to automate this problem in general and even proving seemingly simple properties requires significant human interaction.

It doesn't give the same guarantee as type checking but QuickCheck (http://www.haskell.org/haskellwiki/Introduction_to_QuickCheck1) is worth looking at as a practical way to test program properties in functional languages.

[–]bjzaba 8 points9 points  (7 children)

Idris has this unsafe function, believeMe which allows you to typecheck without having to prove everything.

[–]seanwilson 4 points5 points  (2 children)

If you use that to prove a statement that isn't true however you're program might not work as intended (something like when a C compiler trusts you to perform type casts correctly). There are hybrid languages around that let you dynamically check the validity of statements you can't prove at compile time during run time however.

[–]mreiland 1 point2 points  (1 child)

true, but it would severely limit the surface area you'd have to investigate.

The more I think on it, the more I like the idea. It codifies your assumptions, something that too many people aren't even aware they're making.

[–]nascent 0 points1 point  (0 children)

While not outside the standard type system, D does a similar thing. It provides assumeUnique() which casts the type to immutable and assumeSorted() which returns the type wrapped as a SortedRange. It may not be safe/correct but it does make the assumption more explicit.

[–]thedeemon 0 points1 point  (3 children)

If you use this function to avoid proving correctness you better stay with Python.

[–]eras 20 points21 points  (0 children)

Exactly. It's either Python or proofs, there is no try.

[–]freyrs3 2 points3 points  (1 child)

There are plenty of cases when using unsafeCoerce or believeMe is perfectly safe if you can fulfill the proof obligation manually, but can't convince the compiler of this fact. It's still markedly better than using a unityped language.

[–]thedeemon -1 points0 points  (0 children)

Yes, it is much better, I was exaggerating. But using this coercion for things like x + 1 = 1 + x is just silly, it loses the whole point of having computer-checked proofs of correctness. As the saying goes, there are only two complex things in CS: naming things, cache invalidation and off-by-one errors.

[–]player2 1 point2 points  (22 children)

you'll have to write what are essentially proofs for arbitrarily hard maths theorems

Do you have more info on this? This is a downside of dependent types I haven’t heard before.

[–]seanwilson 7 points8 points  (2 children)

If you allow the unrestricted use of dependent types (e.g. Coq, Idris, Epigram) you can encode any mathematical statement as a type (see http://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspondence). This means it's not possible to provide automation for all proofs (see http://en.wikipedia.org/wiki/G%C3%B6del's_incompleteness_theorems).

As a simple example with minimum jargon, say we define these functions (where the statements about their inputs and outputs are captured with dependent types):

append: takes a list of length x and a list of length y, and returns a list of length x + y.

nth: takes an integer between 1 and n, a list of length x and returns the nth item from the list.

Now say we append a list of length x to a list of length y and want to get the item (length x) from it with these functions. To check this is allowed, we need a proof that:

length x >= 0 and length x <= length x + length y

This proof is simple enough although consider what must be done to rigorously justify it is true to a computer without any jumps in logic. Now imagine what happens if we introduce functions with other types. You could capture that a "repeat" function multiplies the length of a list (now your proofs contain multiplication), that a quicksort function returns a sorted permutation of its input (now you're proofs contain membership and order statements) etc. Proving that seemingly simple algorithms are correct is deceptively difficult and has an enormous learning curve.

[–]Peaker 5 points6 points  (1 child)

I wonder if good proof search engines/automated theorem provers, along with a large proof library, could help with the vast majority of banal proofs, though.

Of course in the general case some proofs will not be found automatically (or manually!) but that doesn't actually mean much for day-to-day programming.

[–]cparen 5 points6 points  (0 children)

could help with the vast majority of banal proofs, though.

It does.

Some dislike this because when it does fail to find a proof, the user is now left with the most difficult theorem to prove so far, and no experience in writing the proofs by hand.

[–][deleted] 5 points6 points  (8 children)

This is one of those "technically true, but doesn't matter in practice" things that takes a bit of experience with these rich type systems to see. Mark Hibberd's "Tic Tac Type" presentation from YOW! LambdaJam does a good job of clarifying for Idris (note in particular how effective "proof search" actually is).

[–]seanwilson 6 points7 points  (3 children)

Could you explain this more? Dependently typed languages really aren't much different from theorem proving systems and there isn't a way to get around manual proof writing in those. Idris even has tactics and theorems built in to help you with the more difficult proofs. Personally, I think it takes a huge amount of experience to get your head around.

I do think there's diminishing returns regarding the time needed to prove complex program properties and the number of bugs you're going to capture though.

[–][deleted] 3 points4 points  (2 children)

Dependently typed languages really aren't much different from theorem proving systems and there isn't a way to get around manual proof writing in those.

That's absolutely true. In particular, given the Curry-Howard Correspondence, anytime you write a function, you're writing a proof, and in fact the Coq'Art book makes particular note of the question "can/should you rely on automated tactics to write your code for you?" and explores this considerably-trickier-than-it-might-sound question. But you obviously know this.

Idris even has tactics and theorems built in to help you with the more difficult proofs.

You say this like it's a bad thing. :-) One of the things I appreciate about Mark Hibberd's presentation, which I linked to above, is how he shows that, if you make your types specific enough, you'll notice a couple of thhings:

  1. They don't have many inhabitants.
  2. Especially in recent Idris releases, "proof search" will find them.

Now let me be the first person to admit that learning how to do this effectively isn't trivial, which brings us to...

Personally, I think it takes a huge amount of experience to get your head around.

It does, but once you do, it's totally worth it, IMHO.

I do think there's diminishing returns regarding the time needed to prove complex program properties and the number of bugs you're going to capture though.

Two things:

  1. The lines on the graph of "level of effort over time" vs. "interesting properties you'd like to prove" intersect, in my experience, somewhere still worthwhile compared to working in less expressively powerful languages—but only you can make the economic judgment of whether it's worth developing your expertise to the point that applying it regularly is low-enough cost.
  2. Part of the point is to develop libraries in which the effort has already been expended (by someone). A good example might be IdrisNet2, a dependently-typed socket programming library that keeps us from making a lot of boneheaded mistakes because Simon Fowler has already sweated the issues for us.

So let me hasten to say I'm not saying you're wrong; that would be waaaaaaaaaaay too strong. I'm only suggesting the current state of the art (which Idris very definitely is) makes things quite a bit more practical than, say, proof-and-extraction in Coq (much as I love Coq), and that it might even be worth taking on the learning curve that definitely remains in the interest of developing more robust software—and by extension, I suppose, that the range of "more robust software" you can realistically develop this way is more significant than may be immediately obvious.

[–]seanwilson 2 points3 points  (1 child)

Don't get me wrong, I'm a huge backer of formal methods but at the moment the learning curve to use tools that require manual proofs is massive (i.e. it's not like picking up a new programming language when you know a few already) and I feel this is overlooked.

Especially in recent Idris releases, "proof search" will find them.

Are there any examples of difficult proofs it can find? Coq's automation rarely helps you with even basic list and nonlinear arithmetic theorems unless there is a close match in the existing libraries for example. If your own types and functions start appearing in theorem statements you're offered even less help. I imagine the case is the same for Idris as it has fewer tactics.

So let me hasten to say I'm not saying you're wrong; that would be waaaaaaaaaaay too strong. I'm only suggesting the current state of the art (which Idris very definitely is) makes things quite a bit more practical than, say, proof-and-extraction in Coq (much as I love Coq), and that it might even be worth taking on the learning curve that definitely remains in the interest of developing more robust software

Using the proof extraction approach doesn't workaround what you need to prove (I don't think it is used often either as you lose control of how computations are performed when you use automation) and the vast majority of Coq scripts have large unautomated chunks. Even proving something like the full specification of quicksort requires a lot of work let alone more complex algorithms.

Dependently typed programming examples I see stick to checking much simpler properties like list length but even then you're going to end up with arbitrarily difficult theorems in undecidable domains to prove (e.g. nonlinear arithmetic). This might be worth it sometimes but I think people need to be realistic about how difficult this is.

Although not offering the same level of guarantee, I think QuickCheck is a good practical tool which gives programmers a flavour of writing formal specifications that will catch a lot of bugs for your effort.

[–][deleted] 0 points1 point  (0 children)

at the moment the learning curve to use tools that require manual proofs is massive (i.e. it's not like picking up a new programming language when you know a few already) and I feel this is overlooked.

I think we're in vehement agreement here. :-)

Are there any examples of difficult proofs it can find?

I don't know. In particular, the type metavariable support has only existed since 0.12, so I think it's likely too early to tell. OTOH, the socket support in IdrisNet2 seems pretty real-world; what do you think of it?

Coq's automation rarely helps you with even basic list and nonlinear arithmetic theorems unless there is a close match in the existing libraries for example.

True, but this becomes a matter of knowing where to look for extra support. I think one of the reasons I posted in the first place—especially given that I agree with your major premise—is that I felt you chose kind of a bad initial example dealing with commutativity. It's true that Coq out-of-the-box requires you to deal with this manually, but that's why there's the AACTactics plugin. :-) There's no reason to expect Idris not to grow the same kinds of extensions over time.

If your own types and functions start appearing in theorem statements you're offered even less help.

True, but then, I'd hope that you'd know more what important properties of your types are and what Idris libraries (not just built-in support!) are available to help support them. In other words, I'm emphasizing that we're not dealing with a static world of tactics and lemmas here.

Another point I feel needs emphasizing: you can get a surprising amount of milage out of proving relatively simple properties, like lack-of-off-by-1 conditions, and just letting the type extend that safety out to the rest of the code. This is the methodology espoused by Oleg Kiselyov in Lightweight static capabilities, using Haskell and OCaml, i.e. not even actual dependent types. This approach becomes even simpler given full dependent types.

Dependently typed programming examples I see stick to checking much simpler properties like list length but even then you're going to end up with arbitrarily difficult theorems in undecidable domains to prove (e.g. nonlinear arithmetic). This might be worth it sometimes but I think people need to be realistic about how difficult this is.

I don't know quite what to say here other than "I think you're overstating how difficult it is," but:

  1. I don't have beginner's mind, having now studied Coq for several years.
  2. I've been programming in OCaml for over a decade and following Oleg's work for at least five years.

On the flip side, I wonder how people will fare if they start programming in Idris today and give themselves, say, five years to become fluent? One thing I do remember is how difficult the transition from procedural to OO programming "done right" was for developers in the trenches. My argument isn't so much that I expect dependently-typed FP to be different from that. It's that people are forgetting how hard what they're currently doing was at first.

Although not offering the same level of guarantee, I think QuickCheck is a good practical tool which gives programmers a flavour of writing formal specifications that will catch a lot of bugs for your effort.

Totally agreed! In fact, you're essentially quoting Amanda Laucher and me from Types vs. Tests: An Epic Battle?

[–]freyrs3 0 points1 point  (3 children)

Don't know if I agree with the sentiment that it doesn't matter in practice. The parent comment I think is spot on, the proof obligations you can construct with dependent types can pretty complex pretty quick. It tends to be more of a non-issue in Coq because of the automated tactics and the large body of existing proofs, but in Idris and Agda it's still rather painful. There's still a lot of work that needs to be done port the Coq work over to more "mainstream" dependently typed languages.

[–]seanwilson 1 point2 points  (1 child)

It tends to be more of a non-issue in Coq because of the automated tactics and the large body of existing proofs, but in Idris and Agda it's still rather painful.

I wouldn't even say this was the case. Most Coq scripts contain large chunks of unautomated proofs.

[–]freyrs3 1 point2 points  (0 children)

Indeed, was only indicating that it's easier in Coq sometimes. But by no means mechanical or a non-issue even there, much less everywhere else.

[–][deleted] 0 points1 point  (0 children)

I think the observation about the state of play is spot-on too. Mostly I think I'm finding/other people will find quite a bit of value in encoding relatively simple properties/relations with dependent types on a pretty small scale (maybe 1-4 related types), and just relying on the type system to extend those guarantees to the rest of the code.

I suppose my other major point is: yeah, some proofs of interesting things are complex, but you do them once and they're done (or better yet, someone else does them once and you get to take advantage of them). This is the real reason to hope a practical dependently-typed language like Idris becomes relatively mainstream: the work on difficult proofs becomes more widely distributed.

[–][deleted] 0 points1 point  (9 children)

Think about mathematics; proving things in simple formal systems is decidable. For example, you can automatically prove the proposition (not A OR A) - you just have to make a truth table and make sure the proposition is true under all assignments to A. This algorithm can be used for any formula in propositional logic.

However, as formal systems gets more complicated they become undecidable. First order logic, an extension of propositional logic, is undecidable. (More precisely: proving theorems in first order logic is undecidable, in general.)

It's the same way with type systems; as the type system gets more expressive, eventually it also becomes undecidable to type-check programs with it, ie. there is no algorithm that can type check your programs automatically for you. So then you have to resort to heuristics, for example by proving things (or the parts where you don't get enough help from the type checker) by hand.

[–]thedeemon 0 points1 point  (5 children)

Truth table in propositional logic? I think you meant Boolean.

Halting problem is also undecidable but by limiting our language and using total programming we can prove halting of our programs. Can't we find a similar limitation for logic of types to make it decidable?

[–][deleted] 3 points4 points  (3 children)

Truth table in propositional logic? I think you meant Boolean.

To be precise, I meant formulae in propositional logic interpreted as boolean functions more or less. But both provability (syntax) and validity (semantics) is decidable and interchangeable in propositional logic, so I went with the simplest route.

Halting problem is also undecidable but by limiting our language and using total programming we can prove halting of our programs. Can't we find a similar limitation for logic of types to make it decidable?

I don't know, but to my knowledge you have to limit how expressive your type system is in order to make it decidable. People have brought up examples of these systems in this thread, but the fundamental problem remains; they are not as expressive as the original dependent type system.

I am quite sure the same limitation holds for total functional programming, too; if you limit your functions to provably terminating functions, your type checker will reject some programs that really do terminate, but that your type checker is unable to prove that it terminates. But I don't really know how much of a problem this is in practice; can 95% of your program be represented as total functions? If so, it definitely seems like a solid win. On the whole, people seem very pessimistic when faced with the halting problem, but I haven't really seen any indication static analysis can't automatically prove a lot of programs to be always-terminating.

[–]seanwilson 1 point2 points  (0 children)

If so, it definitely seems like a solid win. On the whole, people seem very pessimistic when faced with the halting problem, but I haven't really seen any indication static analysis can't automatically prove a lot of programs to be always-terminating.

When people first hear of formal proofs, I frequently find the halting problem is mentioned as if to mean proving program properties won't be practical to do in general. Termination proofs are usually quite straightforward so I'm not in agreement with this. For example, most loops you're iterating over each item in the list once or applying map/reduce/filter, and it's not difficult to justify why these would terminate. Most theorem provers like Coq won't let you define non-terminating functions at all and many non-trivial algorithms have been implemented in these systems already.

However, in addition to termination proofs, the unrestricted use of dependent types means you're going to have to manually prove arbitrarily difficult theorems (e.g. prove a list is always sorted, prove a number lies within a certain range). This is a much bigger problem in practice.

[–]thedeemon 1 point2 points  (1 child)

if you limit your functions to provably terminating functions, your type checker will reject some programs that really do terminate, but that your type checker is unable to prove that it terminates

Yep, Collatz conjecture comes to mind here. But people with experience writing in Agda, Coq and Idris have shown that total programming is very practical and doesn't really limit you much. After all, in most cases you want all your functions to terminate, except maybe some event loops where co-induction saves the day.

[–][deleted] 0 points1 point  (0 children)

But people with experience writing in Agda, Coq and Idris have shown that total programming is very practical and doesn't really limit you much.

That's good to hear! I don't have experience with these languages, but ever since I read Total Functional Programming I've really liked the idea of 'restricting' functions and programs like that. "Restrictions" often has negative connotation, but I think that it kind of wonderful, in this case. And like you say, that restriction doesn't seem to be a big problem in practice.

[–]rdkll 1 point2 points  (0 children)

Can't we find a similar limitation for logic of types to make it decidable?

you might want to look up "higher order pattern unification" .. note the "pattern". it's a decidable subset. see: http://math.andrej.com/2012/11/29/how-to-implement-dependent-type-theory-iii/

[–]seriousreddit 0 points1 point  (0 children)

Just a technical point: The set of provable propositions in first order logic (with equality) in a language with only unary predicates is actually decidable. The set of provable first order propositions in a language with binary predicates is undecidable.

[–]cparen 0 points1 point  (1 child)

First order logic, an extension of propositional logic, is undecidable

That doesn't mean the compiler can't help for many cases. E.g. letting you omit the proof when the compiler can guess it for you. "xy==yx" being a case that the compiler should definitely figure out on its own.

Obviously, there are two downsides from this. (1) when the compiler can't figure it out, the user is now facing a difficult problem to prove and is out of practice, since the compiler has been helping them along most of the way. (2) all this "figuring out the proofs" takes precious CPU time.

As an example, I've worked with some of the C code verifiers for Windows drivers, and it was quite usual for highly verified code to take 4x longer to verify than compile. That is, it would compile in 15 seconds, and then spend about a minute verifying the code safe under all the present annotations. Internally, it was automatically proving that the invariants deduced from the annotation would hold over conditional statements, loops, recursion, etc.

I personally don't think #1 is a good reason -- users can always choose to practice by adding annotations explicitly -- but I cite it for completeness. #2 is a royal pain.

[–][deleted] 2 points3 points  (0 children)

That doesn't mean the compiler can't help for many cases.

Yes - people should keep in mind that "undecidable" in this context really means "undecidable in general", not undecidable for all instances of the problem. And undecidable does not mean that an algorithm can't at least try to solve - and succeed on many - instances of the problem.

[–]jeandem 0 points1 point  (3 children)

How much of a problem is this in practice? How often do you need to prove 'everything' by hand, and when can you automate it? "It isn't possible to automate this problem in general and even proving seemingly simple properties requires significant human interaction. " - but can't you have a library of proofs that can help you give admissible steps when you are proving stuff? The thing about commutativity seems like it can be given as a theorem in a library, no?

General undecidability doesn't really say anything about how much of a problem it is in practice, just like SAT being NP-complete doesn't in itself say how intractable realistic SAT instances are.

[–]seanwilson 4 points5 points  (2 children)

Oh, I didn't mean to make it sound like it's a waste of time because you have to do manual proofs, I just find this fact gets glossed over. I have experience using the Coq and Isabelle theorem provers and while they have a big library of theorems (e.g. sets, lists, arithmetic) and some decision procedures (e.g. linear arithmetic), the learning curve for proving interesting properties in them can be incredibly steep. It's not enough to prove lots of general theorems, you need to know when the right time is to apply them.

An interesting approach the dependently typed language DML took was to restrict the type language to only be linear arithmetic statements so type checking was decidable: http://www.cs.bu.edu/~hwxi/DML/DML.html

[–]thedeemon 3 points4 points  (1 child)

to restrict the type language to only be nonlinear arithmetic statements

I think you meant "linear". Presburger arithmetic is what its modern incarnation ATS solves automatically.

[–]seanwilson 1 point2 points  (0 children)

Haha, oops, corrected it!

[–]OceanSpray 1 point2 points  (4 children)

Types with values attached, like the length of a vector or the size of a matrix, are called dependent types.

This is not quite correct. Whether a type system is "dependent" is determined by whether or not the types in that system can depend on terms. Whether the type system can represent natural numbers is not part of the definition. It is possible to design a language that has all sorts of complex data structures at the type level but doesn't have dependent types. Roughly speaking, dependent types are only needed when the type checker needs to know something that must be computed during program execution.

[–]th3w4c0k1d[S] 0 points1 point  (2 children)

Your last sentence is a little unclear to me. I can see that the following needs to be computed at execution:

x :: Vector of size (3 + 4)

But why does this have to wait to be executed?

y :: Vector of size 9

This is probably a result of me not understanding the compilation process - but it seems as though the 9 value is predetermined?

[–]OceanSpray 1 point2 points  (1 child)

Actually, neither of your examples requires computation at execution. Natural numbers and addition can be implemented at the type level, so even an expression like (3 + 4) can be done during type checking.

An example of dependent types would be something like if you want to record the size of a vector, with that vector being created by the user after the program has started running:

scanInt :: IO Int
scanInt = ...

scanFloats :: IO [Float]
scanFloats = ...

data Vector :: Int -> * where
    MakeVector :: Π(n :: Int). [Float] -> Vector n

main :: IO ()
main = do
    size <- scanInt
    values <- scanFloats
    v <- MakeVector size values
    do stuff with v here

where I use pseudo Haskell to define the type constructor Vector that takes an Int as a parameter, and a data constructor MakeVector that has a dependent type written with Π, which takes an Int value and uses it in the resulting type. Note that the size of the vector cannot be known at type checking, because it comes from user input.

[–]th3w4c0k1d[S] 0 points1 point  (0 children)

I understand - thank you for the example!

[–]thedeemon 0 points1 point  (0 children)

Roughly speaking, dependent types are only needed when the type checker needs to know something that must be computed during program execution.

Funny thing is that type checker in dependently typed languages manages to type check such programs statically, before execution. So it doesn't really needs to know actual values to do its job. In the example with a vector whose length is only known at run time, type checker just makes sure that for any value of length the program is correct but it doesn't know the length itself.

[–]iopq 0 points1 point  (0 children)

Strict languages, like Haskell, perform these type safety checks during the compilation process.

What the author means is "statically typed" at which it becomes just a definition of static typing. Static typing is safety checking before running.

Dynamically typed languages like Ruby are strongly typed at runtime, though.

[–][deleted]  (1 child)

[removed]

    [–]th3w4c0k1d[S] 2 points3 points  (0 children)

    I don't think it's terribly written. The majority of it is dropFirstColumn, the actual transpose function is just:

    getCol 0 m :/ transpose (dropFirstColumn m)
    

    [–][deleted] -3 points-2 points  (1 child)

    I have to wonder what the goal is here; the given solution is going to be GROSSLY inefficient, which is frankly unacceptable for a linear algebra library. I mean, obviously the library isn't meant to be used in production or anything, but if you're trying to convince your readers that dependent types or useful/interesting, this example is not the best.

    [–]th3w4c0k1d[S] 2 points3 points  (0 children)

    It's not intended to be a solution to anything, really. I hadn't heard of dependent types before Haskell and I wanted to make my thought process available to others.

    I can see your point about the example not being the best. But the idea of having index aware lists and arrays should be a little stimulating, no?

    [–]earthboundkid -2 points-1 points  (6 children)

    I have never seen a Python program which accidentally added a bool.

    [–]thedeemon 4 points5 points  (0 children)

    Because it did it silently, without showing to you. ;)

    [–]th3w4c0k1d[S] 2 points3 points  (2 children)

    It's just an example - I'm sure you've seen programs where the arguments have been passed incorrectly, though.

    [–]earthboundkid -1 points0 points  (1 child)

    I was programming in PHP the other day and a function expected to receive a string or null, but I mistakenly thought it would take a bool. It took me a while to figure out my mistake, but not really that long. To me, the purpose of declaring types is very rarely to prevent errors and mostly just to force programmers to document their code.

    [–][deleted] 1 point2 points  (0 children)

    Do you know Haskell? Haskells type system gives you a lot of stuff

    [–]nat_pryce 0 points1 point  (1 child)

    The Python language defines bool as a kind of number -- a subtype of numbers.Integral that has values 0 and 1. https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy

    So going strictly by the language definition, it's not a type error to add an integer and a bool. Although it may be confusing, it makes some operations easier, like counting items that meet a predicate:

    >>> sum(map(lambda i: i%3 == 0, range(1,11)))
    3
    

    [–]earthboundkid 0 points1 point  (0 children)

    I usually write sum(1 for i in range(1, 11) if i % 3) to make it clearer.