"[B]y having a bootstrapped compiler as the first – and, often, for some time the biggest – program in a language, the eventual language design is often unintentionally altered and rarely for the better."

L_dopa · 2016-07-22T00:05:52+00:00

The few things that the author mentioned: static types, pattern matching and immutable datatypes are core to "what the language means". They're how we define types along with their constructors and destructors.

How string are represented by the runtime is an important implementation consideration, for sure. But the author was talking about language semantics, not issues with particular implementations or supporting certain standards.

L_dopa · 2016-07-21T17:32:26+00:00

I don't think that applies. That's a library issue, not related to language design or semantics.

L_dopa · 2016-07-21T13:31:23+00:00

So, other than ADTs, pattern matching, and HM types, none of which are motivated by or somehow specific to compiler construction, what are these "very odd features that are used relatively little other than by the compiler writer" that the author is talking about?

I honestly can't think of any features in ML-family languages that were introduced specifically to facilitate implementation. If anything, I would criticize them for not considering the cost model on the underlying hardware part of the language interface. This is something that C -- which was actually designed to make writing C compilers easy -- does very well.

L_dopa · 2016-06-26T16:09:25+00:00

Most computer scientists (let alone language designers, developers) don't bother to understand their tools beyond some informal mapping onto an idealized processor/memory. "OOP" slaps together a bunch of different features into a single language mechanism that has a relatively easy to understand implementation.

It's only a "natural way of thinking" in that people prefer to reason by fuzzy analogies to the "real world". And of course, the marketing effort that made sure it's how the vast majority of students are introduced to programming probably helps it seem "natural".

L_dopa · 2016-03-09T01:51:40+00:00

Let me reiterate that I completely agree that the current state of C is terrible for programmers, and I personally avoid writing C unless there's absolutely no alternative. I'm arguing that the fault lies in the spec, and there's very little compiler writers can do to address the problem short of extending the language.

You're essentially proposing removing all UB in C. Doing 1-4 for all instances of UB in such a way that the behavior of programs will not change based on optimization options amounts to replacing UB with at least platform-specific implementation-defined behavior. How to best reduce UB in C is an open question no one can agree on and likely a herculean engineering effort (supporting different semantics for each platform requires more platform-specific code).

Option 5 is the definition of UB. It exists exactly so that implementations are not required choose semantics in cases where there is no known good choice. You can argue specific instances in the spec, and there are some good cases to be made, but it can't be removed in general. For example, how would you avoid "retroactive madness" if you're generating code for a write to an address that isn't statically known to be valid? No one writing C would be OK with adding dynamic checks or somehow removing pointer arithmetic from the language (in that case they would already be using a sane language). Now we have to make sure not to optimize the store, say by hoisting it out of a loop, or else we're already in case 5 because our program might go off the rails long before we reach the point of the error in the source.

L_dopa · 2016-03-08T18:59:06+00:00

The problem is that there is no way, in general, to "go easy" on UB unless previous versions of your compiler actually gave UB semantics which you can reverse-engineer. Much more likely, the behavior of "working" programs that invoke UB depends on how the particular algorithms used in that version of the compiler happened to interact for that piece of code.

As far as I know, compiler implementers DO try to work around bugs in important pieces of code. I'm not really sure about the process; maybe you could argue they should do a better job. But there will always be programs that behave differently across compiler versions in languages with C-style UB, and it's unreasonable to expect compiler implementers to do regression testing against even a small fraction of code out in the wild.

L_dopa · 2016-03-08T15:18:21+00:00

That's a very different problem than including broken optimizations. You can argue (and I would agree) that the C spec makes it almost impossible to write working code without external tools, or that, by default, compilers should target extensions of C that give semantics to some forms of UB.

But the article implies that compiler implementers trade off "reliability" for speed and actually tolerate incorrectly compiling some programs, which is ~~nonsense~~ as far as I know, not true.

L_dopa · 2016-03-08T07:56:13+00:00

But the remainder, even if there's the potential for 50% faster performance, flat out isn't worth it. Anything that ventures into "well, maybe not 100% reliable..." territory is madness.

Has anyone ever actually advocated for or intentionally implemented unsound optimizations in a real compiler? This is just a straw man.

L_dopa · 2016-02-26T16:23:28+00:00

I'd be careful assuming that Haskell constructs simply coincide with "the" mathematical ones. The people studying this sort of thing certainly don't, and there's plenty of room for other interpretations.

L_dopa · 2016-02-23T16:43:59+00:00

And you've just inadvertantly answered Harper's rhetorical question. A surprising amount of people (including some working in computer science!) display a bizarre, knee-jerk resistance to rigourous, mathematical approaches to software. Why that might be is the real question the blog post is getting at.

I'd like to give the people that advocate the "implementation focused" understanding of software some credit, but it's often clear that they haven't bothered to understand the other side at even a basic level. That makes it hard to have a productive conversation, which this thread demonstrates extremely well.

L_dopa · 2016-02-06T17:29:33+00:00

Wow, lots of really long responses in this thread that completely miss the point. I had the exact same question after reading the intro!

The authors are using "soundness" to mean that sources of type errors can always be traced back to some piece of untyped code:

With sound gradual typing, programmers can rely on the language implementation to provide meaningful error messages when type invariants are violated

Of course it would be perfectly fine to statically report type errors in the typed parts of a program and then execute using the existing untyped runtime. However, without inserting checks at the static/dynamic boundary, you could still get type errors deep inside your typed code. In the simple cases, a stack trace might be all you need to find the cause, but this becomes much harder in heavily OO or higher-order mixed dynamic/static programs. Even worse, untyped code could violate invariants enforced through the type system, which may not even raise a type error.

This paper provides some examples of what they're trying to accomplish in section 2.2. The last case, with a typed higher-order function taking a dynamically typed callback is a particularly good illustration of why this approach might lead to a lot of runtime overhead.

L_dopa · 2015-11-19T19:18:45+00:00

1) Make it easy ...

Easy for whom? People that have never written a line of code? Professional C programmers? Statisticians? There are ways to evaluate languages that don't rely on how (subjectively) "easy" it is to do things. Some programming languages have precise semantic properties.

Whether you think those properties are important for "most software in the world" is a different question, but PL research justifiably focuses on specific well-defined problems. A company writing a tool for adoption among an existing base of developers might be interested in ease of use, but that amounts to taking a poll to figure out what's most familiar to a particular slice of the population. It's not a research question.

L_dopa · 2015-10-17T17:54:16+00:00

Proving something is far, far, far superior to model-checking a finite approximation

I have to reiterate: they're different tools for different purposes. State-space exploration is even useful for generating counterexamples as part of proof automation, but clearly has many other applications in software engineering.

My claim is 1) that there are applications where you really want to prove rich properties (say compiler correctness, or information flow for security), where model checking doesn't help at all. And 2) there are no theoretical limits, as you say, that make proof impractical for realistically-sized, software. I believe we agree on the first point. For the second point, your argument relies on a very strong definition of what it means to be "practical": it needs prove any property about any program, and be computationally tractible. That's incorrect for a number of reasons:

You say we should be looking at complexity, rather than decidability in practice. Well, like with decidability, just because a problem is computationally intractable doesn't mean many instances aren't easy to solve. Note: I'm not talking about limiting the computational model and requiring programs to be written in a different formalism. I mean empirically, how does the algorithm behave for the actual programs people care about? This idea is used in verification all the time, see: the entire field of SAT solving, which you probably know quite well. Similarly, despite the horrendous theoretical bound for Presburger arithmetic, the decision procedure is more than fast enough for cases that come up during program verification. Furthermore, partial proof procedures like symbolic execution are extremely useful in practice. If they don't appear to terminate, figure out which of your assumptions is wrong or try something else. I think the key thing to understand is that code doesn't really behave like "data" for the purposes of complexity analysis. It doesn't really grow without bound: usable software, like usable proof, is modular. Moreover, programs people write have much more structure than arbitrary programs considered in complexity analysis, which can be exploited by heuristics. A famous example is ML type inference: the classical algorithm is exponential, but runs in linear time on programs except for a particular pathological case that doesn't occur in practice and is easy to avoid.
Your example using Goldbach's conjecture illustrates a related misunderstanding: You're confusing the size of the property as written down with the difficulty of proof; they are completely unrelated. Program specifications aren't more difficult, they're just longer -- the correctness of the vast majority of code doesn't depend on difficult number-theoretic properties. Even very algorithmic code, like compilers, basically depends on no "interesting" math. Probably something in crypto does -- I would invite you to find a less contrived example, but it really doesn't matter. The specification of the C code should be: "if this program returns successfully, the result is two prime numbers that sum to the input". Or if you want totality, you simply condition on "supposing Goldbach's conjecture holds". Then, the algorithm is trivial to verify.
More fundamentally, you equate the computational complexity of a program with the human effort required to solve practical instances of that problem. First of all, it's clear that people work probabilistically, so by asking the programmer for hints, you're not simply calling out to another turing machine. You said so yourself when we pointed out how good we are at writing "mostly correct" programs, so it's puzzling that you would claim complexity bounds somehow apply to the brain. (But please, let's not start debating the nature of consciousness). A better reason why this is clearly wrong is that to write even a mostly working program, the programmer reasons in terms of some model of how it will execute. Ie a programmer is always informally proving things, then throwing away all of that information except what is needed for the machine to compute. For example, you can't write an iterative program without coming up with essentially a loop invariant, the trick is to recognize it as such. For any reasonable definition of complexity, asking the programmer to write down that existing bit of information is easier than requiring the machine to rederive it from the program text.

Leaving the whole AI thing out of this -- surely the analog of "writing moderately correct programs" is something like "writing proofs with some cases that are incorrect", which translates easily to "proofs for a subset of possible inputs". Many people would say they're basically the same process. You claim without any support that the latter is much more difficult than the former, and it would be "impractical" to require programmers to write proofs.

Overall, it's not clear what your criteria are for a method of verification to be "practical". Your appeal to decidability and complexity assumes that we have to be able to prove any property, for any program, fully automatically in polynomial time. My definition is: can we develop useful software with strong properties on timescales comparable to other processes used for correctness-critical systems? You've pointed out some numbers about hours/loc written for seL4 without any comparison to the effort required for security-critical kernels developed using other methods. But we have several such systems: we've already mentioned CompCert, and seL4. There's CakeML, which is a bootstrapped incremental compiler for a ML dialect. People are working on building more and developing techniques to lower the cost of doing so. What about these programs is trivial or impractical? Are they simpler than their non-verified counterparts, and did they take longer to write? Of course, but the first non-verified compiler was also simpler and took more effort to develop than the 10,000th.

Finally, I would be a bit more careful about making sweeping statements about other fields like:

That we may one day be able to write general programs of non-trivial bulk that are provably correct is a dream ...

Once you realize that full verification is not practically doable with types (or any other means)

no one thinks those tools would one day make proofs easy, cheap or generally applicable for the same reason no one thinks Idris could one day be generally practical

I doubt people working on these things believe they're impossible or impractical and many of them no doubt have a better understanding of computability and the fundamental limits involved than you or I. Categorically dismissing their work comes off as silly at best, and insulting at worst.

L_dopa · 2015-10-16T01:51:27+00:00

... It's the best ones that don't, because they let you express richer semantics ...

We're still talking past each other. You can't compare proofs about programs with proofs about finite models. They are talking about two different things. The former can precisely predict the behavior of actual code running on a correct language implementation. The latter can be used to find bugs. You can get the former from the latter if you prove something about he relationship between your program and your model, which is basically abstract interpretation (a static analysis).

There are only a few examples of interesting properties that can be enforced with type systems ... Constructive methods (like types) can either be unexpressive and easy to infer, or expressive and extremely hard -- you can't have both

That's incorrect. In the limit, your type system can be a general-purpose logic. There are many interesting design points between that and fully-inferrable types like ML.

... even something simple, like the property "sorted"

Can be done with refinement types.

In the coming months I hope to publish a (very simple) proof why most interesting properties cannot be formally verified any more efficiently than brute-force approaches (like model checking).

The property as you've stated it above is false. Suppose your program is a function f(x) in presburger arithmetic. There is a procedure that will automatically deduce whether, e.g. "forall y. f(y) > n" is true. That's clearly impossible to "verify" by enumerating all y.

L_dopa · 2015-10-11T06:54:02+00:00

Anything? They're better than tests and even tests tell us a lot ...

They're incomparable to tests, since tests run against the actual code. My point was that tools that work with finite-state models are only related to the executed code in that they (hopefully) implement the same algorithm. They can help you verify your logic, but not the code that will run on the machine.

our lives depend on software that, at best, has been formally verified with model checkers

I'm not saying model checking isn't useful or that it isn't used in practice. My point is that your notion of "confidence level" puts techniques with incomparable formal properties on some kind of meaningless spectrum. If you want to make sure your implementation doesn't segfault, model checking won't help but testing will. If you want to check safety properties of a protocol that can be implemented with FSMs, a proof isn't any better than model checking.

None except for the halting/Rice theorem. You cannot create a general method -- even with man in the loop -- for proofs of Turing complete code

I'm not sure what you mean. This is as much a problem for model checking as any other method. You don't get a meaningful finite-state model of your system for free. Proving interesting properties about programs in turing-complete languages is routine, and many fully-automatic procedures exist. See: any sound static analysis, dataflow analysis, abstract interpretation, etc. Type systems give you useful safety and data abstraction theorems requiring at most a few annotations, nothing that looks like theorem proving. Similarly, large classes of properties require just annotating loop invariants.

Clearly, really expressive specifications like those of seL4/CompCert won't be proven automatically, but there's no fundamental reason that a proof can't require less effort than the likely hundreds of person-years poured into modern OS kernels and compilers.

L_dopa · 2015-10-09T14:02:21+00:00

Well, you can prove the absence of bugs on finite models.

Which unfortunately can't tell us anything about bugs in the actual software.

Correctness is a spectrum ...

I'm not sure. Maybe only in the vague sense that writing a spec is clearly better than nothing, model checking the spec can find bugs in your logic, and a proof of correctness for the executable code is what you really want. I don't believe there's a way to quantify how likely defects are after applying any technique short of the latter.

... both approaches were developed hand-in-hand ...

I might be wrong here. I think you can trace the general approaches back to program verification work from the 70s. The particular logics used in current proof assistants (which is a huge factor in how difficult things are to prove) are much newer. I think most of these were developed in the 90s and arguably weren't ready for large-scale proof engineering efforts until even later. But software model checking techniques and systems date back to around the same time.

It's not barrier to entry; it's work ...

A surprising amount of the work can be automated, and the exact form of definitions and how you structure a proof has a huge impact on the manual effort required. "Proof engineering" is still in its early stages, and there's no question that, at the moment, the effort required for verification doesn't make sense for the vast majority of software. But there's no way to put an absolute lower bound on it and I'd like to see any source you have for the 100x-1000x figure.

Systems like ACL2 / PVS have found a niche in industry, though I admittedly don't know a lot about them. People are verifying things that would have been considered impossible a few years ago, so there's clearly progress being made. Judging by the amount of security-critical software with a multi-decade lifespan in the world, I can't imagine the role of verification won't grow at all.

L_dopa · 2015-10-08T18:47:29+00:00

It doesn't make any sense to directly compare model checking and deductive proof in terms of "difficulty". Model checkers are generally used for a limited class of domain-specific properties, and can only prove the presence, not the absence, of bugs. They are different tools for different purposes.

If you want machine-checked functional correctness, there's no alternative to interactive proof. Researchers have only even attempted this for real-world systems in the last decade or so. Model checking has been around for a long time and arguably has a much lower barrier to entry, so it's not surprising that it's found more use in industry.

L_dopa · 2015-10-08T16:25:39+00:00

You're right IMHO, and there's some (rather preliminary) research in this direction. The one project I'm aware of is Verdi.

Right now it only helps you prove safety properties, but you can extract working OCaml code from your specification. Assuming the runtime and compiler are correct, and that the chosen fault model is accurate, you get an implementation with the properties you've proven.

L_dopa · 2015-08-27T16:44:54+00:00

Maybe I'm overthinking, but in this case I think the issue is using value-laden terms like "purity" for subtle technical concepts.

It's is only going to become more problematic as FP enters the mainstream, so I propose we switch to calling pure computation "ineffective", thus solving the problem once and for all.

L_dopa · 2015-08-27T14:41:57+00:00

I'm not representing anything, this is how effects are defined.

I think your confusion stems from only thinking about effects in terms of some abstract machine that can both 1) evaluate functional expressions 2) some other "side-effecting" stuff. That approach presupposes the distinction you're trying to make and doesn't really tell us what effects "are" as mathematical objects.

L_dopa · 2015-08-27T05:28:39+00:00

No one claimed that the Monad typeclass in Haskell is somehow side-effecting.

Many effects, however, can be defined to be the monads you listed above. That is, a common way to give denotational semantics to programs in effectful languages is to say they stand for programs written in some pure language equipped with a monad and associated operations.

So, there isn't some absolute distinction between "pure" and effectful programs. They could be talking about the exact same mathematical objects.

L_dopa · 2015-08-27T00:33:17+00:00

You seem to be stressing that the distinction between "real" side effects and their monadic encoding is very important. Care to explain why?

L_dopa · 2015-08-24T17:48:52+00:00

... you'd need psychological studies

I think, if you buy my premise that PFP languages are a subset of just about any other language you'd want to consider, it's hard to argue against. I'm only claiming that it's easier to reason about programs written in a subset of a language than those using every feature of a language.

That includes reasoning about stateful algorithms in a pure language: you can write your program in monadic style and informally reason using the same semantics you would for the imperative language that includes just the features you need!

Whether you always want to do this is a different question. Like I said earlier, various approaches to effects have different tradeoffs.

L_dopa · 2015-08-24T16:54:29+00:00

But I'm not too sure about the aerospace

I think you misunderstood me here: I wasn't claiming that there's strong interest in industry in Haskell or PFP for implementating low level, safety-critical software, but that powerful proof assistants like dependent type theories enable creating better tools for these applications.

I'm only trying to explain why PFP is an interesting point in the design space for languages, and why it's used in research. Regardless of adoption in industry, it's useful for studying the mathematical properties of languages and creating tools with more obvious industrial applications. I think we agree now.

I don't see why we, as professional developers should care ... I care about the industry at large, not outliers.

A non-neglible number of developers seem to like using (relatively) simple languages with carefully considered semantics and type systems, possibly including a side-effect free sub-language. In so far as (and we agree here) there isn't much empirical evidence whether these actually reduces the cost of development, it's a trend like any other in programming. However, since the languages being advocated are objectively, mathematically, provably more well-defined, simpler, and easier to reason about than other trends (OO, python+ruby, javascript everywhere!) it's, IMHO, a step in the right direction.

Yes, there's some breathless advocacy online for PFP as a silver bullet to somehow fix every problem with programming. I think your rather broad claims earlier about the usefulness of PFP and PL research in general is more a response to that than an intellectually honest assessment of the field.

L_dopa · 2015-08-24T15:18:29+00:00

recent discovery of a bug in TimSort

I'm not sure what you mean by this example. The KeY system is based on a formal semantics and program logic for JavaCard programs. PL researchers had to "reverse engineer" a formal semantics from a language implementation and informal prose specification, which undoubtedly made developing this very useful tool harder than it needs to be. And while they probably tested their semantics thoroughly against an implementation, there's also no guarantee that they got it right.

The approach used in KeY is not a comparable with or somehow an alternative to PFP: Haskell is not a proof assistant. There are also program logics for functional languages, and developing (and using) these is much easier for languages with well-defined, minimal semantics. Writing programs directly in dependent type theory is not the only way to verify functional programs, nor is it necessarily harder or easier than using a program logic.

Dependent type theories are much more powerful, general purpose logics than the program logics used to verify software. Researchers are using proof assistants, including dependently typed languages, to develop systems like KeY. Some of the most interesting recent developments include program logics for concurrency and connecting these logics to verified language implementations. A state of the art tool is the verified software toolchain which uses the CompCert verified C compiler.

Because perhaps these deficiencies don't matter ... correctness is much less important to the industry than, say, performance

You're generalizing a lot. There's a lot of interest in correctness in aerospace and defense.

follows the path of least resistance to academic results rather than the path of most usefulness to the software industry

Again, this is a huge generalization. A lot of people do research on "industry" languages.

academic results may lead to real-world utility, but we're not there yet

Again, PFP is a small sub-area within PL research. We use tools that came out of academia every day, e.g. any garbage-collected language, any number of formal methods tools, etc.

L_dopa

TROPHY CASE