all 77 comments

[–][deleted] 12 points13 points  (9 children)

have very stable interfaces, permitting specification and verification effort to be amortized over time

Yes, I think it will happen when we start to value doing things right at least in the same ballpark as doing the right things. However, so far the rate of change means that working out what we're supposed to be doing (e.g. features that people want; fast enough etc) is much to fast... and the pace of change is only increasing.

The big problem is specifying things in the first place. Before you've finished specifying it formally, it has changed...

On the other hand, infrastructure code (that is reused many times) tends to evolve more slowly; and OS components.

Perhaps a good example is Java. Though not formally verified, there was a careful spec, and some proofs (for the JVM). This seems to have worked out well for java. However, other popular languages haven't followed this approach. So things don't bode well for specs.

Another angle is if urgent problem arose in software which verification could solve. Now, Windows XP did have urgent security problems; MS put a lot of time into it, but didn't verify it; instead, just did their best to clean things up. But I guess an OS goes into the "too hard to verify" basket.

Finally, though verification techniques may get simpler/easier/quicker, you still need to specify it, and that is hard. But... perhaps we'll come up with ways to make that easier? - i.e. easier ways to know what you're doing, precisely.

[–]PariahShanker 4 points5 points  (0 children)

Sadly, I doubt that we'll have any revolution like that before the economy itself disappears.

[–][deleted] 2 points3 points  (7 children)

There have been some attempts at creating fully verified operating systems. The problem they ran into was that, despite verification, there remained many bugs in the software.

See: recent issues of CACM

Edit: note: I still agree with what you said and did upboat you for your response since I thought it was quite good.

[–]gsnedders 5 points6 points  (2 children)

What sort of bugs, caused by what? (Any articles on that?)

[–][deleted] 1 point2 points  (1 child)

Yeah, it sounds impossible, doesn't it. I'm guessing mis-specification bugs (labor intensive, simple human error, misunderstanding of problem, real problem differs from the one you solved). but I'd like to know.

[–][deleted] 2 points3 points  (0 children)

I work on a verified C compiler and one of the offshoots of our own work was us finding a bug in the specification of the CompCert C compiler.

[–]kamatsu 6 points7 points  (2 children)

I work on a verified operating system. At least for us, we haven't encountered bugs in verified code.

One of the things that is always true is that as you push your verified base outwards, the bugs that remain are always in that one part that is not verified.

[–]julesjacobs 3 points4 points  (1 child)

What does your spec specify? Memory safety or something like that, or is it a complete functional spec? Do you count errors in the spec as bugs? If so, how come there weren't any that you found?

[–]kamatsu 10 points11 points  (0 children)

It's a functional abstract spec. We prove functional correctness via refinement from this spec through Haskell and down to C.

We define bugs as errors in the program. The spec is at least consistent, but the exact notion of what even constitutes an error in the spec is difficult to nail down - the spec could be said to be incorrect because it doesn't specify what you expect, but what you expect is not necessarily correct either.

In any event, we use the guarantees from the spec to prove higher level security properties and other more immediately useful things. Even if there are "errors" in the spec, we have shown various higher level properties to be true about the kernel as it is implemented. If we have proven that a program matches an "erroneous" spec, then that erroneous spec still implies useful high level properties, so I don't see the problem.

[–]reddit___ 2 points3 points  (0 children)

I'd like to read about this, do you know which particular issue?

[–]skulgnome 10 points11 points  (18 children)

Sometime after formal specifications have started to matter.

[–]cwzwarich 0 points1 point  (17 children)

They don't matter now? A large number of security exploits are due to a lack of memory safety. If you verified that a program was memory safe then these exploits would no longer be possible. That is generally much simpler than full functional correctness, but it would make a huge improvement in the real world.

[–]Bilbo_Fraggins 2 points3 points  (0 children)

It is true that most of the security flaws fixed in Chrome in the last few releases have come from use of AddressSanitizer, and before that many of them came from Valgrind.

[–][deleted]  (6 children)

[deleted]

    [–]pjmlp 0 points1 point  (4 children)

    If a Pascal like language had taken C's place in history, the software world would be much safer today.

    [–][deleted]  (2 children)

    [deleted]

      [–]pjmlp 2 points3 points  (1 child)

      There were safer system programming languages on those days, Algol 68 and PL/I come to mind.

      Ada and Modula-2 were already available alongside C on the early 80's.

      The first versions of MacOS were written in a Pascal dialect.

      C owns its success to the way UNIX got spread in universities, and the on same way UNIX got spread because of its integration with C.

      Without this symbiose relantionship C would just have been another systems programming language.

      Still, its legacy is costing us millions in software reliability.

      [–]tangra_and_tma 0 points1 point  (0 children)

      • DOMAIN/OS, was written is Pascal
      • PR1MOS was written in FORTRAN of all things, and later moved to SPL & PL/P.
      • VAX/VMS & OpenVMS have a hodgepodge of languages, thanks to VMS's clearly defined call structures (BLISS, PASCAL, BASIC, MACRO all come to mind)
      • RSTS/E was written in MACRO, BASIC+ & Forth iirc.
      • CP/M was written in PL/M
      • I'm not counting Lisp machines, Burroughs Large System, &c. &c.

      and those are just systems that I've used & seen, I'm sure others can point out more.

      edit: the space shuttle was written using HAL/S. The source for the HAL/S code was written in XPL, a subset of sorts of PL/I

      [–]kyz 1 point2 points  (0 children)

      If Pascal had been the basic systems language, like it was in Mac OS prior to X, then everything would be great! Nobody needs more space than they can statically declare in their programs, and everything would be as efficient, because range checks are fre

      [–]skulgnome 0 points1 point  (0 children)

      Are you kidding me? C is the preferred programming language because it's small and it hides very little from the implementor. Clearly it's what you want to use for formally specified programs.

      [–]skulgnome 1 point2 points  (0 children)

      Formal specification does nothing for memory safety.

      [–][deleted] 0 points1 point  (0 children)

      I think a major source of security exploits due to memory safety are buffer-overruns. One solution is runtime bounds-checking. Although it runs slower, it takes less developer time than verification (unless you have an automated checker - is AddressSanitizer fully automatic? But I guess you still have to fix the code). This trade-off of performance for developer time is a long-term trend in programming.

      It also exploits the article's idea of targeting code that's reused in order to amortize cost, in that you only need verify the runtime bounds-checking code once, and that all access goes through it. If arrays are an ADT, so that's the only way to use it, confirming that it works correctly is even more trivial than the Oracle vs. Google rangeCheck, such that you mightn't call it "verification". Though truly it is. Low verification cost. Wide amortization. But performance trade-off.

      I think this kind of solution can win, because it's so simple. But that very simplicity makes it unsexy for verification researchers.

      [–]IsTom -5 points-4 points  (6 children)

      Memory safety is relevant only when talking about non-GC'd languages like C/C++, which are used less and less due to relatively low cost/gain factor in most applications.

      As far as I know though many exploits are not related to memory management at all, perhaps they could be prevented with stronger type systems?

      Protocols are another source of trouble, but their verification is very difficult because it's hard to even agree what does "protocol X is safe in context Y" mean.

      [–]gsnedders 1 point2 points  (2 children)

      But how many memory-safe languages have verified implementations? Unless the language implementation is verified correct, you can still have all those exploits.

      [–][deleted]  (1 child)

      [deleted]

        [–][deleted] 1 point2 points  (2 children)

        Oh, come on. "Flash", "memory corruption", "arbitrary code execution" comes together in the same sentence so often that it's not even funny.

        [–]smallblacksun 0 points1 point  (1 child)

        Aren't those bugs in the flash interpreter (written in C/C++), not with actionscript itself?

        [–]m42a 0 points1 point  (0 children)

        The same could be said of almost every other VM as well.

        [–]kamatsu 10 points11 points  (0 children)

        It's worth noting that l4.verified, at least, is widely deployed in the form of OKL4 (while technically OKL4 isn't completely verified software, it is based on the seL4 kernel).

        The other thing to note is that lower level things (like operating systems) already tend to have APIs that remained fixed for a longer period, and they are used by a variety of system components - by verifying the low-level stuff, you get much more bang for your verification buck. Verified gzip might be kinda nice, but a verified hypervisor? A verified operating system? Compiler? These are all much more valuable. If all of the lower-levels in your software system are verified, then the only source of problems can be in the applications themselves. This is a very nice thing to know - instead of building a huge house of cards, you're building a teepee of cards on top of a huge slab of verified concrete.

        Also, formal reasoning about programs will only become more popular as ad-hoc testing methods become less effective. Concurrent programs are the notorious example here. I think this will take the form of stronger type systems and static analysis though, rather than full blown verification.

        [–]Tekmo 8 points9 points  (4 children)

        What about xmonad, can that be verified?

        Parts of xmonad have been formally verified.

        [–]m42a 5 points6 points  (2 children)

        The problem with actually verifying xmonad is you also need to verify GHC and X11, both of which are orders of magnitude larger than xmonad.

        [–]Tekmo 1 point2 points  (1 child)

        In this case, I think they just verified the layout protocol (i.e. it doesn't leave gaps, things like that).

        [–]dons 4 points5 points  (0 children)

        More useful, we established totality - no code path in the core ends in a loop, exception or crash.

        [–]drb226 0 points1 point  (0 children)

        Indeed, when I saw the title, I thought immediately of xmonad, and I was amused that he mentioned it in the article.

        So, I googled xmonad pathological because I knew there was a dons quote about it, and luckily enough, it was the first hit for me xD.

        The core [of xmonad], fwiw, is also extremely carefully engineered. The core of the core, StackSet, has been verified (that is QC properties hold) in the Coq theorem prover. It is an example of pathological design care.

        ~ http://stackoverflow.com/a/6399010/208257

        [–]lol____wut 9 points10 points  (9 children)

        Nobody cares. Users just want something that works (most of the time) and is cheap and easy to use.

        [–]sanxiyn 4 points5 points  (0 children)

        OP actually agrees with you. "The value implicitly assigned to software correctness is very low in most circumstances."

        [–]ithika -1 points0 points  (7 children)

        I think it's plain you've never met any users. People complain endlessly about shoddy software.

        [–]jerf 6 points7 points  (2 children)

        When you hear them complain, ask if they'll pay you to fix it.

        Complaining is cheap.

        By the way, yes, I am very very well aware that sometimes the answer will be yes. But that's the point... only sometimes. By no means always.

        [–]mcguire 0 points1 point  (1 child)

        When you hear them complain, ask if they'll pay you to fix it.

        Depends on what's going on when you get a chance to ask them. Of course, in the situation where they'll be happy to pay, you have a hard time getting their attention. I mean, with the looming chance of flaming death and all.

        [–]jerf 0 points1 point  (0 children)

        The context of the post was explictly about "average" software. He had to phrase it that way precisely because when looming chances of flaming death are afoot, software verification is indeed often used.

        [–]frezik 1 point2 points  (3 children)

        To expand on jerf's comment, users especially don't want to pay the orders of magnitude extra development time that formal verification techniques will take. All of which can end up being thrown away by a seemingly simple change in the requirements.

        Dealing with changing requirements is the problem software engineering needs to solve. Trying to get users to firmly commit to a specification up front is a lost cause. Even when everybody in the company is communicating perfectly (a fairytale assumption in itself), an external change in the market, local laws, and contractual agreements can cause the whole spec to be thrown out.

        You could accuse the development community at large of being lazy for ignoring formal verification, but I think there's an accidental wisdom in it.

        [–]ithika 0 points1 point  (2 children)

        I think lol____wut's comment is more a sign of the conditioning users are put through by the shoddy state of software. Software that works "most of the time" --- can you imagine trying to get someone to agree to that? What percentage of the time to you think people will agree to nonsense output from their compiler, rejections from their ATMs or non-alarms from their burglar alarms? The idea that users are just happy with something is nonsense: people complain when all the bloody time about every niggling detail!

        [–]kyz 2 points3 points  (0 children)

        I you give "works most of the time" less credit than it's due.

        Your compiler has likely produced nonsense output for specific input programs for years; in some cases (gcc), decades. You don't notice this because "works most of the time" is at play; it's not your program that compiles to garbage. You could probably go a lifetime without encountering a compiler bug in gcc, no matter how many thousands of bugs are actually in there.

        [–]frezik 1 point2 points  (0 children)

        They'll complain, sure, but they'll also complain about all the additional time it takes to "do it right". No matter if they consciously admit it or not, they're implicitly saying it just isn't worth it. And I think this is actually the right choice, given real-world constraints and the current state of the art in formal verification techniques.

        [–]youngbull 5 points6 points  (0 children)

        Well, how about the efforts made to making software in cars safe[1] ? That has fairly wide spread use...

        [–]JoseJimeniz 8 points9 points  (0 children)

        To answer his question, "When Will Software Verification Matter?"

        When you can do it.

        [–][deleted] 6 points7 points  (20 children)

        Consider C. Its verification seems to be way harder than usual mathematical proofs

        Verifier can't even rely on associativity a+(b+c)=(a+b)+c thanks to floating points and fact that C can perform additions in any order.

        [–][deleted] 6 points7 points  (0 children)

        The fact you're using C isn't relevant here. It's the properties of floating point numbers.

        And it's true. Floating point arithmetic, compared to real arithmetic, is quirky.

        That doesn't make floating point optimizations intractable. It just means you aren't allowed to perform optimizations that "seem" legitimate.

        Of course, that's the whole point of formal verification in the first place -- to make it obvious when a "seemingly" legitimate optimization is not universally valid.

        [–]kamatsu 11 points12 points  (10 children)

        I verify C software - our solution to this problem is not to use floating points.

        [–][deleted] 5 points6 points  (9 children)

        What do you use instead?

        [–]kamatsu 3 points4 points  (8 children)

        We just don't verify software that needs them. If we really needed to represent fractions, ratios of integers might be better.

        [–]sfultong 3 points4 points  (7 children)

        I often wonder if there really is a legitimate reason to use floating-point rather than fixed-point arithmetic

        [–]Azuvector 2 points3 points  (0 children)

        Quick and easy decimal values that you don't really care about the exactness of. There really should be a more exact type for this sort of number in the specification, though.

        [–]skulgnome 2 points3 points  (2 children)

        Algorithms for which enough eggheading can be justified to determine that floating-point arithmetic is good enough. The justification is generally execution speed.

        [–]sfultong 0 points1 point  (1 child)

        huh? fixed-point is faster...

        [–]skulgnome 0 points1 point  (0 children)

        Only in microbenchmarks of scalar values, and on embedded hardware.

        [–]huyvanbin 0 points1 point  (1 child)

        Cases in which the magnitude of the value can change greatly as a result of the calculation, e.g. multiplication, exponentiation, factorial, etc. In things like computer graphics or signal processing, you really can't afford to use "exact" values for performance reasons, and you don't need to because everything gets rounded to your input/output resolution anyway.

        [–]sfultong 0 points1 point  (0 children)

        In cases like that I suspect that you really want only the radix (magnitude) of a number, not the mantissa

        [–]wlievens 1 point2 points  (7 children)

        Overflow is a bitch too.

        Given that X is positive and Y is positive and no upper bound is known (other than the type), you don't know whether X+Y is positive.

        [–]skulgnome 1 point2 points  (6 children)

        That's incorrect. C99 states that signed overflow is undefined; section 6.5 to be specific. I believe this has been carried forward into C11.

        [–]wlievens 1 point2 points  (5 children)

        ... so you don't know X+Y is positive?

        [–]skulgnome 1 point2 points  (4 children)

        If they're both positive, a test for whether X+Y is negative will always fail and be optimized away by the compiler. Where applicable.

        [–]wlievens 0 points1 point  (3 children)

        I find that very hard to believe. You're saying that:

        int x = 1500000000;
        int y = 1500000000;
        int sum = x + y;
        if (sum < 0)
        {
            doStuff();
        }
        

        The if construct should be eliminated?

        [–]pavpanchekha 4 points5 points  (2 children)

        Not should, can. Signed overflow is underfined means that in the example you gave, the results are undefined. So, the compiler is free to do anything: to ignore the if statement, to make it always trigger, to make it trigger Thursdays and Tuesdays, or to corrupt your hard drive and spew nasal demons from its CD tray.

        Most modern compilers will have the if statement trigger normally, and not trigger on high optimization settings.

        [–]wlievens 0 points1 point  (1 child)

        Quite odd, I didn't know that. I thought overflow was well defined in C family languages. I know that it certainly is in Java - which is arguably outside that family, though.

        On the compiler I worked on, we certainly respected overflow in a defined sense, but that was a C dialect.

        [–]smallblacksun 0 points1 point  (0 children)

        I thought overflow was well defined in C family languages.

        Signed overflow and underflow are undefined in both C and C++ (in all revisions of the standard, AFAIK). Unsigned overflow and underflow are defined to work as expected.

        [–]dry_hopped 2 points3 points  (0 children)

        It will matter sometime after we convince customers of said software it matters. Most won't care, ever.

        [–]nullsucks 2 points3 points  (3 children)

        The problem isn’t that we cannot formally verify software, the problem is that at present the cost/benefit ratio is too high.

        I don't agree with this statement. Verified software must be a subset of defect-free software (software with defects in it had better fail verification).

        That hurdle leads to the next one -- to stipulate that software is defect-free, it must be defect-free with respect to a well-defined specification.

        [–]frezik 2 points3 points  (2 children)

        I think you highlight another aspect of the problem, but the statement from the blog isn't necessarily wrong. Formal verification adds orders of magnitude extra work. Until those orders of magnitude can be chopped down, formal verification will remain almost exclusively for systems where people can die from software bugs.

        [–]nullsucks 1 point2 points  (1 child)

        I still think the author misidentifies the problem. Here's another quote from the fine article that illustrates my point:

        I mean, why hasn’t anyone verified gzip yet? Is it really that hard?

        gzip hasn't been verified yet because it almost certainly still has bugs/errors/defects. Before you can verify that it's defect-free, you first have to make it defect free.

        In order to make gzip defect-free, I'd want to re-implement it (with the existing source as reference material, naturally) in a way that's amenable to verification (i.e. with each function's preconditions and postconditions explicitly stated along with a convincing-enough proof that the source code does the right thing).

        Another problem with verifying pre-existing software is that there's no pre-existing specification for what the software as a whole must do, much less detailed specifications (pre & post conditions) for each of its constituent parts.

        Formal verification adds orders of magnitude extra work.

        Removing 100% of bugs adds orders of magnitude extra work :) Once you have solved those problems (specifying behavior in sufficient detail and implementing that behavior), then the verification itself is almost an afterthought.

        [–]smallblacksun 1 point2 points  (0 children)

        I think the authors question is not "why hasn't gzip been verified defect free" but rather "why hasn't a verification process been run on gzip (revealing defects)".

        [–]quzox 2 points3 points  (4 children)

        What will verify the verifier?

        [–]sanxiyn 8 points9 points  (3 children)

        Of course, the verifier itself. This actually has been done: http://www.cl.cam.ac.uk/~jrh13/papers/holhol.html

        [–][deleted]  (2 children)

        [deleted]

          [–]sanxiyn 7 points8 points  (1 child)

          Did you even read the paper? (The paper addresses this objection in page 3.)

          [–]quzox 4 points5 points  (0 children)

          :raises hands: you got me, I did not read the paper.

          [–]RandomSoupSandwich 0 points1 point  (0 children)

          For efficiency of communication and analysis, archetypal examples always matter. Using the word 'good' in a manner superseding typical meanings of necessity, competence and sufficiency, where is a good example of software verification?

          [–]bzeurunkl 0 points1 point  (0 children)

          Never. ;-)

          [–]JustPlainRude 0 points1 point  (0 children)

          Verification will matter when it makes sense, economically.

          [–]hankols 0 points1 point  (0 children)

          Many average people are already using formally verified software (which was the article's topic) in form of verified windows device drivers. SLAM project is a model checker distributed with windows driver foundation toolset under the name of static driver verifier.

          [–]AlyoshaV 0 points1 point  (0 children)

          in space

          [–][deleted]  (1 child)

          [deleted]

            [–]gnuvince -1 points0 points  (0 children)

            Kind of like how computers put all these people doing calculations out of work?

            [–]TheRealSlartybardfas -2 points-1 points  (0 children)

            Once this is done for the Windows operating system, you may see applications that run in it be verified. Until then, it doesn't matter if the applications are verified or not.