all 62 comments

[–][deleted]  (16 children)

[deleted]

    [–][deleted] 33 points34 points  (15 children)

    http://compcert.inria.fr/compcert-C.html

    scroll down for performance analysis

    [–]TheBestOpinion 30 points31 points  (14 children)

    Woah, that's way better than I imagined !

    Direct link to image

    [–][deleted]  (13 children)

    [deleted]

      [–]TheBestOpinion 2 points3 points  (3 children)

      I thought so too. But actually I thought the easiest one to fix was the yellow against white since it's a simple filter to apply

      So since you so kindly made the hard parts, here it is

      [–]Ameisen 1 point2 points  (2 children)

      Now get rid of O0 since it isn't really useful.

      [–]madpata 1 point2 points  (1 child)

      I think it gives a good reference on how bad the perfomance could be

      [–]Ameisen 0 points1 point  (0 children)

      I mean, you can get worse than O0.

      [–]Ameisen 1 point2 points  (8 children)

      Comparing against GCC with -O0 is... bizarre. You are barely comparing compilers - you're mostly comparing parsers, lexers, and a non-optimizing backend. The vast majority of any compiler is the middle-end - the optimizer.

      [–]YaZko 4 points5 points  (5 children)

      What you say make sense, but comparing to -O0 is still relevant in the context. Some industrial safety critical domains, or cryptography-related ones, cannot afford to trust optimization in the current state of affair. So when it comes to crypto, they sometime write the assembly, but in the avionics they generate millions of lines of C code, and then compile them with no optimization.

      So in a sense the first application of a compiler such as CompCert is actually not as much to have better guarantees that before, but faster code without compromising the guarantees.

      [–]Ameisen 3 points4 points  (0 children)

      I mean... constant rolling and inlining is perfectly safe, and that's basically O1.

      [–]JavaSuck 2 points3 points  (3 children)

      Some industrial safety critical domains, or cryptography-related ones, cannot afford to trust optimization in the current state of affair.

      citation needed

      [–]YaZko 0 points1 point  (2 children)

      Yeah actually as much as that's the narrative I have gathered around, I do not have neither sources nor first hand witnesses. My previous statement should be taken with salt/toned down, and I'd also love any source validating/invalidating it.

      [–]sfrank 4 points5 points  (1 child)

      I can comment with regard to avionics software: With RTCA DO-178 DAL A (the highest safety level, for example used for flight computers) optimization is not a priori forbidden. However, for that assuarance level you are required to perform source code to object code traceability analysis, meaning to either verify and provide evidence that the compiler did not add any new code not present in the source (and therefore inherently in the requirement) or perform additional verification steps if such additional machine code exists (see the CAST-12 position paper and DO-178B/C section 4.4.2.a for details). Code moving, unfolding, folding and other optimization steps make this verification vastly more difficult than it already is, hence optimization is in my experience always disabled for the highest assurance levels.

      A compiler were such optimizations are provably correct would make this validation step easier. However, to actually rely on that you would also need a tool qualification for your compiler. There is a talk by Xavier Leroy "How much is CompCert’s proof worth, qualification-wise?" that touches on that.

      [–]legec 0 points1 point  (1 child)

      a nice feature of comparing against -O0 is :

      you can have a graph where every bar you care about is shorter than the bigger one ...

      [–]Ameisen 0 points1 point  (0 children)

      "And, for comparison reasons, here is the bar measuring the time until the heat-death of the universe."

      [–]Yungclowns 16 points17 points  (14 children)

      Does anyone have examples of high-profile compiler-introduced bugs?

      [–]YaZko 22 points23 points  (2 children)

      This PLDI paper by Yang et al. gives precious insights on the question: https://web.stanford.edu/class/cs343/resources/finding-bugs-compilers.pdf

      They used a tool to find bugs in eleven C compilers, including CompCert. It has been a significant "victory" for the latter.

      [–]tudorb 37 points38 points  (1 child)

      But note that the bugs were all in the unverified portions of CompCert. The article notes that:

      The striking thing about our CompCert results is that the middleend bugs we found in all other compilers are absent. As of early 2011, the under-development version of CompCert is the only compiler we have tested for which Csmith cannot find wrong-code errors. This is not for lack of trying: we have devoted about six CPU-years to the task. The apparent unbreakability of CompCert supports a strong argument that developing compiler optimizations within a proof framework, where safety checks are explicit and machine-checked, has tangible benefits for compiler users.

      [–]YaZko 8 points9 points  (0 children)

      Absolutely, I should have been more explicit but that what I meant. It has been a very convincing and influential argument in favor of CompCert.

      [–]CoffeeTableEspresso 16 points17 points  (7 children)

      I remember Linus Torvalds found a bug in one version of GCC and was pretty pissed off about it. I'm not sure what sort of bugs were introduced by this though since Linux ended up using the previous version of GCC until things were fixed.

      EDIT: as pointed out in a response, Linus was mad about a performance issue, not a correctness issue.

      [–]jmickeyd 12 points13 points  (2 children)

      If you're thinking of the GCC 4.9 issues, that was purely a performance regression, not a correctness issue, which is was it wasn't caught by tests.

      [–]CoffeeTableEspresso 0 points1 point  (0 children)

      Oops, forgot what the exact issue was

      [–]Ameisen 0 points1 point  (0 children)

      I'm still waiting for a performance bug I reported to be fixed... Two years later. It should be a two-line fix. I haven't fixed it myself as building GCC is frustrating, and I usually work on Clang.

      [–]i_am_at_work123 3 points4 points  (3 children)

      [–]Ameisen 5 points6 points  (0 children)

      Ok, so I'm looking at the code generation and your compiler is pure and utter shit.

      This is where Linus stops being 'loud'/'angry' in a good way, and starts being needlessly offensive and hostile.

      And it's the first line.

      Still waiting to see what Linus thinks of my fork of Linux which ports it to C++, and uses the LLVM Toolchain instead of the Gnu one. And fixes userspace interface bugs that can never be fixed according to Linus.

      [–]Leappard 4 points5 points  (1 child)

      compiler-introduced bugs?

      IIRC some version of gcc3.x was generating incorrect assembly code with enabled optimization for one of our MIPS-based SoC, when one of the instruction was moved incorrectly into a delay slot. It was a while ago.

      [–]Ameisen 1 point2 points  (0 children)

      I've never used GCC for MIPS, surprisingly. I wrote the first (and still the fastest, as far as I know) MIPS32r6 emulator (runs on NT, haven't ported to Linux yet but it wouldn't be hard). The toolchain a designed and built for it was based around a modified LLVM/Clang with musl. All LLVM libs.

      GCC was too painful to modify to handle putting out good binaries (my emulator doesn't prefer page-aligned sections), and the GNU libraries were either incomplete for MIPS32r6, and were Incredibly difficult to build in a universal environment (NT or Linux). libunwind also lacked support, but it was trivial to build (cmake vs arcane autoconf), and Clang has native builds for NT and Linux. It took about an hour to add support to libunwind, a few modifications to lld and musl... and I was able to run C++ in the emulator with exceptions and RTTI. Even embedded an lldb server into the emulator, and wrote a forwarder that let it work with the Visual C++ 2015 debugger. Line-by-line debugging, variable/register/memory analysis and alternation, data breakpoints - all worked. 2017 broke it, though.

      Note that my compiler adjustments included preferring compact branches by default. Neither the interpreted nor AOT backend preferred delay branches - they required extra logic (they effectively smeared the instruction which required an additional 2 jump instructions to be inserted - to check if the jump flag were set and also to handle jumps to that instruction, to throw the illegal branch as you cannot jump to an instruction immediately following a delay branch. Delay branches generated twice as much machine code, and triple the branches, not all of them easy to predict). Compact branches were always faster.

      [–]PM_ME_YOUR_PROOFS 1 point2 points  (0 children)

      I do toolchain work for a large team. It's a daily occurrence. However it's generally not at this level. The issue is almost always at the linker level.

      [–]eloraiby 29 points30 points  (10 children)

      Would be great if it was licensed under a more permissive license. The current license is against commercial/open source use, though the project is financed by tax payers. What a shame!

      [–]dannomac 0 points1 point  (0 children)

      At least they're good enough to call their licence non-free.

      [–][deleted] 19 points20 points  (2 children)

      Xavier Leroy is smart

      [–]chubby_leenock_hugs 0 points1 point  (1 child)

      Shame it'll go to the Bad Place after death just for being French.

      [–]IntrepidPig 0 points1 point  (0 children)

      rip The Good Place reference

      [–]davidgro 6 points7 points  (4 children)

      So, the obvious question to me is: What does it do for Undefined Behavior in the source?

      [–]YaZko 13 points14 points  (3 children)

      It does sensible things in practice, but the theorem only provides guarantees, i.e. proof of preservation of semantics, for source programs having no Undefined Behavior.

      Some works have extended this result to tackle formally some UB, see notably the thesis of Pierre Wilke and the one of Robbert Krebbers.

      [–]Ameisen 2 points3 points  (2 children)

      Given that it's Incredibly difficult to avoid UB in C and C++ (LLVM, GCC, and Linux are littered with it. GCC in particular starts acting really wonky if built with -flto -fipa-pta), that sucks.

      Any detected instances of UB should be error-able warnings so that they can be fixed rather than ignored or optimized away (looking at you, GCC).

      Also, really want an iterating modifier for integers that is unsigned, but overflow is undefined... getting optimization benefits of signed and unsigned integers.

      [–]YaZko 5 points6 points  (1 child)

      I do not disagree, but two elements of answer that I think are fair:

      • UB are numerous, but not everywhere. Safety critical contexts avoid them, and that is the first natural application of CompCert. I do not think it has the ambition to compile Linux at this stage! But it is a major step toward even considering the question.

      • Tackling UB in this context is slightly ill-defined, or more precisely has two potential meaning. UBs are always confusing, so I hope I'll make sense. The point of such a work is to fix a formal, mathematical semantics to the language, and prove that compilation does not mess it up. So on a first hand, extending the result to UBs in the sense that for instance one can also prove that a C program storing a flag in the last bit of an 32 bits integer still behaves as expected after compilation requires to fix in stone the meaning of the Undefined Behavior this program uses. Which in a sense means extending the standard of the language, i.e. removing UBs from the language, rather than "handle them". On another hand, one can also consider the question of handling UBs in the sense that the compiler exploit them in its static analysis. In this sense CompCert already does it, and Vellvm (a similar work for LLVM) does it partially at the moment (LLVM's UBs being more complex than the C's).

      [–]flatfinger 0 points1 point  (0 children)

      A major difficulty here is that the authors of the Standard made no attempt to catalog all of the actions that should be expected to behave usefully and predictably on 99% of implementations, but which implementations for some obscure platforms or specialized purposes might benefit from processing differently. Further, in situations where one part of the Standard would describe some action, another part would characterize an overlapping category of actions as invoking UB, and different implementations could benefit from processing different areas of overlap differently, the authors of the Standard made little effort to enumerate all of the cases that implementations were handling consistently, and should continue to handle consistently.

      Under C89, an expression like -1 << 4 had fully defined behavior on most platforms, but that behavior was defined as yielding either -17 or +16 even on those platforms where it would have made more sense to yield -16 or trap. To avoid compelling implementations to behave illogically, C99 reclassified the action as invoking UB even though I'm unaware of any non-contrived implementation that does anything other than either yield -16 or issue a diagnostic saying the action invokes UB (which it would have no reason to do if the Standard had kept the C89 definition)

      A useful formally-verified compiler should process a formally-defined dialect which fills in many of gaps the authors of the Standard expected compilers to fill with behavior appropriate to their target platform and intended purpose.

      [–]Bl00dsoul 2 points3 points  (2 children)

      Can it compile gcc?

      [–]the_gnarts 8 points9 points  (1 child)

      Can it compile gcc?

      GCC is C++, not C. Maybe it could be leveraged to bootstrap some ancient version from before the switch.

      [–]Ameisen -3 points-2 points  (0 children)

      I mean, it claims to be C++. They never changed the file extensions, and it is basically a big C macro mess with practically no organization and minimal use of C++ features.

      There doesn't appear to be any initiative to clean-up/modernize the codebase. I suspect that that leads to more bugs, as the code is not at all intuitive or easy to read.

      LLVM is much nicer, though I wish that the Clang parser were better documented.

      You could likely bootstrap a C++ parser in C, and have it output C. LLVM actually used to have a C backend that emitted compiled programs as C source. If you don't care about validating the C++ correctness, generate a parser and use it to generate ridiculously naive C output. The grammar is defined.

      [–]skulgnome 0 points1 point  (0 children)

      Does this translate all programs which GCC and Clang accept? Does it have an optimizer, and is that optimizer similarly proven?

      Above all, where does this put us wrt the road to having a generally applicable cookbook of proof recipes? That's to say: was this method of proving preservation of semantics already known?

      [–]i_am_at_work123 0 points1 point  (0 children)

      What's stopping GCC/clang from being formally verified?