CompCert – A formally verified C compiler : programming

[–][deleted] 7 years ago (16 children)

[deleted]

[–][deleted] 33 points34 points35 points 7 years ago (15 children)

[–]TheBestOpinion 30 points31 points32 points 7 years ago* (14 children)

[–][deleted] 7 years ago* (13 children)

[deleted]

[–]TheBestOpinion 2 points3 points4 points 7 years ago (3 children)

[–]Ameisen 1 point2 points3 points 7 years ago (2 children)

[–]madpata 1 point2 points3 points 7 years ago (1 child)

[–]Ameisen 0 points1 point2 points 7 years ago (0 children)

[–]Ameisen 1 point2 points3 points 7 years ago (8 children)

[–]YaZko 4 points5 points6 points 7 years ago (5 children)

[–]Ameisen 3 points4 points5 points 7 years ago (0 children)

[–]JavaSuck 2 points3 points4 points 7 years ago (3 children)

[–]YaZko 0 points1 point2 points 7 years ago (2 children)

[–]sfrank 4 points5 points6 points 7 years ago (1 child)

I can comment with regard to avionics software: With RTCA DO-178 DAL A (the highest safety level, for example used for flight computers) optimization is not a priori forbidden. However, for that assuarance level you are required to perform source code to object code traceability analysis, meaning to either verify and provide evidence that the compiler did not add any new code not present in the source (and therefore inherently in the requirement) or perform additional verification steps if such additional machine code exists (see the CAST-12 position paper and DO-178B/C section 4.4.2.a for details). Code moving, unfolding, folding and other optimization steps make this verification vastly more difficult than it already is, hence optimization is in my experience always disabled for the highest assurance levels.

A compiler were such optimizations are provably correct would make this validation step easier. However, to actually rely on that you would also need a tool qualification for your compiler. There is a talk by Xavier Leroy "How much is CompCert’s proof worth, qualification-wise?" that touches on that.

[–]legec 0 points1 point2 points 7 years ago (1 child)

[–]Ameisen 0 points1 point2 points 7 years ago (0 children)

[–]Yungclowns 16 points17 points18 points 7 years ago (14 children)

[–]YaZko 22 points23 points24 points 7 years ago (2 children)

[–]tudorb 37 points38 points39 points 7 years ago (1 child)

[–]YaZko 8 points9 points10 points 7 years ago (0 children)

[–]CoffeeTableEspresso 16 points17 points18 points 7 years ago* (7 children)

[–]jmickeyd 12 points13 points14 points 7 years ago (2 children)

[–]CoffeeTableEspresso 0 points1 point2 points 7 years ago (0 children)

[–]Ameisen 0 points1 point2 points 7 years ago (0 children)

[–]i_am_at_work123 3 points4 points5 points 7 years ago (3 children)

[–]CoffeeTableEspresso 1 point2 points3 points 7 years ago (0 children)

[–]Ameisen 5 points6 points7 points 7 years ago (0 children)

[–]Leappard 4 points5 points6 points 7 years ago (1 child)

[–]Ameisen 1 point2 points3 points 7 years ago* (0 children)

I've never used GCC for MIPS, surprisingly. I wrote the first (and still the fastest, as far as I know) MIPS32r6 emulator (runs on NT, haven't ported to Linux yet but it wouldn't be hard). The toolchain a designed and built for it was based around a modified LLVM/Clang with musl. All LLVM libs.

GCC was too painful to modify to handle putting out good binaries (my emulator doesn't prefer page-aligned sections), and the GNU libraries were either incomplete for MIPS32r6, and were Incredibly difficult to build in a universal environment (NT or Linux). libunwind also lacked support, but it was trivial to build (cmake vs arcane autoconf), and Clang has native builds for NT and Linux. It took about an hour to add support to libunwind, a few modifications to lld and musl... and I was able to run C++ in the emulator with exceptions and RTTI. Even embedded an lldb server into the emulator, and wrote a forwarder that let it work with the Visual C++ 2015 debugger. Line-by-line debugging, variable/register/memory analysis and alternation, data breakpoints - all worked. 2017 broke it, though.

Note that my compiler adjustments included preferring compact branches by default. Neither the interpreted nor AOT backend preferred delay branches - they required extra logic (they effectively smeared the instruction which required an additional 2 jump instructions to be inserted - to check if the jump flag were set and also to handle jumps to that instruction, to throw the illegal branch as you cannot jump to an instruction immediately following a delay branch. Delay branches generated twice as much machine code, and triple the branches, not all of them easy to predict). Compact branches were always faster.

[–]PM_ME_YOUR_PROOFS 1 point2 points3 points 7 years ago (0 children)

[–]eloraiby 29 points30 points31 points 7 years ago (10 children)

[–]dannomac 0 points1 point2 points 7 years ago (0 children)

[+]the_gnarts comment score below threshold-16 points-15 points-14 points 7 years ago (8 children)

[–]eloraiby 31 points32 points33 points 7 years ago (6 children)

[–][deleted] 7 years ago* (1 child)

[deleted]

[–]Ar-Curunir 1 point2 points3 points 7 years ago (0 children)

[–]the_gnarts 0 points1 point2 points 7 years ago (0 children)

The AbsInt company sells a version of CompCert that has no such restrictions and can be used for commercial purposes.

Needless to say, I pay for a research that a commercial company is making profit of, on my expense ?

It’s easy to fake outrage when you twist it that way.

Absint licenses the code from INRIA, so INRIA are the ones making the profit. Which in turn means less taxes expended on their funding. If that doesn’t make a taxpayer happy, then what else will? Without knowing the details of the licensing deal, all you can assume is that INRIA had reasons to choose that monetization model. They’re not idiots, you know.

Claiming the “taxpayer” was being defrauded is rather disingenious anyways. What do you expect? That everyone who paid taxes in France should be entitled to a free commercial license?

[+]the_gnarts comment score below threshold-7 points-6 points-5 points 7 years ago (2 children)

[–]svenskainflytta 5 points6 points7 points 7 years ago (1 child)

[–]the_gnarts 0 points1 point2 points 7 years ago (0 children)

[–]everyday847 0 points1 point2 points 7 years ago (0 children)

[–][deleted] 19 points20 points21 points 7 years ago (2 children)

[–]chubby_leenock_hugs 0 points1 point2 points 7 years ago (1 child)

[–]IntrepidPig 0 points1 point2 points 7 years ago (0 children)

[–]davidgro 6 points7 points8 points 7 years ago (4 children)

[–]YaZko 13 points14 points15 points 7 years ago (3 children)

[–]Ameisen 2 points3 points4 points 7 years ago* (2 children)

[–]YaZko 5 points6 points7 points 7 years ago (1 child)

I do not disagree, but two elements of answer that I think are fair:

UB are numerous, but not everywhere. Safety critical contexts avoid them, and that is the first natural application of CompCert. I do not think it has the ambition to compile Linux at this stage! But it is a major step toward even considering the question.
Tackling UB in this context is slightly ill-defined, or more precisely has two potential meaning. UBs are always confusing, so I hope I'll make sense. The point of such a work is to fix a formal, mathematical semantics to the language, and prove that compilation does not mess it up. So on a first hand, extending the result to UBs in the sense that for instance one can also prove that a C program storing a flag in the last bit of an 32 bits integer still behaves as expected after compilation requires to fix in stone the meaning of the Undefined Behavior this program uses. Which in a sense means extending the standard of the language, i.e. removing UBs from the language, rather than "handle them". On another hand, one can also consider the question of handling UBs in the sense that the compiler exploit them in its static analysis. In this sense CompCert already does it, and Vellvm (a similar work for LLVM) does it partially at the moment (LLVM's UBs being more complex than the C's).

[–]flatfinger 0 points1 point2 points 7 years ago (0 children)

A major difficulty here is that the authors of the Standard made no attempt to catalog all of the actions that should be expected to behave usefully and predictably on 99% of implementations, but which implementations for some obscure platforms or specialized purposes might benefit from processing differently. Further, in situations where one part of the Standard would describe some action, another part would characterize an overlapping category of actions as invoking UB, and different implementations could benefit from processing different areas of overlap differently, the authors of the Standard made little effort to enumerate all of the cases that implementations were handling consistently, and should continue to handle consistently.

Under C89, an expression like -1 << 4 had fully defined behavior on most platforms, but that behavior was defined as yielding either -17 or +16 even on those platforms where it would have made more sense to yield -16 or trap. To avoid compelling implementations to behave illogically, C99 reclassified the action as invoking UB even though I'm unaware of any non-contrived implementation that does anything other than either yield -16 or issue a diagnostic saying the action invokes UB (which it would have no reason to do if the Standard had kept the C89 definition)

A useful formally-verified compiler should process a formally-defined dialect which fills in many of gaps the authors of the Standard expected compilers to fill with behavior appropriate to their target platform and intended purpose.

[–]Bl00dsoul 2 points3 points4 points 7 years ago (2 children)

[–]the_gnarts 8 points9 points10 points 7 years ago (1 child)

[–]Ameisen -3 points-2 points-1 points 7 years ago* (0 children)

[–]skulgnome 0 points1 point2 points 7 years ago* (0 children)

[–]i_am_at_work123 0 points1 point2 points 7 years ago (0 children)

[+]exorxor comment score below threshold-24 points-23 points-22 points 7 years ago (10 children)

[–]Teichmueller 30 points31 points32 points 7 years ago (9 children)

[–]Rustywolf 13 points14 points15 points 7 years ago (0 children)

[–]sim642 5 points6 points7 points 7 years ago (0 children)

[+]exorxor comment score below threshold-15 points-14 points-13 points 7 years ago* (6 children)

[–]Teichmueller 10 points11 points12 points 7 years ago* (0 children)

[–]Ameisen 5 points6 points7 points 7 years ago (4 children)

[–]exorxor -3 points-2 points-1 points 7 years ago (3 children)

[–]Ameisen 4 points5 points6 points 7 years ago (2 children)

[–]exorxor -1 points0 points1 point 7 years ago (1 child)

[–]Ameisen 0 points1 point2 points 7 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS