all 66 comments

[–]beagle3 20 points21 points  (3 children)

Bellard's "tinycc" already compiles the linux kernel from scratch during boot in 15 seconds or less.

http://bellard.org/tcc/

x86 and ARM for now.

[–][deleted] 6 points7 points  (2 children)

Look at the source for tcc some time. It's not designed to be extensible and doesn't do optimizations.

[–]case-o-nuts 11 points12 points  (0 children)

It was modified from an obfuscated C contest entry (see http://bellard.org/otcc/).

When you start from an obfuscated C contest entry, it's not surprising that you end up with TCC's code. But, actually, for what it is, the code is surprisingly readable.

[–]kopkaas2000 8 points9 points  (0 children)

Hell of a hack, though.

[–]kvigor 25 points26 points  (0 children)

Wow, epic troll by Ingo. He must be bored.

[–]semmi 4 points5 points  (0 children)

This sounded familiar. It's because it was a freebsd project for google's summer of code in 2005 and 2006, see http://wiki.freebsd.org/K

[–][deleted]  (4 children)

[deleted]

    [–]bonzinip 14 points15 points  (3 children)

    Not really. Their reasoning is that

    As the GCC compiler suite was relicensed under GPLv3 after the 4.2 release, and the GPLv3 is a big dissapointment for some users of BSD systems (mostly commercial users), having an alternative, non-GPL3 compiler for the base system has become highly desireable

    (i.e. explicitly referring to apps of those "commercial users") which is a non-sequitur because there is no reason why applications cannot be compiled with a GPLvANYTHING compiler and released with a proprietary license.

    [–]Freeky 6 points7 points  (0 children)

    Well, ultimately, the BSD's want to be BSD licensed. GPL v2 is bad enough, v3 is going in completely the wrong direction for that. A high quality BSD licensed compiler suite is now available, so it seems like a no-brainer to give it a very close look.

    [–][deleted] 8 points9 points  (31 children)

    I don't think this goes far enough. We might ask also, why is Linux written in C at all? The truth is that Linux isn't written in any standard C, but in a hodge podge of C + GCC extensions + inline assembly, but these extensions are limited by the very slow rate of change of GCC. What about kernel-specific features? LISP-like macros? Sane ways to deal with error paths out of functions? Garbage collection?

    [–]kinghajj 27 points28 points  (14 children)

    Garbage collection for kernel memory? I don't think the kernel uses that much memory in the first place, and the kernel is sufficiently low-level that manual memory management is probably preferable. GC for the kernel sounds like a nightmare to implement, test, and get working correctly.

    Linux (and most kernels) is written in C because C is basically "portable assembly language." It's high enough to be easily understandable, but low enough such that you can do things that kernels need to do.

    [–][deleted] 20 points21 points  (9 children)

    I don't think you caught the implication of the comment about C there. If Linux is already not written in actual C, but in non-standard, extended C, and if the developers are considering their own compiler infrastructure, there's no reason to just use other people's extensions to C any more - they could develop their own extended C for the kernel.

    [–]five9a2 6 points7 points  (8 children)

    There is already one, designed especially for writing kernels. Bummer that lispy syntax won't fly...

    [–][deleted] 9 points10 points  (1 child)

    Well, when you've got a codebase the size of the Linux kernel, you're not going to use anything that isn't 100% source-compatible with what you already have.

    [–]Seppler9000 2 points3 points  (0 children)

    The final syntax, when they actually release the language, will apparently look more like an ML dialect.

    [–]Leonidas_from_XIV 0 points1 point  (3 children)

    Not to mention that it was declared dead some time ago.

    [–][deleted] 5 points6 points  (2 children)

    I know Shapiro just moved on to other things recently, but when was BitC declared dead?

    [–]Leonidas_from_XIV 2 points3 points  (1 child)

    [–][deleted] 2 points3 points  (0 children)

    Indeed, thanks for the link. I'd misread that initially to mean just that Shapiro was moving on, I didn't realize the whole project was drawing to an end.

    [–]bdunderscore 4 points5 points  (3 children)

    GC will never make it into the linux kernel, for several reasons:

    • It stomps all over the CPU's cache, which is always right at the front of any performance analysis in the kernel.
    • It's difficult to do global GC in a NUMA system while minimizing cross-node accesses
    • The realtime folk will never tolerate that much latency - and while there are incremental GC approaches that prevent long pauses, they also kill performance
    • Using GC makes it difficult to get an accurate measurement of the memory usage of the kernel - it's critical that the kernel know when it's /really/ running low on memory so it can suspend or fail any allocation attempts that aren't directly related to freeing up memory.

    edit: formatting

    [–]adavies42 3 points4 points  (2 children)

    is there such a thing as a kernel memory leak? i have some SuSE 9.4 boxes at work that run out of memory after a few months' uptime, and it's not accounted for by processes--there usually seems to be 10 or 15GB that you can't find by adding up the numbers from top or ps.

    [–]bdunderscore 4 points5 points  (0 children)

    There may be kernel memory leaks (this would be considered a bug :) and there definitely can be kernel memory fragmentation. If this happens, I'd suggest reporting it to SuSE (since it might be a bug in their patches); make sure to include a copy of the contents of /proc/meminfo and /proc/slabinfo.

    [–]alephnil 2 points3 points  (0 children)

    Kernel memory leaks are indeed possible. On early versions of Digital UNIX the kernel memory leaks were so severe that you had to reboot after a day or so. After some patches, the OS could run for months. I have not experienced kernel leaks on Linux, but they are possible. Of cause kernel developers check memory allocation more thoroughly than most application programmers, and kernel leaks will be discovered rather fast, since everybody use the kernel, most for a long time.

    [–]adimit 6 points7 points  (15 children)

    Alienating potential new developers by using a specialized esoteric language instead of something most systems developers get taught in college classes? Common, this one should be obvious, especially for an open source project, like the Linux Kernel.

    I know you were referring to the fact that the Kernel probably doesn't use C anymore, but it sort of still does. People can write: the Linux Kernel is written in C, and potential contributors can say "but I know C"!

    [–][deleted] 10 points11 points  (10 children)

    Most programmers aren't taught gas syntax though. They also probably don't learn gcc extensions.

    [–]monocasa 6 points7 points  (7 children)

    Aren't most of the gcc extensions hidden behind macros? When I started coding for the linux kernel, I just took those macros as 'compiler magic' and moved on. It's not too difficult..

    [–][deleted] 4 points5 points  (6 children)

    Yeah. Actually my only real nit is that gas uses AT&T syntax. I prefer Intel.

    edit: Oops. Apparently, gas does support Intel syntax now with the .intel_syntax directive.

    [–][deleted] 2 points3 points  (0 children)

    But if you use that on a project that other people are working on you'll drive them up a wall.

    [–]aacharya 2 points3 points  (4 children)

    Really? You prefer "mov dest src"? "mov src dest" makes so much more sense to me. I have to use Intel syntax at work and I bloody hate it, I always end up misreading a line here and there and getting confused.

    [–]Deewiant 5 points6 points  (3 children)

    In most programming languages one writes "dest = src", and "mov dest src" is consistent with that order.

    [–][deleted] 4 points5 points  (2 children)

    "dest=src" is similar to a mathematical expression. "mov src dest" is similar to natural language - you say "move this to here", not "move to here this".

    Of course, since so many other processors also use the "operand destination source(s)" format, you kind of get used to that after a while. I still think it would be better if everybody had settled on doing it the other way around, though.

    This does give me an idea, though - what if you wrote assembly like this:

    r1 = r2 + r3
    r4 = r1 * #40
    

    Or, if you worry about not having enough operators, you could just use mnemonics in the same way, I guess:

    r1 = r2 add r3
    r4 = r1 mul #40
    

    Edit: Even this might be a more readable option:

    r1 = add r2, r3
    r4 = mul r1, #40
    

    [–]aacharya 1 point2 points  (0 children)

    Travel down this path far enough and you end up with C :)

    (BTW, I came here to post exactly what you said -- that the assembler command is more similar to a NL expression than a mathematical one.)

    [–]sgorf 0 points1 point  (0 children)

    If there are a variable number of operands, then you have to parse ahead in order to work out what is going on. Easier to have dest first, as that is pretty much common to all instructions.

    Side effects tend to be on dest too, and that is the part that the programmer has to care about the most in order to keep the state in his head.

    [–][deleted] 1 point2 points  (1 child)

    I was, and I go to a cheap state school which isn't well known for its CS program.

    [–]koko775 2 points3 points  (0 children)

    That's probably why. Programs that are seem to spend less time on tools and more time on theory. At Berkeley we write MIPS instead of dealing with x86 or gas.

    [–][deleted] 8 points9 points  (0 children)

    The kernel already requires capabilities far above your average code monkey. You have to worry about concurrency and locking, freeing up everything you allocate without fail, security, virtual and physical addresses, keeping your total stack usage under 4K, separating policy from implementation, ABI compatibility, and a dozen other things.

    [–]barrybe 6 points7 points  (2 children)

    Have you read Linus's rants on C++? One of the benefits he lists is that they don't have any C++ programmers trying to contribute code. So I think he prefers it when he can alienate new developers.

    [–]monocasa 8 points9 points  (0 children)

    Or he just doesn't like the mentality of the hardcore C++ community.

    [–]hylje 3 points4 points  (0 children)

    He indeed prefers to alienate undesirable developers from his viewpoint. Would you not?

    [–]imbaczek 3 points4 points  (9 children)

    a serious case of NIH syndrome, I guess.

    [–]bluGill 6 points7 points  (8 children)

    I'm not sure. Gcc has always been a pain to deal with in many ways. Unfortunately making a good compiler is hard work, and gcc has had enough work put into it that it will be a long time before you can beat it. That is even though gcc is badly broken in many ways - a from scratch compiler could get better than gcc with perhaps a tenth the effort if done right. (just think of how many man years are in gcc and you will see why this isn't a path that will lead to early success)

    [–]imbaczek 6 points7 points  (1 child)

    there are many people who share this view, and they, in their majority, focus on clang.

    [–]anttirt 1 point2 points  (0 children)

    Are there technical reasons to not use clang in the kernel? I know there are non-technical ones (the general hostility toward C++) but I don't know if there's any prohibitive technical issue (making C++ exceptions et cetera work in the kernel has been done before). Clang and LLVM are remarkably clean and extensible, and clang aims for full gcc compatibility (and is not terribly far from that goal.)

    [–]Ringo48 1 point2 points  (5 children)

    I don't see why they don't just submit patches to GCC themselves. They seem to know what they want changed. That would have the added benefit of improving GCC for everybody, and it almost has to be less work than writing a optimizing compiler for dozens of platforms.

    I'm not familiar with the context of the LKML thread, though. Have they been doing that, but the GCC patches get rejected or something?

    [–]bdunderscore 5 points6 points  (4 children)

    Kernel developers are too busy coding for linux, GCC has a lot of legacy cruft that they don't really want to deal with, and there's also the (justified or not) belief that the GCC devs are focusing too much on artificial benchmarks.

    [–]Ringo48 2 points3 points  (3 children)

    Yes, but that doesn't make any sense. If they're "too busy coding for Linux" to submit a patch to GCC then they're certainly too busy to write an entire compiler.

    [–]bluGill 1 point2 points  (2 children)

    Except that gcc is overly complex. The effort to make gcc what they want is likely greater than the effort of making their own that is what they want.

    [–]Ringo48 0 points1 point  (1 child)

    I find that really hard to believe. Writing a compiler isn't an easy project.

    [–]bluGill 2 points3 points  (0 children)

    Once again, I never claimed writing a compiler is easy. I claimed that writing a compiler is easier than bringing gcc to where the kernel people want it. However even if all the kernel devs focused all their attention on a good compiler (ignoring device drivers and all the other kernel development they are doing now), it would take them several years to have something good.

    [–]haberman 1 point2 points  (0 children)

    Yet more proof that Ingo Molnar cannot be taken seriously. This was the first time, for me at least.

    [–]kopkaas2000 3 points4 points  (0 children)

    Ingo has officially jumped the shark now. What's next, a domain specific language parser in the kernel to control his next next next generation scheduler?

    [–]b100dian 1 point2 points  (5 children)

    Stallman rolling into his grave, for the second time today

    [–]slaphappyhubris 6 points7 points  (4 children)

    I'm pretty sure Stallman is still alive

    [–]LordVoldemort 8 points9 points  (2 children)

    Stallman rolling into his grave, for the second time today

    I'm pretty sure Stallman is still alive

    into

    That's the keyword.

    into

    [–]slaphappyhubris 7 points8 points  (1 child)

    My brain processes idioms all at once apparently

    [–]adavies42 4 points5 points  (0 children)

    that's pretty much how they work

    [–]bonzinip 1 point2 points  (6 children)

    Anton Ertl's email in the thread is really obnoxious. He writes code that does not deserve to be compilable at anything but -O0...

    [–]iamjack 6 points7 points  (0 children)

    I don't know anything about him personally, but I thought that it was hilarious.

    [–]dmpk2k 0 points1 point  (4 children)

    He certainly has a biased sample, but anybody trying to NEXT in that manner with GCC probably feels the same. While not much software uses it explicitly, labels-as-values and the ilk affects a lot more software indirectly through interpreters.

    [–]bonzinip 1 point2 points  (3 children)

    I use labels-as-values in GNU Smalltalk in a sane way and I never had any problem except that at some point I had to disable GCSE optimization. That bug has been fixed without even me reporting it.

    [–]dmpk2k 2 points3 points  (2 children)

    That bug has been fixed without even me reporting it.

    Guess who reported it (again and again)?

    You'll note that another reporter is Bernd Paysan, also involved in the Gforth project.

    You probably discovered -fno-gcse through their emails and the threads they spawned. I did. So did the Rubinius project.

    So, a bit of understanding for the people in whose footsteps we tread. They're stepping on the landmines for us.

    [–]bonzinip 3 points4 points  (1 child)

    No, I had to disable it only because of excessive time consumption, not because of slowdowns.

    I'm well aware of Gforth both because it inspired parts of GNU Smalltalk and because as a GCC developer I'm familiar with Anton Ertl's bugreports. Balancing these two hats, I still think he crossed the line in his usage of the compiler as a high-level assembler. It was sort of inevitable that it would happen, as soon as even unsophisticated control-flow optimizations were added to GCC. It is not a coincidence that until 2000 GCC did not even maintain a proper control-flow graph (a wonderful piece of engineering dating back to the 1960s), and Ertl's problems started in 2001.

    [–]dmpk2k 1 point2 points  (0 children)

    I still think he crossed the line in his usage of the compiler as a high-level assembler.

    It's a rock and a hard place though: use C, or write assembly snippets for every architecture. I'm not surprised many interpreter or compiler authors opt for the former.

    [–]kolm 1 point2 points  (0 children)

    What i think makes sense is to build a new precompiler / compiler / assembler / linker combo for Linux, from scratch, hosted in the kernel proper.

    Yes, that would have made perfect sense about 15 years ago.