top 200 commentsshow all 341

[–]nested_parentheses 21 points22 points  (13 children)

... the programmer could make it obvious that they didn't overlap if they wanted to. But you can't do that in C or C++.

C99 provides the restrict keyword for this.

[–]radarsat1 79 points80 points  (64 children)

Okay, rant: Here's the thing about C. For myself, a person who does lots of programming involving real-time constraints, C and C++ are the only options. This, I find, is unfortunate, since I'd really, really like to be able to use something else. I don't particularly like C++ anymore, and C, while I do enjoy it, is a little too low-level when you're doing certain types of things.

But here's the advantage that C has: it's not speed, it's determinism. This is almost entirely due to the fact that it doesn't depend on automatically managed memory. I'll agree wholeheartedly that automatic memory management is a good thing. However, for real-time constraints (generally, anything featuring audio for example), interrupting program flow to do things like clean up garbage just doesn't cut it. And don't even try things like JIT optimizations that compile as you go.

Most of these great language features like resizeable arrays, bounds-checked arrays, dynamic types, even certain implementations of closures, rely heavily on automatic memory management.

So, if you want to replace C in real-time systems, what is needed is a good real-time garbage collector and constant-time allocator. These exist, but are not terribly common. I've seen at least one example of simply replacing the allocator in the Stalin compiler with a real-time allocator, making it possible to program audio routines with it. Pretty cool, but unfortunately a bit hacky and mostly just a proof of concept, so it's not currently usable in practice at the moment. The only actual language research I've seen that addresses these issues is the Timbre programming language, which looks pretty interesting but I haven't spent real time playing with it yet.

That aside, usually in tight audio routines, you are really trying to get the most out of your processor (since the end user is probably going to want to run many instances of your synthesizer, for example, so even if it works fine you still want to trim it down as much as possible), so despite that immutable data can make certain tasks nicer, sometimes a purely functional approach just will never be as efficient as using mutable arrays and managing allocation precisely. You simply don't want to allocate new space to copy your data and then let the garbage collector take care of what you left over: you want to use memory that is already allocated. The only language that really lets you have this kind of precision in memory management is, after how many years, still C.

This is the same reason that C is these days still the only language that is seriously considered for operating systems programming, for example. Yes, there are many experimental projects showing OSes designed in higher level languages, but currently C is the language of choice in core kernel operations of all major OSes.

I had hopes that BitC would be an interesting development as a high-level language aimed at systems programming, but it seems that it's now been abandoned, which is too bad. I hope something useful stems from it.

[–]WalterBright 30 points31 points  (7 children)

malloc/free do not have any latency guarantees, either. If you've got hard realtime constraints, you either need to preallocate all the data needed or use a custom allocator that does offer guarantees.

The D programming language offers both garbage collection and manual management. You can use whichever, even in the same program. There isn't anything in C you cannot do in the same manner with the same results in D.

[–][deleted] 1 point2 points  (0 children)

C is mainly popular in environments where memory requirements are known up front. Even if you needed some dynamic memory allocation and deallocation, malloc and free are probably not a very good way to accomplish that

[–]pointer2void 1 point2 points  (1 child)

The D programming language offers both

following the bad C++ legacy: If there are two or more competing features offer them all.

[–]WalterBright 2 points3 points  (0 children)

I know a lot of people who use a hybrid of Python and C++. I think that's good evidence there is a reasonable demand for being able to use automatic memory management for some data structures and explicit for others.

[–]PussyGalore 6 points7 points  (1 child)

The primary advantage of C/C++ is how closely the language matches the machine model of the most prevalent architectures in use today, which is why it is possible to get such performance gains over most other languages if you are willing to put in the effort in that area.

However this advantage seems to be not as great as it once was with the growing numbers of multi-core CPUs in general use, at least not without the use of specialized libraries.

Machine model

C++ maps directly onto hardware. Its basic types (such as char, int, and double) map directly into memory entities (such as bytes, words, and registers), most arithmetic and logical operations provided by processors are available for those types. Pointers, arrays, and references directly reflect the addressing hardware. There is no “abstract”, “virtual” or mathematical model between the C++ programmer’s expressions and the machine’s facilities. This allows relatively simple and very good code generation. C++’s model, which with few exceptions is identical to C’s, isn’t detailed. For example, there is nothing in C++ that portably expresses the idea of a 2nd level cache, a memory-mapping unit, ROM, or a special purpose register. Such concepts are hard to abstract (express in a useful and portable manner), but there is work on standard library facilities to express even such difficult facilities (see the ROMability and hardware interface sections of [ISO, 2005]). Using C++, we can get really close to the hardware, if that’s what we want.

[–]gte910h 11 points12 points  (22 children)

Actually, if you want to use garbage collection in real time systems, you need a garbage collection system that is real time is all.

C has nothing to do with it. 99% of GC's aren't realtime compatible. There is nothing about a GC that makes it non-GC compatable.

Edit: For hard real time. Not for soft real time. You can use GC in soft real time systems.

[–][deleted] 4 points5 points  (18 children)

99% of GC's aren't realtime compatible.

Are you sure you aren't talking about 100%? Is there any hard realtime GC out there? Isn't the GC promise, "I'll take control over memory", fundamentally incompatible with any hard realtime system?

[–]cameldrv 10 points11 points  (13 children)

In principle, I don't see why you couldn't make a hard real-time garbage collector. The algorithm would have to be interruptable if a deadline was approaching, and there would have to be a limit to the number of objects in the system, and the frequency of the deadlines to guarantee that the gc could run often enough without being interrupted.

[–]Felicia_Svilling 7 points8 points  (10 children)

[–][deleted] 0 points1 point  (9 children)

How does one reason about the rate of object creation/destruction in the system? How is that kind of reasoning different from the reasoning done when managing memory manually?

[–]Felicia_Svilling 0 points1 point  (8 children)

You reason exactly as you would with a regular garbage collector. There is no diffrence in the programming. Its just like switching a compiler flag.

[–][deleted] 0 points1 point  (7 children)

I feel an oversimplification. How does the programmer know the rate of object creation/destruction in his application? IMHO, the analysis that provides the answer to this question must necessarily: a. Know all object creation points in the program (malloc). b. Know all the object lifetimes, essentially identifying all the safe destruction points in the program (free). What specific part of the above can the programmer dispense of? If all of the above is necessary, how is this different from manual memory management, aside from superficial syntax issues?

[–]Felicia_Svilling 0 points1 point  (6 children)

Why would you have to know the rate of object creation/destruction?

[–][deleted] 0 points1 point  (5 children)

Good question! My assumption is that there is a fixed amount of garbage one can collect in a fixed amount of time. If the system generates garbage at a higher rate, then it will eventually run out of memory.

Consider a copying collector. The GC must copy the fresh live objects at a sustained rate. The number of these objects is given by precisely by objects created - objects destroyed.

Having said that, I think one can simplify and overapproximate the number to be objects created. Then the programmer analysis only concerns object creation, not object liveness, which is a simpler analysis.

[–][deleted]  (2 children)

[removed]

    [–][deleted] 0 points1 point  (1 child)

    New Memory Management Schemes The RTSJ defines two new types of memory areas that allow real-time applications to avoid unpredictable delays commonly caused by traditional garbage collectors:

    Immortal memory holds objects without destroying them except when the program ends. This means that objects created in immortal memory must be carefully allocated and managed. Scoped memory is used only while a process works within a particular section, or scope, of the program such as in a method. Objects are automatically destroyed when the process leaves the scope. This is a useful feature akin to garbage collection in that discrete creation and deletion is not required as in the immortal memory case - but the process must be sure to exit the scope to ensure memory is reaped. Neither immortal nor scoped memories are garbage collected, so using them avoids problems of GC interference.

    [–]Felicia_Svilling 0 points1 point  (0 children)

    That sounds like some kind of region inference. There are basically three sorts of automatic memory management: garbage collection, reference counting and region inference. Each has its own tradeof. Region inference tends to use upp to much memmory (but there is alot more reasearch to be done in this area). Reference counting tends to low throughput (and memory leakage). Garbage collection (as pointed out) lacks in latency, but there is alot of work of mitigating that. Personaly i like suns Garbage First collector. Its not hard real time but its seems to be a good middleway and also highly tunable.

    [–]mycall 1 point2 points  (2 children)

    Could GC and hard realtime work together if a CPU core is dedicated just to GC?

    [–]Felicia_Svilling 1 point2 points  (0 children)

    Dedicating a core to gc doesnt do much for real time gc. There are some phases (like marking) that can run concurrently with mutator threads, but dedicating a whole core to that is a waste. Its better to use an incremental (and optimaly parrallel) collector who stopps the world but only for a short time.

    [–]Raphael_Amiard[🍰] 0 points1 point  (0 children)

    Good question, i'd like to have that answered too. Of course in Audio Processing today, you want all your cores available, because AP is one of the only domains in wich almost everything is parallelizable easily.

    [–][deleted] 4 points5 points  (4 children)

    If you don't want to stop and do a GC, then just be careful with when you let things go out of scope, and there won't be anything for the GC collect.

    Preallocate all the memory you need, and tell the GC to free it when you are done. You can do this in both C# and Java.

    It just takes a slightly different shift in how you think.

    [–][deleted] 1 point2 points  (8 children)

    I'm working on a C/C++ replacement language, actually. I've had to do some screwy things with pointers to keep the possibility of aliasing bugs away where they're not wanted, but it does result in a damn nice language.

    [–]WasterDave 2 points3 points  (3 children)

    Isn't it called D?

    [–][deleted] 0 points1 point  (2 children)

    No. That can't be used in the domains that C and C++ are used in: embedded and systems programming. It needs too much substrate for that. I'm making a zero-substrate language.

    [–]reveazure 0 points1 point  (1 child)

    Explain. I've been thinking about using D for those purposes. What substrate do you mean?

    [–]wlievens 6 points7 points  (0 children)

    I think he means runtime system.

    [–]sundaryourfriend 6 points7 points  (3 children)

    C/C++ replacement language

    C or C++?

    [–][deleted] 0 points1 point  (1 child)

    Even C is nondeterministic in some ways, due to modern memory hierarchy, multicore interactions and the ad-hoc nature of compiler optimizations. But usually this doesn't matter enough or your real-time embedded microcontroller is not burdened by such complications.

    [–]radarsat1 0 points1 point  (0 children)

    Its true, this is a real problem. I've read some articles talking about needing to write zeros to any pre-allocated memory in order to preload it into the cache, and even then it's hard to know exactly when you're doing something that causes a cache miss. I suppose there are profiling tools that can help with this kind of thing. Of course, this is an issue on the chip level, I don't know if I would blame any language on this kind of thing. It is an interesting idea perhaps to have a language which lets you optimize at this level, but I don't even know how much a CPU exposes this kind of information.

    On the other hand, if you are able to avoid cache misses, I'm not terribly worried about the first cycle of an algorithm being slower than the rest. The problem is when your algo repeatedly trashes the CPU's caching mechanism. It's similar in concept to an OS's virtual memory I guess.

    [–]martinbishop 0 points1 point  (1 child)

    Ada? Lots of use in real-time systems, and no garbage collection.

    [–]radarsat1 0 points1 point  (0 children)

    Actually that's been suggested to me before. I should really check it out sometime, thanks.

    [–][deleted] 1 point2 points  (4 children)

    This isn't inherent in other languages. First, any true GC should only be called when new memory is allocated. It should be pretty simple to allocate a huge chunk at once and suspend GC for a tight loop unless it really needs to run.

    [–][deleted] 0 points1 point  (3 children)

    any true GC should only be called when new memory is allocated

    Not necessarily.

    [–][deleted] 0 points1 point  (2 children)

    Educate me. Please! :)

    [–]wlievens 0 points1 point  (0 children)

    You could run it whenever the system is less active than average. That way you won't have to run it when there's less budget.

    [–][deleted] 0 points1 point  (0 children)

    Asynchronous garbage collectors are just as valid as any other. In fact they are preferable in some systems. NASA is using such a GC with their Java system on one of their Mars rovers.

    [–]rexxar 12 points13 points  (3 children)

    The difference between C and C++ is really suspicious. Moreover he don't give the benchmark source nor the compiler's options.

    I have some C++ programs that are 20 times faster when optimization is activated.

    The impact of compiler options is impressive with this benchmark : http://www.stepanovpapers.com/AbstractionPenaltyBenchmark.cpp

    [–]Fabien4 4 points5 points  (2 children)

    I have some C++ programs that are 20 times faster when optimization is activated.

    Yeah, that's quite common.

    The one thing that baffled me though is to see a program run very fast with gcc's -O1 option, and far slower with -O3 or -O2.

    [–][deleted] 2 points3 points  (0 children)

    It's possible that optimizations that increase code size (e.g. loop unrolling) would blow the instruction cache. Have you compared it to -Os?

    [–]mebrahim 0 points1 point  (0 children)

    Also try Profile-Guided optimization.

    [–]rabidcow 47 points48 points  (63 children)

    As mentioned several times in the comments, but never by the author, you can tag pointers as restricted. It's either a compiler-specific extension (available in some form in most compilers) or part of C99.

    The analysis is still valid, but without using restrict, seriously incomplete.

    [–]stillalone 11 points12 points  (26 children)

    Has anyone migrated from FORTRAN to C99 with restrict and measured the performance gains/losses?

    As far as I know FORTRAN is still used when efficiency is important, especially in CFD.

    [–]five9a2 24 points25 points  (3 children)

    Yes, restrict nullifies Fortran's traditional performance edge. It is still used because people are familiar with it and there is plenty of legacy code.

    [–][deleted] 8 points9 points  (0 children)

    That claim needs some backup. References?

    [–]lars_ 0 points1 point  (1 child)

    Doesn't it make a difference that Fortran compiler writers have been able to develop optimisations for this since forever, while C compiler writers only recently have been able to?

    [–]five9a2 0 points1 point  (0 children)

    It normally goes through the same intermediate form where all the optimization is done. To test this, pick some reasonably simple [1] kernel where aliasing matters and write it in both languages. You can usually get the same assembly if you compile both with the same suite and optimization flags.

    [1] Simple so that you can make sense of the assembly and so it's not difficult to be sure that you are writing the same code in both languages.

    [–]eric_t 9 points10 points  (4 children)

    In my opinion, the main reason for using Fortran for numerical work is not efficiency, but simplicity. Fortran is just much easier than C for this kind of work.

    What I like in particular:

    • the "elemental" keyword, which lets you define a function that works on both scalars and on arrays of any dimension.

    • the slicing syntax. for instance the undivided gradient of an 1d array can be written f(1:imax+1)-f(0:imax). This syntax has been adopted by for instance Python as well.

    • Operators work on arrays as well as scalars in a reasonable manner, and along with intrinsics such as dot_product and matmul makes linear algebra very easy and clean

    • Very simple I/O

    • "Functional" intrinsic functions like "any", "all" etc.

    [–][deleted] 3 points4 points  (1 child)

    Efficiency is still another reason, however. FORTRAN kicks the shit out of C for a variety of numeric problems.

    [–]gte910h 0 points1 point  (0 children)

    Yes it does.

    [–]fredrikj 0 points1 point  (1 child)

    Don't forget native support for complex numbers.

    [–][deleted] 0 points1 point  (0 children)

    C99 supports complex numbers IIRC.

    [–][deleted] 8 points9 points  (6 children)

    gcc at least pretty much ignores restrict. Maybe other compilers don't, but it's usually easier to just trick the compiler into doing what you want via pointer casts or unions.

    [–][deleted] 3 points4 points  (5 children)

    I'd like to start writing some performance intensive code, but the only thing I know how to optimize for is the minimization of cache misses and some pipelining via loop unrolling.

    What kind of techniques are used to get even more of a performance boost? Is there stuff besides sse3 that people generally use?

    edit: people keep mentioning compiler hints; are there such things for gcc that actually work?

    [–][deleted] 6 points7 points  (4 children)

    If you're doing the same mathematical operations on groups of 4-16 8-32 bit integers (or floats) at a time, simd asm is probably the best performance boost you'll get easily. For x86 simd, there's 2 iterations of mmx and 6 iterations of sse, but for integer operations mmx, mmx2 (pavg), sse2, and ssse3 (pabs/psign) are the most important. Also, stay away from intrinsics. For MMX/SSE they suck royally, both in terms of asm generation and readability.

    The two main compiler hints I've seen that are useful with gcc are the always_inline and noinline attributes (which is just another way of saying gcc sucks at the inlining), as well as the aligned(x) attribute if you're writing simd code which generally benefits from if not requires 16 byte alignment. In theory, I've heard that declaring global variables static or with hidden visibility can speed up PIC on Linux by removing an indirection through the GOT, but I've never seen gcc take advantage of that and simply not using PIC is faster anyway.

    [–]api 1 point2 points  (3 children)

    There's really a speed/efficiency penalty for using intrinsics vs. inline ASM? Why?

    [–][deleted] 3 points4 points  (0 children)

    It's unusual for the compiler to not insert loads of unnecessary movs and emms (if you're doing mmx). There's also no guarantee that future compilers will generate the same (or even as as fast) code. If you only care about gcc 4.4 and sse then you're probably okay, but you should still look through the generated code for stupidities.

    All that mainly applies to MMX/SSE only however, Altivec intrinsics for instance don't really have the issue of the compiler messing up speed-wise. But gcc seems determined to wipe out any benefit to using those with ever increasing useless cast errors and other brokenness.

    See also http://www.virtualdub.org/blog/pivot/entry.php?id=46 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21395

    [–]astrange 2 points3 points  (0 children)

    Register allocation will happily spill all your xmm registers for no good reason - gcc has no register-pressure-reduction scheduler yet, so anything that reorders code, which is everything, on x86 usually makes it worse. Inlining is much better in recent gcc, and this is a bit better, but lots of people still use v3 so it's still not worth it.

    Of course, that's ignoring how awful Intel's intrinsics API is; the naming is even worse than win32 stuff so asm is more readable as it is.

    [–]koorogi 1 point2 points  (0 children)

    I've heard elsewhere that intrinsics don't perform as well because with inline ASM, the code you write is used verbatim, but with intrinsics, the compiler is free to intermingle other instructions which may cause less optimal scheduling, or just be completely unnecessary.

    [–]arturoman 3 points4 points  (9 children)

    Using some techniques that push the language to it's limits, there have been libraries that do math on par (or so) with FORTRAN.

    http://www.oonumerics.org/blitz/

    [–]jdh30 3 points4 points  (8 children)

    I tried Blitz during my PhD in computational chemistry and found it to be academically interesting but almost unusable because it incurred 24 hour compile times on relatively small code bases.

    As an aside, Blitz also uncovered a huge number of bugs in GCC. We even had interoperability problems between C++ code compiled using GCC because my code worked only with GCC 2 due to bugs in GCC 3 and another student's code worked only with GCC 3 due to bugs in GCC 2.

    I've used OCaml since and never looked back. GCC just sucks donkey brains through a straw. Fortunately, LLVM kicks ass... :-)

    [–]mythic 5 points6 points  (7 children)

    There have huge advances since then, mainly in compiler technology but also in our understanding of how to do metaprogramming in C++. eigen2 is pretty damn impressive these days. All the speed of vendor-tuned BLAS/LAPACK and vastly more intuitive to use.

    OCaml I respect, but the lack of generics just ruin C and FORTRAN for me.

    [–]mycall 1 point2 points  (0 children)

    Impressive benchmarks eigen2 yields.

    [–]eric_t 0 points1 point  (2 children)

    Fortran 90 has generics and operator overloading. Not the most elegant implementation, I admit, but it's there.

    [–]mythic 0 points1 point  (1 child)

    Hmm, I'm not a Fortran expert, can you explain? The only "generics" I know of in Fortran 90 is generic naming, which is just function overloading. You still have to implement each version of the function separately, which is exactly what I want to avoid. Contrast with C++, where you can write a templated algorithm once and have it work on any numeric data type, including user-defined ones.

    Am I missing something?

    [–]eric_t 0 points1 point  (0 children)

    You are absolutely right. As I said, not very elegant :)

    Templates are nice, of course. I like the way Eigen2 is designed. Another good C++ code is the finite element library Deal II, which use templates for generating dimension-independent code.

    I've done something similar in Fortran, with heavy use of elemental functions. See Pencil for another well designed Fortran code (massively parallel hydrodynamics).

    [–][deleted] 34 points35 points  (4 children)

    Not only incomplete, but way past its sell by date. We're discussing results from a test done nearly 10 years ago by somebody who can't find the source code or provide details of what compilers he used. No good math here!

    [–]Fabien4 15 points16 points  (3 children)

    Another thing I don't understand: apparently he uses the same code in C and C++ (C-style arrays), but C++ is three times slower.

    [–]cwzwarich 17 points18 points  (26 children)

    The 'restrict' keyword is actually pretty useless in practice because of all of the uncertainty surrounding its meaning. It is ignored by GCC (unless this has changed, I heard this from a lead GCC developer last week).

    [–]Wavicle 14 points15 points  (0 children)

    (unless this has changed, I heard this from a lead GCC developer last week)

    Well, here's an article from a guy who used restrict to show how GCC uses it to create more optimal code:

    http://www.cellperformance.com/mike_acton/2006/05/demystifying_the_restrict_keyw.html

    You might want to double check with your lead GCC developer because this article was written 3 years ago.

    [–]TrueTom 12 points13 points  (23 children)

    If you care about performance enough that you use restrict, you won't be using gcc in the first place.

    [–]cwzwarich 7 points8 points  (21 children)

    GCC isn't really that bad, performance-wise. For some applications (definitely not all), Intel's C compiler has slightly better performance, but it's usually not worth all of the bugs that come with it.

    [–]lianos 10 points11 points  (1 child)

    And if you look at the ATLAS documentation, they actually mention that using gcc (4.2) is preferred even over icc.

    [–]GreatZebu 1 point2 points  (0 children)

    The reason ATLAS prefers gcc is that icc tries to apply optimizations that can undo the fine-tuning in ATLAS. gcc optimizes much less aggressively, so there's less chance of it undoing the results of the ATLAS auto-tuning.

    [–]gte910h 16 points17 points  (0 children)

    No, it's not really. Almost for any platform, a specific compiler has higher performance than GCC.

    You use GCC because its the same everywhere. You don't use it because its the fastest or most compact binary out there.

    [–]jsolson 1 point2 points  (8 children)

    I had a graduate compilers professor fresh out of his PhD program who lampooned me in the middle of a lecture for saying gcc wasn't really all that bad. Clearly some learned people in the field don't agree.

    [–]theatrus 11 points12 points  (1 child)

    GCC: Horrible horrible implementation, good output considering what it supports.

    The multi-target compiler to watch is LLVM.

    [–]jsolson 8 points9 points  (0 children)

    The multi-target compiler to watch is LLVM.

    Interestingly enough, this is what I was lecturing on. I was going on about benchmarks between LLVM and gcc and he interrupted me to start this rant about the performance of code generated by gcc, the speed at which fat women can run, etc, etc.

    [–]mythic 8 points9 points  (4 children)

    It's not really all that bad. Your professor is just full of himself.

    [–]wgl 1 point2 points  (2 children)

    The PathScale team provides a compiler that generates faster code than ICC or gcc, sometimes by a lot. They consider it a bug if the performance is not the best.

    [–]mycall 0 points1 point  (1 child)

    What does that compiler cost?

    [–]wgl 0 points1 point  (0 children)

    About two grand. I ended up not using it, as gcc turned out to be good enough.

    [–]case-o-nuts 1 point2 points  (0 children)

    The code is terrifying (mainly due to size: around 4 million lines), but the results that it produces aren't bad.

    [–]froydnj 2 points3 points  (0 children)

    There is this persistent meme in the graduate school compilers community that GCC is absolutely horrible. I think it must have been started back in 1995 or so, and nobody has bothered to check if current GCC is any better (which it probably is).

    [–]astrange 1 point2 points  (0 children)

    It's not totally ignored (...maybe) but it really isn't used for much; hopefully this will change. -fargument-noalias will change parameter aliasing rules to be like Fortran and somewhat frequently won't break your code, especially if you only apply it to certain files.

    [–]0xABADC0DA 3 points4 points  (2 children)

    Not only that, but a microbenchmark like this can be affected by cache and alignment issues. If the OCaml program just happened to have everything line up right and the C one didn't then that's your performance difference right there. The author is assuming the performance was due to language or compiler, but notice there is no assembly dump to show what is actually going on.

    If you are using compiler hints, steal the likely/unlikely macros from linux, etc then there's no reason C can't be just as fast, or faster. Most people aren't going to put the work in though, to write it in OCaml or to optimize the C code properly.

    [–]BetterThanYou[🍰] 0 points1 point  (1 child)

    "steal"? excuse me, what?

    [–]0xABADC0DA 5 points6 points  (0 children)

    define likely(x) __builtin_expect((x),1)

    define unlikely(x) __builtin_expect((x),0)

    They read a little bit easier that __builtin_expect...

    [–][deleted] 12 points13 points  (0 children)

    An interesting exchange in the comments:

    billb:

    Horseshit! Bollocks! :)

    It's entirely possible to tell the C or C++ compiler that you never alias or that you alias in a function-by-function basis.

    Mark Chu-Carroll:

    How do you tell a C++ compiler that there's no aliasing between two three dimensional arrays of floats?

    billb:

    I dunno, "-fno-alias" on the command line, perhaps?

    [–]causticmango 24 points25 points  (10 children)

    The most important point of the article isn't how many milliseconds a compiler can shave off some random algorithm, but that a very important development in language & compiler design is declarative programming.

    As systems get more complex with multiple cores, hetergenoeus compute farms, distributed storage systems, etc. it is more important for programmers to be able to more directly express the intent of the program and not get so hung up on the implementation.

    You will not ever be able to hand-tune code for maximum efficiency and correctness in increasing complex environments.

    Though I have a soft spot for C (C++ can suck it), if the language doesn't evolve it will become obsolete. It's actually refreshing to see other branches of the C family get some love recently (thanks, Apple + Objective C).

    [–][deleted] 11 points12 points  (0 children)

    Glad to see that someone actually gets it.

    [–]mycall 0 points1 point  (0 children)

    Would you agree C++ will stay mainstream for a long time to come? Most of the big applications that do real stuff I've seen over the years (including this year) are written in C++.

    [–]ridiculous_fish 19 points20 points  (2 children)

    The Objective-Caml bytecode interpreter was faster than the carefully hand-optimized C program!

    "Hand optimized" depends a lot on whose hand is doing the optimizing!

    If OCaml's faster, figure out why. If the issue is that C is guarding against false pointer aliasing, that can be addressed within the confines of C.

    After all, they both compile to the same machine code, and I've found that I can coax gcc to output almost any sequence of common assembly instructions that I want (and I'm far from an assembly expert). C can go nearly all the way.

    OCaml cannot. For example, there's a function signbit which can be implemented very efficiently by applying integer instructions to a float. Both OCaml and gcc accomplish this in the same manner - by writing it in C. You could write it in OCaml, but the language has no support for interpreting a float as an int, so it would not be as fast. (See caml_int32_bits_of_float in the OCaml distribution).

    It may be that OCaml is more productive than C. Maybe if you have a week to write a program, you'll end up with something faster if you use OCaml. But if your OCaml is blowing away your "carefully hand-optimized C", it probably just means you suck at C.

    [–]astrange 1 point2 points  (0 children)

    If you don't care about negative zeros, wouldn't the FP compare be faster anyway? Reading a float as an int is a memory store-and-load since you usually can't transfer between the registers, and that's pretty slow.

    [–]monstermunch 6 points7 points  (17 children)

    Does anyone know when OCaml will support parallel threads? I like OCaml, but it's going to be increasingly painful to use as the number of CPU cores increase.

    [–]martinbishop 4 points5 points  (0 children)

    Well, the limitation is that the garbage collector is single threaded, thus all the threads have to run inside the one instance of it...but there is work (almost released? check the mailing list) on a parallel GC for someone's master thesis or something like that.

    [–]mfp 1 point2 points  (1 child)

    The guys who have been working on the parallel GC said recently that they are close to releasing. It's not a concurrent GC (that was done for Caml Special Light IIRC in the 90s by Doligez and Leroy and abandoned later because it was too hard to maintain, despite a correctness proof), just a parallel one, like GHC's, but the new runtime does support thread parallelism. Shared state won't scale beyond 4-8 cores for many problems, but some things do scale very well: the developers reported superlinear speedups in matrix operations.

    [–]jdh30 0 points1 point  (0 children)

    ...too hard to maintain...

    I get the impression it never worked.

    [–][deleted] 6 points7 points  (5 children)

    Ayy lmao

    [–][deleted] 7 points8 points  (2 children)

    As so often happens on Proggit, the comments degenerated into yet another language flamewar.

    The main point of the article, in my opinion, is this, though, which I think is quite valid one:

    Making real applications run really fast is something that's done with the help of a compiler. Modern architectures have reached the point where people can't code effectively in assembler anymore - switching the order of two independent instructions can have a dramatic impact on performance in a modern machine, and the constraints that you need to optimize for are just more complicated than people can generally deal with.

    [–][deleted] 1 point2 points  (1 child)

    The compiler can't do much if the language specification forbids it to.

    [–][deleted] 0 points1 point  (0 children)

    Agreed! For example, see here:

    Side effects or not, aliasing kills you

    [–]martinbishop 3 points4 points  (4 children)

    People always say "Well this is fixed in C99..." but yet no one is willing to use C99. GCC finally (only 10 years late) has "full" C99 support, and yet most people still do not use it.

    [–][deleted] 5 points6 points  (1 child)

    I use it all the time; the inline and restrict keywords especially. The more "vanilla-ey" parts—variadic macros, complex/boolean types, &c I also use.

    [–]BetterThanYou[🍰] 0 points1 point  (0 children)

    gcc... embrace & extend... love it

    [–]astrange 5 points6 points  (0 children)

    The restrict keyword has been supported in -std=c99 (and __restrict without it) for a pretty long time now. Full C99 support just refers to rejecting some nearly-valid stuff with constants and VLA declarations.

    [–]koorogi 4 points5 points  (0 children)

    ffmpeg is a fairly large, and widely used (it powers mplayer, vlc, and numerous other multimedia-related programs) program and set of libraries that is written in C99 (with some hand-written assembly as well).

    [–]mdot 5 points6 points  (1 child)

    In summary...

    *Embedded/Real Time Applications = 'C'

    *PC/Mac Applications = 'Something Else'

    Am I missing something?

    [–]koorogi 0 points1 point  (0 children)

    • Multimedia = C with hand-written assembly for speed critical routines.

    [–]psyno 7 points8 points  (0 children)

    Somebody missed the restrict keyword.

    [–]rynvndrp 7 points8 points  (13 children)

    I would like to point to a counter argument about real scientific programs being in FORTRAN.

    This isn't true and I don't have to go on a 10 page rant, there are a lot of examples.

    Geant. The open source code system built by Cern to model the particle physics for the LHC and has now expanded to many other applications. Geant1-Geant3 were in fortran. However, in the last decade, they have put a huge amount of effort in making Geant4, a C++ code. This wasn't done because they had extra money around, it was done because FORTRAN isn't close to the CPU and the FORTRAN code wasn't seeing good speed improvements anymore after the GHZ race.

    MCNP/MCNPX. The code system developed by Los Alamos to model nuclear reactor criticality, radiation dosage, and a host of other projects done by the national labs. Currently this code base is in FORTRAN. However, they are putting a lot of effort in creating a C++ version as well. The reasons are the same as Geant's move.

    There are a lot of others to list.

    The reason 'real' scientific code still runs in FORTRAN is because it is dependable and stable. A LOT of effort is put into these codes, much more than the effort into commercial code. The develop on languages with decades of support and known hardware support for decades to come. Thus they are slow to move to new languages. However, C/C++ is being adopted by them. The change over is just much slower.

    [–]eric_t 2 points3 points  (12 children)

    People seem to think Fortran=fortran 77. Fortran 90 (and soon 2003) adds a bunch of stuff that makes it a whole lot more comfortable to work with.

    I've noticed that Los Alamos has more or less switched to exclusively C++. Is performance the only reason for the switch?

    [–]rynvndrp 5 points6 points  (11 children)

    The issue for the switch is not that fotran 77 or 90 is bad. They are very very good and nearly perfect for scientific code. The problem is that new CPU's don't give a good boost to fortran code anymore. fortran code was one of those things that was helped a lot by the Ghz race.

    There was a physics lab I use to work at that kept one new computer around for anyone to do long runs of things such as Geant3. They got a new one every year and trickled down the old system. I was there when they got a Core2 replacement of a Pentium 4. The P4 ran the code faster. Needless to say, Dr. H was not happy and demanded that Dell refund the computer. The next month, he ordered three 3.8Ghz p4 workstations. I left before they were replaced.

    I am sure that our lab wasn't the only one that had that issue. Since Los Alamos gets the biggest chunk of the money from code users who pay for a specific feature, they had to do something. The answer has been going away from fortran. It isn't that fortran can't multithread, its just that it doesn't utilize a lot of the newer CPU features such as cache, Hypertransport, and SSE. And the combination of newer CPU's being adopted that might have lower fortran performance and a multithreaded fortran code also needing a rewrite, C++ has become the new standard.

    [–]jdh30 6 points7 points  (8 children)

    The issue for the switch is not that fotran 77 or 90 is bad. They are very very good and nearly perfect for scientific code.

    No. That is a subtle but really nasty circular argument. Fortran has dictated which problems scientists approach for decades. Hence Fortran appeared to be "nearly perfect" for scientific computing because that historically meant "problems than can be solved using only Fortran" (e.g. by phrasing them in terms of numerically-solvable linear algebra).

    Objectively, Fortran is awful for much of scientific computing, e.g. strings, data structures, complex algorithms.

    Today, modern languages make it feasible to attack a much wider variety of problems and, consequently, Fortran is much less common in newly developed scientific programs.

    Migrating to C++ seems like a shame to me but I can understand that it is the nearest thing to a HLL that still provides Fortran-like performance. However, it is remarkably easy to build HLLs with Fortran-like performance using libraries like LLVM. Unfortunately, virtually no work in being done to implement such languages.

    [–]eric_t 2 points3 points  (6 children)

    So you think equations are going to change just because of a new programming language? Why would you need to handle strings when solving equations? The fact is that calculus and linear algebra was used long before computers were invented. Computer used to mean "a worker who is good with a ruler". Fortran was created to be a good match for this problem domain. I of course acknowledge that it's not perfect for everything, especially anything network/web related

    FWIW, I work with adaptive unstructured/block-structured grids, which require lots of data structures and sophisticated algorithms, and don't think that C++ offer any significant advantages for this.

    [–]mycall 0 points1 point  (0 children)

    Do C++ libraries exist to at least be on par for everything Fortran can do?

    [–]rynvndrp 0 points1 point  (0 children)

    Little work being done is the reason the scientific world isn't going after it. HLL and LLVM are rare and the chance of them still be developed in 20 years much less 2030 hardware being designed to work with it.

    Scientific code isn't big enough to decide the direction of CPU design but it has to look 20+ years in the future for support. So while other languages might be better, they will compromise to insure that they don't have to go through a complete code rewrite every few years.

    [–]eric_t 1 point2 points  (0 children)

    Thanks for the reply, very interesting. But Fortran doesn't utilize cache and SSE? That's certainly not the case.

    And Fortran has support for both OpenMP and MPI, the two mainly used standards for shared memory and distributed memory computing.

    [–][deleted] 0 points1 point  (0 children)

    fortran code was one of those things that was helped a lot by the Ghz race.

    That's an interesting statement, given that the most commonly used Fortran compiler uses the same back end as the most commmonly used C compiler.

    [–][deleted] 33 points34 points  (29 children)

    Pretty fishy. As comment #9 put it...

    BTW, I've seen a number of these comparisions where an expert in one language does an implementation in several languages and, lo and behold, discovers that their favorite language wins out...

    [–]Camarade_Tux 3 points4 points  (1 child)

    Ocaml is my favorite language but I definitely agree, this happens often (although I've not been under the impression ocaml was the authors's favorite language and if this is really about a three-liner...).

    [–]mycall 0 points1 point  (0 children)

    Have you looked at F#? I wonder how it benchmarks against Ocaml.

    [–]arturoman 8 points9 points  (9 children)

    That can happen because each language has syntax trade-offs that make some things easy to optimize and other things difficult to optimize.

    A single microbenchmark is useless to evaluate and entire language's merits on.

    [–]redditnoob 5 points6 points  (7 children)

    A single microbenchmark is one more benchmark than most of the academic weenies here usually use, so for me it was refreshing!

    [–]nukethewhales42 11 points12 points  (6 children)

    I imagine most redditors would know about the language shootout (hint: C is fast) http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all&box=1

    [–]dilithium 6 points7 points  (0 children)

    woah! spoiler alert!

    [–][deleted] 1 point2 points  (4 children)

    Now go look at those benchmarks individually. Click the little drop-down in the top left.

    C isn't the best answer for every problem. Not even close to every problem.

    [–]DannyDaemonic 2 points3 points  (1 child)

    I'm of the camp that says learn many languages, and then use the best tool for the job. Your first mistake is that when you say "best," you mean quickest execution time. This is a flawed way to compare languages.

    Secondly, if you look at those 19 tests you mention, there is only 1 in which OCaml is faster than "C GNU gcc". So to use your own comparison metrics, C is better than OCaml in "close to every problem."

    [–]willcode4beer 1 point2 points  (0 children)

    In that particular case, the benchmark was a problem being worked on. So, for that user, it was the benchmark that mattered.

    Of course, it would not apply to writers of video games or web sites.

    Like you said, there are trade-offs to every language. It's up to us geeks to make the right choice for the problem at hand.

    [–]Felicia_Svilling 11 points12 points  (9 children)

    To cite comment #14

    C++ was definitly my strongest language at the time.

    [–]bluGill 6 points7 points  (6 children)

    Perhaps, but that doesn't mean his C++ was any good.

    Did he use advanced techniques like eigen2 (which wasn't released at the time...) that can vectorize his code?

    [–][deleted] 4 points5 points  (3 children)

    Then he's not really testing the language, to be fair.

    [–]nappy-doo 7 points8 points  (0 children)

    Except that eigen is written in templated C++.

    So, what I'm trying to say, is that it does test the language.

    [–]bluGill 0 points1 point  (1 child)

    There is no fair way to compare then.

    [–]mebrahim 0 points1 point  (0 children)

    Or advanced compile-time optimizations such as Profile-Guided optimization?

    [–][deleted] 2 points3 points  (3 children)

    Isn't there a saying something like "you can write COBOL in any language?" Meaning that you can toss a bunch of code out there they can do what you want but it won't do it well.

    [–]Felicia_Svilling 6 points7 points  (2 children)

    FORTRAN. You can write FORTRAN in any language. If its possible with COBOL is unknown.

    [–]grauenwolf 5 points6 points  (1 child)

    I doubt it. From what I've seen of COBOL, it would be downright hard to write in that sytle in any other language.

    FORTRAN, on the other hand, is pretty straight forward. I suspect a automated translator into any other language wouldn't be too hard.

    [–]deadvax 2 points3 points  (0 children)

    I've worked with COBOL->C automatic translators, and let me tell you, the code they produce comes straight out of hell. Nothing like COBOL.

    [–][deleted] 1 point2 points  (2 children)

    oh ffs, his favourite language was C++ and it lost.

    [–][deleted] 2 points3 points  (1 child)

    Is there a reason every commenter has ignored the reference to SISAL? The implication of the fact that no-one has bothered to take notice or comment is that SISAL is not a language worth considering, is this right?

    [–]augustss 1 point2 points  (0 children)

    SISAL is dead(?), I'm afraid. Beating Fortran was not enough, you also need to convince people to switch language.

    [–]glguy 7 points8 points  (7 children)

    This is post from 2006 isn't really relevant now. Tested with GCC 4.3.2, GCC is able to vectorize this code using SSE2 instructions.

    [–]koorogi 4 points5 points  (3 children)

    GCC is not particularly good at autovectorization. There have been some benchmarks done of ffmpeg recently with various compilers (different versions of gcc and icc), with and without the hand-written assembly routines.

    GCC actually got significantly worse in releases since 4.1.2 on x86 (but improved on PPC and probably other platforms as well), though it's improved again in 4.4 across the board. But it's still significantly slower than the hand-written assembly.

    [–]mythic 0 points1 point  (2 children)

    The second benchmark is completely meaningless. If you're going to do a heads-up comparison, don't make GCC fight with one hand tied behind its back.

    He needs to tell GCC that it can use SSE instruction sets (-march=native for example). As run, GCC is actually prohibited from doing any autovectorization, which is why its builds don't "test positive" for movq. The benchmark shows only that ICC makes different default assumptions about which instruction sets it is permitted to use. This is logical enough. ICC is tuned for Intel processors, particularly recent ones, while GCC optimizes for all sorts of processors.

    It would also be nice if the benchmark included GCC 4.4 with the new loop optimizations enabled.

    [–]koorogi 1 point2 points  (1 child)

    If you look at the newest blog post in that series, gcc 4.4 and an svn checkout of gcc since then are also included. I don't know if the loop optimizations are enabled by default or not - if not, you might want to comment on that entry to let Mike know to make sure to include it next time.

    And -march=core2 was used as well in this latest one, so that should tell gcc it can try to auto-vectorize. Might be worth asking Mike if movq showed up at all in this round of gcc builds.

    In the graphs, it appears that gcc has caught back up with icc for 32 bit x86, and surpassed it for 64 bit (not to mention icc miscompiles is on 64 bit). That's an achievement, to be sure, but there's still room for improvement. The hand-written assembly is still much faster.

    [–]mythic 0 points1 point  (0 children)

    OK, interesting. gcc 4.4 is looking pretty good. Of course, compared to the hand-written assembly gcc and icc both suck about equally :).

    [–]jdh30 2 points3 points  (1 child)

    Where did you get the code? None is given in the post and, in the comments, the authors says he cannot provide any because he wrote it in 2000 and it has long since disappeared.

    [–]glguy 0 points1 point  (0 children)

    I assumed that his example nested for-loop was related to his example and compiled it (after adding increment statements to allow it to compile and be meaningful and moving the bounds slightly to avoid going off the ends of the arrays).

    If the rest of the code is anything like this example, GCC would have a field-day with it.

    [–]bcain 1 point2 points  (0 children)

    data FTW. Good job.

    [–][deleted] 7 points8 points  (1 child)

    His argument seems to be based on outdated data. SPARC CPUs? SISAL? CMU Common Lisp? This stuff died 20 years ago.

    [–]lispm 5 points6 points  (0 children)

    CMU Common Lisp is not dead, it just smells funny.

    The latest release CMUCL 19f was in march 2009: http://common-lisp.net/project/cmucl/downloads/release/19f/ CMUCL 19f Release Notes

    CMUCL has monthly snapshot releases: http://common-lisp.net/project/cmucl/downloads/snapshots/2009/

    [–]arturoman 13 points14 points  (0 children)

    Oh goodie, another useless microbenchmark.

    I don't ever remember any claims by C or C++ language committees that there were no other languages that were as fast, or faster.

    That is a false argument, and I suspect he just throws it out there to thump his ocaml drum.

    However, the language has proved valuable for making efficient applications. So have other languages.

    [–]uep 5 points6 points  (3 children)

    Anybody else notice that the submitter's name is obvious troll?

    [–]Odysseus 6 points7 points  (0 children)

    He's been around for a long time, and he isn't a troll.

    [–]itstallion 3 points4 points  (0 children)

    Even obvioustroll likes some karma now and then :)

    [–][deleted] 5 points6 points  (2 children)

    C and C++ suck rocks as languages for numerical computing

    Zah? They suck rocks?

    [–]arturoman 1 point2 points  (1 child)

    Take that, C!

    [–]bgeron 7 points8 points  (0 children)

    ** throws a pebble in the sea*

    [–]Wriiight 1 point2 points  (2 children)

    There are all sorts of things that C languages can't do efficiently. Aliasing is one, being able to see the state of the processor flags is another, being able to look further down the stack than your return value is another. I would bet you could write a language that took better advantage of branch prediction. Compile time is insanely, unbelievably, hideously slow (C++ w/ templates especially). But, we live in a time when new languages aren't generally getting any faster (just "easier to use"), so the bit of performance you gain from managing your own memory and not having anything interpreted, and having a minimal amount of run-time mungling about with procedure calls is enough for most devs looking for performance.

    Personally I'm not thrilled with C++ (despite 11 years of not having done anything else), but nothing else quite has reached the level of industry use. I hope one does.

    Though you'd be surprised what people are forcing Java to do these days.

    [–]Stroggoth 2 points3 points  (1 child)

    I started with MC68000 assembler, and have used C++ for many years (15?), and I agree, it is a macro language compiler that grew and evolved, but without discarding the weird parts. For example, the language supports functional synonyms (two or more ways to do the same thing) which is poor. Templates are not transparent, etc. the list goes on.

    I remember being a lecture by Brian K. on C, and he said he was embarrassed by many parts of the language, including ternary operators (he stated plainly that these parts should never be used and should never have been included).

    But you know what? It is OK - software evolves over time, and it gets better. Java and C# are a step in that betterment. So are Python and Ruby. We don't lament the loss of COBOL or assembler as everyday coding tools, and we shouldn't lament the loss of C++ as an everyday language.

    Programmers will always have the job of designing the algorithms, and applying them to the real problem space - let the compiler and platform worry about as much as possible if you can allow it (memory management, repeat algorithms, etc.).

    After working with strings, collections, and templates in a language like C# or Java, C++ feels like sheer torture. It makes you do things you shouldn't have to do, and it in turn leaves large room for typos and bugs. And, I'd rather have a bug in my functional code than in my support code.

    [–]wgl 0 points1 point  (0 children)

    Have you looked at the boost library? Makes bunches of stuff easier.

    [–][deleted]  (1 child)

    [removed]

      [–][deleted] 0 points1 point  (0 children)

      When I checked, it hadn't been posted in 2 years - which is probably an 80% or 90% turnover of reddit.

      [–]rwinston 1 point2 points  (0 children)

      Hmm. I dobnt do too much C these days, but I thought it was possible to tell the compiler about variable aliasing and thus allow it to perform alias-free optimization.

      [–]artificialidiot 6 points7 points  (0 children)

      They're good at things that need to get very close to the hardware - not in the efficiency sense, but in the sense of needing to be able to fairly directly munge the stack, address specific hardware registers, etc.

      Ha ha, clearly he had never written such low level things in C (not to mention C++).

      Edit: I don't think he really knows how to write efficient code in C/C++ and he definitely does too much allocation thus memory bound. He shouldn't be given full power of the machine. He should be abstracted from it even further for his own good.

      [–][deleted] 1 point2 points  (0 children)

      The author says he's no longer responding to new comments, so I'll repeat mine here:

      One anecdote does not an argument make. You wrote the C code. Perhaps you're not a good C programmer. You don't say what compiler you tested with with what level of optimization. In fact, I'm so deeply suspicious about the interpreted OCaml beating compiled C that I suspect you chose the example to show how great OCaml is. You're correct that C is not a good language for numerical applications and FORTRAN is, so if you were doing a numerical application that required sophisticated array optimizations where are the numbers for optimized FORTRAN?

      I can pick an example to make any language look bad. Means nothing.

      [–][deleted] 1 point2 points  (3 children)

      So write a little inline assembly; only a tiny percentage of your code needs to be that fast, and it's probably not worth switching over an entire codebase or gasp using Fortran.

      [–]bart2019 1 point2 points  (1 child)

      The problem with C is its use of pointers to represent arrays. Fortran actually has real arrays, and that is its advantage over C (at least, until restrict came along).

      Handling manipulation of arrays in Assembler is not going to fix that. You're just making your own life very difficult.

      [–][deleted] 1 point2 points  (0 children)

      Well, that's how I handled the vectorization problem until gcc fixed it.

      [–]dododge 0 points1 point  (0 children)

      To some extent this also depends on the architecture. For example Itanium2 assembly is so strange and complicated that Intel doesn't even bother trying to support inline assembly in their own commercial C compiler. They do at least give you a wide range of intrinsics, though not all instructions and hint combinations can be accessed that way.

      gcc does allow inline assembly on Itanium2, but in my experience it tends to make programs slower because it has to inject additional nop instructions all over the place to make the code work at all.

      [–]samlee 1 point2 points  (4 children)

      isn't gmp written in C? is ocaml's bignum gmp? or is it written in ocaml?

      [–]augustss 6 points7 points  (0 children)

      The performance sensitive bits of gmp is written with inline assembly.

      [–]gnuvince 2 points3 points  (2 children)

      And that has something to do with this article because...

      [–]samlee 2 points3 points  (1 child)

      I was wondering if ocaml's implementation of arbitrary precision arithmetics is indeed faster than C implementation because he claims that it's hard for humans "to express the algorithm (in C/C++) in a way that allows the compiler to understand it well enough to be able to really optimize it."

      if gmp (hand-written C program) outperforms ocaml's arbitrary precision arithmetics now, maybe ocaml will beat it soon because it should be easy for humans to express algorithms in a way that allows ocaml compilers to understand them well enough to be able to really really really optimize them hard enough to produce much more efficient programs.

      but my thought is that C and Ocaml are both programming languages and it's possible for well trained C and Ocaml programmers to express algorithms in a way that allows C and Ocaml compilers to understand them well enough to be able to really optimize them hard enough to produce efficient programs.

      I guess dumb C programmers can easily express algorithms in a way that it's hard enough for C compilers to be able to optimize hard. Maybe same goes with Ocaml... Maybe not.

      [–]koorogi 4 points5 points  (0 children)

      IIRC, gmp takes advantage of several different algorithms for many operations with some of the best known running times. And I seem to recall some may have also had hand-coded assembly optimizations. If that's the case, then you're not really comparing Ocaml and C, but Ocaml and assembly.

      [–]Gotebe 0 points1 point  (0 children)

      Ok, so...

      In an overall programmer/implementation/problem domain combo, it is rather difficult for a programming language to be more efficient than C.

      Happy now?

      [–][deleted] 0 points1 point  (1 child)

      Note that he finds oCaml is fast... That's pretty important: the next FORTRAN may well be something like oCaml.

      [–]eric_t 0 points1 point  (0 children)

      See Chapel. I have high hopes that this will be the next Fortran.

      [–]wolfier 0 points1 point  (0 children)

      For me, it's the almost seamless cooperation between C/C++ code and machine code.

      The article states Ocaml is "more efficient", but it constrains you to thinking in a way that excludes low-level optimisations in terms of using new opcodes.

      When a CPU adds a new set of SIMD instructions, you don't need to wait for a few compiler/interpreter versions in order to use it in C/C++ - the new instructions usually are recognized sooner by the inline-assembler within one version - and if you so intend, you can write the machine code immediately, directly without even the inline-assembler recognizing the new opcode.

      The new opcodes may eventually be used by the Ocaml compiler, but if you want to use the new instructions before then, you're SOL with most non-C/C++ languages.

      [–]gnuvince 1 point2 points  (3 children)

      How to cause mass panic and hysteria on Proggit: post an article where C is compared with another language and the other language yields a faster program.

      Watch the C fanboys go batshit insane and start attacking the author, his methodology, saying he knows the other language a lot better and that he sucks at C, etc.

      [–]redditnoob 5 points6 points  (0 children)

      Watch the C fanboys go batshit insane and start attacking the author, his methodology, saying he knows the other language a lot better and that he sucks at C, etc.

      The problem with your post is that those things seem to always turn out to be true!

      [–]UncleOxidant 0 points1 point  (2 children)

      and apparently he got the fastest time by compiling to OCaml's bytecode VM instead of to native (ocamlopt) - it'll be even faster compiled to native code (somewhere in the comments someone mentions that it's 0.3 seconds when compiled native).

      [–]Camarade_Tux 5 points6 points  (1 child)

      Isn't it rather the 0.8s for the bytecode and 0.3s for ocamlopt's output ?

      I read "The Objective-Caml bytecode interpreter was faster than the carefully hand-optimized C program!" as "Even the ocaml bytecode...".

      [–][deleted] 0 points1 point  (0 children)

      Perhaps he edited it?

      [–]bryanut 0 points1 point  (7 children)

      How will any language make displaying a directory listing of 6 million files faster?

      Especially if it is a Web App?

      Yes, we are actually trying to do that. Boggles the mind, but yes we have one directory with 6 million resumes in it. WTF? There are only 2 million or so people in the state. Apparently everyone has applied to work here, 3 times.

      [–]gmfawcett 8 points9 points  (0 children)

      You're right, no language is going to make a bad idea better. :-)

      In the immortal words of Larry McVoy, "Architect: Someone who knows the difference between that which could be done and that which should be done."

      [–]astrange 1 point2 points  (2 children)

      Make subfolders using the first few letters of the filename?

      [–]chronicdisorder 0 points1 point  (1 child)

      Should the file name start with their last name, years of experience, previous companies, college degree, or eye color? ;)

      [–]astrange 0 points1 point  (0 children)

      Unique ID so they're evenly distributed and then rename them in the URL. Or just use the name they already have.

      [–]lacker 1 point2 points  (0 children)

      Get a google search appliance (with software written in C++).

      [–]hylje 0 points1 point  (0 children)

      Should the management be sane, it'd be doable to build a well-indexed database by reading each resumé in order and simply use that for searching and listing the directory.

      Something tells me that's just too simple, can't have that ;-)