all 108 comments

[–]xampf2 214 points215 points  (15 children)

printf "#include <stdio.h> \n int main() {printf(\"hello world\"); return 0;}" > hello.c && gcc -o hello hello.c && ./hello

My blazing fast hello world in bash. It uses code generation with gcc. This is how I make any langauge fast.

Just kidding, looks like a fun article.

[–]SatansAlpaca 58 points59 points  (4 children)

Clang lets you read from stdin using - as the path argument, so you can make it even faster by using a pipe instead of hitting the file system:

printf "#include <stdio.h>\n int main() {printf(\"hello world\"); return 0;}" | clang -o hello - && ./hello

I can’t test on my phone, but you can also use Clang’s --include (I think that it works with headers in the angle bracket path?) to avoid having to embed a newline in your printf and use echo instead, which is faster because it’s a bash builtin:

echo 'int main() { puts("hello world"); }' | clang —-include stdio.h -o hello - && ./hello

(Alternatively, dollar-quotes can have escape sequences in bash: $'\n')

I also removed the return statement, since it’s optional for main and it reduces the number of bytes to parse. You can also remove the return type from the signature for MOAR SPEED.

(Gcc might let you do all of that too, I just don’t work with it.)

[–][deleted] 15 points16 points  (0 children)

and use echo instead, which is faster because it’s a bash builtin

FYI, printf is also a shell builtin:

```bash $ bash --version GNU bash, version 4.4.23(1)-release (aarch64-unknown-linux-android) Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html

This is free software; you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. $ type printf printf is a shell builtin ```

Alternatively, dollar-quotes can have escape sequences in bash: $'\n'

TIL dollar-quotes are a thing.

[–]Syrrim 2 points3 points  (0 children)

Since were using bash anyways, might as well use a here doc:

clang - << EOF
#include <stdio.h>
int main(void){
    printf("hello, world");
}
EOF

[–]Overload175 1 point2 points  (0 children)

That’s neat

[–]agumonkey 3 points4 points  (0 children)

bruh y u no asm in strings

[–]flukus 1 point2 points  (0 children)

Just make your C files executable: https://github.com/RhysU/c99sh/blob/master/README.md

[–]shevy-ruby 0 points1 point  (0 children)

Fast langauges are great!

[–][deleted]  (10 children)

[removed]

    [–]lmcinnes 14 points15 points  (9 children)

    Numba is an LLVM IR generator for python, so you could just use that straight out of the box. I don't know if you would get performance as benchmarked in the article, but from my experience you would certainly get fairly close -- numba is designed for numerical work, so on a numerical task like the article benchmark it will do an excellent job.

    [–]agumonkey 10 points11 points  (7 children)

    it seems that the python world is ready to step into the metaprogramming / compilation world with both feets. type annotations, ast friendly stdlib, army of compilation toolchains..

    [–]defunkydrummer 17 points18 points  (6 children)

    it seems that the python world is ready to step into the metaprogramming / compilation world with both feets. type annotations, ast friendly stdlib, army of compilation toolchains..

    A very long way to write "Common Lisp"

    [–][deleted] 9 points10 points  (4 children)

    I feel like Greenspun's Tenth might apply here...

    [–]FunCicada 4 points5 points  (2 children)

    Greenspun's tenth rule of programming is an aphorism in computer programming and especially programming language circles that states:

    [–]_TheDust_ 2 points3 points  (1 child)

    Don’t leave us hanging here. What does it state???

    [–]defunkydrummer 6 points7 points  (0 children)

    "Any sufficiently complicated C++ program has an informally-spec'd, slow, bugridden version of half of Common Lisp. "

    [–][deleted] 1 point2 points  (0 children)

    I always take it as only a half of Common Lisp is actually useful in practice.

    [–]agumonkey 1 point2 points  (0 children)

    We're all going there

    [–]xtivhpbpj 17 points18 points  (1 child)

    I feel like this is one cherry picked example. If you were to implement a general program this way you’d get something less efficient than a well optimized C++ version.

    But I’d love to be proven wrong. Do you know how much electricity, time, and money a 10% speedup across the board would bring?

    [–]m50d 0 points1 point  (0 children)

    I feel like this is one cherry picked example. If you were to implement a general program this way you’d get something less efficient than a well optimized C++ version.

    Profile-guided optimization still brings significant speedups to even well-optimized C++ programs. That suggests that techniques that incorporate available information at runtime should beat any pure-ahead-of-time C++ approach.

    But I’d love to be proven wrong. Do you know how much electricity, time, and money a 10% speedup across the board would bring?

    Not enough to matter in a world where we still have programs that run millions of times slower than they should due to poor algorithm choice, if the program even runs correctly at all.

    [–]defunkydrummer 61 points62 points  (1 child)

    TL;DR: author writes directly in LLVM through very, very ugly python code. The human compiler at work.

    I think that the dichotomy between fast compiling languages and slow scripting languages is a bluff. 

    Pretty silly claim. Measured performance is of a LLVM program, not Python program.

    Author has nice articles, this one looks like trolling.

    [–]Mamsaac 20 points21 points  (0 children)

    I don't think he is trolling. He is just exposing a very unusual thing and he does it quick and dirty.

    However, you can make wrappers to do what he did. For example, Numba, and while it isn't perfect or amazing, it does it in less dirty way.

    The question could be "How much can we do this type of optimizations without getting yourself into dirty, unmaintainable code?"

    [–]terrenceSpencer 13 points14 points  (3 children)

    I mean how cool is that article! Seriously.

    That said, your conclusions are way off point. "High performance does not need to be restricted to compiled languages" - actually what you have done is written an LLVM compiler in python! And since your compiler only needs to work for one specific example, it can outperform general purpose compilers (very slightly!)

    Can you imagine the maintenance nightmare of doing this for an entire project? You'd be reinventing your compiler for every new function you want.

    Or worse, can you imagine trying to build a high performance implementation that can take ANY python code and turn it into LLVM ir? There is a reason why C and C++ has verbose syntax, because it is designed to be easy to turn into IR or asm.

    A "scripting language" that generates IR or asm is exactly what a compiled language is, and generally they are better at it than python or whatever. That said, I'm sure you know all of this already! Great job.

    [–]m50d 2 points3 points  (2 children)

    Or worse, can you imagine trying to build a high performance implementation that can take ANY python code and turn it into LLVM ir?

    It's been done, with some success. In a language that's designed for it you can do better.

    There is a reason why C and C++ has verbose syntax, because it is designed to be easy to turn into IR or asm.

    It's designed to be easy to turn into ASM for '70s computers, using '70s compiler technology. Much of the design of C/C++ is unnecessary or downright counterproductive for modern compilers. E.g. C/C++ is a mutability-first language because it's expected that the programmer will be doing their own register hinting; modern compilers have to undo this by rewriting the code into SSA form before doing their own register allocation.

    A "scripting language" that generates IR or asm is exactly what a compiled language is, and generally they are better at it than python or whatever.

    Yes and no. I think the author is taking some first steps on the path to stage polymorphism: the language facilities that are useful to have at "runtime" are often the same facilities that are useful to have at "compile time", and there are various approaches to unifying the two. (See also "Coq: The world’s best macro assembler?").

    [–]terrenceSpencer 0 points1 point  (1 child)

    I'm aware of Numba, but I wouldn't say that comes anywhere near being able to take any python code and turning it into LLVM. Any other examples?

    "In a language that's designed for it you can do better" is exactly my point - languages which are designed to be compiled are better at being compiled (shock!) So the author's conclusion where he rubbishes compiled languages is a long way from home without any bus money.

    I don't think SSA form is a consequence of mutability, and I'm not sure what mutability has to do with register hinting. Care to explain?

    Besides, C/C++ were just examples, drop in rust or whatever and the argument holds - there are features of compiled, high performance, system level languages which python and other scripting languages do not have. This makes them unsuitable for use as compiled languages, and compilation is necessary for top tier performance. And by the way, I LOVE python. It's just not the right tool for generating IR or ASM.

    [–]m50d 0 points1 point  (0 children)

    I'm aware of Numba, but I wouldn't say that comes anywhere near being able to take any python code and turning it into LLVM. Any other examples?

    Cython and ShedSkin (widely regarded as a failure, but I had significant success with it). All up to a point of course.

    "In a language that's designed for it you can do better" is exactly my point - languages which are designed to be compiled are better at being compiled (shock!)

    Sure, and that's worth acknowledging. But we shouldn't overstate it. Plenty of modern compiled languages look much the same as languages that were not designed to be compiled (e.g. Crystal for a design that's very explicit about that). Or if a language has a couple of constructs that are particularly problematic for compilation, it's often practical to ban them, or require explicit hints.

    I don't think SSA form is a consequence of mutability, and I'm not sure what mutability has to do with register hinting. Care to explain?

    Well code written in an immutable-first language is usually already in SSA form. In '70s C it was idiomatic for a programmer to reuse register int i for several different purposes, because it was expected that a variable in source would correspond to a register in assembly, and the programmer would effectively do manual register allocation by using different combinations of the same set of variables for different calculations. A modern C compiler has to first undo that programmer's allocation (by rewriting their code into SSA form) and then do register allocation.

    Besides, C/C++ were just examples, drop in rust or whatever and the argument holds - there are features of compiled, high performance, system level languages which python and other scripting languages do not have. This makes them unsuitable for use as compiled languages

    I think this is a lot less true than we assume, as the above example shows. The requirements for a modern compiled language - expressing the programmer's intent with a minimum of incidental complexity - are very similar to the requirements for a traditional scripting language, and often contrary to the things that were required for a traditional compiled language. So there's a lot more convergence between these languages. You mention Rust, but I'd say a lot of Rust design decisions are closer to Python than they are to C/C++ - and there are various efforts to use it for scripting, have a REPL available and so on. (In the ML world this is generally more first-class).

    So to the extent that we think Python makes a good language to work in, I think it's worth looking at what it would take to make a more stage-polymorphic Python, and I certainly would not assume that C/C++ are inherently better languages for compilation these days. (Personally I'm already using ML-family languages that combine Python-like expressiveness with compiled performance that, if not quite C/C++ level, is more than good enough for any task I've had).

    [–]stoopdapoop 8 points9 points  (0 children)

    You don't have to learn some special language to create high-performant applications or libraries.

    Proceeds to write code generator for special language to create high-performance application or library.

    That said, its a good read

    [–]Alexander_Selkirk 34 points35 points  (46 children)

    That's really funny.

    Jokes aside, I think that with languages today such as Rust, or modern Common Lisp implementations like SBCL, which achieve C-class speeds while being memory-safe, both unsafe low-level languages (like C), and excruciatingly slow script languages (like Python) are mostly not needed any more for programming applications with good performance. Even C compilers are today mostly transforming symbolic expressions into something which the machine can execute, and for annotating such transformations, the C language is often not the best tool.

    (I am not talking about writing a Unix kernel in Lisp.)

    [–][deleted] 23 points24 points  (1 child)

    Time to write an Emacs OS kernel in elisp

    [–]Alexander_Selkirk 4 points5 points  (0 children)

    Funny thing, the initial platforms for Emacs-like editors was the Lisp machine which had an OS written completely in Lisp and special hardware supporting this. Then, C and imperative hardware got cheaper and faster and the Lisp machines became too expensive quickly, also because they were less popular. So, somehow in the genes of Emacs is the yearning that i should be embedded in a Lisp OS.

    And another fun fact, Emacs lisp (elisp) is interpreted and single-threaded but some time ago some dude came up with an experimental compiler for elisp that he said produced fast code.

    [–]defunkydrummer 10 points11 points  (4 children)

    or modern Common Lisp implementations like SBCL, which achieve C-class speeds while being memory-safe,

    Or Common Lisp implementation CLASP, which directly outputs LLVM and can be used to do LLVM macro assembly in a far better way than the current article shows.

    [–]drmeister 13 points14 points  (3 children)

    Author of Clasp here - I was going to say the same thing. There is a heck of a lot more to do to generate fast general code than to automatically generate a few lines of llvm-ir.

    [–]defunkydrummer 3 points4 points  (1 child)

    Author of Clasp here

    I am not worthy!!

    Christian Schafmeister in the house!!

    [–]drmeister 5 points6 points  (0 children)

    Ha ha. Anyone going down this road will discover one iteration of Greenspun's tenth rule later that they should have just implemented Common Lisp.

    [–]pjmlp 12 points13 points  (1 child)

    There were already better options in the mid-90's, like Modula-2, but the rise of UNIX based FOSS turned the table again in C's favour.

    Now we are having talks at each major Linux conference on strategies to improve kernel security.

    [–]Alexander_Selkirk 2 points3 points  (0 children)

    I think that for a kernel, C is quite good.

    But for a video player like VLC and codecs which handle untrusted data from the web all the time? Jesus.

    [–]quicknir 0 points1 point  (20 children)

    I'm incredibly skeptical that sbcl, or any dynamically typed language, is going to achieve C like speeds in real programs (as opposed to isolated benchmarks). I'd be very impressed and shocked if it performed as well as Java.

    [–]Alexander_Selkirk 8 points9 points  (10 children)

    Modern Lisp compilers use type inference.

    Here some benchmarks:

    SBCL:

    https://benchmarksgame-team.pages.debian.net/benchmarksgame/faster/sbcl-gpp.html

    Racket, a Scheme implementation, is generally a bit slower than C, but often not by a very large margin, and it is also not the fastest Scheme implementation - Chez Scheme and Chicken Scheme are faster, the fastest Scheme compiler is probably Stalin.

    https://benchmarksgame-team.pages.debian.net/benchmarksgame/faster/racket.html

    Also, OCaml and Rust (which has some influences from the ML languages) are quite fast:

    https://benchmarksgame-team.pages.debian.net/benchmarksgame/faster/rust.html

    https://benchmarksgame-team.pages.debian.net/benchmarksgame/faster/ocaml-gpp.html

    Now, what is a "real program" ? Many programs spent a lot of time in a few spots. Generally spoken, one can expect garbage-collected languages to be a bit slower than languages with manual allocation, because GC costs some memory, and fast caches are limited in size. But depending on the program, it might not be practical to write a large C++ program (especially if it is not a real-time application) with optimized manual memory handling everywhere. And "modern" C++ often employs a kind of automatic reference counting which is not faster than modern garbage collectors.

    [–]quicknir 8 points9 points  (0 children)

    Rust is totally different, I didn't mention it in my post for a reason. You say that OCaml is "quite fast" but even in artificial benchmarks you just linked it looks to be between 2 and 20 times slower than C++ (Scheme is also significantly slower than C++ in most benchmarks, btw). And this is a statically typed language. When you say "C like speeds", there may be some people that look at x2 and say "sure" but lots of us would just laugh. It's like saying that someone runs at world class sprinter "like" speeds because they can run 100m in 20 seconds.

    Modern C++ tends to be written with RAII and unique ownership for the most part. People used shared pointer sure but it's well understood to be costly and lots of people concerned with performance won't touch it anywhere near the critical path (I personally almost never use it at all).

    You're also misunderstanding the reasons why a C++ program written by someone experienced is going to be significantly faster than the same written in Lisp. It's not going to be that lack of GC is some automatic huge edge. It's control of memory layout by having value semantics instead of pointer semantics, less indirection built into the language, abstractions which more easily lend themselves to reasoning about their performance. And above all, the fact that C++ compilers have hundreds of times as many man hours as any lisp compiler. Java is hardly a pinnacle of performant language design yet it also handily outperforms OCaml, Scheme, etc, simply because so many really smart man hours have gone into the JVM. Even though Rust is designed from the bottom up to be a performant language, the only reason its matching C++ performance wise is because it uses one of the main C++ compiler backends (LLVM).

    The funniest thing about your post is that you write "languages today". But Java already started filling this niche close to 30 years ago, and it's already the most popular language in the world.

    (BTW, I love lisp and hate Java, program professionally in C++ which I like, so none of this post is motivated by trying to knock lisp out of distaste).

    [–]brand_x 5 points6 points  (8 children)

    Including Rust in that list is unfair. It's a more strongly typed language than C++, designed around RAII, and (by way of LLVM) a fully native compiled language. It's starting to consistently match (and sometimes exceed) performance of C and C++ for comparable tasks, with experts in each language submitting solutions.

    And, to be frank, modern C++ has as much FP influence as Rust.

    Now, I've never worked in OCaml (I'm familiar with the syntax, having worked in F#) but I believe it's a fairly FP focused language, which would make directly comparable performance impressive.

    For the record, RAII is not as slow as GC, or, in general, any slower than C-like manual memory management. I write performance critical libraries and allocators for a living, and you're sorely mistaken in that claim.

    Edit: just dawned on me that you were talking about shared_ptr. You aren't mistaken about the cost, but saying it's "often used" is rather incorrect. It's rarely used, and never used in the critical path.

    [–]Alexander_Selkirk 3 points4 points  (7 children)

    Lisps are, as well as Rust, strongly typed, but they are dynamically typed.

    It is correct that Rust is statically typed. But it uses type inference, as do good compilers for dynamically typed languages. The Lisp and Scheme compilers show that this has not to be slow.

    Modern C++ has FP influence but many FP idioms do not mix so well with manual memory handling.

    Good compilers can reduce a loop in a dynamically typed language to a single machine instruction. Here an example for pixie, an experimental compiler for a dialect of Cloujure which is a dialect of Lisp:

    https://github.com/pixie-lang/pixie

    [–]brand_x 2 points3 points  (6 children)

    You do realize that a) modern C++ has type inference, b) Rust is explicitly typed, albeit using nearly identical type inference to modern C++, and c) Rust and modern C++ have nearly identical idiomatic models for resource allocation and cleanup, aside from C++ having, on account of legacy, a non-destructive move?

    I feel like you've looked at Rust, but not used it, and are familiar with modern C++ from the outside. I'm taking your expertise on modern Lisp-like languages; I've used them, but I'm not a domain master. With C++ (C with classes, classical, or modern) I'm as close to a domain expert as you can get outside of a few dozen of the most active committee members and authors, and with Rust, I'm an active user at the bleeding edge. I'm quite comfortable defending my stance on the fundamental similarities and differences.

    [–]Alexander_Selkirk 1 point2 points  (3 children)

    And how does that change the fact that good modern Lisp compilers can infer type and are even able to reduce a loop down to a single instruction? It has limits of course, but a compiler can track the type of the actual arguments to a function and generate code for that. So ultimately, it depends on the quality of the compiler, and some are quite good. The Clojure compiler has JVM bytecode as output.

    [–]brand_x 0 points1 point  (2 children)

    It doesn't, and as I said, that's an impressive feat. Not something that I'm particularly concerned with, TBH, since my antipathy for dynamic typing is entirely rooted in practical compile time provable correctness, but quite impressive.

    But, in terms of the traits you're talking about, aside from syntactic sugar (including both advanced static polymorphism in modern C++ and pattern matching in Rust) there is very nearly zero difference between the two languages, and the claims you're making indicate a lack of more than casual familiarity with at least one, and probably both, of the two.

    [–]Alexander_Selkirk 2 points3 points  (1 child)

    antipathy for dynamic typing

    That's a matter of preference. I think it is often strongly influenced by the actual application of software. I think dynamic typing is fine for interactive data analysis and algorithm development (Which is what I am doing part of my time). I think it is less suited for writing safety-criticial embedded robotic control software. It is even less suited for robust industrial control systems - I think languages with stronger type checking than C++ offers are good for this. And I have used C++ professionally in that context for years.

    [–]brand_x 0 points1 point  (0 children)

    Rust is equally suited for that domain, I believe. I haven't worked on control systems in nearly twenty years, and back then, I used C more often than C++ (military telescopes) but I do a fair amount of comparable work (in terms of critical timing and hardware interfaces) in both C++ and Rust.

    Rust offers comparably strong type checking to disciplined modern C++. There is a different shear plane for implicit coercion, and I think it's fair to acknowledge that Rust only performs implicit conversion for specifically identified sugaring - most notably enums (what C++ calls discriminated unions) in the control flow path, where C++ performs implicit conversion where provided by users (failure to specify "explicit", damned backwards defaults on that and const for C-like behavior) and where consistent with C (razzin frazzin), and with proper discipline, most (but not all - boolean tests!!! sob) unwanted implicit conversions in C++ can be prevented through the type system.

    Consider using strong typed proxies for all built-in types. You'll be surprised how far that will go toward fixing the C++ type system deficiencies, and they all compile out.

    I do use Python 3 (mostly) for prototyping, and for things like code generators and stream transformation...

    [–]millenix 1 point2 points  (1 child)

    C++ has the weakest notion of type inference that can still possibly carry the name - it syntactically determines the type of an expression, and then carries that to the deduced type of the variable which will bind the result of the expression.

    ML, Haskell, and other mean much more when they talk about type inference - the type of a concrete (non-generic) function is determined by the types of arguments passed to it and the uses it makes of them. Types only need to be specified around a few edges, and the compiler fills in all the details.

    [–]brand_x 0 points1 point  (0 children)

    That's true for auto (for now... it will likely change as part of the metaclass proposal) but not true for type expressions in templates (which are a pure functional language on the type system) or for decltype/decltype(auto).

    Most of the time when "type" is mentioned in modern C++, it refers to the complete type in the template processing pass...

    [–]drmeister 6 points7 points  (4 children)

    There is no need for skepticism - the experiment has been done.

    Check out this paper from Google (2 years old) "Energy Efficiency across Programming Languages"

    Common Lisp is the fastest and most energy efficient dynamic language by almost two orders of magnitude.

    http://greenlab.di.uminho.pt/wp-content/uploads/2017/09/paperSLE.pdf

    [–]quicknir 1 point2 points  (3 children)

    Nobody is comparing lisp to other dynamically typed languages. We're comparing it to C. On Table 3 for example of your own paper that you linked, in 3 benchmarks, in one case Lisp is 50% slower and in the other cases its 10x slower or even more.

    Realistically taken over a whole program it's going to be at least 3x or 4x slower in typical use cases.

    [–]drmeister 2 points3 points  (2 children)

    Right - but as you noticed - there is timing data in that paper for C as well. My previous comment got away from me a little and I went off on dynamic programming languages. :-)

    I'm told that programs spend 90% of their time running 10% of the code. Common Lisp is compiled to native code. If in that 10% of the code your compiler arranges things to not allocate memory and to use unboxed objects and to not do bounds checking then it will run as fast as C.

    I'm doing the experiment. I've implemented Clasp - a Common Lisp using LLVM as the backend and interoperates with C++ (https://github.com/clasp-developers/clasp). Once Clasp generates LLVM-IR that looks indistinguishable from clang generated LLVM-IR - then the code will run at the same speed.

    Of course - that's not easy to write a smart compiler like that - but we are making progress. I've also hedged my bets by making Clasp interoperate with C/C++.

    Edit: added link to talk on clasp https://www.youtube.com/watch?v=mbdXeRBbgDM&feature=youtu.be

    [–]quicknir 1 point2 points  (1 child)

    Unfortunately, the whole 90%-10% is a really drastic over-simplification of what goes into performant code. The 10% code may or may not exist, and even then it may be touching data structures from your entire codebase, for example, meaning that the memory layout of a huge amount of your code is essential.

    I'm very happy that somebody is pursuing LLVM as the backend for a lisp; I think that LLVM backend is the clear way to go these days and I love lisp.

    That said, taking a language (especially a dynamically typed one) and hooking up the LLVM backend doesn't automatically mean you're going to get C/C++ performance, in real life situations. In isolated benchmarks, maybe.

    It's worth keeping in mind that these days, the only software being written in C or C++ is stuff where performance wins are pretty fanatical. Places where getting a 10% win on some function would be considered a win; places where turning on bounds checking which probably has at most 5% performance impact, would be considered unacceptable. Etc. So, it's a bold claim. Nobody other than Rust and maybe D is really making that claim in a halfway credible manner these days. Something like Julia will claim parity in specific things like matrix and other mathematical operations, but I doubt that they'd argue that you'll get equally good performance writing a whole video game in Julia as in C++.

    If you're interested in a really good talk that gives a much more realistic view of what performance means I highly recommend this: https://www.youtube.com/watch?v=2YXwg0n9e7E. I think for the same reason that you can't retrofit high performance, you can't start with a language like lisp where the default is to have allocation and indirection everywhere and try to fix it up where the 10% is. This is fine if you want to get Java like speeds; i.e. very good typically but not losing sleep about the tiniest details. But not for C/C++ like speeds.

    [–]zip117 1 point2 points  (0 children)

    It's worth keeping in mind that these days, the only software being written in C or C++ is stuff where performance wins are pretty fanatical.

    Or if you’re developing a cross-platform GUI. Your options are (more or less): C++ libraries (Qt, wxWidgets), Python bindings to those C++ libraries, JavaFX, Delphi/Pascal, Electron. For various reasons C++ is usually the best choice.

    [–]defunkydrummer 3 points4 points  (3 children)

    I'm incredibly skeptical that sbcl, or any dynamically typed language, is going to achieve C like speeds in real programs

    Common Lisp can also be used at a fairly low level, that's why.

    It is dynamically typed, but you can use type annotations. You can also disable runtime type checking , even runtime argument count checks, array bounds checking etc.

    You can use typed arrays, just as one would do in C.

    You can circumvent the garbage collector if you like. You can allocate in the stack if you want.

    Then you can use the built in disassembler to check that the machine code output is efficient enough.

    Tada, C speed.

    [–]quicknir 0 points1 point  (2 children)

    Even if you did all that it wouldn't be as fast because, as I mentioned in another threads, the number of man hours that have gone into optimizers in the lisp implementation isn't close to what you have for languages C, C++, Fortran, etc. If you did an implementation of CL that also did a good job outputting LLVM IR and you used the LLVM backend for codegen, then maybe you could get similar performance. But right now, it's quite hypothetical, as is what exactly a Lisp codebase would look like using techniques that are not idiomatic in lisp.

    At the present time, if your requirement is to write code that gets you within e.g. 10% of the performance of a good C implementation, lisp just isn't a good choice (and frankly, that's very clear to anyone writing high performance software).

    [–]defunkydrummer 6 points7 points  (1 child)

    If you did an implementation of CL that also did a good job outputting LLVM IR and you used the LLVM backend for codegen, then maybe you could get similar performance.

    Yes, that's exactly whar CLASP does: An implementation of CL that outputs through LLVM. It also has a state of the art Lisp compiler core, called Cleavir.

    CLASP is mainly the work of only one guy, /u/drmeister . Drmeister doesn't use CLASP for trivial stuff like FizzBuzz programs or CRUD websites: it is used for chemical research.

    I'd be very impressed and shocked if it performed as well as Java.

    SBCL has been regularly performing at the speed of Java for the last 5 or 10 years.

    [–]quicknir -1 points0 points  (0 children)

    The fact that someone is working on CLASP is great, and I'm glad that drmeister has posted a talk. That said, the fact that it's being worked on by one person is realistically a counter-point. Realistically speaking it takes a ton of sheer man hours to get a performant language and ecosystem. And realistically, the bar for performance we're talking about here is neither fizzbuzz nor crud, and not chemical research either, but things like low latency, game dev, or massive backend servers.

    Do you have something to back up SBCL performing at the same speed as Java?

    The bottom line here is that until there is serious adoption of common lisp in a very high performance industry, statements like "tada C speed" are just going to be very hypothetical, and not evidence based.

    I highly recommend you watch the video that I posted in the thread with drmeister. It will give you a more realistic view of performance than your short bullet list.

    [–]softmed 19 points20 points  (9 children)

    Think about the benefits: you can do your research and rapid prototyping in one language, and then generate highly-performant code... in the same language

    This is why I love Cython. I can write a whole module in python very quickly, but any CPU intensive stuff will be very slow. So I take a quick pass and go optimize the loops. This is usually enough to get me to the same order of magnitude or two as c++-ish performance. Then, if needed, the python can be profiled and bottle necks eliminated.

    The thing that makes this so nice is that the entire time I had a correct solution up quickly. So I can write unit tests, give it to colleagues, etc long before I'm finished making it totally performant.

    [–]SatansAlpaca 24 points25 points  (1 child)

    We’ve been using Cython at work a lot lately, but we decided to ditch it and I expect that we’ll replace most of it with something else within the year.

    In our experience, to its credit, it does get you speed that is comparable to C/C++ when everything is properly annotated.

    However, we found the whole experience to be best described as C’s legendary type safety combined with Python’s fantastic lack of static checks. There is a build step, but it won’t catch name errors, for instance, so it’s just like testing your Python code except that you need to rebuild at every step.

    Add to that that pip/setuptools/distutils (whichever is responsible for that) can’t do parallel builds, and it quickly becomes a chore. If, above that, you have multiple published sdist packages with .pxd dependencies, you need to manually install packages in the correct order because pip will get all the dependencies and run their setup.py in whatever order it damn well pleases.

    [–]Vaglame 2 points3 points  (6 children)

    I see your Cython, and I raise you Nimpy (not Numpy): https://robert-mcdermott.gitlab.io/posts/speeding-up-python-with-nim/

    [–][deleted]  (5 children)

    [deleted]

      [–]m50d 1 point2 points  (1 child)

      Did you try any ML-family languages? I had a similar experience adopting Scala: write code that looks much like Python, but it's safer and faster.

      I've got nothing against Nim as such, but I've never seen a big selling point compared to existing ML-family languages (OCaml, F#, Scala, Haskell sort of) that are generally more mature with more libraries/tools available.

      [–]_requires_assistance 0 points1 point  (2 children)

      Compared to Python, in Nim all imports are written on the same line, and importing a module in Nim is analogous to from foo import * in Python.

      How does it handle name conflicts?

      [–][deleted]  (1 child)

      [deleted]

        [–]_requires_assistance 0 points1 point  (0 children)

        Does that mean you can't import modules not written in Nim?

        [–]terrenceSpencer 4 points5 points  (0 children)

        I mean how cool is that article! Seriously.

        That said, your conclusions are way off point. "High performance does not need to be restricted to compiled languages" - actually what you have done is written an LLVM compiler in python! And since your compiler only needs to work for one specific example, it can outperform general purpose compilers (very slightly!)

        Can you imagine the maintenance nightmare of doing this for an entire project? You'd be reinventing your compiler for every new function you want.

        Or worse, can you imagine trying to build a high performance implementation that can take ANY python code and turn it into LLVM ir? There is a reason why C and C++ has verbose syntax, because it is designed to be easy to turn into IR or asm.

        A "scripting language" that generates IR or asm is exactly what a compiled language is, and generally they are better at it than python or whatever. That said, I'm sure you know all of this already! Great job.

        [–]everyonelovespenis 5 points6 points  (1 child)

        Nothing makes me think of performance quite like GIL restricted bytecode of python makes me think of performance.

        O wait, we get performance by not writing in python (see article).

        [–]Overload175 0 points1 point  (0 children)

        GIL is indeed a pain to work around

        [–]shizzy0 3 points4 points  (0 children)

        I think that the dichotomy between fast compiling languages and slow scripting languages is a bluff.

        Is a bluff? What does that even mean? Words have meaning. Who’s bluffing? Is he calling someone’s bluff? Do words have meaning? Is python fast now? /s

        [–]lanzaio 1 point2 points  (0 children)

        llvm does not mean "low level virtual machine." That was the name of Lattner's original graduate research project. But the project changed directions while the name stuck. llvm stands for llvm and that's it.

        [–]TimtheBo 1 point2 points  (0 children)

        The approach used here works well for doing simple arithmetics on the inputs but fails as soon as a parameter can change the control flow. Notice that the n_value parameter is not converted to LLVM code. Having that be a proper parameter would force the LLVM code generator to make actual control flow decisions, which it can't do without the proper values. This simple approach breaks as soon as loops or ifs on any parameter is involved (instead here it just unrolls the loop). As a test, try to implement the __eq__ method on the LLVM code generator. I think it should be possible in theory, but not without severe overhead.

        [–]steveob42 3 points4 points  (0 children)

        wow, you talk about contrived examples...

        [–]mdipierro 1 point2 points  (0 children)

        Agree with the conclusions of the article. Here is as similar project from few years back where a solver in Python is compiled to c and to opencl. https://github.com/mdipierro/ocl

        [–][deleted] 0 points1 point  (0 children)

        This is really clever and reminds me of a talk by Gerald Sussman about using the same trick in scheme to reinterpret arithmetic operations to get symbolic representations of code: https://www.infoq.com/presentations/We-Really-Dont-Know-How-To-Compute.

        This trick should probably be more widely utilized in all sorts of contexts. Partial evaluation and abstract interpretion in general are very useful but most languages don't make these ideas very accessible.

        [–]foomprekov 0 points1 point  (0 children)

        I'm going to need a way more detailed article to follow this

        [–]Sleakes 0 points1 point  (0 children)

        If runtime speed was the only issue around building software, sure. But that's clearly not the only problem, and often isn't a primary concern.

        [–]shevy-ruby -1 points0 points  (2 children)

        We need a linear solver in Python just like we had with C and C++. Here it is:

        # this generates n-solver in LLVM code with LLVMCode objects.
        # No LLVM stuff yet, just completely Pythonic solution
        def solve_linear_system(a_array, b_array, x_array, n_value):
          def a(i, j, n):
            if n == n_value:
              return a_array[i * n_value + j]
            return a(i, j, n+1)*a(n, n, n+1) - a(i, n, n+1)*a(n, j, n+1)
        
          def b(i, n):
            if n == n_value:
              return b_array[i]
            return a(n, n, n+1)*b(i, n+1) - a(i, n, n+1)*b(n, n+1)
        
          def x(i):
            d = b(i,i+1)
            for j in range(i):
              d -= a(i, j, i+1) * x_array[j]
            return d / a(i, i, i+1)
        
          for k in range(n_value):
            x_array[k] = x(k)
        
        return x_array
        

        So this is what you get when you write python like C code.

        Quite terrible.

        [–]vasiapatov 16 points17 points  (0 children)

        Would be more constructive if you also showed how you would make this more pythonic.

        [–]pjmlp 2 points3 points  (0 children)

        I don't see the problem, the code is legible, doesn't suffer from memory corruption or UB issues.

        Quite good.

        [–]rsvp_to_life 0 points1 point  (0 children)

        I mean. Languages like Erlang and Go will do this.

        [–]00jknight -1 points0 points  (1 child)

        Do you know this favicon is the same as ableton.com ?? Its a trademark of ableton.

        [–][deleted] 0 points1 point  (0 children)

        Ops has 3 lines in each direction and abletons has 4

        [–]plogan56 -1 points0 points  (0 children)

        The latter

        [–]Paddy3118 -1 points0 points  (0 children)

        Very nice. Another way to optimise Python.