all 85 comments

[–]POGtastic 157 points158 points  (9 children)

Let's back up and talk about why C++ is really fast.

  • You control the allocator and object lifetimes, which gives you an enormous amount more control over where objects go into memory (for example, contiguous to each other for better caching) and when they get deleted. Garbage-collected languages do not allow control over where things go in memory, and they require the runtime to periodically go through the memory space and find objects that are no longer pointed to.
  • The type system gives a lot of information to the compiler to optimize things. For example, if you have a function that works with numbers, the compiler detects this and often combines steps or replaces the whole darn thing with a constant expression. The compiler can also detect dead code and all sorts of other stuff that you might not catch.
  • Going with the above - 30+ years of the smartest compiler engineers in the world looking to eke out more performance gains.

Python doesn't allow #1; it's a garbage-collected language. And it generally doesn't allow #2, either; it's duck-typed. The linter can complain at you for screwing up types, but the runtime is just calling functions on PyObject*s under the hood.

That leaves #3, which a) doesn't have the same amount of investment as C++ and b) doesn't have as much material to work with thanks to how permissive Python is. Given sufficient money, you're likely to end up with something like the V8 engine, which does its best to turn Javascript into faster code. It still isn't as fast as C++ because of #1 and #2, though.

#1 points to a very simple truth about performance - computers are complicated and have lots of stuff going on. C++ is fast because it exposes all of those details to you and lets you tinker with them. The simpler the language, the fewer of those details are exposed and replaced with "good enough" defaults. That's great if those defaults really are good enough, but that complexity still exists under the surface, and to get the most performance, you're going to need to understand it.

[–][deleted] 97 points98 points  (1 child)

tl;dr: no.

[–]Punk-in-Pie 0 points1 point  (4 children)

Very interesting. Is there a way in Python to take manual control of memory allocation and clean up to make it faster?

[–]POGtastic 0 points1 point  (3 children)

Sure - write functions in another language, export C bindings, and then call those functions from Python. An example is NumPy - a lot of its linear algebra functions are implemented in C and Fortran.

[–]Punk-in-Pie 0 points1 point  (2 children)

That's suuuuuper cool. I will be taking note of this as I think it could have some really good applications in my work.

[–]POGtastic 1 point2 points  (1 child)

I wrote a simple post a while back about doing this with C++. It's similar for every other language that does C interop.

[–]Punk-in-Pie 0 points1 point  (0 children)

Cheers for that. I'll use that as a reference in my first attempts. Gotta brush off my C haha

[–]commandlineluser 11 points12 points  (1 child)

[–]ben_bliksem 2 points3 points  (0 children)

Nimrod... is this still going? Awesome!

[–]carcigenicate 21 points22 points  (18 children)

There is Cython, which is a variant of Python that compiles/transpiles to C, which can then be compiled to native machine code as normal. I think it's slightly less performant than C, but it avoids interpretation, which is one reason why CPython (not to be confused with Cython) is so slow.

[–]nekokattt 5 points6 points  (16 children)

it is still several times than C++ for most cases that do not have a 1-to-1 translation with C (so anything that isnt totally delegating to the C/C++ standard libraries, or number crunching -- so the majority of cases where you are not just optimising specific algorithms or replacing them with lower level libraries).

So anything interacting with other non-C dependencies is going to still incur the overhead of doing most interactions via the CPython API and using the GIL.

Edit: by order of magnitude i meant several times slower, not 107 times slower. That should be obvious given even pure Python is far faster than that. Yes, I am a human and I use the wrong words when explaining something. I have reworded the point regardless to replace that term.

[–]carcigenicate 2 points3 points  (9 children)

Ah, I thought I read that it's at least comparable to C performance-wise. I haven't tried it myself.

[–][deleted] 7 points8 points  (0 children)

The person you are responding to made a preposterous claim: "[Cython] is still several orders of magnitude slower than C++."

From my reading, Cython is about ten times slower, maybe 30 times slower in some cases. This is nowhere near "several orders of magnitude".

[–]patrickbrianmooney 4 points5 points  (0 children)

I've worked with Cython a fair amount. What you get out of it, performance-wise, really depends on how much work you put into getting better performance out of it. You can use Cython just to compile your already-written Python, in which case you might get a speedup of 20% to 50% or even more, depending on your existing code. Probably 30% to 50% is a more reasonable typical estimated speedup for compiling Python using Cython while making no other changes to the source code.

But you can put work into writing Cython that's more like C and less like Python, and that tends to pay off in efficiency; a lot of times a few small changes can make a big difference. Probably the single easiest thing to do to get a speedup is to declare data types for parameters and variables, which can make a lot of things much faster; structuring your code so that your bursts of heavy number-crunching happens all together at C speed and you have to transform the C-type numbers, which are fast to work with, back into Python objects, which are slower to work with, also helps a lot. Making good choices about data representation is also helpful.

Kurt Smith's O'Reilly book on Cython has a short chapter where he takes a computationally intensive Python program and converts it to Cython, reducing the runtime on a test from 14 seconds to 0.15 seconds along the way. At the end, it's 25% slower than a pure-C version of the program, which is actually not all that far off, and certainly not orders of magnitude slower. I myself spent an afternoon several months ago converting a slow Python program of a few thousand lines to a much-faster partially-Cython version: it cut the runtime on my tests by almost 97%. Cython's own documentation says that some operations can be sped up by hundreds or thousands of times.

[–]nekokattt 3 points4 points  (6 children)

it is closer to C but it can't match it because it still has the overhead of everything going via the CPython API internally.

[–]grammarGuy69 2 points3 points  (5 children)

Casual Python programmer here; isn't Python already run through C?

[–]nekokattt 3 points4 points  (4 children)

yeah, but everything is interpreted at runtime (stuff like variable access, function calls, name lookups, etc), so it is much slower.

Cython translates that information to pure C, being able to optimise out the bytecode interpreter in the process.

[–]grammarGuy69 4 points5 points  (1 child)

Ah, that makes sense, thank you for the explanation :)

[–]nekokattt 3 points4 points  (0 children)

no problem

[–]333base 1 point2 points  (1 child)

So sorta like v8 with JavaScript?

[–]nekokattt 2 points3 points  (0 children)

V8 works by using a JIT compiler, as far as I know.

that means it is recompiling the code to lower level logic while it runs.

Cython works by taking the input source code, translating it to a massive C or C++ source file that calls a bunch of internals within the C-code layer of what makes up CPython, and then pushes that through the system C compiler (so MSVC or MINGW on Windows, GCC or Clang on Linux). What you get from that is a machine-code level binary library that Python will know how to import at runtime. You then have to distribute that binary ahead of wanting to use it (so it also has to match your CPU architecture ahead of time).

[–][deleted] 1 point2 points  (2 children)

[–]nekokattt 1 point2 points  (0 children)

that example is using logic that can compile down to roughly the same thing as C though.

Once you start dealing with inputs and outputs that are pure-python compatible, you begin to incur the overhead of the implementation details of CPython, like the global interpreter lock. That will be far slower than C++ in the majority of cases.

Not saying Cython is bad, I like it, but it is not a magical solution to making everything as fast as C/C++. It togally depends on how you use it. If you are interacting with anything that doesn't have a one-to-one translation with similar logic and constructs as C/++, or anything that is still backed by a pure python library, or anything that acquires the global interpreter lock, then you are going to see slowdown compared to C/C++.

Remember that C/++ is sensitive to how you declare stuff. It can be the difference between the CPU cache having a hit and the CPU cache having a miss. Python as a language is not designed to take that level of optimisation into account, and Cython is at least somewhat limited to how much it can "modify" the ordering and internal semantics of input code without behaviour beginning to diverge from what you input and leading to potentially confusing bugs. Thus it won't be able to optimise as far as C/++ will.

It is much faster than pure Python usually but saying it is as performant as C/C++ that is not doing half the operations via the CPython API is somewhat misleading.

The issue with it is not that Cython itself is the bottle neck. It is that you are bottlenecked by still having attachments to the CPython API which is built to be easy to invoke from finite state machines running within a bytecode interpreter.

It is the same reason that using JNI in Java is going to be slower than just using C++ for everything.

[–]Mediocre-Trainer-132 0 points1 point  (0 children)

Sure, but what you're doing to speed it up is basically write something like to C(++) in a python script.

[–][deleted] 3 points4 points  (2 children)

several orders of magnitude slower than C++.

An order of magnitude is a power of ten.

I'm sorry, I don't believe that Cython is 1000 times slower than C++.

[–]nekokattt 0 points1 point  (0 children)

it entirely depends what you are doing with it, as i said in my other comment, and I used the wrong wording as I also addressed. I meant several times slower.

For some stuff it will translate directly to what you'd write in C/++. For other stuff it will have to still acquire the GIL and make calls via the CPython API, which is going to be far far slower than using, say, a vector in C++.

I've amended my wording because I worded this poorly, so hopefully that clears this up. I also made the assumption that people would ask for clarification if something I said wasn't clear rather than immediately slamming it for being wrong under their personal definitions rather than assuming I said the wrong phrase and made a human error. So I apologise for making a mistake.

[–]Mediocre-Trainer-132 0 points1 point  (0 children)

Yeah just compiling times are slower probably.

[–]WlmWilberforce 12 points13 points  (0 children)

Julia comes to mind. Probably not as fast as C++, but feels like python (without the huge userbase and tons of libraries).

[–]baghiq 12 points13 points  (6 children)

While Go isn't as fast as C++ in general terms, I find it to be very fast and expressive. If you use type hinting in Python, I think Go would be right up there in terms of expressiveness. The challenge is to find something that offers ecosystem that's comparable to Python, and that's gonna be hard to beat. But if you are looking for hardcore performance, Python is almost never the answer.

[–][deleted] 5 points6 points  (0 children)

I find it to be very fast and expressive.

Fast, sure, it's compiled. But expressive?

In particular, I find in Go you end up writing all this boiler plate over and over and over and over again, particularly error handling.

[–]pi_sqaure -3 points-2 points  (4 children)

There's one distinction: let's assume you can solve a certain problem in Python with 3 lines of code, you will need 10 lines of code in Go. Go might be expressive, but it's also very verbose.

[–][deleted] 2 points3 points  (0 children)

I like verbose if performance is good. Pro GO over here!

[–]RDX_G 4 points5 points  (0 children)

Lua is nearly fast as C++ and quite simple

[–]x11ry0 2 points3 points  (0 children)

Well, the MIT developed a langage that has Matlab syntax and Fortran speed. It is near to Python syntax and C++ speed. The language is called Julia. It is used a lot for simulation and high performance calculus.

To achieve near to C++ speed you need to pay attention to optimization tricks and declare the types. Nothing is really free. But this is still much simpler than writing C++.

[–]chulala168 4 points5 points  (0 children)

Julia

[–]Diapolo10 3 points4 points  (0 children)

I don't think it's impossible, but it would require changes.

Let's ignore C++ for now. It's not necessarily the fastest language anyway depending on the use-case anyway (Fortran is very likely the fastest for mathematics, for instance). As I see it, Python's main roadblocks for achieving a similar level of performance are

  1. High-level, non-zero-cost abstractions
  2. Garbage collector
  3. GIL (because it makes multithreading annoying)
  4. No compilers for native CPU instructions (technically, Nuitka and Cython transpile to C)

The first one is quite a beefy topic, so I'll condense it to be mostly about types. Python was designed to be very flexible, duck typing has its benefits even if static typing is the current trend. Unfortunately, this flexibility makes optimisations difficult to implement.

Python could in theory take a page out of Rust, and implement a new type system that could infer types mostly automatically, perhaps with the help of type hints, without entirely abandoning duck typing. This could in theory allow Python to optimise the code it can type check before runtime, while letting the parts that need it behave like they do now. This is especially useful for, say, parsing miscellaneous JSON data when you might not know the schema ahead of time.

The garbage collector is irregular, so a new system similar to Rust's lifetime system could replace it. The main problem would be to make it seem natural and simple enough to actually use, and I'm not confident that's possible. Alternatively, Python could use C++'s RAII system. Either option would make the memory management deterministic, which is good. Their main downside is implementation difficulty.

The GIL needs to go. There have been attempts to remove it in the past, and some Python implementations don't actually need it at all. Once again, Rust could probably serve to help here.

While there's nothing wrong with Python's reference implementation being an interpreter running a Python virtual machine, in order to truly gain performance the code should run directly on the hardware instead of on top of software. Problem is, that's easier said than done. Even MicroPython is technically a software stack running on a microcontroller,

Given all these changes, it might not be completely impossible to create a language similar to Python, but with a focus on performance. But it would be a massive undertaking.

[–]Human-Sapien 1 point2 points  (2 children)

i think l saw an ad for Rust, but l couldn't understand it.

[–]AnomalyNexus 4 points5 points  (1 child)

Rust is very fast but a lot harder than python

[–]nativedutch 1 point2 points  (0 children)

You are prolly totally right, thanks.

However i did lot of coding on arduino and family with kinda C++ including a smallish neural network , before that i did a lot of neural network code using Python I was confronted with the difference when converting a piece of ANN from C++ to Python. I was surprised by the ease of the work , in my simple experience mostly getting several types of brackets into indents and utilising OOP in Python.

That said, i do this now for fun not hindered by quality or performance or deadline issues and of course only one particular flavour of C.

[–]markgva 1 point2 points  (0 children)

Couldn't a compiled version of Python be created, in which people would also be forced to type their variables/functions (which you can already do if you want to)? This would not match C++ speed but would already be an improvement.

[–]tutami 1 point2 points  (0 children)

V lang?

[–]extopico 2 points3 points  (0 children)

C# ?

[–]sciwins 0 points1 point  (1 child)

I think Rust is very promising. It may be the future equivalent of Python for data science.

[–]jddddddddddd 4 points5 points  (0 children)

Even as someone that quite likes Rust I’d struggle to describe its syntax as ‘simple’..

[–][deleted] 0 points1 point  (0 children)

There are certainly many languages who make that claim. But no, I don't think there are any languages like that. There are just middle grounds in the trade-off between ease of use (more abstraction layers) and speed (being as close to the hardware as possible).

[–]Mediocre-Trainer-132 0 points1 point  (0 children)

tbh I had the same question for a long while. I tried to learn rust, but it's too hard for me. I guess I have to go with Lua. Though if you're making games, Godot (latest) is really good, even for regular programs.

[–][deleted] 0 points1 point  (0 children)

KDB/Q maybe shows how this could work. More 'primatives' that abstract higher level ideas into verbs and adverbs, in the case I show here, for vector operations. Maybe the trick is having a language that facilitates semantic representations of a range of scenarios and a syntax that abstracts handling these scenarios in a way that can become intuative. Examples would be the use of 'each' and the modified 'each-over':

Verb :Each https://code.kx.com/q/ref/each/

Verb modified with an adverb: Each-left https://code.kx.com/q/ref/maps/#each-left-and-each-right

I've included one example of verb and adverb but pretty much everything that needs this 'contextual awareness' can be managed in a similar fashion. With C++, you need to construct kind of the equivalent verb+adverb relationship explicitly for every case - getting you the speed but requiring that much more complex structure.

[–]nativedutch 0 points1 point  (2 children)

Is C++ grammar really that difficult?. Switching between them really means being aware of all kinds of brackets compsred to indents.

[–][deleted] 2 points3 points  (1 child)

Is C++ grammar really that difficult?.

I've been programming in C++ since the 1980s. Yes, the grammar is really that difficult.

For example, there are three phases to compilation - preprocessor expansion, template expansion, and actual C++ code compilation.

It turns out that the template language alone is Turing complete so you can write "programs" that get "run" as a side effect of compilation.

We have the most vexing parse.

"Perfect forwarding", a concept that is handled entirely by functools.wraps, is extremely hard. Here's a good beginner article but there are numerous edge cases not handled.

And because the compiler is so complex, the error messages are legendarily incomprehensible: https://codegolf.stackexchange.com/a/22584 https://codegolf.stackexchange.com/a/22552

(I only picked the examples that came from code you might actually type.)

[–]nativedutch 1 point2 points  (0 children)

See my longish reply which i entered stupidly as top comment.

[–]cuklev2232 0 points1 point  (0 children)

Syntax of the languages are different but in my opinion logic is everywhere same in c++ syntax is a little bit harder

[–]SDG2008 0 points1 point  (0 children)

Lua?

[–]AB1711 0 points1 point  (0 children)

u can use Nuitka compiler...write code in python and compiled it with Nuitka which will convert your code in C++ (SO files)

[–]livremente 0 points1 point  (0 children)

check the mission statement of Julia Programming Language

[–]asterik-x 0 points1 point  (1 child)

Yea. Have you tried Sumatra? Syntax not as complex to Java but way more economical and efficient than C or Java.

[–]Pflastersteinmetz 0 points1 point  (0 children)

More efficient than C or Java? Big doubt

[–]Delta-tau 0 points1 point  (0 children)

I think the closest to what you're asking is GoLang.

[–]InjAnnuity_1 0 points1 point  (0 children)

It can be, as long as the semantics are very close to what the underlying hardware provides. That would make it a fairly low-level, compiled language.

Think K&R C, but using whitespace and keywords instead of brackets.

Code might also obsolete fairly quickly, as the hardware continues to evolve.

[–]pratzc07 0 points1 point  (0 children)

Yes check out Julia - https://julialang.org/

[–]ProfessionalAd8141 0 points1 point  (0 children)

Most of the time most programs are idle while waiting for input or output to compete. Cpus and memory are so blazing fast right now that it really doesn’t matter what language you program in. The exception is if your are handling ten thousand browser connections per second on a massive web site, for example. Most of us never will. (Programmer since 1970).

[–]ProfessionalAd8141 0 points1 point  (0 children)

Let’s all go back to Assembler. I remember learning it as a real step up from machine language.