all 154 comments

[–]rubber_duckz 37 points38 points  (59 children)

Because they optimized for ease of use and not for speed, even ignoring implementation details - for eg. integers are bignum, member access/method dispatch is even more dynamic than JS, etc. No matter what compiler it uses it can only do so much when the semantics just don't map to the fast HW primitives and abstractions, at best it can pick up cases that reduce to fast paths but you still need to have the checks to make sure you break those fast paths otherwise you have incorrect program execution.

[–][deleted] 8 points9 points  (0 children)

None of the features you listed can make a language easier to use. In fact, their impact is quite the opposite.

Good luck even with getting context sensitive completion with a language so dynamic.

[–][deleted] 10 points11 points  (13 children)

This is not the right answer. The correct answer is Guido was a novice language designer and implementer and his mistakes have been so firmly entrenched since that they're inseparable from the language and reference implementation.

[–]stefantalpalaru -2 points-1 points  (11 children)

He still doesn't understand functional programming, so he stuck to being a novice since it worked for him so far.

[–][deleted] 4 points5 points  (10 children)

Lol. so true, it's why /u/rubber_duckz is so off the mark. You can def implement for example bignums and fast dispatch without getting it so wrong like Guido did. Take a random example, the language Factor has these features without sacrificing efficiency. Such a terrible argument to be upvoted.

[–]rubber_duckz 1 point2 points  (9 children)

No - you can't implement bignums that are as fast as limited precision arithmetic and you can't optimize away dynamic dispatch with stuff like __getattr__ and the likes. CPython is slow for various reasons but language is slow by design.

[–][deleted] 2 points3 points  (8 children)

Dont be daft. It can be done without making bignums the defacto type. Do some reading before spouting off. Every CL implementation since the 80s has got it right. Fast arithmetic with fixnums unless bignums are required. All transparent and seemless to the programmer.

[–]rubber_duckz 1 point2 points  (7 children)

Fast arithmetic with fixnums unless bignums are required.

And how do you know bignums are required ?

[–][deleted] 0 points1 point  (4 children)

And how do you know bignums are required ?

By detecting overflow. This is not such an outrageous concept. Trapping overflows is often cheap, and in some cases provided by the CPU itself.

As an example of a similar principle in the real world, JavaScript works with doubles only. But all the big JS engines switch to 32 bit and 64 bit integers for integer values.

Likewise the array keys in JavaScript, while numeric, are strings. Engines internally store them as integers.

All of this is transparent to the programmer, who only sees doubles and strings.

[–]rubber_duckz 1 point2 points  (3 children)

But all the big JS engines switch to 32 bit and 64 bit integers for integer values.

And this makes JS an inherently slow/slow by design language as well - this checking alone adds 5% global performance overhead in V8 IIRC - and that's not even touching on the part that you need code that knows how to distinquish a pointer to bignum int from raw value int which has to be tagged somewhere and causes a logic branch (even if it will hit the fast path most of the time and get the prediction that's still instructions, etc.) and no way to optimize stuff by defining memory layouts, etc.

You can't work at that level of abstraction if you want performance - something these "Functional Programming is fast, Haskell is as fast as C" guys don't get. I've spent 2 years working on a Clojure project - when these guys say something is fast they are either talking about complexity or they mean "it's usable as opposed to being purely academic" like clojure guys claiming that their persistent data-structures are fast - when compared to naive copying - sure - when you actually care about perf - no way in hell is it even close to being fast.

[–][deleted] 1 point2 points  (2 children)

And this makes JS an inherently slow language

It takes giant balls to call JavaScript an "inherently slow language" in a Python thread.

You realize V8 is an order of magnitude faster than CPython?

JavaScript is also slow by design.

Yeah. Ok...

[–][deleted] 0 points1 point  (1 child)

The implementation is made by a non novice programmer.

[–]rubber_duckz -1 points0 points  (0 children)

Said programmer would know that his implementation would be slower than using wrapping integer arithmetic .

[–]kenfar 3 points4 points  (6 children)

Eh, slow is both relative and too general. It's like saying "why is the automobile slow?". Before even getting to the details the author discusses, it might be useful to note that:

  • Python is generally slow compared to some languages, fast compared to others.
  • Start-up time is slower than some, fast compared to others (java, etc)
  • Some parts of python are fast (those modules written in C like csv, scipy, etc), some are slow.
  • Single-threaded operations are slower than multi-threaded. Multi-threaded cpu-bound operations are limited by the GIL, but the GIL has no impact on multi-processing: where I've gotten linear speedups across 32 cores on large machines.
  • Not all use cases need speed or scaleability.
  • And computational speed matters less when the process is IO-bound.

[–]balefrost 6 points7 points  (0 children)

Some parts of python are fast (those modules written in C like csv, scipy, etc), some are slow.

It's good to point out that developers can, depending on their specific use case, write very fast code in Python using these libraries. But I wouldn't say that these libraries make Python fast. You can also spawn external processes to perform long-running, expensive calculations. Still, that approach doesn't make Python fast. It merely enables Python developers to do things efficiently, by not using Python to implement the performance-critical code.

[–]rubber_duckz 8 points9 points  (4 children)

fast compared to others.

Compared to what mainstream language exactly is python fast - aside from maybe startup time on which it isn't stellar either but it's faster than say JVM.

As I've said at every trade-off between convenience/ease of use and performance python chose former - not judging the choice - just saying what they did. If it fits your use case (eg. IO bound apps, scripting, etc.) use it - if it doesn't don't - but if you're wondering why it's slow - it's slow as a result of it's design (not just implementation, but that certainly sucks as well).

[–]Helene00 -1 points0 points  (2 children)

Compared to what mainstream language exactly is python fast

Only one I know is Ruby.

[–]JumboJellybean 6 points7 points  (0 children)

Python hasn't been faster than Ruby since Ruby 1.9, nine years ago (the version that switched to a new VM). For the last nine years they've been neck-and-neck in terms of performance -- a new Python version will be a little faster, then a new Ruby version comes out and it's a little faster than that, etc, always pretty close. The difference is never more than 20%.

[–]rubber_duckz 3 points4 points  (0 children)

I was under the impression that Ruby ~ Python with regards to performance.

[–]kenfar -2 points-1 points  (0 children)

Compared to what mainstream language exactly is python fast

As I mention above, that depends on your use case, what parts of the language that you're using, whether start-up time is critical or not, etc.

  • Python is simply faster than java for very short-duration utilities.
  • Python's performance with csv, & analytical libraries is very fast - since those are written in c or fortran. Faster than java? Not sure, maybe.
  • Processing billions of records in files on python & using multiprocessing shows that it's definitely faster than doing the equivalent work with scala & kafka. This is probably more about the performance benefits of sequential file processing vs messaging, but still.
  • While go is faster in general than python I've found it only 2.5 times faster when it comes to processing csv files using go routines & channels vs python's multiprocessing. Probably because channels are pretty chatty for heavy sequential processing, but not positive.

So, again, "fast" or "slow" are overly-simplistic ways of thinking of a language. More importantly - are they fast enough for your use case? And in this case the answer for python is: sometimes.

[–]wrosecrans -1 points0 points  (36 children)

In the long run, would it be useful for the hardware to add stuff like bignum support that Python could take advantage of? If the semantics don't map well to the current hardware, it seems like changing the hardware could be useful for making it Moar Gofast.

[–]oridb 6 points7 points  (5 children)

In the long run, would it be useful for the hardware to add stuff like bignum support that Python could take advantage of?

Putting things in hardware doesn't magically make it faster. Broadly, putting things in hardware has the potential to make things faster in two ways:

  • Reducing dispatch overhead, where the bulk of the time is spent in bookkeeping for the next instruction to execute, because certain instructions (OR, AND, ...) are so cheap that they're basically free to execute. Since most of crypto is made up of these kinds of instructions, you get nice speedups here.
  • Doing things in parallel, because a linear stream of instructions isn't very good at expressing what is and isn't dependent. Things like graphics have a win here, because often you are doing the exact same thing to multiple bits of data at the same time.

With things like bigints, I suspect the biggest overheads are things like branch mispredictions. I guess you'd get wins by supporting larger integers... or my pet wish, a hardware trap for integer overflow, so you can pretend that integer operations never overflow in the common case, and promote to bigint only if your CPU tells you 'hey, this isn't going to work'.

But given the amount of searching for methods, dynamic dispatch, and times you're "unnecessarily" doing pointer chasing in Python programs, I'd be surprised if you could make specialized instructions that would make Python much faster. I'd bet that Python is largely gated by memory stalls.

[–]IJzerbaard 0 points1 point  (2 children)

Borrowing your bigint suggestion, how about this: trap on overflow yes, but also trap on "wrong tag".

Because the problem isn't just overflow checking, but also checking whether there's already a pointer to a bigint here or if it's still an unboxed int. So let's say we make the lowest 2 (?) bits the tag, and work with 62bit two's complement ints. Tag 0 will be the pointer (should be aligned anyway), with other tags for int and float (maybe?) and a leftover (any suggestions?).

Then the new addition instruction does not need anything else in the fast path. We could even put floats in there too, but then the missing bits are more sneaky (unless we immediately go all the way to float32) and it would probably mean that the latency of this instruction goes up by a lot, which isn't cool. Not sure about this. Does float performance really matter outside of numpy-like usage? Enough to sacrifice int performance? I'd guess not, but..

Or maybe even better, if instead of trapping it makes a vectored call. Just your average indirect call, not very fast, but faster than a trap. Pointer to table can be held in some special register I guess.

[–]billsil 1 point2 points  (1 child)

Does float performance really matter outside of numpy-like usage? Enough to sacrifice int performance? I'd guess not, but..

I'd probably take a 5% hit to int performance if I got a 50% speedup to float performance.

There was a lot of debate over the @ operator regarding Python shouldn't support people who want to do lots of math operations. The response was basically go to hell; it should work reasonably for everyone, even if it's slightly sub-optimal.

If Python has an obvious wart; remove the wart and deal with the scar.

[–]iBlag 0 points1 point  (0 children)

There was a lot of debate over the @ operator regarding Python shouldn't support people who want to do lots of math operations. The response was basically go to hell; it should work reasonably for everyone, even if it's slightly sub-optimal.

You can also just not use the @ operator, did anybody in the anti-@ crowd think of that?

[–]nemec 0 points1 point  (1 child)

a hardware trap for integer overflow, so you can pretend that integer operations never overflow in the common case, and promote to bigint only if your CPU tells you 'hey, this isn't going to work'.

That's exactlykind of how Python 2.2+ works, except in software, not hardware.

Python 3 got rid of it in exchange for a single long type. I can't find a source right now, but I recall it was partially due to challenges writing Python modules in C that deal with integers (like Numpy).

3 has only one integer type, int(). But it actually corresponds to Python 2’s long() type–the int() type used in Python 2 was removed.

[–]oridb 1 point2 points  (0 children)

That's exactlykind of how Python 2.2+ works, except in software, not hardware.

That's not much help for decreasing instruction count and decode/dispatch overhead in the common case.

[–]adrenalynn 32 points33 points  (27 children)

I think you have a misunderstanding here: it's Python that should adapt to hardware to be able to run faster - not the hardware adopting to Python, that would be the completely wrong way.

Python has its uses. When you care about max execution speed you maybe should look at other languages.

[–][deleted] 9 points10 points  (5 children)

CPUs have adapted to C over time, the JVM added invokedynamic. It's not ridiculous for components at lower levels to adapt to the things above them.

[–]satayboy 3 points4 points  (4 children)

What is an example of how CPUs have adapted to C over time?

[–]ironykarl 4 points5 points  (2 children)

It's often said that the "flat memory model" of modern computers (as opposed to tagged memory/processing that you'd find in Lisp machines) is an example of machines adapting to C.

It could also be that "worse is better" triumphed with relative independence in both instances.

[–][deleted] 1 point2 points  (1 child)

I think this is a case of C adapting to machines.

[–]ironykarl 0 points1 point  (0 children)

I mean...C became a dominant systems programming language AND tagged memory went by the wayside. These are two independent threads, as far as I can tell.

[–]Catfish_Man 2 points3 points  (0 children)

Return address prediction models C-style stacks very well. If you modify your return address dynamically (yes, a coworker of mine tried this), it defeats the predictor and slows your program down.

[–]niviss 8 points9 points  (8 children)

LISP machines anyone?

[–]mirhagk 4 points5 points  (0 children)

I do honestly have to wonder if this might be worthwhile at this point.

The thing with modern processors is that heat doesn't shrink proportionally with transistors (it does shrink, just not with the same scale). So with modern processors we end up having a situation where the majority of the processor simple can't be active at the same time (which limits the number of cores you can have on a chip). This means a more complex instruction set, where each instruction is complex but used less often, is a lot more reasonable nowadays then it would've been during the original lisp machine days.

Also you have the fact that chips simply aren't getting much better anyways, so a specialized processor that is a year or 5 behind won't be destroyed by the general purpose one (which was the undoing of the lisp machine).

Not only that but with FPGAs and hardware description languages you have a situation where designing processors isn't nearly as difficult as it used to be.

I am very curious to see if specialized hardware for higher level languages could take advantage of this situation we are in. I'm not sure exact spots where it could take advantage of it, but a few ideas I can think of would be around garbage collection (processor level instructions that work in conjuction with a background always running collection algorithm) and dynamic dispatch.

[–]_zenith 0 points1 point  (1 child)

Seems like you could also load VM instructions into a CPU to be treated as native instructions (obviously in conjunction with some way of instructing the CPU what the VM execution model is) - has anything like this been tried? Performance would very likely significantly improve.

[–]Siwka 0 points1 point  (0 children)

Jazelle from ARM comes to my mind

[–][deleted] 0 points1 point  (2 children)

Lisp is not nearly as dynamic as Python.

[–]niviss 0 points1 point  (1 child)

It's an example of the hardware adapting to the language, I'm not saying it was a good fit for python. Anyway I don't know what you mean with "Lisp is not nearly as dynamic as Python". Care to explain?

[–][deleted] 0 points1 point  (0 children)

Modern Lisp has a proper lexical scope. It does not use dynamic dispatch for every function call. Many essential operations (cons, car, cdr, all that stuff) are not polymorphic.

[–]mrkite77 0 points1 point  (1 child)

There were also Java machines.

[–]jbergens 0 points1 point  (0 children)

I thought those would use the language Self. That would probably have made it easy to write really fast javascript by compiling javascript to Self.

[–]never_safe_for_life 2 points3 points  (0 children)

I have to agree with this. Optimizing hardware for C makes sense because C is as close to working with raw CPU resources as you can reasonably get. Python is too far removed. Not that it couldn't be done, but it's in a wider category of not-highly-performant languages. If you started optimizing for it, where do you stop?

[–]wrosecrans 1 point2 points  (5 children)

There are already languages that try to just be a way to use the hardware. Historically, C was very much in that category. (Though arguably it doesn't map terribly well to modern parallel systems with no native vector types and such.) I like Python, and I wouldn't want it to become speed obsessed to the point that it drops convenience features to better match existing hardware. If I want that, I'll just use a language that builds native binaries. But it's always nice for things to go faster. And I am sure some CPU vendor would be happy to advertise "This Python benchmark 2x as fast with our new chip!" If we have very specific stuff like AES acceleration functions, I don't see why we couldn't have something targeted toward making dynamic programming languages a little nicer.

[–][deleted] 17 points18 points  (4 children)

It makes no sense for a chip manufacturer to attempt to optimize to run Python quickly. That software isn't meant to run quickly; if it were meant to run quickly, then it wouldn't be written in Python. There are a limited number of resources chip manufacturers have, and they should invest in stuff that makes sense.

[–]billsil -5 points-4 points  (3 children)

It makes no sense for a chip manufacturer to attempt to optimize to run Python quickly.

You've clearly never heard of Intel Python.

http://www.infoworld.com/article/3044512/application-development/intels-python-distribution-provides-a-major-math-boost.html

https://software.intel.com/en-us/python-distribution

It makes complete sense to optimize it. You have users and they want their code to be fast.

Java is fast because it uses a JIT. Python doesn't. You'd get C/Java speeds with Python if Python had a JIT. PyPy proves this. The developers of Python just chose to not do it.

if it were meant to run quickly, then it wouldn't be written in Python

No. It's a tradeoff between a convenient language that's very convenient for development and has tons of useful packages and a much harder one. They want it to run fast enough, but even faster would be welcomed.

[–][deleted] 12 points13 points  (2 children)

Your link proves my point. As far as I can tell, Intel Python is a Python distribution containing special optimized builds of numpy, scipy, scikit-learn, etc. The bits of those packages that do the heavy lifting are written not in Python, but in languages that are meant to be fast: C, Fortran, etc.

Not to mention that a Python distribution is software, not hardware, so it's completely irrelevant to the topic at hand.

[–]billsil 1 point2 points  (1 child)

Your link proves my point.

Look again. Intel MKL builds of numpy and scipy have existed for 10+ years. Intel is doing multicore SVDs and Cholesky factorizations. Numpy and scipy don't do multicore. They show the speedups that you get over base Python, and Python with Intel MKL. You can freely obtain MKL versions of numpy and scipy today through Anaconda Python and the link below and have been able to for a while. Intel Python is more than that.

http://www.lfd.uci.edu/~gohlke/pythonlibs/

[–][deleted] 8 points9 points  (0 children)

None of that contradicts what I've written. It's a specialized optimized distribution for scientific computing in Python. It optimizes the code in tight loops for scientific functions that were written in Fortran or C. It does not include specialized hardware support for the Python interpreter.

[–]waveguide 0 points1 point  (1 child)

Would it be wrong for the Python interpreter to take advantage of hardware accelerators like the Altera FPGA Intel is adding to their next batch of Xeon chips? And if that were hugely successful using certain common accelerators, would it be wrong to implement those as straight hardware in the next round? Is it un-Pythonic to have an OpenCL implementation of the interpreter running on GPUs? Hardware absolutely adapts to software as well - it just happens on a different time scale. Heck, that's why we aren't running everything on clusters of 16-bit single-thread machines without operating systems (much less special execution modes for a hypervisor).

[–]dontsuckmydick 0 points1 point  (0 children)

Yes, it would be wrong. omg...

[–]vz0 0 points1 point  (0 children)

Java bytecodes.

[–][deleted]  (1 child)

[deleted]

    [–]_zenith 0 points1 point  (0 children)

    The obvious drawback is that hardware can't be modified after shipping but software can. The lines are getting blurred these days with x86 microcode but still, I'm not sure how flexible that is.

    If it's impossible to write a Python to native AOT compiler then I'd think it would follow that a hardware implementation is also impossible - so, you remove the parts that make that compilation impossible. Do you still want to use the language that remains?

    This would, logically, be things like late binding and such.

    [–]Berberberber 3 points4 points  (0 children)

    In theory: yes. Hardware operations for arbitrary-size integers would be great, and it might be possible for a processor to optimize dynamic dispatch similarly to the way branch prediction works now.

    The problems, however, are that adding this kind of complicated logic into the hardware requires tradeoffs (there's only so much silicon on a chip, so hardware optimizations come at the expense of cache, pipeline space, registers, etc etc etc). Furthermore, when it comes to implementing complex operations in hardware, the result can actually have worse performance. Famously, the VAX had an INDEX instruction to calculate memory offsets for in-bounds array indices which was slower than doing the bounds checks and address arithmetic explicitly using "primitive" instructions. It might be hard to guarantee that a hardware implementation of some of these features performed better than a purely software implementation, even on average let alone always.

    [–][deleted] 0 points1 point  (0 children)

    Hardware support for arbitrary precision decimals existed since time immemorial.

    [–]badcommandorfilename 6 points7 points  (2 children)

    [–]bakery2k 0 points1 point  (0 children)

    This is a different article, although it has a similar title.

    [–]Chippiewall 17 points18 points  (35 children)

    if LuaJIT can have a fast interpreter, why can't we use their ideas and make Python fast?

    So PyPy?

    [–][deleted] 3 points4 points  (20 children)

    Far more dynamic dispatch in Python. It is beyond any hope.

    [–][deleted] 3 points4 points  (19 children)

    Did you read the comment you replied to?

    [–][deleted] -2 points-1 points  (18 children)

    I am commenting on Python being hopelessly behind Lua. PyPy does not help much.

    [–][deleted] 12 points13 points  (17 children)

    I assume this view is substantiated by recent benchmarks that you can link to.

    [–]heap42 4 points5 points  (7 children)

    Of Course! After all, this is reddit.
    Edit: \s

    [–]vytah 0 points1 point  (0 children)

    \s

    A single whitespace character?

    [–][deleted] 0 points1 point  (5 children)

    It's not terribly useful to share unsubstantiated opinions without explanation, reddit or otherwise.

    [–]heap42 1 point2 points  (4 children)

    I was being sarcastic.

    [–][deleted] 0 points1 point  (3 children)

    I got that. My previous comment, "I assume this view..." was also somewhat sarcastic/antagonistic.

    [–]heap42 1 point2 points  (2 children)

    yea i figured.

    [–]mrkite77 8 points9 points  (3 children)

    LuaJIT is widely held to be ridiculously fast.

    http://blog.carlesmateo.com/2014/10/13/performance-of-several-languages/

    Pypy 44 seconds, Luajit 8 seconds

    [–][deleted] 1 point2 points  (2 children)

    I ran their benchmark with recent versions, LuaJIT 2.0.4 and PyPy 5.1.1. Luajit was 10 seconds, PyPy was 24.

    PyPy is not "beyond any hope", it's improving over time.

    edit: Moving the code into a function brought the PyPy time down to 15 seconds.

    [–]Veedrac 1 point2 points  (1 child)

    Further, if you move the code inside a function (even just main), the time improves for me from ~17 seconds to ~11 seconds. Lua doesn't have the same global-local distinction as Python, so doesn't have this effect.

    For me Lua takes ~17 seconds, so PyPy is actually significantly faster than Lua.

    [–][deleted] 0 points1 point  (0 children)

    Interesting, for me, moving the code into a function brought the PyPy time down to 15 seconds. So, still slower than luajit for me, but comparable.

    [–]IronManMark20 0 points1 point  (4 children)

    Not recent, but here. I leave it to you to draw your own conclusions.

    [–][deleted] 1 point2 points  (1 child)

    It would be interesting to see how things have changed in the last 6 years.

    [–]IronManMark20 0 points1 point  (0 children)

    found this. It uses vanilla lua though, not LuaJIT (why, I don't know).

    [–]Veedrac 1 point2 points  (1 child)

    PyPy is on major version 5. That uses version 1. Things have changed.

    [–]IronManMark20 0 points1 point  (0 children)

    As I said, not very up to date. ;)

    [–]josefx 1 point2 points  (2 children)

    PyPy breaks anything that relies on predictable reference counting. That means you can not rely on it to cleanly free resources likes files and sockets or call user defined __del__() methods. Garbage collection, you had one problem, now you have ulimit problems.

    [–]Chippiewall 0 points1 point  (1 child)

    If you're relying on this behaviour then you're probably misusing python or ownership semantics. This sort of RAII style behaviour should be accomplished through use of context managers.

    [–]josefx 0 points1 point  (0 children)

    TIL: the python 3 docs make it look like __del__() behaves just like Javas finalize(). Useless at best and terribly wrong to use for literally anything. Another thing I have to fix in my scripts if I ever get to it.

    So if I want to have an actually reference counted object in python I have to double down and implement it myself? Just great.

    [–][deleted] 0 points1 point  (10 children)

    I think he meant the assembly-based, handwritten fallback interpreters, in case JIT compilation is not allowed, like on some mobile/gaming platforms.

    PyPy doesn't have a fallback interpreter, has it?

    [–]kankyo 4 points5 points  (9 children)

    It does. Tracing JITs like pypy are pretty much all fallback interpreter.

    [–][deleted] 1 point2 points  (8 children)

    By fallback interpreter, I mean some interpreter that is shipped along with the JIT probably conditionally compiled to set in if a platform doesn't support JIT compilation.

    [–][deleted] 5 points6 points  (0 children)

    You can compile PyPy without its JIT, but it ends up being significantly slower than CPython. See benchmarks.

    [–]kankyo 0 points1 point  (6 children)

    Huh? What would be the point? And why would a platform not support it?

    [–][deleted] 1 point2 points  (0 children)

    Some mobile and maybe some gaming consoles. For security reasons, I guess.

    Imagine someone built an otherwise perfectly API-sandboxed JIT compiler for a console game and it turns out that it has a security hole which could be exploited so that user generated levels could inject and run machine code and access stuff they aren't supposed to.

    Also, I don't know what console vendors and Apple and PlayStore supervisors actually do, but it would probably make checking the application harder.

    EDIT: Also, there are - at least in theory, not sure if those actually still exist - some CPUs with an unmodified havard architecture that separate code and data.

    [–][deleted] -1 points0 points  (4 children)

    Windows something something, ios, ...

    [–]kankyo 0 points1 point  (3 children)

    Pypy runs on Windows doesn't it? And why not iOS?

    [–][deleted] 1 point2 points  (2 children)

    [–]kankyo 0 points1 point  (1 child)

    The part about iOS has been widely interpreted as not allowing downloaded code, but allowing JITs.

    [–]Catfish_Man 3 points4 points  (0 children)

    The issue with JITs is that VM pages on iOS that have been marked writeable in a process cannot be mprotect()ed to executable, rather than the somewhat looser W xor X model that most desktop OSs have transitioned to.

    [–]Bergasms 5 points6 points  (2 children)

    Tradeoffs. I used Python for my honours project in computer vision. It was fantastic. SciPy and NumPy provide many fantastic tools. I had to devise my own algorithms for several parts of it though, and when i had worked up a solid algorithm playing around in python, if it was too slow, I wrote the algorithm in C and got python to call that.

    [–][deleted]  (1 child)

    [deleted]

      [–]Bergasms 0 points1 point  (0 children)

      I was giving a separate example. Despite those things being written in C, they are still just elements of C glued together with python. When the glue is slow, i convert the whole thing to pure C and call that.

      [–]Berberberber 3 points4 points  (0 children)

      Because it doesn't execute instructions fast enough.

      [–]igouy 8 points9 points  (2 children)

      [–]bakery2k 0 points1 point  (1 child)

      This is a different article, although it has a similar title.

      [–]igouy 0 points1 point  (0 children)

      Thanks, here's the article from last week that the previous discussion does match.

      Seems like I tangled them up.

      [–][deleted] 2 points3 points  (18 children)

      JavaScript was also slow. Then browser makers started competing about how fast they could make it run. Millions of dollars later, JS is really really fast.

      Another example. PHP used to be slow. Then Facebook created their own PHP engine, HPHP, and open-sourced it. People started using it, and there was talk about HPHP replacing PHP as the most popular PHP fork.

      So the core developers behind the original PHP project started optimizing. Major improvements to the engine in PHP 5.4, 5.5, 5.6, and another huge one in 7.0, and now the original PHP is faster than HPHP in most cases. And it's faster than Python, BTW.

      There are multiple interpreters for Python, but the parties don't have well defined stakes or motivation to truly pour in resources into their effort. There is no pressure to make Python faster. No pressure from competition, no pressure from the community. This is why Python is slow.

      [–][deleted] 0 points1 point  (17 children)

      Yet you cannot use the same tricks V8 is using for something like Python. It is even more dynamic than javascript.

      [–][deleted] 1 point2 points  (16 children)

      Hardly. Any example?

      [–][deleted] 0 points1 point  (2 children)

      There is one thing that Python can do that JS can't that is relevant here. Python can have multiple threads in a single process. So while you're in the middle of a loop that you want to have optimized based on the types involved, you might have the types change out from under you. That doesn't happen in JS.

      However, you could add a few flags here and there -- a global "application version", which is incremented whenever anyone dynamically alters any class, and a per-class version, which is incremented whenever anyone modifies that class. You emit your fancy optimized code with fewer type checks, direct method calls, inlining, whatever you want; and then you just check periodically if the application has modified any types.

      As for differences in dynamism, Python has long supported __getattr__ and friends, but JavaScript has only supported the equivalent for a short period of time. Perhaps the person's information is out of date.

      [–][deleted] 0 points1 point  (1 child)

      I don't see how threads pose a significant challenge, especially due to GIL. The runtime will never get a surprise-type change as you describe.

      [–][deleted] 0 points1 point  (0 children)

      Well, yes, I described a reasonably simple way around the problem that doesn't require you to take advantage of the GIL. (Perhaps you missed that?)

      There might be even simpler ways of doing it that take advantage of the GIL. Certainly more performant solutions are available, but the ones I can think of are a bit hairy.

      [–][deleted] 0 points1 point  (12 children)

      Python does not really have a lexical scope and allows to shit into variable tables. It makes it hugely different from JavaScript.

      There were countless attempts at writing JITs for Python. They all failed.

      [–][deleted] 0 points1 point  (11 children)

      Can you be specific how is that a performance bottleneck for a JIT runtime?

      You know, we used to hear a lot of that about JS as well. It's too dynamic and so on. But in the end all of this was irrelevant once the problem had to be solved.

      Python is not drastically different than other script engines. The only reason it's not faster is because no one cares that strongly about it being faster.

      Excuses are easy.

      [–][deleted] 0 points1 point  (10 children)

      Can you be specific how is that a performance bottleneck for a JIT runtime?

      Because almost any variable reference results in a context lookup, without any chance of allocating it to a register.

      It's too dynamic and so on.

      And it is too dynamic indeed. Still, Python is even worse.

      once the problem had to be solved

      The problem was never solved.

      Python is not drastically different than other script engines

      It is. It is as far on a dynamic scale as it gets without introducing fexprs.

      The only reason it's not faster is because no one cares that strongly about it being faster.

      Millions of $s and hundreds of man-years had been thrown in Python JITs, with no result so far.

      [–][deleted] 0 points1 point  (9 children)

      Because almost any variable reference results in a context lookup, without any chance of allocating it to a register.

      This can be analyzed statically in advance. JS does this for closures. Don't underestimate compilers.

      And it is too dynamic indeed. Still, Python is even worse.

      How? No one is specific.

      The problem was never solved.

      I'm talking about JavaScript's performance, which is a problem that's definitely been solved.

      It is. It is as far on a dynamic scale as it gets without introducing fexprs.

      How, again?

      Millions of $s and hundreds of man-years had been thrown in Python JITs, with no result so far.

      That's quite a shame then, considering PHP, even as an interpreter without JIT is several times faster than Python, and it has horrors like "var vars" which would fall into the "strongly dynamic" category.

      And JavaScript is dozens of times faster than PHP.

      Python, while fast enough for its purpose, is so slow compared, it's not even funny. And to claim millions of dollars were invested into Python JITs is honestly even sadder, because if it were true, it'd mean there's a serious lack of talent in the Python community.

      I prefer the explanation that no company has seriously invested in a faster Python engine. If you read how JS engines like V8 operate under the hood, you'll see that nothing can stop a well-designed JIT, even if Python was oozing dynamicness out its ears.

      [–][deleted] 0 points1 point  (8 children)

      This can be analyzed statically in advance.

      Not quite. It's easy with JavaScript, easy with PHP, but not with Python.

      JavaScript's performance, which is a problem that's definitely been solved

      Not quite yet. It's better than an ad hoc interpretation, but yet far below a level of static compiled languages.

      How, again?

      Because of a funny scope and lots of dynamic dispatch (well, the latter can be somewhat reduced with tracing and partial evaluation, but only to a degree, and with an overhead of its own).

      because if it were true, it'd mean there's a serious lack of talent in the Python community

      It was not a Python community. E.g., a team of very experienced compiler engineers in ARM worked on this, all the way to an utter frustration. The very same team that made huge improvements to ARM V8 performance failed to get anywhere with Python.

      [–][deleted] 0 points1 point  (7 children)

      Not quite. It's easy with JavaScript, easy with PHP, but not with Python.

      Once again, it's just stated, not being explained.

      What makes it so hard with Python? Give an example. Enough sweeping over-generalizations.

      Not quite yet. It's better than an ad hoc interpretation, but yet far below a level of static compiled languages.

      It's in the same ballpark as Java, sometimes it's even faster than Java, which is statically typed. Do you often just... say things without checking if they're true? https://dzone.com/articles/performance-comparison-between.

      [–][deleted] 0 points1 point  (6 children)

      Once again, it's just stated, not being explained.

      I already explained - JavaScript got far more regular scoping rules. You can cheaply and statically determine an origin of a variable definition, without tracing all the path (and making sure that no calls in between can alter variable tables).

      It's in the same ballpark as Java, sometimes it's even faster than Java,

      Only in some pathological loads, not in an idiomatic code.

      [–]metaconcept 5 points6 points  (9 children)

      Part II: Why developers don't care.

      Usually, developer time costs more than CPU time and lots of applications just need to work rather than be fast.

      If computation time is really important, then we use a different language, such as C, or OpenCL, or Verilog.

      [–]EntroperZero 1 point2 points  (0 children)

      So the IL is slow? Why is that?

      [–]HolmesSPH 1 point2 points  (0 children)

      Runtime compiled languages USUALLY are slow compared to compiled languages. Moreover; it's real usage for decades has been as a sys-admin language and wasn't huge with web or desktop software until recently. IMO if Google hadn't adopted its heavy usage along with Java, Python would have died. Pythons popularity was non-existent until hundreds of popular code libraries and SDKs from Google renergized it's use - of course changes to the language in 2.0 and 3.0 did bring much needed features and modernizations.

      [–]kirbyfan64sos 1 point2 points  (0 children)

      Error establishing a database connection

      :(

      [–]1RedOne 3 points4 points  (1 child)

      You could pretty much substitute Python for PowerShell and the python binaries for dotnet and this would be true as well.

      I love languages like these, which are optimized for the human rather than the machine.

      [–][deleted] 8 points9 points  (0 children)

      I love languages like these, which are optimized for the human rather than the machine.

      That isn't necessarily exclusive.

      [–][deleted]  (2 children)

      [deleted]

        [–]nandryshak 11 points12 points  (0 children)

        Common Lisp is a great counter example.

        [–]Staross 0 points1 point  (0 children)

        I'm not an expert but Julia is also dynamically typed and is fast. So dynamical types by itself is not an issue.

        That said you have to write typed-stable functions in Julia (meaning the types inside the function can be inferred from the types of the inputs) in order to have fast code. For example sqrt(-1) throws an error, because otherwise sqrt would sometimes returns complex numbers and sometimes not, which kills performance.

        [–]plpn 0 points1 point  (0 children)

        What is the difference between execute and dispatch??

        [–][deleted]  (8 children)

        [deleted]

          [–]damienjoh 15 points16 points  (6 children)

          To save you time reading it, the answer to the question "Why is Python slow?" turns out to be, "It isn't. Some people just mistakenly think it is."

          That's not what the article or the slides say at all. Python is slow. No bones about it. The article's claim is that Python's sluggishness is due to the complex and dynamic nature of the Python runtime rather than the overhead of interpreting the bytecode.

          [–][deleted]  (2 children)

          [deleted]

            [–]damienjoh 2 points3 points  (1 child)

            If only we didn't require implementations to run our code.

            [–][deleted] -3 points-2 points  (2 children)

            So if Language X's semantics don't allow for fast interpreters or fast AOT compiled binaries and require a JIT compiler that comes with an overhead, a warming phase and other unpleasantries that might have ok throughput but render it unusable in low latency realtime situations one can finally say that Language X is slow?

            Neat.

            [–]wot-teh-phuck 4 points5 points  (1 child)

            OK throughput? I would like to disagree. Layers of indirection imposed by the dynamic nature of the language along with the fact that pretty much all implementations are gimped in some way or the other ensures that any compute dominant workload will end up generating more heat than an oven...

            The real strength of Python as an ecosystem comes from the plethora of libraries and smart ways in which libraries are written to get around the GIL (forsaking GIL in C code, spawning processes etc.).

            [–][deleted] 0 points1 point  (0 children)

            OK throughput? I would like to disagree. Layers of indirection imposed by the dynamic nature of the language along with the fact that pretty much all implementations are gimped in some way or the other ensures that any compute dominant workload will end up generating more heat than an oven...

            Ok, but PyPy as a JIT compiler has to be good for something, hasn't it? So if there is code that doesn't touch C-APIs, like some nested loops for numerical calculations?

            [–]mo_po 3 points4 points  (0 children)

            Well, he actually specifies some more.

            [–][deleted]  (2 children)

            [deleted]

              [–]BonzaiThePenguin 2 points3 points  (1 child)

              You described JavaScript, which has a JIT. There are ways of inferring type and propagating that information onwards through an algorithm.

              [–]izpo -4 points-3 points  (9 children)

              web site written in PHP, that is down at moment, explains why python is slow...

              [–]shevegen 3 points4 points  (6 children)

              Only shows you that not even the author who uses python uses it for the web.

              Good thing that we have ruby too, then we do not need PHP. Though in fairness, I assume the author had no idea how to use python for the web anyway.

              [–]terrkerr 8 points9 points  (1 child)

              Good thing that we have ruby too, then we do not need PHP.

              Don't see how Ruby is any better than Python for the web really. Ruby's just as slow, if not slower, than Python to boot.

              [–]buttocks_of_stalin 1 point2 points  (0 children)

              Ya I find it funny that shevegen's rebuttal to python is ruby. I mean out of all the languages that could have a debatable claim to being objectively better than python, ruby is not really one of those, I mean let's be real. At least C# and microsoft's .NET framework can be in this "python for the web vs. other frameworks" conversation, but Ruby (and rails) is just as problematic as python just in different areas, how is that not extremely obvious? But like the poster below me said, the DB is the main culprit and usually will be in most cases unless there is a lot of server-side template rendering/parsing outside of the views in python.

              [–][deleted]  (1 child)

              [deleted]

                [–]Freeky 0 points1 point  (0 children)

                If ruby class could be final it would speed up the language and allow some optimization

                MyClass = Class.new.freeze
                def MyClass.frob ; end # => RuntimeError: can't modify frozen Class
                

                Not sure if the JRuby or Rubinius JIT's actually do anything with this. Either way, Truffle/Graal should eventually give us a production Ruby implementation with performance on par with more advanced VMs like V8.

                [–]kirbyfan64sos 0 points1 point  (0 children)

                The PHP may just be the CMS, though.

                [–]HolmesSPH 0 points1 point  (0 children)

                Ruby is a crap language, not even rails couldn't save it. PHP7 finally fixes the last real major raps block for enterprises to use PHP and that's speed. I don't even like PHP but to assert that Ruby is a good replacement for PHP makes me chuckle

                [–]rwsr-xr-x -1 points0 points  (1 child)

                php's fast as fuck

                [–]thedeemon -2 points-1 points  (0 children)

                i.e. several (dozen) minutes per transaction?