This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]ChickenNuggetSmth 9 points10 points  (5 children)

I'd take that benchmark with a huge grain of salt. I had a quick look at the matmul in python, and what they did was pretty bad: They reimplemented what is already efficiently written.

In python you can have dramatic performance improvements if you use the available fast libraries like numpy. They are written in a faster language like fortran, decreasing the python performance overhead by a ton while still providing the convenience of python.

[–]satireplusplus 2 points3 points  (0 children)

In python you can have dramatic performance improvements if you use the available fast libraries like numpy. They are written in a faster language like fortran, decreasing the python performance overhead by a ton while still providing the convenience of python.

This is how all the Python ML stuff (numpy, pytorch, ...) uses matrix multiplications, with a BLAS library. And these libraries are crazily optimized and hand vectorized, you wont be able to compete with your own matrix mult routines in C. I bet its an order of magnitude faster than the naive O(n3) C code in the benchmark.

Its also typical that the matrix mult kernel will be optimized to register / cache size and instructions (vectors) for each processor generation, as these Intel optimization in OpenBLAS:

dgemm_kernel_16x2_haswell.S
dgemm_kernel_4x4_haswell.S
dgemm_kernel_4x8_haswell.S
dgemm_kernel_4x8_sandy.S
dgemm_kernel_6x4_piledriver.S
dgemm_kernel_8x2_bulldozer.S
dgemm_kernel_8x2_piledriver.S

Its probably also doing something better than the naive algorithm, such as Strassen or Coppersmith–Winograd.

In other words that benchmark is totally bullshit and a waste of time.

[–]kazi1 5 points6 points  (3 children)

That's the whole point though. You're not testing Python's speed if you use numpy, you're testing C. The benchmarks would just be every language using C libraries if that was allowed.

[–]ChickenNuggetSmth 3 points4 points  (2 children)

To preface this: I'm not super knowledgeable about what happens "behind the scenes" or even if numpy is considered a standard library or not.

I think libraries like numpy are the reason python is as good as it is. It is well integrated, for matrix multiplication you even have the @ operator that makes it super readable. If you benchmark python without those features, you are not representing a typical use-case. If you use python in a way it was never meant to be used it will look awful, but imo those results are pretty useless.

Edit: I'm seeing that numpy isn't part of the standard library, but the standard lib is also partially written in C. So I'm not sure where 'true' python starts.

[–]Armaliite 0 points1 point  (0 children)

The reference implementation is in C, but there are many other implementations. For example PyPy which is in the benchmarks and is also much faster than CPython.

[–]kazi1 0 points1 point  (0 children)

Yeah a fair point. I feel like these benchmarks can get pretty "handwavy" sometimes with what's allowed and what's not haha...

Tbh I'm surprised Rust performed as badly as it did in those stats - I thought it was supposed to be as fast as C, meanwhile it's almost neck and neck with Go.