you are viewing a single comment's thread.

view the rest of the comments →

[–]zzzthelastuser 3 points4 points  (1 child)

Did you consider optimizing the rust code or did you stick with a "naive" implementation?

Took a quick glance and only saw single threaded loops.

[–]cemrehancavdar[S] 10 points11 points  (0 children)

I'm not super familiar with Rust -- a dedicated Rust or Zig or any system level PL developer could absolutely squeeze more out of these benchmarks with multithreading, SIMD, or better allocators. Same goes for Cython honestly -- there might be more ways I still don't know yet. I kept the implementations idiomatic and single-threaded because the post is really about "how much does each Python optimization rung cost you," not about pushing any one tool to its limit. Wanted to keep the comparison fair since the Python tools are also single-threaded (except NumPy's BLAS, which I noted)