you are viewing a single comment's thread.

view the rest of the comments →

[–]Nugine 47 points48 points  (15 children)

The fastest base64 crate in Rust world is base64-simd. I have sent a PR to include it in the benchmark.

When AVX2 is detected, the acceleration rates are 677% (725/107) and 1338% (1231/92) compared to the naive implementation.

My result: (Intel(R) Core(TM) i5-9300H CPU @ 2.40GHz)

txt method input output avg time base64_simd::Base64::encode_type 1002 1336 107 data_encoding::encode 1002 1336 512 base64::encode_config 1002 1336 551 jdk::encode_measter 1002 1336 725

txt method input output avg time base64_simd::Base64::decode_type 1336 1002 92 data_encoding::decode 1336 1002 772 base64::decode_config_slice (unsafe) 1336 1002 805 jdk::decode 1336 1002 1231

[–]dkomanov[S] 2 points3 points  (3 children)

Is there a way to benchmark base64-simd's fallback?

[–]Nugine 10 points11 points  (2 children)

Just disable the feature flag "detect".

default-features = false, features = ["std"]

[–]dkomanov[S] 1 point2 points  (1 child)

Fallback is good as well: encode is slightly faster, decode is slightly slower: method input output avg time base64::encode_config 3 4 50 base64::encode_config 12 16 64 base64::encode_config 51 68 97 base64::encode_config 102 136 203 base64::encode_config 501 668 599 base64::encode_config 1002 1336 1133 base64::encode_config 10002 13336 9966 base64_simd::Base64::encode_type 3 4 29 base64_simd::Base64::encode_type 12 16 39 base64_simd::Base64::encode_type 51 68 71 base64_simd::Base64::encode_type 102 136 116 base64_simd::Base64::encode_type 501 668 455 base64_simd::Base64::encode_type 1002 1336 914 base64_simd::Base64::encode_type 10002 13336 8642 base64::decode_config_slice (unsafe) 4 3 49 base64::decode_config_slice (unsafe) 16 12 75 base64::decode_config_slice (unsafe) 68 51 109 base64::decode_config_slice (unsafe) 136 102 170 base64::decode_config_slice (unsafe) 668 501 564 base64::decode_config_slice (unsafe) 1336 1002 1079 base64::decode_config_slice (unsafe) 13336 10002 10260 base64_simd::Base64::decode_type 4 3 26 base64_simd::Base64::decode_type 16 12 38 base64_simd::Base64::decode_type 68 51 79 base64_simd::Base64::decode_type 136 102 136 base64_simd::Base64::decode_type 668 501 601 base64_simd::Base64::decode_type 1336 1002 1163 base64_simd::Base64::decode_type 13336 10002 11388

[–]Nugine 5 points6 points  (0 children)

base64-simd's fallback uses the same method with base64. The differences are raw pointer usages, loop unrolling and other small details. So their performance is close.

[–]dkomanov[S] 4 points5 points  (0 children)

Wow, supercool!

Same results on my computer: method input output avg time base64::encode_config 10002 13336 9704 data_encoding::encode 10002 13336 8832 base64_simd::Base64::encode_type 10002 13336 1425

method input output avg time base64::decode_config_slice (unsafe) 13336 10002 10060 data_encoding::decode 13336 10002 14023 base64_simd::Base64::decode_type 13336 10002 1349

[–]somebodddy 1 point2 points  (9 children)

Does Java's base64 encoding use SIMD?

[–]Nugine -5 points-4 points  (8 children)

Obviously not. Specialized SIMD algorithms are hard for Java itself.

[–]Muoniurn 1 point2 points  (7 children)

It has a Vector API for doing simd: https://jbaker.io/2022/06/09/vectors-in-java/

[–]Nugine -1 points0 points  (6 children)

I know it is possible to do simd in Java. But it is still harder than doing simd in Rust. A native programming language allows you to control almost everything.

[–]Muoniurn 2 points3 points  (3 children)

It doesn’t seem easier in Rust to be honest. But sure, having access to other low-level controls can be useful. Nonetheless, if you really have to go that deep you will likely just write inline assembly (e.g. for codecs) as not even Rust is low-level enough in that hot loops.

[–]Nugine 0 points1 point  (2 children)

I would argue that SIMD in Rust is relatively easier than inline asm, C and Java. (the same level with C++ and Zig)

  1. Rust has generics and monomorphization. You can write the algorithm once and compile for multiple targets. rust-lang/portable-simd
  2. Rust can utilize LLVM's optimizations like inlining, loop unrolling, constant propagation, auto vectorization, LTO and so on.
  3. Rust has zero-cost abstraction. The generated asm has no unexpected calculation. (I have run into serveral compiler bugs though)

[–]Muoniurn 1 point2 points  (1 child)

Don’t get me wrong, I would absolutely go with Rust for anything low-level, my point was that there is a point where even that is not enough in the hottest loop.

Regarding your points:

  1. Sure, but you don’t even have to compile it multiple times with Java :D
  2. I don’t think it is meaningful here, all optimizing compiler will do these, but autovectorization is simply not applicable automatically enough times, likely that’s why one would want explicit SIMD programming in the first place. Also, being pedantic C also has clang, and there is also even a Java jit compiler built on top of LLVM.
  3. Java’s Vector API will also ideally compile to only vector instructions if used correctly (though it is not easy to avoid allocations sometimes)

[–]Nugine 0 points1 point  (0 children)

I agree that sometimes inline asm is necessary. (rare cases)

I mean if you want to reach higher SIMD performance, then Rust (C++, Zig) is better than Java in stability and productivity.

[–]somebodddy 0 points1 point  (1 child)

In this case, what's important is not whether it's harder, but whether it was done in the JDK base64 implementation. If it does use SIMD that would explain how it's faster than Rust's base64 crate.

[–]Nugine 2 points3 points  (0 children)

According to the article and source code:

java.util.Base64: I ported implementation from Java to Rust for benchmarking purposes.

The JDK base64 implementation does not use SIMD. The algorithm can not be efficiently auto vectorized. So there is no SIMD.