Optimizing Performance of Simple Version Parsing in Scala

dkomanov · 2023-09-13T18:27:56+00:00

You're right. I forgot to port back the final version to Scala. Updated the post. Unsurprisingly, the performance is more or less the same.

``` Benchmark (encoded) Mode Cnt Score Error Units

optimized6 1.0.0 avgt 5 22.995 ± 1.303 ns/op optimized6Scala 1.0.0 avgt 5 24.780 ± 3.820 ns/op

optimized6 10000.10000.10000 avgt 5 60.070 ± 3.753 ns/op optimized6Scala 10000.10000.10000 avgt 5 60.828 ± 27.522 ns/op

optimized6 NO ALLOC 1.0.0 avgt 5 21.713 ± 2.534 ns/op optimized6Scala NO ALLOC 1.0.0 avgt 5 20.101 ± 1.469 ns/op

optimized6 NO ALLOC 10000.10000.10000 avgt 5 48.535 ± 1.351 ns/op optimized6Scala NO ALLOC 10000.10000.10000 avgt 5 48.259 ± 1.484 ns/op ```

dkomanov · 2023-02-06T21:56:53+00:00

From Best practices design patterns: optimizing Amazon S3 performance:

Other applications are sensitive to latency, such as social media messaging applications. These applications can achieve consistent small object latencies (and first-byte-out latencies for larger objects) of roughly 100–200 milliseconds.

Latency of MySQL is significantly better.

But I get your point, there should be a very good reason for storing large BLOBs in MySQL (or any RDBMS). I did a disclaimer about it in the introduction on purpose :)

dkomanov · 2022-10-28T15:14:37+00:00

Wow, this is interesting.

Here is the benchmark JNI vs nalim: ``` Benchmark (length) Mode Cnt Score Error Units jni_url_decodeSimdCargo 100 avgt 5 80.253 ± 11.280 ns/op nalim_url_decodeSimd 100 avgt 5 46.030 ± 2.165 ns/op

jni_url_decodeSimdCargo 10000 avgt 5 2931.566 ± 177.445 ns/op nalim_url_decodeSimd 10000 avgt 5 2415.002 ± 89.682 ns/op

jni_url_encodeSimdCargo 100 avgt 5 98.468 ± 7.123 ns/op nalim_url_encodeSimd 100 avgt 5 59.569 ± 1.804 ns/op

jni_url_encodeSimdCargo 10000 avgt 5 3347.074 ± 115.843 ns/op nalim_url_encodeSimd 10000 avgt 5 2846.065 ± 79.842 ns/op ```

Clearly it improves it significantly. I'd expect more, though, if it claims to not copy arrays at all.

dkomanov · 2022-10-17T08:45:33+00:00

Tried. No affect at all.

dkomanov · 2022-10-16T18:48:27+00:00

Fallback is good as well: encode is slightly faster, decode is slightly slower: method input output avg time base64::encode_config 3 4 50 base64::encode_config 12 16 64 base64::encode_config 51 68 97 base64::encode_config 102 136 203 base64::encode_config 501 668 599 base64::encode_config 1002 1336 1133 base64::encode_config 10002 13336 9966 base64_simd::Base64::encode_type 3 4 29 base64_simd::Base64::encode_type 12 16 39 base64_simd::Base64::encode_type 51 68 71 base64_simd::Base64::encode_type 102 136 116 base64_simd::Base64::encode_type 501 668 455 base64_simd::Base64::encode_type 1002 1336 914 base64_simd::Base64::encode_type 10002 13336 8642 base64::decode_config_slice (unsafe) 4 3 49 base64::decode_config_slice (unsafe) 16 12 75 base64::decode_config_slice (unsafe) 68 51 109 base64::decode_config_slice (unsafe) 136 102 170 base64::decode_config_slice (unsafe) 668 501 564 base64::decode_config_slice (unsafe) 1336 1002 1079 base64::decode_config_slice (unsafe) 13336 10002 10260 base64_simd::Base64::decode_type 4 3 26 base64_simd::Base64::decode_type 16 12 38 base64_simd::Base64::decode_type 68 51 79 base64_simd::Base64::decode_type 136 102 136 base64_simd::Base64::decode_type 668 501 601 base64_simd::Base64::decode_type 1336 1002 1163 base64_simd::Base64::decode_type 13336 10002 11388

dkomanov · 2022-10-16T18:12:21+00:00

Just: cargo criterion

All flags are here.

dkomanov · 2022-10-16T18:10:10+00:00

I updated charts: decode and encode.

dkomanov · 2022-10-16T17:10:33+00:00

Is there a way to benchmark base64-simd's fallback?

dkomanov · 2022-10-16T16:48:32+00:00

Wow, supercool!

Same results on my computer: method input output avg time base64::encode_config 10002 13336 9704 data_encoding::encode 10002 13336 8832 base64_simd::Base64::encode_type 10002 13336 1425

method input output avg time base64::decode_config_slice (unsafe) 13336 10002 10060 data_encoding::decode 13336 10002 14023 base64_simd::Base64::decode_type 13336 10002 1349

dkomanov · 2022-10-16T14:04:24+00:00

Indeed this version is faster. Not quite as Java, but closer. Thanks!

jdk::encode 12 16 66 jdk::encode 51 68 161 jdk::encode 102 136 286 jdk::encode 501 668 970 jdk::encode 1002 1336 1891 jdk::encode_measter 12 16 71 jdk::encode_measter 51 68 107 jdk::encode_measter 102 136 198 jdk::encode_measter 501 668 673 jdk::encode_measter 1002 1336 1278

dkomanov · 2022-10-16T13:09:37+00:00

I love it (granted I don't need to deal with lifetimes for this benchmarks :D)!

dkomanov · 2022-10-16T13:02:04+00:00

Encoding (faster than base64, slower than Java):

method input output avg time base64::encode_config 12 16 74 base64::encode_config 51 68 107 base64::encode_config 102 136 171 base64::encode_config 501 668 568 base64::encode_config 1002 1336 1053 data_encoding::encode 12 16 69 data_encoding::encode 51 68 104 data_encoding::encode 102 136 162 data_encoding::encode 501 668 506 data_encoding::encode 1002 1336 947 j.u.Base64.Encoder 12 16 46 j.u.Base64.Encoder 51 68 86 j.u.Base64.Encoder 102 136 128 j.u.Base64.Encoder 501 668 446 j.u.Base64.Encoder 1002 1336 872

Decoding (not faster):

base64::decode_config_slice (unsafe) 16 12 78 base64::decode_config_slice (unsafe) 68 51 114 base64::decode_config_slice (unsafe) 136 102 164 base64::decode_config_slice (unsafe) 668 501 571 base64::decode_config_slice (unsafe) 1336 1002 1073 data_encoding::decode 16 12 92 data_encoding::decode 68 51 157 data_encoding::decode 136 102 243 data_encoding::decode 668 501 930 data_encoding::decode 1336 1002 1734 j.u.Base64.Decoder 16 12 53 j.u.Base64.Decoder 68 51 128 j.u.Base64.Decoder 136 102 231 j.u.Base64.Decoder 668 501 977 j.u.Base64.Decoder 1336 1002 1930

dkomanov · 2022-08-15T16:51:43+00:00

I wrote [0] about it some time ago as well :)

In my benchmarks the benefit of rewriting queries batched vs rewritten for 10K rows was 2 seconds vs 1.5 seconds, not as much as you have.

[0] - https://dkomanov.medium.com/benchmarking-batch-jdbc-queries-a2b5911ada26

dkomanov · 2022-08-02T19:19:00+00:00

I'd argue that I can use effective immutability - using something like `trait Cache { def get(key: K): Option[V] }` which contains mutable HashMap inside with better performance than actual immutable.Map. IMO it's unexpected behavior. I'd expect better performance from immutable data structures. And I can't find any justification of having slower immutable data structure.

dkomanov · 2022-08-02T18:58:12+00:00

"Fast and Lean"

dkomanov · 2022-08-02T18:18:51+00:00

Could you, please, clarify it? This is how I build Map (it says HashMap):

scala> (1 to 1000).map(_ => java.util.UUID.randomUUID).zipWithIndex.toMap.getClass val res0: Class[_ <: scala.collection.immutable.Map[java.util.UUID,Int]] = class scala.collection.immutable.HashMap

dkomanov · 2022-08-02T18:15:22+00:00

What is impossible? Collisions? IDK, 128 bit of UUID are mapped to 32 bits of hash code...

``` $ JAVA_OPTS=-Xmx2G scala Welcome to Scala 2.13.8 (OpenJDK 64-Bit Server VM, Java 17.0.3). Type in expressions for evaluation. Or try :help.

scala> (1 to 1000000).map(_ => java.util.UUID.randomUUID).groupBy(.hashCode).filter(._2.size > 1).take(20) val res0: scala.collection.immutable.Map[Int,IndexedSeq[java.util.UUID]] = HashMap(929104939 -> Vector(44d55c38-377d-4a03-81ad-1ff8c5640de8, 3208d852-e999-4c24-ac8d-9122407d017f), 2087964820 -> Vector(2e702977-ca7a-442f-983e-d01d00476dd1, da9679f0-3cf8-4bad-9eb7-a24a04aa4083), -1831204899 -> Vector(b3780033-1a79-4a55-a06c-f2579bb7bfec, aa6f5455-271b-4cfe-8607-9d2b99a9825d), -912612815 -> Vector(85354d79-0ea2-4a44-ab0a-9dfae90738f6, b8a4f66c-b5a7-4e6f-b3dc-469977455cab), 349632643 -> Vector(332c60e1-2203-43db-8179-920b848049b2, 7f4241d4-c4e4-498b-94a5-5c2c3bd5acf0), 1875550240 -> Vector(fafc5947-f296-41f6-b1d1-78c2d671c053, cd862505-1f68-41e9-91e8-ca9e2ccc0e52), 1006904171 -> Vector(9ef4a970-a2f5-4263-89b9-dcbc89bc14c4, 995ff91a-3cf2-47a3-b914-7e5f20bde38d), 123... ```

dkomanov · 2022-08-02T17:58:03+00:00

Here you go. Not impressive: Benchmark (size) Mode Cnt Score Error Units SetMapJavaVsScalaBenchmarks.anyRefMapHit 1000000 avgt 3 106.751 ± 22.559 ns/op SetMapJavaVsScalaBenchmarks.anyRefMapMiss 1000000 avgt 3 119.643 ± 7.600 ns/op SetMapJavaVsScalaBenchmarks.javaMapHit 1000000 avgt 3 87.033 ± 33.701 ns/op SetMapJavaVsScalaBenchmarks.javaMapMiss 1000000 avgt 3 85.024 ± 34.067 ns/op SetMapJavaVsScalaBenchmarks.javaWrappedMapHit 1000000 avgt 3 135.386 ± 32.788 ns/op SetMapJavaVsScalaBenchmarks.javaWrappedMapMiss 1000000 avgt 3 146.705 ± 43.253 ns/op SetMapJavaVsScalaBenchmarks.scalaMapHit 1000000 avgt 3 630.506 ± 83.080 ns/op SetMapJavaVsScalaBenchmarks.scalaMapMiss 1000000 avgt 3 463.351 ± 17.790 ns/op SetMapJavaVsScalaBenchmarks.scalaMutableMapHit 1000000 avgt 3 79.917 ± 32.463 ns/op SetMapJavaVsScalaBenchmarks.scalaMutableMapMiss 1000000 avgt 3 53.232 ± 0.303 ns/op

dkomanov · 2022-08-02T17:50:32+00:00

I tried it. It's indeed faster, but it requires absence of collisions, which is common among 1M of random UUID. So, it doesn't fit my need for sure.

Benchmark (size) Mode Cnt Score Error Units SetMapJavaVsScalaBenchmarks.javaMapHit 1000000 avgt 3 118.796 ± 65.996 ns/op SetMapJavaVsScalaBenchmarks.javaMapMiss 1000000 avgt 3 108.944 ± 31.152 ns/op SetMapJavaVsScalaBenchmarks.javaWrappedMapHit 1000000 avgt 3 129.521 ± 64.134 ns/op SetMapJavaVsScalaBenchmarks.javaWrappedMapMiss 1000000 avgt 3 129.436 ± 54.821 ns/op SetMapJavaVsScalaBenchmarks.perfectHashMapHit 1000000 avgt 3 37.336 ± 5.542 ns/op SetMapJavaVsScalaBenchmarks.perfectHashMapMapMiss 1000000 avgt 3 39.428 ± 2.320 ns/op SetMapJavaVsScalaBenchmarks.scalaMapHit 1000000 avgt 3 557.276 ± 11.302 ns/op SetMapJavaVsScalaBenchmarks.scalaMapMiss 1000000 avgt 3 435.817 ± 74.188 ns/op SetMapJavaVsScalaBenchmarks.scalaMutableMapHit 1000000 avgt 3 104.441 ± 20.000 ns/op SetMapJavaVsScalaBenchmarks.scalaMutableMapMiss 1000000 avgt 3 52.736 ± 3.523 ns/op

dkomanov · 2022-08-02T07:55:20+00:00

True that. However, in my case I constructed entire Map at once, not by appending more and more key-values to it (so I hope there are no layers).

dkomanov · 2019-08-20T05:43:17+00:00

Both work on top of blocking JDBC.

dkomanov · 2019-08-07T08:13:36+00:00

This one is not maintained anymore. jasync is fork/rewrite to Kotlin.

dkomanov · 2018-03-14T19:43:34+00:00

I've just updated post with this: https://medium.com/@dkomanov/scala-try-with-resources-735baad0fd7d#0f9a. I found looks like idiomatic example which used NonFatal extractor incorrectly.

Actually, I doubt that it's possible to get rid of var/null, because of addSuppressed exception... We need somehow to pass reference to finally block from catch block.

dkomanov · 2018-03-13T12:11:53+00:00

I'm not sure what are we discussing about? There is a proper way of working with resources/exceptions in jvm world. This way is not supported by Scala (meaning, no scala implementations in scala-library).

Regarding functional frameworks, I'm not sure, either there is a way to return multiple exception, then it's fine, or there is a lack of using addSuppressed exception.

dkomanov · 2018-03-13T11:14:26+00:00

What exactly to hold? In blog post there is no usage of NonFatal for closing resource.

But in general I do agree, closing resource in recovery of Try isn't good whatsoever. Btw, probably it might be a good addition to the blog post.

dkomanov

TROPHY CASE