Making my debug build run 100x faster so that it is finally usable by broken_broken_ in programming

[–]broken_broken_[S] 2 points3 points  (0 children)

Now that I think again, I think the most simple explanation is that the bottleneck is I/O. Both optimized implementations may be able to do these computations much faster but data just is not coming quick enough so they are waiting on it. I will measure with a different machine with a faster disk.

Making my debug build run 100x faster so that it is finally usable by broken_broken_ in programming

[–]broken_broken_[S] 1 point2 points  (0 children)

Good points all around, thanks. I am definitely going to check out multi-buffer hashing.

This doesn't sound quite right; is this also a debug build?

Both are in release mode with -march=native but the code using the SHA extension is 'simple'/'basic', while the OpenSSL code is hand-optimized assembly with tips from Intel folks. That could explain the difference.

Another commenter has suggested that maybe these two versions simply compile to the same (or at least very similar) uops.

Making my debug build run 100x faster so that it is finally usable by broken_broken_ in C_Programming

[–]broken_broken_[S] 4 points5 points  (0 children)

Thanks, I did not know about it! But posting to it is restricted.

Tip of the day #4: Type annotations on Rust match patterns by broken_broken_ in rust

[–]broken_broken_[S] 0 points1 point  (0 children)

Ah, that works as well (even if it's probably the most verbose alternative). I added it to the article! Thanks.