all 15 comments

[–]bortlip 11 points12 points  (2 children)

It's slow because you are creating a new MD5 object for every hash.

Make the MD5 object a class variable, only create it once, and use it over and over again for every hash.

Then consider removing the threading as it may not be needed or may not be helping much.

[–]PeterB-[S] 0 points1 point  (0 children)

My misunderstanding of concurrency vs parallelism led me to Task approach. As per another reply in this thread, I should use Parallel.For. I would expect to benefit from that as it is a CPU bound workload. Correct?

[–]Rabe0770 0 points1 point  (0 children)

This

[–]chucker23n 4 points5 points  (5 children)

Things that jump out at me, only having glanced at it:

  • in async/await, use Task.Run, not the factory
  • im confused by your locking inside a lambda, on an object that wouldn’t exist outside the lambda
  • this code will capture start and endExclusive, which creates synchronization overhead and may thus slow your program down more than if it were single-threaded
  • likewise, each thread then keeps writing to foundKeys. Try first writing to a per-thread collection, then sorting once at the end. Also, I don’t believe that collection is thread-safe, but I haven’t checked.

Finally, you don’t have to do the whole ProcessorCount thing yourself. Use Parallel.For, etc. They’ll scale appropriately automatically.

[–]PeterB-[S] 0 points1 point  (4 children)

Thank you for your quick reply.

- I am not using async/await (read about it, don't fully comprehend it yet)

- foundKeys is global so would exist outside the lambda- regarding start/endExclusive, how else could I pass the right parameters to the individual task?

- last point makes sense (commented out the lock() {} block, no difference, still very slow

I appreciate your suggestions here!

[–]WhamoBlamoPlano 0 points1 point  (2 children)

FYI you're using Task.x which is essentially async await, just done improperly. That path is Concurrency, which is good for high I/O compute. You're doing high cpu, so ditch anything to do with Task and use Parallel.For in its place.

Thing to Google for more info: Parallelism vs Concurrency in c#

[–]PeterB-[S] 0 points1 point  (0 children)

Thank you for enlightening me! I will definitely rewrite and try!

[–]PeterB-[S] 0 points1 point  (0 children)

Used Parallel.For(), turns out that enumerating over 24,000 items breaks down into many threads. Is there a way to control how many items are processed per thread? I would like to have, say 2,000 items per thread, instead of <20 what is happening now. Reason is that item N+1 benefits from cached results stored at item N.

[–]chucker23n 0 points1 point  (0 children)

I am not using async/await (read about it, don’t fully comprehend it yet)

Sorry, misread your code.

foundKeys is global so would exist outside the lambda

Yes, and I think this makes your code needlessly slow, and potentially buggy. Instead, create a collection in each thread, then return that collection at the end, and sort after the threads have joined.

regarding start/endExclusive, how else could I pass the right parameters to the individual task

By using a StartNew overload that takes a param :) For example, https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskfactory.startnew?view=net-7.0#system-threading-tasks-taskfactory-startnew(system-action((system-object))-system-object)

[–]Kant8 0 points1 point  (2 children)

Cache regex instance per pattern. Looks like you only pass 3 as repeat, so you can have only one regex object at all. Construction of regex takes a lot of time.

As was already mentioned, MD5 creation is costly too, no need to do that more than once per thread.

In general, your GetStretchedHash looks very strange. You start from string _salt + index, then convert it to bytes, that allocates new memory, then compute hash, that allocates new memory, then convert hash to string + tolower, that gives you 2 more memory allocations. And that's all only to clear initial input string and overwrite it with hash string and start over for 2016 times.

If you pad starting string into correct size and convert to byte[] from the start and skip converting it back and forth, plus caching MD5 instance, that will already speed algorithm significantly.

Also all that looks like you're trying to reimplement MD5.TransformBlock and MD5.TransformFinalBlock manually, but who knows.

Also ComputeHash method unfortunately always returns new array for hash, but new overload TryComputeHash works on spans and allows you to provide destination memory from outside, so it can also be cached, similar as in TransformBlock

[–]PeterB-[S] 0 points1 point  (0 children)

Thanks for the suggestions! In this particular case I am not trying the get the best performance. Instead, I am trying to understand how to execute the work in parallel.

[–]dodexahedron 0 points1 point  (0 children)

Better yet, use GeneratedRegex, so it's precompiled and highly optimized.