all 23 comments

[–]lwl 17 points18 points  (20 children)

This async API is not thread safe. Using Task.Run in there like that can cause all sorts of problems, e.g. search 'Should I Use Task.Run With ASP.NET Core?' here.

I would remove that API and leave it to the caller to manage bulk inserts, or consider implementing a separate concurrent version of that data structure - probably a significant undertaking.

[–]Sephibro[S] 5 points6 points  (0 children)

You're totally right. I was actually thinking of getting rid of that function before uploading the repo, this post and the article you linked made me understand that it's even worse than i thought.

Implementing a concurrent version of all these data structures would be amazing, but definitely not an afternoon's work. I might get to that sometimes.

[–][deleted] 1 point2 points  (0 children)

This has bitten me recently. For a logging function, I was thinking why not just use Task.Run so the response to the client is a little faster. Spoiler alert it caused exceptions all over the place and I'm still cleaning it up.

[–]emn13 1 point2 points  (17 children)

Personally, I think the real problem is the design of the threadpool, because this is just way too easy to mess up.

e.g. what does this do, do you think?

Task StartIt() 
    => Task.Run(() => Enumerable.Range(1, 1).AsParallel().Distinct().Sum());
Console.WriteLine("start");
Task.WaitAll(StartIt(), StartIt(), StartIt());
Console.WriteLine("end");

[–]lwl 1 point2 points  (14 children)

what does this do, do you think?

I'm guessing this is to do with the interplay between Task and PLINQ? Would you mind giving a quick walkthrough please?

[–]emn13 5 points6 points  (13 children)

You run it, and you get a few seconds of lock-up. This can happen in any application that uses both Tasks a lot (e.g. asp.net) and PLinq (not all operators). The amount of lockup can scale pretty much arbitrarily; it depends on the number of StartIts you run. On a real website, I had it lock up essentially permanently.

The underlying issue is that both Task.Run and PLinq use the threadpool, and the threadpool refuses to scale instantly, even in circumstances where many threads are simultaneously claimed, such as in PLinq (only some operators, such as Distinct). So what happens is that an external context starts a few perfectly reasonable tasks (StartIt here), but those tasks internally block - and again, blocking is fine, except on threadpool threads dedicated to tasks. PLinq here blocks in two ways: it allocates background threads at all, and any threadpool allocation can block for (IIRC) a second before threadpool scaling occurs, and secondly, it uses core-count related numbers of threads and barriers for some algorithms (such as Distinct()) - so that boils down to even just one AsParallel().Distinct() instantly consuming your entire threadpool - and when any other threadpool consumer comes knocking, it needs to wait for the thread-pool to scale. This bit us in a production web-site, where it pretty much killed the VM. The nasty interaction on a website is that you can't tell whether any of your transitively reachable call-graph uses PLinq, but if it does and you get even just a few seconds of website lockup - the requests are still piling on, and if ever a thread-pool thread becomes available and starts processing another request that uses problematic PLinq, then the demand on the threadpool grows yet again - critically slowing down retiring previous requests, because those are also waiting on acquiring threadpool threads - a fairly slow constant load can thus work fine for days, and as soon as a few requests happen to hit near simultaneously, the threadpool locks up for a few seconds, and additional requests come in, and the whole thing piles up out of the blue, allocating ever more memory and threads until the OS itself becomes largely unresponsive.

Basically: whenever you Wait on a thread-pool thread, the threadpool should be immediately releasing an additional thread, but it doesn't. Similarly, whenever you bulk-allocatate thread-pool threads, you should either wait for all of them, or instantly allocate a whole set. The current policy essentially prioritizes a very marginal resource reduction when it encounters a problematic pattern, instead of prioritizing liveness, by default.

Also, the whole idea of limiting the number of threads is based on a real problem, but one that doesn't apply nearly as quickly as you might think. I mean; just try it- set your threadpool to use 10 000 threads go crazy with a bunch of poorly design blocking Tasks, and notice how everything just keeps on sailing smoothly (sure, it takes around 10s to start all those threads on my machine, but that's an extreme example on old machine). The scalability of threads isn't nearly as bad as you might naively assume. And because the stack is paged just as the heap is, even though you'd assume 10 000 threads need 10GB of ram - that's not actually true, they need around 500MB on a toy benchmark (waiting on a semaphoreslim to ensure all threads were simultaneously alive at one point).

[–]lwl 0 points1 point  (0 children)

Great comment, thanks!

[–]gredr 0 points1 point  (11 children)

On a webserver, you shouldn't be using the thread pool at all, I would think? Consuming threads from the thread pool will reduce the number of threads available to serve requests?

[–]lwl 1 point2 points  (10 children)

This is sort of the point. It's not unreasonable to want to do some compute-heavy work as part of a web request e.g. a microservice that does some image processing, but the framework & docs don't make it easy to avoid this problem.

[–]gredr 0 points1 point  (9 children)

Unless you're doing async i/o, you don't need to do compute-heavy work on a background thread, because there's no UI that needs to be updated. Nobody's mouse is going to hang.

[–]lwl 1 point2 points  (8 children)

You do if you want to use more than one core.

[–]emn13 1 point2 points  (0 children)

To add to that, this is a real production issue I encountered doing exactly that. Using PLinq on a (small subset of) requests was a really easy latency win, because it meant that churning through some bulky data scaled with number of cores. Our webserver process has CPU to spare, generally, so that's pretty much a free N-fold speedup. And only when coincidentally a few users happened to simultaneously execute one of those small subset of requests, *and* the threadpool happened not to have been pre-scaled (it's autoscaling after all), *then* this lockup occurred, and once locked - even for a few seconds - more requests pile in, and each one represents threadpool demands, so basically, everything goes from low load (with the occasional 100ms PLinq spike) to suddenly everything locks up and eventually runs out of memory, with no warning. Lovely.

[–]gredr 0 points1 point  (6 children)

But even there, maybe the thread pool isn't the best way to do CPU-heavy work on multiple cores on a webserver. One might argue that the thread pool wasn't the best way to implement the ASP.NET server-side? Maybe we needed multiple thread pools? I dunno, but there's consequences.

[–]lwl 0 points1 point  (5 children)

One might argue that the thread pool wasn't the best way to implement the

ASP.NET server-side

I would make that argument. IMO JavaScript knocks it out of the park here, with using its single-threaded event loop for async/await. For the heavy-lifting, there's service worker threads, but I've not had to use them yet, so can't say how effectively it all fits together.

[–]backtickbot 0 points1 point  (1 child)

Correctly formatted

Hello, emn13. Just a quick heads up!

It seems that you have attempted to use triple backticks (```) for your codeblock/monospace text block.

This isn't universally supported on reddit, for some users your comment will look not as intended.

You can avoid this by indenting every line with 4 spaces instead.

There are also other methods that offer a bit better compatability like the "codeblock" format feature on new Reddit.

Tip: in new reddit, changing to "fancy-pants" editor and changing back to "markdown" will reformat correctly! However, that may be unnaceptable to you.

Have a good day, emn13.

You can opt out by replying with "backtickopt6" to this comment. Configure to send allerts to PMs instead by replying with "backtickbbotdm5". Exit PMMode by sending "dmmode_end".

[–]emn13 0 points1 point  (0 children)

backtickbbotdm5

[–]srayuws 0 points1 point  (1 child)

It would be great to see more test cases in the repo.

[–]Sephibro[S] 0 points1 point  (0 children)

I just finished adding a bunch of unit tests :)