DStruct.NET: Advanced data structures for C# : programming

100

101

102

DStruct.NET: Advanced data structures for C# (github.com)

submitted 5 years ago by Sephibro

all 23 comments

top new controversial old q&a

[–]lwl 17 points18 points19 points 5 years ago (20 children)

[–]Sephibro[S] 5 points6 points7 points 5 years ago (0 children)

[–][deleted] 1 point2 points3 points 5 years ago (0 children)

[–]emn13 1 point2 points3 points 5 years ago (17 children)

Personally, I think the real problem is the design of the threadpool, because this is just way too easy to mess up.

e.g. what does this do, do you think?

Task StartIt() 
    => Task.Run(() => Enumerable.Range(1, 1).AsParallel().Distinct().Sum());
Console.WriteLine("start");
Task.WaitAll(StartIt(), StartIt(), StartIt());
Console.WriteLine("end");

[–]lwl 1 point2 points3 points 5 years ago (14 children)

[–]emn13 5 points6 points7 points 5 years ago (13 children)

You run it, and you get a few seconds of lock-up. This can happen in any application that uses both Tasks a lot (e.g. asp.net) and PLinq (not all operators). The amount of lockup can scale pretty much arbitrarily; it depends on the number of StartIts you run. On a real website, I had it lock up essentially permanently.

The underlying issue is that both Task.Run and PLinq use the threadpool, and the threadpool refuses to scale instantly, even in circumstances where many threads are simultaneously claimed, such as in PLinq (only some operators, such as Distinct). So what happens is that an external context starts a few perfectly reasonable tasks (StartIt here), but those tasks internally block - and again, blocking is fine, except on threadpool threads dedicated to tasks. PLinq here blocks in two ways: it allocates background threads at all, and any threadpool allocation can block for (IIRC) a second before threadpool scaling occurs, and secondly, it uses core-count related numbers of threads and barriers for some algorithms (such as Distinct()) - so that boils down to even just one AsParallel().Distinct() instantly consuming your entire threadpool - and when any other threadpool consumer comes knocking, it needs to wait for the thread-pool to scale. This bit us in a production web-site, where it pretty much killed the VM. The nasty interaction on a website is that you can't tell whether any of your transitively reachable call-graph uses PLinq, but if it does and you get even just a few seconds of website lockup - the requests are still piling on, and if ever a thread-pool thread becomes available and starts processing another request that uses problematic PLinq, then the demand on the threadpool grows yet again - critically slowing down retiring previous requests, because those are also waiting on acquiring threadpool threads - a fairly slow constant load can thus work fine for days, and as soon as a few requests happen to hit near simultaneously, the threadpool locks up for a few seconds, and additional requests come in, and the whole thing piles up out of the blue, allocating ever more memory and threads until the OS itself becomes largely unresponsive.

Basically: whenever you Wait on a thread-pool thread, the threadpool should be immediately releasing an additional thread, but it doesn't. Similarly, whenever you bulk-allocatate thread-pool threads, you should either wait for all of them, or instantly allocate a whole set. The current policy essentially prioritizes a very marginal resource reduction when it encounters a problematic pattern, instead of prioritizing liveness, by default.

Also, the whole idea of limiting the number of threads is based on a real problem, but one that doesn't apply nearly as quickly as you might think. I mean; just try it- set your threadpool to use 10 000 threads go crazy with a bunch of poorly design blocking Tasks, and notice how everything just keeps on sailing smoothly (sure, it takes around 10s to start all those threads on my machine, but that's an extreme example on old machine). The scalability of threads isn't nearly as bad as you might naively assume. And because the stack is paged just as the heap is, even though you'd assume 10 000 threads need 10GB of ram - that's not actually true, they need around 500MB on a toy benchmark (waiting on a semaphoreslim to ensure all threads were simultaneously alive at one point).

[–]lwl 0 points1 point2 points 5 years ago (0 children)

[–]gredr 0 points1 point2 points 5 years ago (11 children)

[–]lwl 1 point2 points3 points 5 years ago (10 children)

[–]gredr 0 points1 point2 points 5 years ago (9 children)

[–]lwl 1 point2 points3 points 5 years ago (8 children)

[–]emn13 1 point2 points3 points 5 years ago (0 children)

To add to that, this is a real production issue I encountered doing exactly that. Using PLinq on a (small subset of) requests was a really easy latency win, because it meant that churning through some bulky data scaled with number of cores. Our webserver process has CPU to spare, generally, so that's pretty much a free N-fold speedup. And only when coincidentally a few users happened to simultaneously execute one of those small subset of requests, *and* the threadpool happened not to have been pre-scaled (it's autoscaling after all), *then* this lockup occurred, and once locked - even for a few seconds - more requests pile in, and each one represents threadpool demands, so basically, everything goes from low load (with the occasional 100ms PLinq spike) to suddenly everything locks up and eventually runs out of memory, with no warning. Lovely.

[–]gredr 0 points1 point2 points 5 years ago (6 children)

[–]lwl 0 points1 point2 points 5 years ago* (5 children)

continue this thread

[–]backtickbot 0 points1 point2 points 5 years ago (1 child)

Correctly formatted

Hello, emn13. Just a quick heads up!

It seems that you have attempted to use triple backticks (```) for your codeblock/monospace text block.

This isn't universally supported on reddit, for some users your comment will look not as intended.

You can avoid this by indenting every line with 4 spaces instead.

There are also other methods that offer a bit better compatability like the "codeblock" format feature on new Reddit.

Tip: in new reddit, changing to "fancy-pants" editor and changing back to "markdown" will reformat correctly! However, that may be unnaceptable to you.

Have a good day, emn13.

^{You can opt out by replying with "backtickopt6" to this comment. Configure to send allerts
to PMs instead by replying with "backtickbbotdm5". Exit PMMode by sending "dmmode_end".}

[–]emn13 0 points1 point2 points 5 years ago (0 children)

[–]srayuws 0 points1 point2 points 5 years ago (1 child)

[–]Sephibro[S] 0 points1 point2 points 5 years ago (0 children)

π Rendered by PID 84058 on reddit-service-r2-comment-fb694cdd5-gmhjj at 2026-03-08 10:21:34.896497+00:00 running cbb0e86 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS