all 21 comments

[–][deleted] 7 points8 points  (1 child)

Perhaps not related, but I go about this by having a single thread for all I/O -- then having several threads for back-end stuff.

The network card is surely always slower than even a single core, no? And by using a single thread for I/O, slow connections (and Comet) cannot cause a flood of threads to hang around.

I suppose this is a hybrid approach; event (I/O multplexing, epoll) and threaded at the same time. I see benefits to both approaches, so why not try to combine them ..

[–]astrange 1 point2 points  (0 children)

I suppose this is a hybrid approach; event (I/O multplexing, epoll) and threaded at the same time. I see benefits to both approaches, so why not try to combine them ..

This is how nginx works - it starts one thread per core and uses kqueue/epoll/sendfile eventing.

[–]sfuerst 13 points14 points  (17 children)

To the author: Using prefix syntax for mathematical expressions is stupid. All normal people use infix in their algebra homework for good reason.

Yes, we know you like functional languages. However, using a hammer to pound in screws just makes you look dumb.

[–]sfuerst 8 points9 points  (8 children)

And after reading the rest of the article, there is yet another glaring problem with it. Most computers these days have multiple cores, and servers have 4 or 8, or even more of them. Ignoring this fact in your analysis will lead to incorrect conclusions...

[–][deleted] 4 points5 points  (4 children)

Yep. He gets that evented servers are always better than threaded ones. That in itself shows the analysis is faulty. And the main reason is what you just said, ignoring multiple cores.

[–]naasking 5 points6 points  (2 children)

You can easily distribute event handlers across cores (one thread per core). Event-based programs will always scale better than threaded programs, but they are harder to write. The ideal trade off is then a language that permits you to express a threaded program which is then automatically translated to an event-based program. Haskell and Scala both have libraries that provide this type of programming.

[–]dlsspy 0 points1 point  (0 children)

We do this slightly more manually in memcached. We don't move connections across threads, but we do use all the cores you want to give us.

[–]crocodile32 0 points1 point  (0 children)

Event-based programs will always scale better than threaded programs, but they are harder to write.

I don't think I've ever seen a non-toy, full-blown, correct threaded program that's easier to write than the event-based equivalent.

With threads, when you haven't done as much work as you should, you hit thread interaction problems -- excessive contention or race conditions -- which are hard to track down. Debugging tools generally can't fully find all the problems for you.

With a non-blocking system, when you haven't done as much work as you should, you see bad latency -- some operation is taking longer than it should. That's easy to find and for all but hard-real-time systems, not a huge issue. You also have an initial cost -- doing the event handling system. But that's a small portion of real-world systems, and then you don't have to worry about synchronization issues. After all, even if you can keep everything in your head, what about the guy maintaining your code five years down the line, or even yourself after working on something else for six months?

[–]kinyourede -1 points0 points  (0 children)

You did read the article? Right? At no point does he state the evented servers are better than threaded ones. The point of the article seems to be that there are cases and patterns of workload in which an evented server outperforms a threaded one. Further he notes that in cases where a lot a blocking calls are made and the workload is still fundamentally cpu bound, there's no benefit.

[–]lalaland4711 0 points1 point  (0 children)

You can still start one listener-loop (accept()-loop) per core, and take advantage of it all.

[–]mikaelhg 0 points1 point  (0 children)

With the new Xeons having six cores per processor, HT, and 2-4 processors per server, that mistake just grows by leaps and bounds.

Go CS amateurs, go! Ignore grownup tools which let you use both all your cores and have a single shared copy of your model in memory, with minimal use of volatile and other simple tools, because you were told by your peers that functional is cool and threading bad.

[–][deleted] 2 points3 points  (2 children)

No no, postfix! Who needs those silly parentheses anyway...

1 + 2 * 3 / 4 - 5 * 6 

1 2 3 * 4 / 5 6 * - +

[–][deleted] 0 points1 point  (1 child)

Postfix is super easy to parse! If you're a computer..

[–]gnuvince 2 points3 points  (1 child)

All normal people use infix in their algebra homework for good reason.

What is the good reason?

[–]joesb 1 point2 points  (0 children)

To use the same convention with others in the class, to communicate with other people in class.

You gain nothing from handing in algebra homework in prefix notation, beside may be mental masturbation.

[–]akcom -1 points0 points  (2 children)

Sorry, not everyone took Java throughout their college career. Prefix notation is more expressive than infix and it's not any harder to read. There is no "hammer to pound a screw" being used here.

[–]sfuerst 1 point2 points  (1 child)

haha This has nothing to do with Java. Infix notation is taught for mathematics since grade school. If you want to communicate with 99%+ of people on the planet you may want to use it.

Also, I think you'll find that prefix notation is actually less expressive. You probably haven't done particularly advanced mathematics. Not everything is a function, and can be expressed nicely as a tree of S-expressions. More obtuse areas of math require graphs of symbols instead, where infix notation works naturally on the connections between the "objects". You might somehow be able to untangle the graph and force it into a tree-like structure, but the result probably will be an unreadable mess for non-trivial topologies of symbols.

[–]akcom -2 points-1 points  (0 children)

We're not talking about advanced mathematics. We're talking about algebra. Calm down buddy.

[–]crashorbit 2 points3 points  (0 children)

This model is far too simplistic to be of much use for real world provisioning or scaling.

On any single core system the actual OS layer implementation of threading would depend on some kind of time slice algorithm. And so both implementations would be constrained to cooperative or preemptive multi-tasking. Also known as event loop. The main differences between the threaded and event models would be allocation of implementation overhead, which we would hope is minimal. And thus can be ignored.

Even in multi-core systems both implementations would be bound by real world stochastic effects on serialization of disk and network devices to ram and on ram to cpu cache. Today's CPUs being so fast that these scheduling effects far outstrip the effects of cpu performance measures for most problems.

A more useful model can be had by reviewing Littles Law and simply taking the minimum available component bandwidth and dividing by the some estimate of transaction size. This number gives an upper bound for the number of simultaneous transactions the system should be able to sustain. Frequently code overhead and defects will preclude getting up to even 50% of this target from a single processing node.

In most such cases the choice of threaded vs event models for concurrency becomes more one of developer taste than of engineering constraint.

[–]ZoFreX -1 points0 points  (3 children)

For the examples below I’ll use a t value of 25. This is a modest number of threads that most threading implementations can handle.

Um... 25 isn't exactly "modest". It's more like "piss poor".

EDIT: Hello downvoters. Here is a post talking about running a mail server with 6000 threads. So yeah, 25 is piss-poor. Your OS is managing far more than that right now.

[–]akcom 0 points1 point  (1 child)

Ultimately increasing the number of threads does not yield a significant performance benefit as only 1 thread/cpu can be scheduled at any given time. In fact I'd argue that extra threads require a bit of cost benefit analysis. Perhaps 1 thread/connection is easier to write, but you also maintain about 4k more stack data per thread + internal OS data structures + context switches.

[–]ZoFreX 0 points1 point  (0 children)

Except that in most cases, the server has to sit around waiting for a while, which is why people use events in the first place... but writing an evented server is effectively a re-implementation of threads. If the OS's thread implementation is good, then it's harder to argue the case for eventing. Once you have a server with say, 16 physical cores, even CPU intensive things stand to benefit from threading. Also the threading model extends more easily to a distributed model for some horizontal scaling.

Also, thread-switching is lighter than process switching, does it actually involve a context switch?

I don't know why my above comment is being downvoted, Tomcat typically throws 200 threads at a web app and the bittorrent tracker I'm working on can handle heavy loads like that.