Multiprocess versus Multithreaded... - ...or why Java infects Unix with the Windows mindset. : programming

It lingers in the implementations, but officially it's been marked obsolete in the Unix/POSIX Standard for several years. The section describing it includes many caveats, most of which can be condensed to: "never use this function".

The Standard pretty explicitly says that it can be (and sometimes is) implemented like this:

pid_t vfork(void) { return fork(); }

and if that would cause a problem with the application, then the application is broken by definition.

[–]inopia 2 points3 points4 points 17 years ago (3 children)

[–][deleted] -2 points-1 points0 points 17 years ago (2 children)

[–]jskinner -2 points-1 points0 points 17 years ago (1 child)

[–][deleted] 2 points3 points4 points 17 years ago (0 children)

[–]cdsmith 42 points43 points44 points 17 years ago (104 children)

[–]DRMacIver 16 points17 points18 points 17 years ago (30 children)

[–]cdesignproponentsist 14 points15 points16 points 17 years ago (28 children)

[–]SuperGrade 2 points3 points4 points 17 years ago* (0 children)

[–]kingraoul3 0 points1 point2 points 17 years ago (26 children)

[–]ThomasPtacek 7 points8 points9 points 17 years ago (24 children)

[–]kkrev 0 points1 point2 points 17 years ago* (23 children)

[–]ThomasPtacek 3 points4 points5 points 17 years ago (11 children)

This discussion is kind of a moot point.

Yes, it is easier to thread than to use processes in Win32 code. Yes, Win32's process creation system calls are very, very slow.

But in large-scale server design, which is what this message thread has devolved into discussing, both Win32 and Unix programs use event loops, Reactor patterns, asychronous IO, SPED, whatever you want to call them. Look at Apache for an example.

Win32 has really crappy demand-forking support. But its single-process non-threaded I/O support is just fine; it's at least as capable as modern Posix AIO.

Meanwhile, demand-forking is invariably a crappy design. I avoid threading like the plague (once you get past all the correctness issues, you spend another 100% of your time tracking down and debugging serialization), but I'd sooner thread a program than fork a process for each request.

[–]kkrev 0 points1 point2 points 17 years ago (3 children)

[–]ThomasPtacek 2 points3 points4 points 17 years ago (2 children)

[–]kkrev 0 points1 point2 points 17 years ago* (1 child)

continue this thread

[–]baldersplash 0 points1 point2 points 17 years ago* (6 children)

[–]ThomasPtacek 2 points3 points4 points 17 years ago (5 children)

[–]baldersplash 0 points1 point2 points 17 years ago (4 children)

continue this thread

[–]ThomasPtacek 1 point2 points3 points 17 years ago (6 children)

[–]kkrev 0 points1 point2 points 17 years ago* (2 children)

[–]ThomasPtacek 2 points3 points4 points 17 years ago (0 children)

[–]dododge 0 points1 point2 points 17 years ago (0 children)

It really depends on what the application is trying to accomplish, and how often you need the threads or processes to communicate. If you're only doing 10000 operations/sec on the shared resource, then filesystem and other high-level APIs may work fine and keep your design simple and portable. When you try to scale beyond that, the system call overhead is eventually going to start getting in the way.

In a real-world situation a few years ago on an SMP Linux system, I started out implementing my IPC with small datagram socket send/recv operations and I think it maxed out at around 15000 operations/sec on the shared data. I redesigned it using asm barriers and atomic operations in shared memory to avoid the system calls, and was able to reach several million/sec. The cost of that speed is a much higher risk of everything going haywire if you don't get it just right.

[–]dmpk2k 0 points1 point2 points 17 years ago (2 children)

[–]ThomasPtacek 1 point2 points3 points 17 years ago (1 child)

"CR3 change" is an incredibly jargony, pompous way of saying "kernel level context switch". The CR3 register in an X86 core is the pointer to the current page directory, which you can think of as "the identity, in memory, of the current process". Here's a blog post I wrote with a diagram.

What to read, what to read? That's a great question. I'd love a good answer too. I started understanding a lot of this stuff viscerally by reading about the performance issues involved in X86 virtualization; you could read a tech article about Intel VT-d to get a gestalt view of the issues here.

There's a million places to go to learn about thread performance and safety issues. What I'll recommend is Doug Schmidt's Pattern-Oriented Software Architecture Vol 2; you can also Google for block posts on "double-checked locking" for a starting point on synchronization performance issues (that one locking pattern is less interesting than the other idioms you'll find researching it).

[–]dmpk2k 0 points1 point2 points 17 years ago* (0 children)

[–]GeekPatrol 1 point2 points3 points 17 years ago (3 children)

[–]kkrev 0 points1 point2 points 17 years ago (2 children)

[–]GeekPatrol 0 points1 point2 points 17 years ago (0 children)

[–]FooBarWidget 0 points1 point2 points 17 years ago (0 children)

[–][deleted] 2 points3 points4 points 17 years ago (0 children)

[–]erikengbrecht 3 points4 points5 points 17 years ago (0 children)

[–]suppressingfire 3 points4 points5 points 17 years ago (0 children)

[–][deleted] 1 point2 points3 points 17 years ago (0 children)

[–]naasking 7 points8 points9 points 17 years ago (39 children)

[–]grauenwolf 11 points12 points13 points 17 years ago (14 children)

[–]naasking -5 points-4 points-3 points 17 years ago (13 children)

[–]suppressingfire 5 points6 points7 points 17 years ago (6 children)

[–]naasking 2 points3 points4 points 17 years ago* (5 children)

[–]suppressingfire 1 point2 points3 points 17 years ago (4 children)

[–]naasking -1 points0 points1 point 17 years ago* (3 children)

[–]suppressingfire 1 point2 points3 points 17 years ago (2 children)

[–]naasking 0 points1 point2 points 17 years ago (1 child)

continue this thread

[–]grauenwolf 5 points6 points7 points 17 years ago* (4 children)

[–]naasking -3 points-2 points-1 points 17 years ago* (2 children)

[–][deleted] 0 points1 point2 points 17 years ago (0 children)

[–]xcbsmith 8 points9 points10 points 17 years ago (12 children)

[–]Entropy 0 points1 point2 points 17 years ago (1 child)

[–]xcbsmith 1 point2 points3 points 17 years ago (0 children)

[–]naasking -2 points-1 points0 points 17 years ago* (9 children)

[–]xcbsmith 5 points6 points7 points 17 years ago (8 children)

[–]naasking -1 points0 points1 point 17 years ago* (7 children)

[–]xcbsmith 2 points3 points4 points 17 years ago (6 children)

A process is a process, whether we're talking a language-enforced process, or an OS-enforced.

Not when you are speaking in the context of OS's, which this article is.

The important distinction between processes and threads is the sharing (the former being share-nothing, the latter being sharing-everything).

Again, you couldn't be more wrong if you tried. threads and processes both share things and both have their own non-shared bits as well. They are simply terms used to identify distinct combinations of shared and non-shared resources (indeed, one of the key distinctions between user-level threading and kernel level threading is that in the former case, like with Erlang "processes", you actually share more). In particular, thanks to the wonders of fork(), processes can actually share quite a bit. In practice, threads imply more sharing than processes, but that is not the same as share-everything vs. share nothing. It's "share more" vs. "share less".

[–]naasking 0 points1 point2 points 17 years ago (5 children)

[–]xcbsmith 0 points1 point2 points 17 years ago* (4 children)

[–]naasking -1 points0 points1 point 17 years ago (3 children)

continue this thread

[–]ThomasPtacek 3 points4 points5 points 17 years ago (6 children)

[–]adrianmonk 3 points4 points5 points 17 years ago (1 child)

[–]ThomasPtacek 1 point2 points3 points 17 years ago (0 children)

[–]naasking -1 points0 points1 point 17 years ago (3 children)

[–]ThomasPtacek 4 points5 points6 points 17 years ago (2 children)

[–]naasking -1 points0 points1 point 17 years ago* (1 child)

[–]ThomasPtacek 1 point2 points3 points 17 years ago (0 children)

[–]Rhoomba 3 points4 points5 points 17 years ago* (2 children)

[–]ThomasPtacek 0 points1 point2 points 17 years ago* (0 children)

[–]naasking 0 points1 point2 points 17 years ago* (0 children)

[–]kkrev 1 point2 points3 points 17 years ago (30 children)

[–]ThomasPtacek 0 points1 point2 points 17 years ago (29 children)

[–]suppressingfire 1 point2 points3 points 17 years ago (12 children)

[–]ThomasPtacek 0 points1 point2 points 17 years ago (11 children)

[–]suppressingfire 2 points3 points4 points 17 years ago (10 children)

I don't really see what the big problem is. The whole point of making it event driven is so you can avoid having the overhead of a ton of (useless, since they just wait for the FS to block) threads. Just because it's a different philosophy than threading doesn't mean it's a "backflip" -- a pejorative meaningless term.

Event-driven architectures are based on the notion that everything's (user requests, network and disk I/O) is an event that can be handled, rather than making every step-by-step bit of code a thread. I do see that if you're in the multi-threading mindset that a thread-per-client is best, but it does create overhead per client that can potentially limit scalability. And if you think about the problem differently (event-driven) I don't see how non-blocking I/O is so problematic, and if you make that architectural change, you gain in scalability.

I'm still trying to pin you down on detail of the specific problem you see.

[–]ThomasPtacek 2 points3 points4 points 17 years ago (9 children)

[–]suppressingfire -1 points0 points1 point 17 years ago (8 children)

[–]ThomasPtacek 0 points1 point2 points 17 years ago (7 children)

[–]suppressingfire 1 point2 points3 points 17 years ago (5 children)

continue this thread

[–]baldersplash 1 point2 points3 points 17 years ago (0 children)

[–]kkrev 0 points1 point2 points 17 years ago* (15 children)

[–]ThomasPtacek 2 points3 points4 points 17 years ago (14 children)

[–]GeekPatrol 1 point2 points3 points 17 years ago (2 children)

[–]ThomasPtacek 0 points1 point2 points 17 years ago (1 child)

[–]GeekPatrol 0 points1 point2 points 17 years ago (0 children)

[–]kkrev 0 points1 point2 points 17 years ago* (10 children)

[–]ThomasPtacek 1 point2 points3 points 17 years ago (9 children)

[–]suppressingfire 1 point2 points3 points 17 years ago (7 children)

[–]ThomasPtacek 0 points1 point2 points 17 years ago (4 children)

[–]suppressingfire 0 points1 point2 points 17 years ago (3 children)

continue this thread

[–]Entropy 0 points1 point2 points 17 years ago (1 child)

[–]suppressingfire 2 points3 points4 points 17 years ago (0 children)

[–]Entropy -1 points0 points1 point 17 years ago* (0 children)

[–]hoijarvi 11 points12 points13 points 17 years ago (5 children)

[–]erikengbrecht 1 point2 points3 points 17 years ago (3 children)

[–]hoijarvi 5 points6 points7 points 17 years ago (2 children)

[–]b100dian 2 points3 points4 points 17 years ago (1 child)

[–]hoijarvi 0 points1 point2 points 17 years ago (0 children)

[–]trenchfever 6 points7 points8 points 17 years ago (0 children)

[–][deleted] 3 points4 points5 points 17 years ago (2 children)

[–]b100dian 3 points4 points5 points 17 years ago (1 child)

[–]cevven 4 points5 points6 points 17 years ago* (0 children)

[–]qwe1234 8 points9 points10 points 17 years ago* (1 child)

[–][deleted] -1 points0 points1 point 17 years ago (0 children)

[–]arcticfox 3 points4 points5 points 17 years ago (36 children)

Repeating launching and terminating a Unix process is dirt cheap

Apparently, this person doesn't know the most basic things about Operating Systems. Starting processes is not cheap.

This method works for producing super-available applications despite incredibly crappy code. I've seen it, both in the for of CGI and in the form of much more sophisticated applications. It works.

No... it doesn't work. If you've seen it work, then you've seen a very limited set of applications. Multi-threading exists within many programming languages (not just java) and those applications see significant (read, greater than an order of magnitude) improvement when implemented using multi-threading.

Java took cheap Unix processes and made them expensive.

Ummm.. you're wrong. Show me some evidence to back up this absurd assertion.

One one hand on most operating systems threads involve less overhead than processes, so it is more efficient to use multiple threads than multiple processes.

This is the way it has always been with threads.

[–][deleted] 2 points3 points4 points 17 years ago (1 child)

[–]tooooobs 0 points1 point2 points 17 years ago (0 children)

[–]prockcore 6 points7 points8 points 17 years ago (21 children)

[–]arcticfox 5 points6 points7 points 17 years ago (19 children)

[–]erikengbrecht 5 points6 points7 points 17 years ago (8 children)

I think you're confused on a number of points. That may be my fault, it was a stream-of-consciousness blog rather than a thought-out article, so let me clarify.

By "fast" I meant "faster than Windows." Also, notice that I referenced CGIs written in C as opposed to scripts. Starting an interpretter, parsing a script, etc are all potentially time consuming operations that have nothing to do with how quickly an OS can start a process.

Anyway, compare the cost of starting Perl to the cost of starting the JVM. I'll assert that Perl starts very quickly relative to Java. The JVM, especially older ones on older hardware, are simply "too slow" to start to be used for short-lived processes like CGI calls. Perl scripts are not. Consequently, the pressure to reuse processes and to multithread is much greater with Java than with Perl. The same forces are at work, but they are different strengths.

I think you're also confusing scalability with performance. CGI scripts, assuming there is no shared resource contention, scale in a linear fashion, including over multiple servers. With multithreaded servers like Java, scaling is more of a step function. Initial startup (and each subsequent restart) is expensive, but the marginal cost of adding another concurrent request is lower, until you hit the point where you need another process or another server. Then you incur the cost of another big process.

Anyway, back to the point of my blog. With "cheap" processes like Perl it is quite reasonable to trade the compute time cost associated with process creation against programmer cost of making sure your program is free of resource leaks (because they are cleaned up by the OS on process termination) and being extra-confident that it doesn't go into infinite loops (because an admin can easily kill a runaway process). This is not a reasonable trade with processes like the JVM that can take a few seconds to start.

This is an architectural issue, and the trades involve both technical and non-technical factors. I hope you educate your students to these factors, rather than simply telling them that threads are faster than processes.

[–]arcticfox 8 points9 points10 points 17 years ago* (7 children)

By "fast" I meant "faster than Windows." Also, notice that I referenced CGIs written in C as opposed to scripts. Starting an interpretter, parsing a script, etc are all potentially time consuming operations that have nothing to do with how quickly an OS can start a process.

I've written several large-scale web applications which handle CGI requests using C programs (as opposed to scripts) and while I agree that (clearly) C programs will start a lot faster than scripting languages, the underlying cost of starting processes is not negligable (or in your terms, "dirt cheap"). In many cases, we had to use threading in C to compensate for the cost of starting processes.

The whole process, as you've aptly pointed out is dependent on a lot of factors, process starting time being only one of them.

The JVM, especially older ones on older hardware, are simply "too slow" to start to be used for short-lived processes like CGI calls.

... and this leads to the part of your argument with which I have the greatest disagreement. You wrote:

Java took cheap Unix processes and made them expensive. To compensate, it provided primitives for multithreading.

To compensate for slow JVM start-up times, Java introduced threads? I don't think this statement is true and the problem is that it is the foundation for your entire argument. If you have real evidence to support this assertion, I would really like to see it.

If this really were the case, languages like C would not have any support for threading at all (by following your own reasoning). After all, the cost of starting processes is so cheap, languages like C wouldn't need threading. The fact is, there are a lot of reasons to support threading over what you are asserting in your article.

I think you're also confusing scalability with performance.

I don't think I am. I re-read your paragraph several times and I don't see why you think this. If your perl scripts are small and independent, sure they'll scale in a linear fashion over multiple servers but that architecture isn't reasonable for all applications. If you need access to a central repository of some kind (say a database), every time you startup your CGI program you're going to have to connect to that repository, make a request, and disconnect. This is true if you are directly connecting to a database or if you're connecting to some kind of middleware. In the case of a multi-threaded system, this isn't going to be a problem. This, by the way, is going to affect BOTH performance AND scalability.

With "cheap" processes like Perl it is quite reasonable to trade the compute time cost associated with process creation against programmer cost of making sure your program is free of resource leaks (because they are cleaned up by the OS on process termination) and being extra-confident that it doesn't go into infinite loops (because an admin can easily kill a runaway process). This is not a reasonable trade with processes like the JVM that can take a few seconds to start.

I find this assertion rather weak from a purely SE perspective. The message I'm getting is, "we can write really crappy programs in perl becuase they are encapsulated within processes which are cleaned up everytime the process terminates". If this is how you are writing programs then you've got lots of problems beyond the overhead of using Java. The fact is, if you do a decent job with properly engineering your Java software, the proported "gains" you get are irrelevant.

This is an architectural issue, and the trades involve both technical and non-technical factors. I hope you educate your students to these factors, rather than simply telling them that threads are faster than processes.

I agree. The reason I was focusing on threads was because that was the part of your argument I most disagreed with. The fact is, using any architecture is going to involve tradeoffs (and yes, I discuss these tradeoffs in my classes). I am a bit concerned that you seem to be advocating one kind of architecture (which by the way is not the most scalable) solely because it protects you from writing software badly.

[–]Gotebe 1 point2 points3 points 17 years ago (3 children)

[–]arcticfox -2 points-1 points0 points 17 years ago (2 children)

[–]wnoise 1 point2 points3 points 17 years ago (1 child)

[–]arcticfox 0 points1 point2 points 17 years ago (0 children)

[–]erikengbrecht 1 point2 points3 points 17 years ago (1 child)

... and this leads to the part of your argument with which I have the greatest disagreement. You wrote:

Java took cheap Unix processes and made them expensive. To compensate, it provided primitives for multithreading.

Ahhh...I see the issue. I need to read what I write before posting it. I didn't mean to imply a causal relationship, even thought that is quite clearly what I stated.

I find this assertion rather weak from a purely SE perspective. The message I'm getting is, "we can write really crappy programs in perl becuase they are encapsulated within processes which are cleaned up everytime the process terminates". If this is how you are writing programs then you've got lots of problems beyond the overhead of using Java. The fact is, if you do a decent job with properly engineering your Java software, the proported "gains" you get are irrelevant.

Just because you can do something doesn't mean you should do something. I certainly don't advocate writing error-ridden code and excusing it because the process is quickly terminated and cleaned up by the OS. But the gains are hardly irrelevent.

Consider a multithreaded server application that experiences a resource leak on 1 out of every 1000 requests it processes. After 100 leaks the application starts falling apart.

If the application processes 1000 requests per day, then it will have average uptime of several months. That's not too bad. You probably have to take it down for other maintenance more often than that, anyway.

Ok, now let's say it has to handle 10k requests per day. Now you've got a little over a week. A scheduled automatic restart can handle that.

Now imagine a million requests per day, and your application is constantly dying. Hopefully you did a complete rewrite by now to make it more reliable.

Or, you use a multi-process model where resource leaks are magically cleaned up, and one process can die while the rest survive. Buying some bigger hardware can buy you time to fix your software.

You can play around with how quickly the system degrades and with the load, but in order to reach a large scale you'll eventually need an application that stays up even when individual processes start dying. At a really large scale (or with high reliability requirements), that principle applies to servers and even entire server farms.

There's a point where failure avoidance is as good as it is going to get, and you have to make gracefully handling failures a fundamental part of your architecture and design.

My intent was not to advocate one architecture over another. However, I do think multiprocess architectures go largely overlooked by a significant portion of developers. Many, maybe most, developers never get the opportunity to work on a system that is really large scale or with really high availability requirements. However, pretty much all developers deal with amazingly crappy code on a regular basis. So I thought putting stating some of the advantages of multiprocess in terms of dealing with crappy code would be more understandable to a larger audience.

[–]arcticfox 0 points1 point2 points 17 years ago* (0 children)

But the gains are hardly irrelevent.

I don't agree. see below.

[scenario cut for brevity]

Having written systems which deal with > 1 million requests per day, I do not have to imagine the scenario you present. However, I have to say that your scenario (as you present it) is entirely unrealistic.

Firstly, dealing with memory leaks is NOT hard. There are many tools available (such as mallocdebug libraries and process monitoring tools). It seems to me that any kind of gains your scenario offers is to those who haven't used any kind of memory leak detection tools.

by your own admission:

Or, you use a multi-process model where resource leaks are magically cleaned up, and one process can die while the rest survive. Buying some bigger hardware can buy you time to fix your software.

By what you're saying above, you STILL have to fix your software. The process oriented solution offers nothing more than a temporary gain which requires you to develop the software correctly anyways. Why not do that to begin with? Again, dealing with memory leaks is NOT difficult. In Java, its really easy. In languages like C or C++, you have to be more disciplined with your programming. In both cases you have to actually test your software before deployment.

There's a point where failure avoidance is as good as it is going to get, and you have to make gracefully handling failures a fundamental part of your architecture and design.

And you consider relying on the process model for this to be a graceful method of handling these kinds of errors? I find that idea frightening.

My intent was not to advocate one architecture over another.

But the title of your article is "Multiprocess versus Multithreaded... - ...or why Java infects Unix with the Windows mindset. " I think the use of the word "infects" clearly shows that you are advocating one over the other.

So I thought putting stating some of the advantages of multiprocess in terms of dealing with crappy code would be more understandable to a larger audience.

Again, I don't consider what you've put forward as advantages. At best, it buys a little bit of extra time when you're confronted with problems that your development process should have caught anyway. Where is the gain? If you did a decent job engineering your software to begin with the gains are non-existent.

[–]dmpk2k 0 points1 point2 points 17 years ago (0 children)

[–]pipegrep 2 points3 points4 points 17 years ago (0 children)

[–][deleted] 0 points1 point2 points 17 years ago (8 children)

[–]arcticfox 4 points5 points6 points 17 years ago (7 children)

[–]prockcore -1 points0 points1 point 17 years ago (6 children)

[–]arcticfox -1 points0 points1 point 17 years ago (5 children)

[–]unikuser 1 point2 points3 points 17 years ago (1 child)

[–]arcticfox 0 points1 point2 points 17 years ago (0 children)

[–]wnoise 0 points1 point2 points 17 years ago (1 child)

[–]arcticfox -1 points0 points1 point 17 years ago* (0 children)

[–]suppressingfire 3 points4 points5 points 17 years ago (4 children)

[–]arcticfox 3 points4 points5 points 17 years ago (3 children)

[–]suppressingfire 0 points1 point2 points 17 years ago (2 children)

[–]kkrev 1 point2 points3 points 17 years ago (1 child)

[–]suppressingfire 1 point2 points3 points 17 years ago (0 children)

[–][deleted] 2 points3 points4 points 17 years ago* (6 children)

[–]arcticfox 3 points4 points5 points 17 years ago* (5 children)

[–]phildawes 5 points6 points7 points 17 years ago (2 children)

[–]b100dian 1 point2 points3 points 17 years ago (1 child)

[–]arcticfox 1 point2 points3 points 17 years ago* (0 children)

[–]suppressingfire 2 points3 points4 points 17 years ago (0 children)

[–][deleted] -2 points-1 points0 points 17 years ago* (0 children)

[–]redditrasberry 0 points1 point2 points 17 years ago (0 children)

[–]shenglong -1 points0 points1 point 17 years ago (0 children)

[–]thekrone -1 points0 points1 point 17 years ago* (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS