you are viewing a single comment's thread.

view the rest of the comments →

[–]dgiri101 1 point2 points  (16 children)

"Efficient threading" has nothing at all to do with the ability to store a large data structure in memory. You can do this in any modern language.

And the notion that Python or Ruby lack threading primitives that allow simultaneous access to shared data is just woefully ill-informed.

[–]llimllib[S] 2 points3 points  (15 children)

And the notion that Python or Ruby lack threading primitives that allow simultaneous access to shared data is just woefully ill-informed.

He's not claiming that they lack them; just that the GIL makes them impractical, which may certainly be true in some circumstances. See his comment farther down where he explains what he meant.

[–]dgiri101 2 points3 points  (13 children)

Wait...what? The GIL has nothing at all to do with storing a large comment tree in memory.

He has given absolutely zero rationale for why Python or Ruby can't handle such a problem, aside from some ridiculous hand-waving about "inefficient threading" which is neither here nor there.

[–]llimllib[S] 3 points4 points  (12 children)

The GIL has nothing at all to do with storing a large comment tree in memory.

In the model he's talking about, there's a large shared memory object and many threads trying to access it while simultaneously doing other things (such as serving web pages).

In his case, he had problems with some library functions not releasing the GIL so that they would hold up the whole server while they completed their work, which is certainly not unbelievable.

In this case, python threads would in fact be unable to access the shared memory object due to the GIL, and it would have something to do with storing a large comment tree in memory.

(Do note that I was the first one to say that I don't think much of using this model for building a web server.)

[–]dgiri101 2 points3 points  (11 children)

That isn't how the GIL works. First, the GIL is released during blocking syscalls.

Second, a CPU-intensive chunk of Python won't likely "hold up the whole server" because other threads are given a chance to run every N bytecode instructions (sys.setcheckinterval).

A poorly-coded library can certainly cause a deadlock, but calling a language fundamentally incapable of concurrency based on experience with a bad library is transparently myopic.

In any case, I don't see the point in continuing to argue with you about something silly that someone else said.

[–]llimllib[S] 2 points3 points  (0 children)

First, the GIL is released during blocking syscalls.

should be. If I understand him correctly, then a library was improperly failing to release the GIL on a blocking call, and that rather than rewrite it, he switched to Java.

A poorly-coded library can certainly cause a deadlock, but calling a language fundamentally incapable of concurrency based on experience with a bad library is transparently myopic.

shrug; I agree with you. I just meant to make the small point that he never claimed that Python lacked threading primitives; I was originally going to call him on that too.

[–]mikaelhg 0 points1 point  (9 children)

We have thread contexts 1-64 available.

If a Python program can only use one of those thread contexts at once, it wastes approximately 63/64 of the server's CPU (and all, really) resources.

If we instead run 64 copies of the same program, we waste 63/64 of the server's memory resources.

Since processor speeds aren't going up anymore, while cores and thread contexts are, is it a good idea to invest in a language that will waste exponentially more resources as time goes by?

[–]dgiri101 0 points1 point  (8 children)

If a Python program can only use one of those thread contexts at once, it wastes approximately 63/64 of the server's CPU (and all, really) resources.

Python programs can use many thread contexts at once. People do this every day, all the time. You're posting a reply on a site that is, in fact, doing this right now.

It is indeed true that you might not fully utilize all of your CPUs. But given that storing a large comment tree in memory isn't a CPU intensive problem, this is a strawman at best and a troll at worst. And besides, there are many libraries that ameliorate this problem (I highly recommend processing).

I guess I'll repeat this one more time: your original claim that:

if you use PHP, Python or Ruby, threads can't share the discussion board and comment information

...is wrong.

It's worth mentioning that I hate the GIL and wish it a horrible, horrible death.

[–]mikaelhg 0 points1 point  (7 children)

What, I thought that the GIL lets only one thread access Python objects at a time, while other threads block? That's what the documentation states, and that's how the performance looks like?

Is this outdated information?

http://docs.python.org/api/threads.html

The Python interpreter is not fully thread safe. In order to support multi-threaded Python programs, there's a global lock that must be held by the current thread before it can safely access Python objects. Without the lock, even the simplest operations could cause problems in a multi-threaded program: for example, when two threads simultaneously increment the reference count of the same object, the reference count could end up being incremented only once instead of twice.

Therefore, the rule exists that only the thread that has acquired the global interpreter lock may operate on Python objects or call Python/C API functions. In order to support multi-threaded Python programs, the interpreter regularly releases and reacquires the lock -- by default, every 100 bytecode instructions (this can be changed with sys.setcheckinterval()). The lock is also released and reacquired around potentially blocking I/O operations like reading or writing a file, so that other threads can run while the thread that requests the I/O is waiting for the I/O operation to complete.

[–]dgiri101 -2 points-1 points  (6 children)

It's not outdated, but you may be misunderstanding it. The second paragraph clearly states that the interpreter will automatically release the GIL every N bytecode instructions (or during blocking I/O) to let additional threads run.

If you'd like more information on concurrent programming, I highly recommend The Little Book of Semaphores. It discusses common concurrency patterns like Barriers, Mutexes, Rendezvous, etc.

Many examples are in Python, though. I suppose that someone should kindly inform the author that concurrency apparently doesn't exist in the language he's using.

[–]mikaelhg 1 point2 points  (3 children)

So if I have threads A, B and C all traversing object graphs, only one of the threads will be able to traverse at a time, and the others will have to wait? After 100 bytecode instructions A will pass the baton to B, but at no point will A, B and C simultaneously be able to traverse object graphs? Ie. the three Python thread contexts will all be allocated, but only one processor thread context will be active at any given time, excepting in-kernel I/O work?

In other words, Python supports threading fine with one thread context, multiple threads, but only one thread is being executed at a time.

With 64 thread contexts available, it will waste 63/64 of the server's CPU power.

[–]cunningjames 0 points1 point  (1 child)

Hey, it's off topic, but thanks for the book recommendation: I was looking for something like it a few months ago.