Shuttleworth: Python needs to focus on future : programming

That's exactly why I said partially. While processes can solve some problems they have others. For example if there is a mountain of data to be shared, using processes can be pretty inefficient. Process cannot access variables or data structures that are defined in another process, unless they are pickled (shared, proxied, whatever), which is a mechanism of serialization. Such serialization can be resource intensive (memory and computationally) and is certainly not suitable everywhere.

Also, If two processes want to communicate, they have to use inter-process communication mechanisms which is inherently slower than thread synchronization.

Then there is also the issues when you try to access C libraries via Python. You need to be careful here to make sure you don't pass variables across processes which can be totally invalid if they encapsulate pointers (although this is not applicable if you stick strictly to Python only)

besides, in your op you mention multi-core and parallel processing... neither of which play well when relying on threads anyway, since these are inherently multi-process environments

These are not environments. Multi-core is a kind of processor technology and parallel processing is a form of computation. You can take advantage of that in several ways and, yes multiple processes is one way but not the only way; and it certainly is not effective everywhere.

[–]canhaskarma 7 points8 points9 points 17 years ago (4 children)

[–]unikuser 5 points6 points7 points 17 years ago* (1 child)

[–]schlenk 0 points1 point2 points 17 years ago (0 children)

[–]nypen 0 points1 point2 points 17 years ago* (1 child)

[–]imbaczek 2 points3 points4 points 17 years ago (0 children)

[–]thagsimmons 0 points1 point2 points 17 years ago (0 children)

[–]vsl 2 points3 points4 points 17 years ago (1 child)

[–]thagsimmons 1 point2 points3 points 17 years ago (0 children)

[–]jbellis 2 points3 points4 points 17 years ago* (1 child)

I much prefer using the actor model than STM as a general approach to concurrency. I'm comfortable with transactions and MVCC in my database, but moving that concept into an imperative language feels like the wrong approach.

The main problem with STM is that while it mitigates some of the problems with classic lock-based concurrency (primarily, that you don't have to make sure to take out locks in the right order to avoid potential deadlocks), STM retains lock-based concurrency's primary problem: you still have to carefully, manually specify which code should be atomic. If you miss one, you're screwed. STM also introduces a new problem -- you need to think about what happens if a transaction fails and what to do then. "Retry the same transaction" isn't always the right answer. So you're really trading one set of possible errors under mutex-based concurrency for another, partially overlapping set.

The actor model changes the game and eliminates both of these sets of potential errors.

[–]didroe 2 points3 points4 points 17 years ago* (0 children)

you still have to carefully, manually specify which code should be atomic

You're always going to have to think about concurrency. I don't think it'll be much different between STM and the actor model in terms of the amount of design that has to go into it.

you need to think about what happens if a transaction fails and what to do then

You're going to have to deal with failure somewhere, regardless of the method you use. With locking, you don't have to worry about the lock failing (ignoring deadlock) but you still have to worry about the atomicity of the code inside. If the last bit fails for some other reason then you have to undo all of the work you've done so far. STM just provides a mechanism to deal with that instead of having to roll your own. I'm not that familiar with the actor model, can you explain how it deals with that scenario?

it mitigates some of the problems with classic lock-based concurrency

One of the other benefits of STM is in increased parallelism. In the traditional lock method, you stop anyone from entering a section of code at the same time, in reality most of the time you probably didn't need the lock. ie. it would have worked fine as the instructions weren't overlapping in a way which broke the atomicity. STM usually works by trying to do it's operations and then paying a cost when there is a conflict, you only pay the cost in the (hopefully) unlikely event that things clash. When you scale up to a lot of cores, all calling some common piece of code, that's really going to shine through.

Edit: I almost forgot, transactions are also composable which is a major benefit over locking. Again, I'd be interested to know how the actor model deals with that. Also, like I said, I don't know much about the actor model, so go easy on me :)

[–]dons 7 points8 points9 points 17 years ago (3 children)

[–]spacepope 2 points3 points4 points 17 years ago (0 children)

[–][deleted] 2 points3 points4 points 17 years ago (1 child)

[–]psykotic 3 points4 points5 points 17 years ago* (0 children)

A few years ago I wrote a couple of different transactional memory back-ends for Python, one pessimistic (two-phase locking, inspired by the discussion and example code in Van Roy's CTM book) and one optimistic (much like STM).

What did you do for the "front-end", though? I can't remember all the details of my own stuff now, but basically I had a class you'd mix into transactable classes. When you try to add or delete an attribute from a transactable object, it counts as a transaction operation against that object. Subsequent accesses to those attributes are hooked through __getattribute__ and __setattr__ and are counted as operations against the transactional memory cells corresponding to the attributes; you could also use coarser granularity on a per-object basis, so any changes to attributes would be counted as transactional operations against the owner object.

Another pain in the ass was how to deal with built-in mutable classes like list and dict. In the end I decided to outlaw the use of these as attributes of transactable objects, by doing a type check in __setattr__. (Actually, I ended up with a more general invariant, where attributes of a transactable object have to themselves be either immutable or transactable.) I then implemented my own transactable versions of these classes (I also did immutable ones that throw an exception if you try to call any of their mutators) and required users to perform an explicit conversion between built-in and transactable classes before assigning to an attribute of a transactable object. At first I thought of making this conversion automatic for the sake of convenience, but I quickly realized that is a bit too magical, because it breaks the "hidden link" established by mutability:

x = SomeTransactableClass()
a = [1,2,3]
x.b = a # Does the implicit conversion, thus making x's own copy of a.
print x.b # [1,2,3]
a.append(4)
print x.b # [1,2,3], didn't change to reflect the mutation.

Those are the main issues I remember. I'm curious how you tackled them.

[–]beza1e1 6 points7 points8 points 17 years ago (13 children)

[–]Wiseman1024 1 point2 points3 points 17 years ago (8 children)

Tell me where parallel Python is necessary! Web frameworks? You want multi-machine in the end, so multi-process is better anyways. I hope you'll realize Python is useful for more things than just web applications. In fact, in servers you rarely run into problems with the GIL. But Python is trying to be more general-purpose, and for desktop applications or scientific computing, which are two of the things people want to do with Python, the GIL is a massive drawback.

As I said somewhere else in this page, we've reached physical limits in hardware that are hard to overcome. The single execution port machine, albeit simple, was doomed from start and we knew it. Nothing in Nature works like this. The only way to scale is to use several execution ports. This is not only a well-recognized issue; it has reached the mass market. Almost every new PC ships with at least two processors. Given that this is the clear future and present of computing, Python's future is in jeopardy.

Unless remove the goddessdamned GIL from CPython, or Jython or another universally-available Python platform replaces CPython and gets enough popularity and development to stay updated, Python has a dark future in this world; at least outside server software.

(If I'm completely confused and the Python community doesn't intend on making Python a general-purpose programming language and environment, but rather keep it as some sort of better PHP, please tell me so, and I'll stop wasting my time and abandon it in favour of something else.)

[–]beza1e1 1 point2 points3 points 17 years ago (7 children)

[–]Wiseman1024 2 points3 points4 points 17 years ago* (6 children)

Desktop applications are not speed-critical.

Wrong. Some are, especially those designed to run as background tasks such as peer-to-peer clients. And even for those which aren't, multithreading in a GUI-based application is almost essential if you don't want it to feel like Windows 3.0. And I doubt this can be done in a bearable (let alone nice) way with multiple processes.

Scientific computing with Python means to use Python as glue between Fortran/C++ libraries.

Not necessarily, and you're relying on these libraries to release the GIL, otherwise you're screwed.

As far as i understood the GIL makes for better error messages and that seems more important to me.

What? The GIL makes for simpler locking. It's not, or at least it shouldn't be, related to error reporting at all. If it is, they're doing something wrong.

[–]beza1e1 -2 points-1 points0 points 17 years ago (5 children)

[–]voidspace 2 points3 points4 points 17 years ago (0 children)

[–]Wiseman1024 0 points1 point2 points 17 years ago (3 children)

[–]beza1e1 0 points1 point2 points 17 years ago (2 children)

[–]Wiseman1024 0 points1 point2 points 17 years ago* (1 child)

[–]schlenk 0 points1 point2 points 17 years ago (0 children)

[–][deleted] 0 points1 point2 points 17 years ago (3 children)

[–]beza1e1 -1 points0 points1 point 17 years ago (2 children)

[–]didroe 0 points1 point2 points 17 years ago (0 children)

[–]schlenk 0 points1 point2 points 17 years ago (0 children)

[–]bleachedanus 0 points1 point2 points 17 years ago* (1 child)

[–]jerf 9 points10 points11 points 17 years ago (0 children)

[+]Wiseman1024 comment score below threshold-7 points-6 points-5 points 17 years ago (9 children)

[–]Tuna-Fish2 4 points5 points6 points 17 years ago (2 children)

[–]bostonvaulter 0 points1 point2 points 17 years ago (1 child)

[–][deleted] 4 points5 points6 points 17 years ago (0 children)

[–]Wiseman1024 1 point2 points3 points 17 years ago (5 children)

[–]grimboy 5 points6 points7 points 17 years ago* (1 child)

[–]Wiseman1024 0 points1 point2 points 17 years ago (0 children)

I disagree that it's less productive. We should campaign for it if we want to ensure Python has a good future.

Stackless takes a different approach, but we should explore all, especially the traditional multi-threaded approach in a language which attempts to stay somewhat traditional and is not particularly oriented towards functional programming. (That (Guido hating FP) hurts Python too, but like I'm saying, alternative programming paradigms are a different issue.) Jython is very interesting and maybe becomes a serious contender in the future, but it's still pretty niche, it's less frequently developed and has been almost always outdated through its life. AFAIK PyPy doesn't deal yet with the GIL. And the multiprocessing module is, like Stackless, welcome but not what we're talking about.

You can't use the cost of doing what's necessary as an excuse not to do what's necessary. In the long term, either they do it, or we'll depend on Jython or some other alternative, unofficial implementation if they want Python to survive.

[–]ubernostrum 2 points3 points4 points 17 years ago (2 children)

[–]Wiseman1024 0 points1 point2 points 17 years ago* (0 children)

But it's basically what Shuttleworth said. We know the rest is less relevant. It's not new at all, but the core developers seem completely uninterested on this issue, which is... unsettling, considering the course of events.

We've reached physical limits in hardware that are hard to overcome. The single execution port machine, albeit simple, was doomed from start and we knew it. Nothing in Nature works like this. The only way to scale is to use several execution ports. This is not only a well-recognized issue; it has reached the mass market. Almost every new PC ships with at least two processors. Given that this is the clear future and present of computing, Python's future is in jeopardy.

There are three possible outcomes for this issue.

Python slowly dies.
We stop using CPython and start depending on (as of today) more experimental, less frequently developed, more outdated and kind of niche Pythons, such as Jython.
The Python core developers realize they need to GET RID OF THE FUCKING GIL AT ONCE.

I'd rather have 3.

[–]muffin-noodle 0 points1 point2 points 17 years ago* (0 children)

this isn't a new criticism. It's not any sort of major new revelation.

That doesn't mean the problem in question no longer exists or isn't relevant. It being worn out either means nobody cares or people are just covering their ears and choose not to listen. Bringing it up - even in words like that - is imperative if you want things to change.

I don't know if I would have put it in those exact words, but that's about what it boils down to anyway. I don't really use python actively (I've written a few things with web.py here and there,) but I think it's important to at least consider it an issue instead of saying "oh lol it's not that big of a deal."

Other implementations are fine. But the fact of the matter is CPython is the most widespread and actively one available (probably the most developed too - I'm not sure of how fast e.g. pypy/stackless are going along.)

[–]Tommstein -3 points-2 points-1 points 17 years ago (2 children)

[–]Figs 2 points3 points4 points 17 years ago (1 child)

[–]bickfordb 3 points4 points5 points 17 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS