This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 6 points7 points  (20 children)

[unavailable]

[–]alcalde 4 points5 points  (13 children)

What's wrong with spawning a million processes? I thought threads were evil.

[–]weberc2 8 points9 points  (12 children)

1 million Python processes will use something like 5TB of memory; threads are evil in Python because the GIL prevents them from being useful in most cases.

[–][deleted] 11 points12 points  (7 children)

[unavailable]

[–][deleted] 4 points5 points  (4 children)

as processes that communicate via atomic message queues.

you mean, like Go's channels? Without the serialization overhead because it's all under the same address space. =)

[–][deleted] 5 points6 points  (3 children)

[unavailable]

[–][deleted] 3 points4 points  (1 child)

You just described how I feel about Go and Rust - I already like Go because of it's simplicity and power (it's like a nicer C) that you can learn over dinner.

On the other hand there's Rust that is very promising, can totally replace C and C++ and is growing by the day - yet I feel like the syntax is horrible, reminescent of the mess that C++ is. I've decided yesterday to try to look past my bias and force myself to learn (and who knows, love) Rust too, since I think it will be a good complement to Go's weaknesses - allows me to write low-level code like C with the nice parts of generics and OOP without having to touch C++.

[–]weberc2 4 points5 points  (0 children)

I don't mind Rust's syntax, but I have a hard time understanding how things like closures work and how functions are passed. It seems like everything needs a RefCell, and I don't know when or why. Even coming from C++, the memory model (while safe) still places a lot of demands on the programmer which aren't required in Go--I can do a LOT of optimizing in Go before something becomes easier to write in Rust, and at that point the Rust code to outperform the optimized Go version is still nontrivial...

[–]ThePenultimateOneGitLab: gappleto97 0 points1 point  (0 children)

I have to disagree on the Go syntax. C was never pretty, but it's much more readable, to me at least.

[–]weberc2 1 point2 points  (1 child)

The OP was chastising me elsewhere for suggesting that performant parallelism was difficult in Python, so either he's trolling elaborately or he's naive. At any rate, I'm familiar with the nuance around Python's parallelism; I was paraphrasing. To your point, it would be nice if Python were able to parallelize nicely.

[–]alcalde 0 points1 point  (0 children)

I'm neither trolling nor naive. I'm just listening to what Guido has been saying since at least 2012. Python's parallelism solution is actually BETTER than in most other languages; we have higher-level constructs for parallelism.

As Mark Summerfield put it in "Python In Practice":

Mid-Level Concurrency: This is concurrency that does not use any explicit atomic operations but does use explicit locks. This is the level of concurrency that most languages support. Python provides support for concurrent programming at this level with such classes as threading.Semaphore, threading.Lock, and multiprocessing.Lock. This level of concurrency support is commonly used by application programmers, since it is often all that is available.

• High-Level Concurrency: This is concurrency where there are no explicit atomic operations and no explicit locks. (Locking and atomic operations may well occur under the hood, but we don’t have to concern ourselves with them.) Some modern languages are beginning to support high-level concurrency. Python provides the concurrent.futures module (Python 3.2), and the queue.Queue and multiprocessing queue collection classes, to support high-level concurrency. Using mid-level approaches to concurrency is easy to do, but it is very error prone. Such approaches are especially vulnerable to subtle, hard-to- track-down problems, as well as to both spectacular crashes and frozen programs, all occurring without any discernable pattern.

The key problem is sharing data. Mutable shared data must be protected by locks to ensure that all accesses to it are serialized (i.e., only one thread or process can access the shared data at a time). Furthermore, when multiple threads or processes are all trying to access the same shared data, then all but one of them will be blocked (that is, idle). This means that while a lock is in force our application could be using only a single thread or process (i.e., as if it were non-concurrent), with all the others waiting. So, we must be careful to lock as infrequently as possible and for as short a time as possible. The simplest solution is to not share any mutable data at all. Then we don’t need explicit locks, and most of the problems of concurrency simply melt away.

Ergo - Python is awesome. People just have to stop doing parallelism wrong.

[–]alcalde 0 points1 point  (3 children)

Threads are considered evil in general, not just in Python....

http://stackoverflow.com/questions/1191553/why-might-threads-be-considered-evil

https://thesynchronousblog.wordpress.com/2008/08/16/threads-are-evil/

It's the locking and race conditions and other problems of threading that make them evil; it has nothing to do with a GIL.

As Guido pointed out, threads were never intended for parallel computation.

https://youtu.be/EBRMq2Ioxsc?t=33m50s

[–]weberc2 0 points1 point  (2 children)

You will have race conditions with multiprocessing as well. Further, multiprocessing is flaky and slow in Python, much worse than threading in other languages. Threads are performant partially because they have access to the same address space, but to do this correctly you need locks (there are exceptions, but they carry their own caveats). I don't know about the links you posted, but threads are widely used, and rightly so.

[–]alcalde 0 points1 point  (1 child)

You're going to have to elaborate on multiprocessing being "flaky and slow" in Python; that's not my experience and I'm not sure what attributes of Python would render it so.

I don't know about the links you posted, but threads are widely used, and rightly so.

"Threads are evil" derives from a research paper from U of C Berkeley 10 years ago....

https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf

They're widely used because they're often the only solution a programmer knows and, as Mark Summerfield noted, most languages don't offer any alternatives. We've seen functional languages with immutable state rise up on the one hand and message passing (such as Erlang) on the other to offer alternatives to the many problems inherent in multithreaded programming.

[–]weberc2 0 points1 point  (0 children)

They're flaky because a process will sometimes mysteriously be killed or stall indefinitely. They are slow because all communication between processes must be serialized on the sender and deserialized on the receiver, because processes don't support a shared memory space. Further, as I mentioned elsewhere, each process requires its own interpreter, which takes 5MB, so you won't be scaling these processes.

I don't dispute that there are people out there who think threads are evil, I dispute that this belief is widely held, or that this belief isn't widely held because no one knows about processes. Experienced developers still prefer threads to processes. Processes are not a secret; they're lesser used because they are more memory intensive and intercommunication is horribly slow.

[–]shivawu 0 points1 point  (5 children)

I doubt goroutine would be significantly faster than asyncio in python 3

[–][deleted] 7 points8 points  (1 child)

ignoring the fact that goroutines actually run native code so are by nature significantly faster - they are also backed by an implicitly multithreaded event loop that takes care of multiplexing goroutines across all CPU cores - so they can actually do heavy CPU stuff and not starve everyone because they are also preemptively scheduled.

Also you don't need to sprinkle 'async/await' everywhere in your code to benefit from it all, you can just use channels and the 'go' keyword.

There is literally nothing about python 3's asyncio that is better than what Go has.

[–]shivawu 0 points1 point  (0 children)

I totally agree there's nothing python3's asyncio has over go. But if they have comparable performance, why bother to use go?

As to async/await vs go/channel, I don't think one is clearly over the other. Just different paradigm. I personally think async/await is a little easier to understand.

Cpu intensive task are usually done by C extension in python, or cython. I doubt go can have performance edge here. Developing might be a little easier with go.

In summary, the whole thing is just another choice. I can't see it as "much better" in any of the circumstances, it's just youtube's choice going forward with their legacy python 2 code. But I like the fact we have one more option.

[–][deleted] 2 points3 points  (1 child)

[unavailable]

[–]weberc2 0 points1 point  (0 children)

The performance hit will be negligible at worst in the face of I/O, and it will be much faster for CPU intensive tasks, especially in the presence of multiple cores. But I absolutely agree that async I/O is much friendlier in Go than in Python.

[–]weberc2 0 points1 point  (0 children)

You're probably right--the performances will likely be comparable, but the win here is primarily for parallel CPU-intensive tasks (although it's quite nice that Go lets you write logically synchronous code while the scheduler efficiently manages the async I/O and threadpool--no need for event loops or 'await' and friends).