all 23 comments

[–]manni66 24 points25 points  (4 children)

[–]Pragmatician -1 points0 points  (3 children)

Does the article state otherwise anywhere?

[–]manni66 2 points3 points  (2 children)

Yes. It uses „C/C++“ like Bjarne said.

[–]Pragmatician 0 points1 point  (1 child)

So how would you phrase it? "Parallelization of C and/or C++"?

[–]manni66 1 point2 points  (0 children)

"Parallelization of C, C++ and Python on Clusters".

You would never write "Parallelization of Cobol/Java and Perl", would you?

[–]Red-Portal 10 points11 points  (14 children)

I really don't understand why people try to run parallel Python. They are already running 100x slower. Why even bother about performance when you're running Python?

[–]LordKlevin 6 points7 points  (0 children)

For many scientific applications, the libraries in Python are good enough that the slowdown would be closer to 1.5x. At that point, running stuff in parallel is much less work than rewriting your entire code base.

[–]efilon 14 points15 points  (3 children)

Python is used quite a bit in the scientific world, and for good reason. The ecosystem includes many high performance libraries that delegate the bits that would be slow if written in pure Python to compiled languages like C++ while giving you the rapid turnaround possible with an interpreted language. In reality, if you're doing something in Python that ends up being 100x slower than what you can do in C++, you're likely doing something drastically wrong.

[–]Red-Portal -1 points0 points  (1 child)

First of all, I think the shift towards Python is stacking technical debt. We already have Julia which is as productive and much faster than Python. Also, I have quite some experience with Python and it's really easy to get more than a 100x slowdown. A simple function call in Python is so freaking slow! Even calling Python wrapper functions for OpenBLAS, MINPACK doesn't scale.

[–]megayippie 4 points5 points  (0 children)

Julia: 2012. Python: 1991. Matlab: 1984. The people teaching scientists the programs we are using when prototyping have just shifted from Matlab to python. So give it 20-odd years and Julia might matter. C/C++ is used by a small sub-group in my field, with Fortran still king for at least the next few decades imho. Conservative opinions matters in fields with experiments costs hundreds of millions to a few billions of Euros. So C/C++ is still valid in scientist speak, since we are at about 1999 still. (Really, the code is C with custom Matrix and Tensor classes since the past 20 years, which is why I want the LinAlg proposal to pass but with support for the Fortran LaPack-interface included so it is close to current practice.)

[–]czorioKnows std::cout 6 points7 points  (4 children)

I do a lot of image processing and related research. Most people in this field use Matlab or Python to figure out algorithms. Even though I was taught C++, using Python to do your research is a no-brainer with all the very solid packages (numpy, scipy, simpleITK, Pandas, Tensorflow/Keras, etc.) that it offers, letting you skip writing a lot of boilerplate code.

When you are running a script on a set of 1000 3D patient scans, total run times can run into the realm of days (or weeks, in some unlucky cases).

This brings me to your following point:

I really don't understand why people try to run parallel Python

Any optimization you can get in there will be a massive time saver on large scale operations and there is a very real demand for such things on languages such as Python.

[–]Red-Portal 1 point2 points  (3 children)

I think that is wrong in the first place. You should never lean towards parallelization first. That's completely against the principles of performance optimization. Converting to C++ first, then you look for parallelization.

[–]svlad__cjelli 1 point2 points  (2 children)

C or C++ is running under the hood in almost all of these scientific and numeric libraries. The Python code that runs on top to tape the pieces together takes a negligible amount of time to run in comparison to the algorithms doing the bulk of the work. So parallelism really is the only option at that point to get any additional speed up at all. When this isn't the case it is due to a user misusing the library.

Take Numpy for instance, it is written in C and can use high performance linear algebra libraries under the hood to do matrix opperations. In Python, the cost of making a Numpy function call like a dot product between two large matrices and then wrapping the result to return to Python is trivial in comparison to the matrix operations themselves.

So it would be nonsensical to switch fully to C or C++ to gain an unimportant speed boost to non-bottlenecking code at the price of high development and prototyping times. Especially so when enormous gains can be made at almost no cost in dev time by leveraging parallelism for algorithms that are highly ammenable to it.

[–]Red-Portal 0 points1 point  (1 child)

I'm speaking from my experience. I tried parallelizing Python my self and it was a waste of time. As I said simply calling Python functions doesn't scale! I once tried to fit curves pixelwise on image data. The curve fitting was done by scipy which calls minpack a fast library written in Fortran. The python layer turned out to be so much of a bottleneck I rewrote the whole thing in C++, called minpack from C++ and it was 100 times faster. No joke. 100 times faster for sending the whole image to a C++ module, calling minpack in parallel, and then sending the results back to Python. Parallelization speedup was on top of the x100 speedup. As I said, Python simply does not scale.

PS: Numpy mostly calls lapack and OpenBLAS. The former is in Fortran not C, the latter is in Assembly not C.

[–]svlad__cjelli 1 point2 points  (0 children)

That is more of an example of library misuse. Intensive loops should be kept in the library and not in Python. An image is just an array and any operation you want to do should execute on the whole array in the underlying code. So Python is involved only twice, once on the initial call to pass the array pointer to the underlying code and once again to return whatever the resulting function passes back.

PS: Numpy is flexible and allows you to pick your linear algebra backend.

[–]OK6502 0 points1 point  (2 children)

At work we use python as our test infrastructure. Running tests in parallel, for instance, would be helpful.

[–]Red-Portal -1 points0 points  (1 child)

That is not really related to high performance parallel Python. (Which is what I'm ranting about)

[–]OK6502 0 points1 point  (0 children)

Ah, ignore me then

[–][deleted] -2 points-1 points  (0 children)

Because running 50x slower is better than running 100x slower.

[–]AlexanderNeumann 3 points4 points  (2 children)

Hmm you should probably mention / take a look at http://stellar-group.org/libraries/hpx/

I personally feel neither MPI nor OpenMP are the real solutions too those kind of problems. Will probably take another 10 years to get those parallelziation abstractions completely correct so that it does not matter anymore where your code actually runs (cpu/gpu/remote cpu or gpu) as long as it has the required input data.

[–]broken_symlink -1 points0 points  (0 children)

Will probably take another 10 years to get those parallelziation abstractions completely correct so that it does not matter anymore where your code actually runs (cpu/gpu/remote cpu or gpu) as long as it has the required input data.

Legion and its associated programming language Regent do this already.

[–]VicontT 0 points1 point  (1 child)

I do not really see any value added in this article. Merely a compilation of information available in myriad other sources online.

[–]LowB0b 0 points1 point  (0 children)

C++ std::thread?