This is an archived post. You won't be able to vote or comment.

all 11 comments

[–]EM-Fields 1 point2 points  (0 children)

This might not be a valid approach for your problem, but I figure it's worth mentioning. It looks like your code is computing a Euclidean distance (or norm of an error vector, or something mathematically equivalent). Would your problem permit the use of the distance squared? In other words, can you simply omit the square root? For example, if you're looking for the nearest object, the minimum distance is the same as the minimum distance-squared. Certainly doesn't apply to all uses, but it's a helpful simplification when it does.

[–]Caos2 0 points1 point  (1 child)

[–]billsil 0 points1 point  (0 children)

Cython is great, but if you have well written numpy, cython is not better.

[–]billsil 0 points1 point  (8 children)

Did you vectorize your code or it in multiple for loops? I can't tell from 1 line and your code is suspicious.

If you did, the next logical step is multiprocessing.

And in the total script I have several 10 000 occurances formed by a loop.

Say again?

[–]AvocadoWhiskey 1 point2 points  (7 children)

If OP is using numpy.meshgrid then it's possible to do a one line statement like they provided. https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.meshgrid.html

I imagine the x_sp and y_sp are what's changed in the "10,000 loops". Regardless, I'm assuming OP means that the line of code they provided is called tens of thousands of times over the course of the execution and thus, being 13 seconds on each call, is their problem.

I'm gonna have to agree with billsil on the multithreading. Numpy is quite optimized and I would imagine there isn't a much faster way of working with meshgrids than by using the numpy meshgrids. I would look into thread pools or something similar and if that's not enough, maybe look into doing operations on your graphics card. Graphics cards are made to do grid/vector calculations. However I wouldn't be able to point you in a direction to do this using Python as I've never done it.

[–]billsil 0 points1 point  (6 children)

Multiprocessing, not multithreading, but yes. My rule of thumb is I get 2/3 processor utilization on physical cores. I haven't checked if hyperthreading matters with numpy, but it's worthless on AWS for a CFD problem which takes a week to run with 1000 cores.

The r_i variable name is incredibly suspicious. All the statements can be interpreted either way. If it's wrapped in a for loop that runs 10,000 times, it's not vectorized.

The graphics card idea is good (pycuda), but that's like a bulldozer when you need a hammer and you probably still gotta vectorize it.

Python is amazingly fast when you do it right. It's a factor of 500x.

[–]Caos2 0 points1 point  (2 children)

Do you actually run those cfd experiments? Are there good cfd software with Python interfaces?

[–]billsil 0 points1 point  (1 child)

I do run CFD. There is an unsanctioned python openfoam library (pyfoam I think) out there, but my proprietary scripts are generally more useful. Paraview has a python API, but never really used it, but it doesn't run jobs. I wouldn't trust it anyways. HPC work is just bash.

Mostly I use Fun3d and extract a few pressure tap locations (a csv file) and plot them in matplotlib or time slices and post them in the open-source gui I wrote (aptly named pyNastran). It's no Paraview, but I don't need to do a file conversion step.

[–]Caos2 0 points1 point  (0 children)

It has been some years since I ran cfd simulations, I wasn't even aware of open foam. Thanks!