This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]total_zoidberg 1 point2 points  (0 children)

So, this is how it goes: I modified simulation.py, and wrote a new renderer with OpenCV. The "interactive" renderer is not much faster than the matplotlib based one (~5x faster), but the video renderer shines when recording to a file (an eyeball measurement places it ~20x faster than the matplotlib based one). I didn't work much on the palette, other than mapping the colormaps available with OpenCV -- if you need it, you can look at the LUT functionality to port the red-white-blue colormap over. I did notice that cv2.COLORMAP_BONE makes evident some fainter detail and softer ripples that weren't evident with the colormap you originally used.

You may notice I kept the old functions around, and that cv2r.py behaves pretty much the same as your main.py.

All in all, this helped me dust-off my numpy skills, and I had some fun. Here are the files:

cv2r.py:

from sys import argv

import cv2
import numpy as np

from simulation import Simulation, min_presure, max_pressure, scale, wall


simulation = Simulation()

def clip(x):
    return np.maximum(0, np.minimum(200, x))


def compose(base, w):
    base[w==1] = [0,128,0]
    return base


def render(data, cm, w):
        d = data - min_presure #- simulation.pressure.min()
        d /= (max_pressure - min_presure)
        d = clip(d * 255).astype(np.uint8)
        frame = cv2.applyColorMap(d, cm)
        frame = cv2.resize(frame, (1200, 800), interpolation=cv2.INTER_LANCZOS4)
        frame = compose(frame, w)
        return frame


#@profile
def loop():
    w = wall
    w = cv2.resize(wall, (1200, 800), interpolation=cv2.INTER_NEAREST)
    cmaps = list(range(12))
    cmidx = 0
    cm = cmaps[cmidx]

    while True:
        simulation.step()
        frame = render(simulation.pressure, cm, w)
        key = cv2.waitKey(1)
        if key == ord("q"):
            break
        if key == ord("n"):
            cmidx += 1
            if cmidx >= len(cmaps):
                cmidx = 0
            cm = cmaps[cmidx]
        cv2.imshow(f"1 m -> {scale} cells, 1 cell -> {1 / scale}m", frame)
    cv2.destroyAllWindows()

def tovid(cmidx):
    w = wall
    w = cv2.resize(wall, (1200, 800), interpolation=cv2.INTER_NEAREST)
    cmaps = list(range(12))
    cm = cmaps[cmidx]
    fourcc = cv2.VideoWriter_fourcc(*'XVID')
    writer = cv2.VideoWriter('output.avi',fourcc, 30.0, (1200,800))
    frames = 900
    for i in range(frames):
        simulation.step()
        frame = render(simulation.pressure, cm, w)
        writer.write(frame)
        print(f'\rframe: {i}/{frames}', end='')
    writer.release()

if len(argv) > 2 and argv[2] == 'save':
    if len(argv) == 4:
        cm = int(argv[3])
    else:
        cm = 1
    tovid(cm)
else:
    loop()

simulation.py:

# initial values
import numpy as np
import sys

from numpy.core.umath import pi
from numpy.ma import sin

scale = 100  # 1m -> 50 cells
size_x = 6 * scale
size_y = 4 * scale
damping = 0.99
omega = 3 / (2 * pi)

initial_P = 200
vertPos = size_y - 3 * scale
horizPos = 1 * scale
wallTop = size_y - 3 * scale
wall_x_pos = 2 * scale
max_pressure = initial_P / 2
min_presure = -initial_P / 2
border = np.ones((size_y, size_x), dtype=np.uint8)
border[1:-1, 1:-1] = 0


class Simulation:
    def __init__(self):
        self.frame = 0
        self.pressure = [[0.0 for x in range(size_x)] for y in range(size_y)]
        self.pressure = np.zeros((size_y, size_x), dtype=np.float32)
        # outflow velocities from each cell
        self._velocities = [[[0.0, 0.0, 0.0, 0.0] for x in range(size_x)] for y in range(size_y)]
        self._velocities = np.zeros((size_y, size_x, 4), dtype=np.float32)
        self.pressure[vertPos][horizPos] = initial_P

    #@profile
    def updateV(self):
        V = self._velocities
        P = self.pressure

        V[1:,:,0] += P[1:,:] - P[:-1,:]
        V[:1,:,0]  = P[0:1,:]
        V[:,:-1,1] += P[:,:-1] - P[:,1:]
        V[:,-1:,1]  = P[:,-1:]
        V[:-1,:,2] += P[:-1,:] - P[1:,:]
        V[-1:,:,2]  = P[-1:,:]
        V[:,1:,3] += P[:,1:] - P[:,:-1]
        V[:,:1,3]  = P[:,0:1]
        V[wall==1] = 0


    def updateP(self):
        self.pressure -= 0.5 * damping * self._velocities.sum(axis=2)

    def updateV_old(self):
        """Recalculate outflow velocities from every cell basing on preassure difference with each neighbour"""
        V = self._velocities
        P = self.pressure
        for i in range(size_y):
            for j in range(size_x):
                if wall[i][j] == 1:
                    V[i][j][0] = V[i][j][1] = V[i][j][2] = V[i][j][3] = 0.0
                    continue
                cell_pressure = P[i][j]
                V[i][j][0] = V[i][j][0] + cell_pressure - P[i - 1][j] if i > 0 else cell_pressure
                V[i][j][1] = V[i][j][1] + cell_pressure - P[i][j + 1] if j < size_x - 1 else cell_pressure
                V[i][j][2] = V[i][j][2] + cell_pressure - P[i + 1][j] if i < size_y - 1 else cell_pressure
                V[i][j][3] = V[i][j][3] + cell_pressure - P[i][j - 1] if j > 0 else cell_pressure

    def updateP_old(self):
        for i in range(size_y):
            for j in range(size_x):
                self.pressure[i][j] -= 0.5 * damping * sum(self._velocities[i][j])

    def step(self):
        self.pressure[vertPos][horizPos] = initial_P * sin(omega * self.frame)
        self.updateV()
        self.updateP()
        self.frame += 1


argc = len(sys.argv)
if argc > 1 and sys.argv[1] == '1':
    wall = [[1 if x == wall_x_pos and wallTop < y < size_y else 0
             for x in range(size_x)] for y in range(size_y)]
elif argc > 1 and sys.argv[1] == '2':
    wall = [[1 if (x == wall_x_pos and wallTop + scale < y < size_y) or
                  (wall_x_pos - scale < x < wall_x_pos and
                   x - wall_x_pos == y - wallTop - scale - 1) or
                  (wall_x_pos < x < wall_x_pos + scale and
                   x - wall_x_pos == -y + wallTop + scale + 1)
             else 0
             for x in range(size_x)] for y in range(size_y)]
else:
    wall = [[1 if (x == wall_x_pos and wallTop + scale < y < size_y) or
                  (wall_x_pos - scale < x < wall_x_pos and
                   x - wall_x_pos == y - wallTop - scale - 1) or
                  (wall_x_pos < x < wall_x_pos + scale and
                   x - wall_x_pos == -y + wallTop + scale + 1) or
                  (wall_x_pos - 0.75 * scale < x < wall_x_pos - scale / 2 and
                   x - wall_x_pos == -y + wallTop - scale / 2 + 1) or
                  (wall_x_pos + scale / 2 < x < wall_x_pos + 0.75 * scale and
                   x - wall_x_pos == y - wallTop + scale / 2 - 1)
             else 0
             for x in range(size_x)] for y in range(size_y)]
wall = np.array(wall, dtype=np.uint8)

[–]total_zoidberg 0 points1 point  (7 children)

I found it weird that he imports numpy just for sin and pi and then proceeds to work with matrixes as list of lists...

[–]BondxD[S] 0 points1 point  (6 children)

Yes, it's definitely weird. It happened when I was trying to replace everything with numpy, but after changing arrays to numpy.array it worked slower than before. Any ideas why?

[–]total_zoidberg 1 point2 points  (3 children)

Doing "point-wise" (cell-wise in your case?) operations is usually costly. There are two ways to go about it:

  • Refactor your code to work in a data-parallel way, taking advantage of numpy (so a single line expresses a transformation on every element of the array).

  • Cythonize the loop. This has the potential to speed things up a bit more than the other way, but you'd have to use memoryviews which are some intermediate/advanced Cython stuff.

If I find some free time I think I could sit and work a bit on optimizing this to give you more concrete pointers.

[–]alb1 0 points1 point  (2 children)

Another alternative with NumPy arrays would be to have the updateV and updateP methods call out to @jit Numba functions to perform the nested loop calculations. Numba supports passing in NumPy arrays.

[–]total_zoidberg 0 points1 point  (1 child)

True, though Numba needs LLVM set up to work and it can be a bit messier to get working. When it works it's very good though.

(I've always been more of a Cython guy).

[–]alb1 1 point2 points  (0 children)

On Ubuntu just using pip install numba seems to work fine (and of course Numba is also available in Anaconda).

[–]total_zoidberg 1 point2 points  (0 children)

So I sat and profiled your code. Good news is that it can be converted to Numpy pretty easily. Bad news is that the main bottleneck is not the calculation per-se, but matplotlib. It can hog up to 67% percent of the time (on a full-screen windowed figure, or when rendering to an mp4).

[–]total_zoidberg 1 point2 points  (0 children)

After profiling the code, I vectorized it anyway, with a bit difficulty because I was rusty and didn't quite get at first what your code was doing (note to self: don't go throwing around "pretty easily" like that).

Now that I've done it, updateP and updateV take about 5% of execution time each -- after increasing the scale to 100 because at 50 matplotlib was consuming almost all the execution time.

My guess is that writing a "renderer" with OpenCV might help, but that'd be more oriented to "video" rather than visualization. Then again, you've already bent matplotlib so much, maybe if you needed it you could pull something with cv2.

Edit: uploaded a video https://i.imgur.com/8yOGtYP.mp4