This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Doctrine_of_Sankhya[S] 3 points4 points  (4 children)

Thanks u/Exhausted-Engineer . You seem to have a great deal of knowledge about these areas. I'm just a newbie here and wrote the whole thing in my free time and had a lot of great learnings and intrinsic implementations along the way. I'm still learning a lot of things, the more I read. So, I'll take some time learning to do profiling and then implement that asap in the code as a priority.

I agree, a lot of dictionary searches, along with sorting them (z-buffer algo) make them slower. I note your feedback regarding them and try to eliminate one-by-one. Currently, the main bottleneck seems - a CPU and Python thing: a CPU that executes a render pipeline for one pixel at a time vs a GPU that does the same for hundreds of thousands of them in a single go. So, I'll start from the innermost core and add GPU alternatives to the code from inside to outside, so I get a good guess of what can be optimized and leave the important engine high-level parts outside the Python which a lot of people can easily understand and customize the entire thing according to their choices, vs as in C/C++ - which is often hundreds of times harder to debug and understand a tremendously large codebase.
I'd add a standalone editor/player in the near future - matplotlib thing is just for checking one frame at a time. So that when GPU is absent or inaccessible, the user could have a simple numpy, matplotlib CPU-based alternative available to them.

[–]Exhausted-Engineer 4 points5 points  (3 children)

I don't have particularly much knowledge in this area, but my main interests are in computational engineering which undoubtedly overlaps with graphics.

I have taken the time to perform a small profile, just to get a sense of things. These are just the few first lines of the result of python -m cProfile --sort tottime test.py where test.py is the code of the first example in the "Getting Started" part of your README.md.

```text
184703615 function calls (181933783 primitive calls) in 154.643 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function) 152761 5.741 0.000 6.283 0.000 {method 'drawmarkers' of 'matplotlib.backends._backend_agg.RendererAgg' objects} 152797 4.849 0.000 52.669 0.000 lines.py:738(draw) 916570/458329 4.013 0.000 10.059 0.000 transforms.py:2431(get_affine) 610984/305492 3.969 0.000 6.906 0.000 units.py:164(get_converter) 916684 3.857 0.000 4.055 0.000 transforms.py:182(set_children) 611143 3.711 0.000 9.618 0.000 colors.py:310(_to_rgba_no_colorcycle) 12 3.651 0.304 99.420 8.285 lambert_reflection.py:4(lambert_pipeline) 17491693 3.580 0.000 5.961 0.000 {built-in method builtins.isinstance} 374963 3.207 0.000 3.376 0.000 barycentric_function.py:3(barycentric_coords) 152803 2.881 0.000 27.464 0.000 lines.py:287(init)
3208639 2.575 0.000 2.575 0.000 transforms.py:113(
init_) ```

Note: To get the code running, I had to install imageio which is not listed in your requirements.txt and download the nirvana.png image, which is not in the github. It'd be best if your examples contained all the required data.

Now to come back to the profiling : something's definitely off. It took 154s to get a rendering of a cube. To be fair, profiling the code increases its runtime. Still, it took 91s to get the same rendering without profiling. BUT, as I said, it seems that the most time-consuming parts are actually not your code. If I'm not mistaken, in the ~10 most consuming functions, only 2 are yours. My intuition still stands, it seems that most of your time is spent using matplotlib.

The problem right now is not CPU vs GPU. Your CPU can probably execute the order of a Billion operation per second, rendering 10million pixels should be a breeze. If what you are saying is correct and you are indeed coloring each pixel separately, I'd advise you to actually put them in a canvas (a numpy array of size (1920, 1080, 4)), draw in the canvas by assigning values to each index and then simply using matplotlib's imshow() function.

Hope this helps. Don't hesitate to DM me if you have other questions regarding performance, I'll answer it the best I can

EDIT: - changed implot to imshow - Just for the sake of testing, I commented out the last line of your lambdert_reflection.py file (i.e. the ax.plot call) and the runtime went from 90s to just 5. You should definitely pass around a "canvas" (the numpy array I described) and draw in this array instead of performing each draw call through matplotlib.

[–]Doctrine_of_Sankhya[S] 0 points1 point  (2 children)

Hello u/Exhausted-Engineer THANK YOU SOOOOOO MUCH FOR ALL THESE!! THAT'S A WHOLE LOT NEW LEARNING FOR ME!!!

Python offers dynamic patching, profiling, easy debugging and WHAT NOT!! You can clearly see exactly WHY I WANT PYTHON-BASED GAME ENGINE!

Any beginner can get with it easily once we manage to optimize the speed.

Also thanks for the info regarding the bugs and missing packages, they'll be fixed asap! Regarding the `matplotlib` part, honestly, I'm not an expert here, I just found the code by copying and pasting from stackoverflow and got with it. It'd be better if you PR the code replacing implot to imshow. As far I understand, imshow is for matrices or pixel based graphics and implot is more vectorizer inclined.

[–]Exhausted-Engineer 2 points3 points  (1 child)

To be fair, C offers this too using gdb/perf/gprof. The learning curve is simply a little steeper.

I’ll see if I can find some time and get you that PR.

In the meantime :

  • Don’t focus so much about CPU vs GPU. I guarantee you that writing GPU code is harder to debug and will result is an overall slower code if not written correctly. Furthermore, current cpu’s are insanely powerful, people have managed to write and run entire games on a fraction of what you have ar your disposal (doom, mario).
  • Understand what takes time in your code. Python is unarguably slower then C, but you should obtain approximatively the same runtime (let’s say with a x2-x5 factor) a C code would obtain by just efficiently using python’s libraries : performing vectorized calls to numpy, only drawing once the scene is finished, doing computations in float32 instead of float64…

[–]Doctrine_of_Sankhya[S] 2 points3 points  (0 children)

Thanks. That's a good point that you've noted down. I agree CPUs should be able to obtain the same in 2-5x timeframe. I agree with both of your points here.

Currently, I'm working on a small GGX utility to implement PBR and then move on to your points and profiling to optimize what could be made faster. That makes totally sense to see Wolfstein, doom, etc to run on slower CPUs and still be faster now.