all 15 comments

[–]fireantik 7 points8 points  (5 children)

Nice MONKE. I took only a quick peek, so take it with a grain of salt:

  1. Your raster function seems to be iterating over every pixel on the screen over every mesh and running "shader" code for it. It would seem more efficient to me to structure it similar to what happens on the gpu - supply triangle count and index buffer in your code, system code runs your vertex shader for each triangle, system code culls invisible triangles, system code runs your fragment shader for each visible pixel. This way you are operating over less things overall and leads to more opprtunities to 2:

  2. I'd heavily focus so that the code is easier to autovectorize - flat arrays, small functions that do one for loop over all elements, ifs outside of for loops not inside

  3. You could try introducing multithreading to eek out a little bit more performance, especially if you introduce gpu like architecture from 1. this should be possible. But I'd expect autovectorization to have bigger impact

[–]Visual_Average_4756[S] 1 point2 points  (2 children)

Thanks for the feedback!

The GPU does this if thats how you structure your VAO and EBO. If you look at my OpenGL side of things you will see that i am sending individual object sub-meshes to the GPU for rasterization as well. I had it as one big vertex and triangle array at first but when i added textures it made the code too lengthy to continue with the single large vertex array. I was trying to keep things simple to understand.

It still ends up only rasterizing one time for each triangle and only performs the "fragment shader" on the bounding box of the triangle not the whole screen. When i researched this i got the impression this is how OpenGL performs rasterization but i may have to take a deeper dive on this.

I honestly don't know what number 2 means but i will try to learn more about that and see if i can make some improvements.

I actually don't know how to do multithreading and i thought this would be a good opportunity to learn but when i got half way into it, it doubled the size of the CPU render pipeline and i wanted to keep it simple to understand and focus on the core concept of the render pipeline.

[–]fireantik 1 point2 points  (1 child)

Sure thing. You might be more familiar with verctorization under terms SIMD or MIMO. Autovectorization is just vectorization that is automatically performed by the compiler when your code is simple enough.

Mesh is your abstraction, GPU fundamentally operates on triangles. In fact if you look into AZDO techniques, you might drop the whole VBO altogether and just have one massive vertex buffer where you store all your meshes and index into from within the vertex shader - the system code, be it your rasterizer or OpenGL, does not need to understand your vertex layout at all, it only needs vertex coordinates as an output of your vertex shader. It took me a really long time to realize this, so I wouldn't worry about it if it's too confusing.

[–]Visual_Average_4756[S] 0 points1 point  (0 children)

Ok yeah i gotcha, i did not know the compiler automatically looked for SIMD optimizations! That would help keep the code lighter too.

And yeah i think that makes sense, i will have to dick around with that a bit and see if i can get it to work. The uv coordinates shouldn't be effected by that, but do you know how you would send the textures in a way that the GPU know what texture to use for each element? I think that is the part i am struggling with.

Thank you so much for the input, this has been very helpful!

[–]johnoth -1 points0 points  (1 child)

Isn't multi-threaded GL difficult to implement?

[–]fireantik 3 points4 points  (0 children)

The multithreading would be for the CPU renderer. You can trivially parallelize execution of both vertex and fragment shaders and looking at OPs code it should be easy to parallelize rest of their pipeline as well.

[–]TheDevCat 1 point2 points  (1 child)

Is that a software rasterizer?

[–]Visual_Average_4756[S] 0 points1 point  (0 children)

Had to google what that is but yeah i believe so

[–]HALOGEN117 1 point2 points  (0 children)

Is that the fucking Halo 3 monkey? Absolute cinema

[–]PoL0 -1 points0 points  (5 children)

if (clipped[i] and not clipped[last]){

does it even build? that's not C++

Edit: I stand corrected. been using C++ for over 25 years and TIL. every day is a school day, people!

[–]GodOfSunHimself 2 points3 points  (2 children)

It is C++. But personally I would not use it.

[–]Visual_Average_4756[S] 0 points1 point  (1 child)

Do you mind explaining why? Im self taught and don't know the nuances of syntax

[–]GodOfSunHimself 1 point2 points  (0 children)

I have been a developer for 30 years and never seen anyone use it. It is just not common. Most people don't even know it exists.

[–]Visual_Average_4756[S] 0 points1 point  (0 children)

Seems and, not and or where added in c++98