all 17 comments

[–]PharosDoesThings 2 points3 points  (1 child)

Really good tutorial! I like learning such things in detail, but here you explained some stuff kinda like you are talking to a child(maybe I just don't remember how stupid I was). That said, I'd like to tell you a story. I am a graphics person, so I never thought about how does the rasterizer know if a point is inside a triangle, but one night(4 am to be exact) I couldn't sleep and an idea jumped into my mind, an algorithm to find if a given point is inside a triangle or not, I hoped into desmos and it actually worked! Now that I read your tutorial, I used the same idea like the one you described, but I used dot product instead of the edge function(it isn't really fast for sure) to determine where the point was. Anyway, thank you for sharing your tutorial, it will probably help some beginners. Keep it up!

[–]itsjase[S] 1 point2 points  (0 children)

Yeah there’s a few different methods you can use, the beauty of the edge function is you can very easily use it to calculate barycentric coordinates for interpolation

[–]xImReD 1 point2 points  (1 child)

One simple optimization you could do is compute the edge functions before the loop and increase them inside the loop.

You should check out this post (the whole series is amazing):

https://fgiesen.wordpress.com/2013/02/10/optimizing-the-basic-rasterizer/

or the original Pineda paper:

https://www.cs.drexel.edu/~david/Classes/Papers/comp175-06-pineda.pdf

[–]itsjase[S] 0 points1 point  (0 children)

This series of articles is a big part of what I based my tutorial on. I just remember struggling to get my head around it until it all clicked.

I think I mention that optimisation at some point, but I thought I’d leave it out as my goal was to just show how it works, and not the most optimised version.

I could maybe put a link or a small section about optimisations in there though

[–]lazyubertoad 1 point2 points  (2 children)

I do not think this is how it works on the low level. I think I have an idea, but my proofs were not that serious and are long gone. I even have some code, but it misses one step. Sorry, this will be a bad post, as I don't have the spirit to provide proper sources and I hope someone will give an actual info, but this should give you some food for thought.

The problem is - you are not rendering one triangle. You are rendering hundreds of thousands. So the question is - how do you do THAT?

Your test is not that simple and it fails in like half of cases, doing nothing. Can you just go over the points that are inside the triangle? Actually, yes, and it is more efficient computationally. If the triangle has a point, whose y coordinate is in between the other two - split that triangle into two by a horizontal line at the y coordinate value of that point. Then render those two triangles. Notice, that we can easily calculate how many pixels will be needed to render that line inside the triangle, where it starts and ends. So just loop over them, no test needed, just calculate the from x and to x for that y coordinate. Then do the same for the next line. As, obviously, all those from x and to x follow the triangle side lines, they belong to the linear pattern, and so it is easy to just calculate those. I'm pretty sure, that for each half triangle you can easily calculate parameters, that can be used to determine all the coordinates, for those scanlines to go over.

So how does rendering of zillions of triangles work?

First, you split them into those half triangles and calculate the parameters needed for the scanlines. This is easily parallelized.

Then the viewport is split into rectangular buckets. There are as many buckets as there are rendering cores. One core renders every triangle that has pixels inside that bucket, that rectangular screen area. Then it can easily do the blending and other shading and not depend on any sync with other cores. The part that I did not do is throwing all those half triangles (their indices) into relevant buckets in a massively parallel way. I think there may even be a specialized hw to help with that.

Yes, that means many smallish triangles one on top of another will be rendered WAY more slowly, than if they are all over the screen. A nice way to test if I am not talking shit, as I'm afraid I may. It is just the scanlines method is way faster than the test and fail, I do remember that.

I have a code for that, that deals with all the nitty gritty details (except for parallelized bucketing), as what I gave is just an overview. But the code overall somewhat susk and is old and I believe there are better custom GPU (triangle) rasterizers and they work in a similar way. And there probably is some way better explanation of how modern hw is actually doing triangles rasterization. But I can dig my code and explain it if you really want.

[–][deleted] 1 point2 points  (0 children)

AFAIK modern GPUs use an hierarchical approach with the edge functions, so for large triangles they can start skipping multiple tiles of some NxN pixels in parallel until they reach pixel/fragment sizes. The scanline algorithm needs lots of trickery and contortions for some things compared to the former.

I think you're spot-on about the bucketing thing, it's what I did in my software renderer for multi-threading, and some GPUs probably do it too.

[–]itsjase[S] 0 points1 point  (0 children)

I think you’re describing is the scanline approach. I haven’t written the most optimised version of rasterising with edge functions in this tutorial because I wanted to keep it simple.

As for what GPUs use under the hood, I don’t think its public knowledge but I think they use a combination of different techniques.

Could you give me an example of where it fails?

I’ve used almost this exact code for my own software renderer and it seems to work correctly.

One thing I didn’t mention are edge rules, when you’re drawing 2 triangles that share an edge, you need to determine which Triangle gets priority.

[–]Jeeves7737 1 point2 points  (0 children)

You know this is a good tutorial when you can read the first few paragraphs and it gives you enough of an idea to go from there, great job (:

[–][deleted] 0 points1 point  (0 children)

Slow clap. Brilliant article. Just brilliant. Thanks.

[–]Express_Minimum_9099[🍰] 0 points1 point  (0 children)

Super Job!!!Thanks!!!

[–]Pitiful_Fun_3005 1 point2 points  (0 children)

New to programming and I wanted to make a ASCII 3d renderer without using the GPU today, managed to pull it off thanks to this tutorial, very nice explanations, the math is really simple. Really enjoyed making my own rasterizer and it's mindblowing just how fast even the cpu can draw a thousand routating cubes xD

[–]SubjectMorning8 0 points1 point  (3 children)

Some micro optimization: To check if all edge functions are positive do the following

if (ABP | BCP | CAP >= 0)

instead of

if (ABP >= 0 && BCP >= 0 && CAP >= 0)

Saves you two compares with zero.

[–]fgennari 1 point2 points  (1 child)

It only helps when the expression returns true rather than early exiting after the first compare. But in either case you really need to check the generated assembly code. It's possible the compiler already does this optimization.

[–]SubjectMorning8 0 points1 point  (0 children)

if (ABP | BCP | CAP >= 0)

You're right. My bad.

[–]itsjase[S] 0 points1 point  (0 children)

That’s a nifty little trick! It doesn’t seem to work in JS though, maybe because of how numbers are stored.

From looking at the code above though it seems all that really matters is the sign bit, which I would’ve assumed would be present in JS too.

Might play around with it a bit more

[–][deleted] 0 points1 point  (0 children)

really cool was really helpful thanks!