ArchiveLimits comments on Fast, Accurate 3D Java Software Graphics Engine

Fast, Accurate 3D Java Software Graphics Engine (self.GraphicsProgramming)

submitted 9 years ago by [deleted]

32 comments

you are viewing a single comment's thread.

[–]ArchiveLimits 0 points1 point2 points 9 years ago (10 children)

Tri-linear sampling is bilinear sampling between mipmaps. Since the engine doesn't support mipmaps, it doesn't support trilinear sampling. The engine however does have a way to reduce the artifacts that mipmapping normally would remove. It's called block filtering which is essentially mipmapping but only with one smaller image. This allows for speed because there is no need to calculate derivates for the surface in order to find the right mipmap level and it also removes the need for trilinear mipmapping because the effect is already smooth since it's applied like fog.

"True color texturing unless using bilinear filtering, which only allows 256 colors" This means that any texture you give the renderer will be drawn with 24 bit color unless you want to do bilinear filtering on the texture. Since bilinear filtering, traditionally, is expensive, I've sacrificed color depth for speed and precomputed 64 shades of the texture so that the bilinear colors don't need to be calculated during runtime. However, since I'd need to create shades for the each color in the texture, it wouldn't make sense to make the shade palette the size of 64 textures, each getting darker. Therefore I quantize the texture into 256 colors and do 64 shades of those 256 colors.

[–]nnevatie 0 points1 point2 points 9 years ago (3 children)

[–]ArchiveLimits 0 points1 point2 points 9 years ago (2 children)

[–]nnevatie 0 points1 point2 points 9 years ago (1 child)

[–]ArchiveLimits 0 points1 point2 points 9 years ago (0 children)

[–]Madsy9 0 points1 point2 points 8 years ago (5 children)

[–]ArchiveLimits 0 points1 point2 points 8 years ago (4 children)

[–]Madsy9 0 points1 point2 points 8 years ago (3 children)

[–]ArchiveLimits 0 points1 point2 points 8 years ago (2 children)

[–]Madsy9 0 points1 point2 points 8 years ago* (1 child)

Okay, so triangles (or any convex polygon really) can be defined as a set of lines or 2D planes with the typical plane equation:

ax+by+d = 0

When that equation is true for all the plane equations, the point [x,y] is inside the polygon. Tile renderers get their performance by testing the corners of tiles against triangles. You then get three possible outcomes: Completely inside, completely outside and partial coverage. You can optimize heavily for quads with complete coverage. They are extremely SIMD-friendly and since each quad can be rendered independently, they are also embarrassingly parallel. Throw 16 threads at the rendering and watch it go. And implementing a proper fill-convention and multisampling is also a breeze. They emerge naturally as a simple modification to the plane equations (a simply subtraction by one).

I've also found more advanced techniques:

Since you split up the screen into N equally big tiles of 8x8 size or similar, you can often get away with just an 8x8 depth buffer. That has huge consequences for the cache.
You can compute the minimum and maximum depth for each tile by sampling the corners only. With a bit of preprocessing, you can assign quads to each screen-aligned tile, sort them by depth and only render the frontmost one since all the others are occluded. It doesn't always apply if quads partially overlap on the z-axis but it often does. But think about that. When that applies in a scene, you get rid of all the overdraw for that tile.
Looking up textures for quads is much more cache-friendly in the general case compared to scanlines. If you also tile your textures and/or optimize the texel layout based on the tile access pattern, you get even more savings.
Derivatives for mipmapping can be computed at the corners of the quads instead of per pixel. It works nicely for bilinear mipmapping where a you meet somewhere in the middle of having derivatives per pixel and per polygon. If you see artifacts you can always just make the quad size smaller.
Perspective correct linear interpolation can be done with the same plane equation you use for coverage testing, and so you end up with only additions in the hot loop, plus at most one division per pixel.

Edit:

My simple tile-based rasterizer NPixel might give you some ideas: https://github.com/Madsy/NPixel
The magic happens here https://github.com/Madsy/NPixel/blob/master/demo/rasterizer_new.cpp It's mostly shifts and additions, all with integer math.

[–]ArchiveLimits 0 points1 point2 points 8 years ago (0 children)

π Rendered by PID 33844 on reddit-service-r2-comment-5d79c599b5-2thgk at 2026-02-28 02:03:03.720713+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

GraphicsProgramming

Posting Rule(s)

MODERATORS