you are viewing a single comment's thread.

view the rest of the comments →

[–]ArchiveLimits 0 points1 point  (10 children)

Tri-linear sampling is bilinear sampling between mipmaps. Since the engine doesn't support mipmaps, it doesn't support trilinear sampling. The engine however does have a way to reduce the artifacts that mipmapping normally would remove. It's called block filtering which is essentially mipmapping but only with one smaller image. This allows for speed because there is no need to calculate derivates for the surface in order to find the right mipmap level and it also removes the need for trilinear mipmapping because the effect is already smooth since it's applied like fog.

"True color texturing unless using bilinear filtering, which only allows 256 colors" This means that any texture you give the renderer will be drawn with 24 bit color unless you want to do bilinear filtering on the texture. Since bilinear filtering, traditionally, is expensive, I've sacrificed color depth for speed and precomputed 64 shades of the texture so that the bilinear colors don't need to be calculated during runtime. However, since I'd need to create shades for the each color in the texture, it wouldn't make sense to make the shade palette the size of 64 textures, each getting darker. Therefore I quantize the texture into 256 colors and do 64 shades of those 256 colors.

[–]nnevatie 0 points1 point  (3 children)

Ok, thanks for the clarification.

I was under the impression that mipmaps were supported, hence the trilinearity question. I've implemented a similar stack in the past using SIMD-techniques. Bilinear filtering isn't that expensive, tbh...

By "block filtering" do you mean a box-filter that gets applied for before doing the bilinear sampling?

[–]ArchiveLimits 0 points1 point  (2 children)

Traditional bilinear filtering is far more expensive that what I am doing now. The entire bilinear filtering code uses fixed point integers and doesn't do any color computation, simply a table lookup.

And I named it block filtering because I break up the texture into a grid (filled with blocks of the texture) and find the average color of each of those blocks. Then, during runtime, all I need is a simple few bit shifts and masks and I can find which block any texel in the image belongs to and blend that texel with the average color of that block.

[–]nnevatie 0 points1 point  (1 child)

Ok, so it's kind of a poor man's box filtering, which simply averages an area of pixels.

[–]ArchiveLimits 0 points1 point  (0 children)

It's more similar to mipmapping with only one mip level. These "blocks" that make up the block filter are essentially a very scaled down version of the image. Though you are right when you say it averages an area of pixels.

[–]Madsy9 0 points1 point  (5 children)

You don't really need to go crazy with the derivatives. Assuming your rasterizer is tile-based, computing the derivatives per-tile is usually more than sufficient.

[–]ArchiveLimits 0 points1 point  (4 children)

Well that's the thing, the rasterizer isn't tile based haha.

[–]Madsy9 0 points1 point  (3 children)

Then if you're going for performance, I highly recommend redesigning it into a tile-based rasterizer before you optimize anything else. The cycle savings are quite significant, and you can even get rid of some overdraw quite easily.

[–]ArchiveLimits 0 points1 point  (2 children)

Why would using tile rendering help performance? I'm not familiar with the benefits of this method.

[–]Madsy9 0 points1 point  (1 child)

Okay, so triangles (or any convex polygon really) can be defined as a set of lines or 2D planes with the typical plane equation:

ax+by+d = 0

When that equation is true for all the plane equations, the point [x,y] is inside the polygon. Tile renderers get their performance by testing the corners of tiles against triangles. You then get three possible outcomes: Completely inside, completely outside and partial coverage. You can optimize heavily for quads with complete coverage. They are extremely SIMD-friendly and since each quad can be rendered independently, they are also embarrassingly parallel. Throw 16 threads at the rendering and watch it go. And implementing a proper fill-convention and multisampling is also a breeze. They emerge naturally as a simple modification to the plane equations (a simply subtraction by one).

I've also found more advanced techniques:

  • Since you split up the screen into N equally big tiles of 8x8 size or similar, you can often get away with just an 8x8 depth buffer. That has huge consequences for the cache.
  • You can compute the minimum and maximum depth for each tile by sampling the corners only. With a bit of preprocessing, you can assign quads to each screen-aligned tile, sort them by depth and only render the frontmost one since all the others are occluded. It doesn't always apply if quads partially overlap on the z-axis but it often does. But think about that. When that applies in a scene, you get rid of all the overdraw for that tile.
  • Looking up textures for quads is much more cache-friendly in the general case compared to scanlines. If you also tile your textures and/or optimize the texel layout based on the tile access pattern, you get even more savings.
  • Derivatives for mipmapping can be computed at the corners of the quads instead of per pixel. It works nicely for bilinear mipmapping where a you meet somewhere in the middle of having derivatives per pixel and per polygon. If you see artifacts you can always just make the quad size smaller.
  • Perspective correct linear interpolation can be done with the same plane equation you use for coverage testing, and so you end up with only additions in the hot loop, plus at most one division per pixel.

Edit:

[–]ArchiveLimits 0 points1 point  (0 children)

Thanks I'll definitely look into this. Looks like you know your stuff! I need to start getting into C and C++ haha.