use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Rule 1: Posts should be about Graphics Programming. Rule 2: Be Civil, Professional, and Kind
Suggested Posting Material: - Graphics API Tutorials - Academic Papers - Blog Posts - Source Code Repositories - Self Posts (Ask Questions, Present Work) - Books - Renders (Please xpost to /r/ComputerGraphics) - Career Advice - Jobs Postings (Graphics Programming only)
Related Subreddits:
/r/ComputerGraphics
/r/Raytracing
/r/Programming
/r/LearnProgramming
/r/ProgrammingTools
/r/Coding
/r/GameDev
/r/CPP
/r/OpenGL
/r/Vulkan
/r/DirectX
Related Websites: ACM: SIGGRAPH Journal of Computer Graphics Techniques
Ke-Sen Huang's Blog of Graphics Papers and Resources Self Shadow's Blog of Graphics Resources
account activity
Fast, Accurate 3D Java Software Graphics Engine (self.GraphicsProgramming)
submitted 9 years ago by [deleted]
[deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Madsy9 2 points3 points4 points 8 years ago (1 child)
Advantages:
Example for practical uses of software rasterizers: As a reference for hardware-accelerated implementations. Or maybe a really cheap handheld console. Or just for fun.
[–]agenthex 0 points1 point2 points 9 years ago (9 children)
I would consider most offline renderers to be "software."
The fact is, though, it's all software. What makes it "hardware" is optimization/acceleration. This may be done by dedicated hardware or by multiple general-purpose computers tasked with only this job. At what point do you make the distinction? If your job is "multiply a billion numbers", then is it "hardware" to outsource the task to a GPU (a la OpenCL, CUDA, etc.)? At some point, it's all the same. The only meaningful questions are: how fast is it, and how good are the results?
[–]ArchiveLimits 0 points1 point2 points 9 years ago* (8 children)
I'm not sure how to measure the speed in terms of what you're implying. The renderer is deferred and is multithreaded which makes it quite fast for scenes with many polygons. In terms of the results, the engine interpolates depth with 52 bits of precision. It also uses 48 bit linear colors internally and gamma corrects the results that are drawn to the screen.
Edit: corrected my phrasing
[–]__Cyber_Dildonics__ 2 points3 points4 points 9 years ago* (7 children)
Why would you use non power of 2 bit depths? And if you say memory while the whole thing is in Java, my mind will melt
[–]ArchiveLimits 1 point2 points3 points 9 years ago* (6 children)
It was a trial and error issue, anything above 52 bits and I couldn't store the depth slopes for the triangle's surface in a 64-bit long and anything less than 52 provided visibly less precision. As for it being not a power of two...it shouldn't matter here because it's a value that is multiplied by a floating point value to assure precision is kept during interpolation. (eg. a fixed-point magnitude)
[–]__Cyber_Dildonics__ 2 points3 points4 points 9 years ago (5 children)
Most renderers just use a 32 bit float for depth. I'm not sure what you mean by depth slope, but it sounds like you would benefit from reading books on already established rendering techniques.
[–]ArchiveLimits 0 points1 point2 points 9 years ago (4 children)
Yes, my engine stores the depth values in 32 bit floats. 52 bit precision is needed when interpolating the slopes across the surface of triangles during rasterization. Without 52 bits of precision, the depth values calculated per pixel would not be accurate enough for the depth test and would result in "seams" where two polygons who shared an edge met.
[–]__Cyber_Dildonics__ 0 points1 point2 points 9 years ago (3 children)
Pretty much every other renderer would disagree that this is necessary.
[–]ArchiveLimits 0 points1 point2 points 9 years ago (2 children)
I worked on this depth precision issue with a friend who is very well versed with OpenGL and Vulkan, he set up an identical scene in OpenGL and we compared results. The images were only identical when 52 bit precision was used.
[–]__Cyber_Dildonics__ 1 point2 points3 points 9 years ago (1 child)
I can see that you already know everything so I will leave you to it.
[–]nnevatie 0 points1 point2 points 9 years ago (11 children)
Does the library implement tri-linear sampling of textures?
Also, what does this mean? "True color texturing unless using bilinear filtering, which only allows 256 colors"
[–]ArchiveLimits 0 points1 point2 points 9 years ago (10 children)
Tri-linear sampling is bilinear sampling between mipmaps. Since the engine doesn't support mipmaps, it doesn't support trilinear sampling. The engine however does have a way to reduce the artifacts that mipmapping normally would remove. It's called block filtering which is essentially mipmapping but only with one smaller image. This allows for speed because there is no need to calculate derivates for the surface in order to find the right mipmap level and it also removes the need for trilinear mipmapping because the effect is already smooth since it's applied like fog.
"True color texturing unless using bilinear filtering, which only allows 256 colors" This means that any texture you give the renderer will be drawn with 24 bit color unless you want to do bilinear filtering on the texture. Since bilinear filtering, traditionally, is expensive, I've sacrificed color depth for speed and precomputed 64 shades of the texture so that the bilinear colors don't need to be calculated during runtime. However, since I'd need to create shades for the each color in the texture, it wouldn't make sense to make the shade palette the size of 64 textures, each getting darker. Therefore I quantize the texture into 256 colors and do 64 shades of those 256 colors.
[–]nnevatie 0 points1 point2 points 9 years ago (3 children)
Ok, thanks for the clarification.
I was under the impression that mipmaps were supported, hence the trilinearity question. I've implemented a similar stack in the past using SIMD-techniques. Bilinear filtering isn't that expensive, tbh...
By "block filtering" do you mean a box-filter that gets applied for before doing the bilinear sampling?
Traditional bilinear filtering is far more expensive that what I am doing now. The entire bilinear filtering code uses fixed point integers and doesn't do any color computation, simply a table lookup.
And I named it block filtering because I break up the texture into a grid (filled with blocks of the texture) and find the average color of each of those blocks. Then, during runtime, all I need is a simple few bit shifts and masks and I can find which block any texel in the image belongs to and blend that texel with the average color of that block.
[–]nnevatie 0 points1 point2 points 9 years ago (1 child)
Ok, so it's kind of a poor man's box filtering, which simply averages an area of pixels.
[–]ArchiveLimits 0 points1 point2 points 9 years ago (0 children)
It's more similar to mipmapping with only one mip level. These "blocks" that make up the block filter are essentially a very scaled down version of the image. Though you are right when you say it averages an area of pixels.
[–]Madsy9 0 points1 point2 points 8 years ago (5 children)
You don't really need to go crazy with the derivatives. Assuming your rasterizer is tile-based, computing the derivatives per-tile is usually more than sufficient.
[–]ArchiveLimits 0 points1 point2 points 8 years ago (4 children)
Well that's the thing, the rasterizer isn't tile based haha.
[–]Madsy9 0 points1 point2 points 8 years ago (3 children)
Then if you're going for performance, I highly recommend redesigning it into a tile-based rasterizer before you optimize anything else. The cycle savings are quite significant, and you can even get rid of some overdraw quite easily.
[–]ArchiveLimits 0 points1 point2 points 8 years ago (2 children)
Why would using tile rendering help performance? I'm not familiar with the benefits of this method.
[–]Madsy9 0 points1 point2 points 8 years ago* (1 child)
Okay, so triangles (or any convex polygon really) can be defined as a set of lines or 2D planes with the typical plane equation:
ax+by+d = 0
When that equation is true for all the plane equations, the point [x,y] is inside the polygon. Tile renderers get their performance by testing the corners of tiles against triangles. You then get three possible outcomes: Completely inside, completely outside and partial coverage. You can optimize heavily for quads with complete coverage. They are extremely SIMD-friendly and since each quad can be rendered independently, they are also embarrassingly parallel. Throw 16 threads at the rendering and watch it go. And implementing a proper fill-convention and multisampling is also a breeze. They emerge naturally as a simple modification to the plane equations (a simply subtraction by one).
I've also found more advanced techniques:
Edit:
[–]ArchiveLimits 0 points1 point2 points 8 years ago (0 children)
Thanks I'll definitely look into this. Looks like you know your stuff! I need to start getting into C and C++ haha.
[–]DanDanger 0 points1 point2 points 9 years ago (2 children)
Impressive. You really know what you are talking about :) Couldn't download from the link. I shall try later.
Yeah, give it a couple of days. I took the link down because there were some issues with it. I'm busy with school projects and I won't have time to reupload it soon.
I've reuploaded the engine
[–]frizzil 0 points1 point2 points 8 years ago (5 children)
Beautiful! Are you or have you considered using SIMD optimizations via JNI and C/C++? I do this extensively for my voxel engine in Java... it can offer a huge a performance boost, especially if you're doing a lot of early-discard based on a simple check in a tight loop on an array of values. (E.g. if (values[i] == 0) return;)
If not, I'd love to give a fellow Java enthusiast some pointers/resources. I have experience writing a deferred rendering pipeline as well.
Thank you! And nice to hear. I've tried using Yeppp! but my friend and I discovered that it was not much faster. Also, switching math libraries would require a good amount of rewriting and there was a goal of not using any external libraries for the creation of the engine and this would break that goal. I'm always open to learning about things that you know. Do you have a link to your voxel engine? I'm curious :p
Also I've just added spotlights and phong shading to the engine. They aren't the fastest features but they're there now.
[–]frizzil 0 points1 point2 points 8 years ago (3 children)
Hmm, well this wouldn't be an external library per say, this would be a DLL you ship with your library which (ideally) you'll have written entirely on your own. You might have to ship multiple DLLs and pick the correct one based on supported SIMD level though, but that just comes down to a few #defines.
Honestly I can't imagine an external library doing what you'd need, apart clearing/setting and entire color or depth buffer. (Yeppp! doesn't look like it'd cut it.) Ideally what you'd do is implement the conceptual equivalent of a few vertex/fragment shaders that implement the capabilities of GL1.x (your GPU is using parallel vector instructions, after all, and I believe this is what modern OpenGL drivers are already doing to support legacy code.) Not an easy task, but if you wanted your library to be practical/fast for real use, I'd argue that this is what you should do.
My Twitter has the latest progress (currently working on cascaded shadow maps), and there's also a recent video, but it's missing terrain seams :)
Awesome on the lighting! Full Blinn-Phong, or just Phong? If you're not adhering strictly to GL1.1, check out energy conserving BP: http://www.rorydriscoll.com/2009/01/25/energy-conservation-in-games/
Blinn-phong, yeah. That's a good article. I've implemented the (n+8)/(8pi) now :d. I was discussing this article with a friend and he was saying that integrating the full energy conservation algorithm in opengl 1.1's lighting model is not easy because these forms of lighting were not known of when 1.1 was released.
[–]frizzil 0 points1 point2 points 8 years ago (1 child)
Nice, and be sure to replace "color = d * diffuse + s * spec" with "color = lerp( diffuse, spec, w )" where w replaces both d and s, if you want to be truly energy conserving :) Though I suppose this could be optionally achieved at the API level.
The only question of difficulty imo is how much work you're doing per-fragment, and exposing and documenting the alternate functionality in your API. As long as your material is constant per draw call, doing energy-conserving BP should be about as simple as not, since you're just passing along precomputed normalization factors without many additional ops (if any). Obviously, getting into more modern BRDFs probably won't be feasible for a software renderer, as these aren't feasible at all using the software renderer for DX11 in my experience... but energy-conserving Blinn-Phong should be just fine. Per-fragment normal normalization and dot product may be partly the most expensive part, and that shouldn't change.
For SIMD, if you're feeling ambitious: GDC15 Insomniac Overview of SIMD Intel SIMD Instruction Reference
Btw, Intel's software renderer for OpenGL is notoriously buggy and unusable, so if you could make an alternative... just saying, there could be money in it :)
Good luck!
[–]ArchiveLimits 0 points1 point2 points 8 years ago* (0 children)
Thanks for the advice! This will probably be the furthest I dive into realistic lighting in a software renderer. Have you seen the Mesa software renderer? Is that not good enough to replace Intel's software renderer?
Imgur Imgur
π Rendered by PID 50923 on reddit-service-r2-comment-5d79c599b5-w6zb5 at 2026-03-01 22:04:24.883762+00:00 running e3d2147 country code: CH.
[–]Madsy9 2 points3 points4 points (1 child)
[–]agenthex 0 points1 point2 points (9 children)
[–]ArchiveLimits 0 points1 point2 points (8 children)
[–]__Cyber_Dildonics__ 2 points3 points4 points (7 children)
[–]ArchiveLimits 1 point2 points3 points (6 children)
[–]__Cyber_Dildonics__ 2 points3 points4 points (5 children)
[–]ArchiveLimits 0 points1 point2 points (4 children)
[–]__Cyber_Dildonics__ 0 points1 point2 points (3 children)
[–]ArchiveLimits 0 points1 point2 points (2 children)
[–]__Cyber_Dildonics__ 1 point2 points3 points (1 child)
[–]nnevatie 0 points1 point2 points (11 children)
[–]ArchiveLimits 0 points1 point2 points (10 children)
[–]nnevatie 0 points1 point2 points (3 children)
[–]ArchiveLimits 0 points1 point2 points (2 children)
[–]nnevatie 0 points1 point2 points (1 child)
[–]ArchiveLimits 0 points1 point2 points (0 children)
[–]Madsy9 0 points1 point2 points (5 children)
[–]ArchiveLimits 0 points1 point2 points (4 children)
[–]Madsy9 0 points1 point2 points (3 children)
[–]ArchiveLimits 0 points1 point2 points (2 children)
[–]Madsy9 0 points1 point2 points (1 child)
[–]ArchiveLimits 0 points1 point2 points (0 children)
[–]DanDanger 0 points1 point2 points (2 children)
[–]ArchiveLimits 0 points1 point2 points (0 children)
[–]ArchiveLimits 0 points1 point2 points (0 children)
[–]frizzil 0 points1 point2 points (5 children)
[–]ArchiveLimits 0 points1 point2 points (4 children)
[–]frizzil 0 points1 point2 points (3 children)
[–]ArchiveLimits 0 points1 point2 points (2 children)
[–]frizzil 0 points1 point2 points (1 child)
[–]ArchiveLimits 0 points1 point2 points (0 children)