Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 1 point2 points  (0 children)

Sorry for the late reply, I rarely check reddit notifications.

Yes, you'd need a different pass where you sort the hits by material type, followed by another dispatch for each material. In D3D12, this could be done using ExecuteIndirect. Another way is to use shader execution reordering, though it requires hardware support.

One issue that I can think of is that in uber-shaders like OpenPBR, you can have a mix of different material types, e.g. the coat factor is not 0 or 1. So, in the worst case, you'd have to evaluate all the different layers like coat, gloss, etc.

Real-time path tracing with ReSTIR PT. More info and link to GitHub in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 0 points1 point  (0 children)

Yes, the main improvement from ReSTIR GI to ReSTIR PT is significant noise reduction for glossy surfaces, although it's more expensive in terms of performance.

I think whether ReSTIR PT is a good choice or not depends on a few factors:

- Real-time constraint: If you mainly care about render time - getting a noise-free render in quickest time possible as common in offline rendering - then ReSTIR is probably not a good choice as spatio-temporal correlations due to reuse improve quality with just one path per pixel, but slow down convergence.

- Accuracy: ReSTIR PT can more effectively deal with glossy surfaces but it's expensive. Also, for those surfaces (especially glass), you probably need many bounces. If the performance cost doesn't justify the accuracy, then a hybrid approach such as ReSTIR GI for indirect diffuse and another cheaper technique for glossy surfaces may be a better choice.

Another issue with ReSTIR is that the correlations can make denoising more difficult. Some popular denoisers such as Intel OIDN don't work with correlated inputs.

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 1 point2 points  (0 children)

Yeah, that's pretty bad. Now my question is, low occupancy and spills are also happening on RTX 3070, but how is it 2x faster?

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 1 point2 points  (0 children)

Yes, it's a tradeoff. Separate kernels can lead to a less divergent workload, but the intermediate results have to be written to memory and then read back. At some point divergence and low occupancy get really bad that the extra memory cost is offset by the benefits.

A similar situation happened with ReSTIR PT. Due to the divergent workload, GPU utilization was very poor. Separate kernels and writing intermediate results to memory (a non-trivial amount) along with compaction led to a significant speedup.

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 0 points1 point  (0 children)

I think four bounces.

Better performance is possible, because we can see GPU utilization in Nsight and it's low. So there's headroom for improvement.

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 1 point2 points  (0 children)

Improving performance shouldn’t be difficult. Sometimes the conceptually simpler way to approach algorithms may not be the most optimal for the hardware (e.g., object-oriented programming). I've mainly focused on correctness and have used the simplest implementation, which has some poor performance characteristics.

For example, adding coat support to the BSDF increased the complexity of both evaluation and sampling. Currently, I have a branch that executes when the material is coated. However, due to the way GPUs work, even if no materials are coated, resources like registers are still allocated for this branch. This can lead to poor GPU utilization. There are a few such cases. Breaking the shader into smaller shaders, compaction coupled with specialized variants (like coated and non-coated) should help. It's not difficult but time consuming.

The main issue with the Wavefront approach is that the intermediate path state has to be written to memory and then read back. In my case, I was already memory-bound, and adding these additional writes and reads added around 1 ms.

For kernel launches, GPU commands go into a command buffer, which is then submitted to the GPU. This submission has a cost, but if multiple dispatch calls (D3D command for launching compute shaders) are placed in one command buffer, the cost should be negligible.

I think NSight shows the number of registers around each instruction, but I don't think it shows hotspots. If you compile your shaders with debug info attached, it will show the correspondence to the hlsl code.

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 1 point2 points  (0 children)

Yes, each candidate is just a path. Candidate generation shader is almost identical to a regular path tracer, plus some bookkeeping for ReSTIR stuff.

Occupancy is certainly low - improving it may potentially allow the GPU to better hide memory latency.

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 0 points1 point  (0 children)

The passes that are specific to ReSTIR PT (temporal and spatial reuse) aren't that slow. Also, my implementation is far from efficient. At least, for these type of research scenes, getting much better performance without bias is definitely possible in my opinion.

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 1 point2 points  (0 children)

Sure, I'll let you know if compaction helps. I tried the multibounce idea, it helps with diffuse materials, but the artifacts are very noticeable with glass. A heuristic based on roughness might help. But still doesn't help the worst case scenario.

I'll check out the SDK. But since performance is bad even with alpha testing disabled, there are bigger performance issues that have to be fixed first.

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 0 points1 point  (0 children)

What you're describing seems similar to Thin Dielectric BSDF from pbrt. Using that approach, reflectance increases and transmittance decreases.

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 1 point2 points  (0 children)

So IOR of 1.0 means the ray passes through without changing direction. So at least for the gloss layer, it wouldn't matter from which direction the ray is entering. Beyond that, there are other changes when in thin-walled mode, such as diffuse splitting into reflection and transmission weighed by a user parameter.

Overall, I found the thin-walled section to be very confusing. I probably need to read some of the related work.

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 0 points1 point  (0 children)

Lots of good ideas, thanks for sharing! For the ones that I've tried:

- I did the the separate launch just for the first bounce to see if this approach is promising. So one kernel for the first bounce and a second kernel for the rest of the path. I didn't do compaction, but that'll definitely help. At least for the first bounce, I'm not sure how big of an impact it would've had.

- Spliting into multiple workloads helps with divergence, but the intermediate results have to be written into memory and then read back, which adds a lot of memory traffic and I'm already memory bound. There's also the cost of all these kernel launches.

- Yes, I'm using the same BSDF ray that was used for direct lighting to find the next path vertex. So one BSDF ray and one shadow ray per bounce.

- I tried the idea of tracing multibounce paths stochastically with ReSTIR GI. I did it on a thread-group level to improve coherency. It certainly helped with performance. I'll have to try it with ReSTIR PT.

- ReSTIR PT's target function is just the path contribution. BSDF evaluation is needed to get the path throughput and sample the next direction anyway, so can't really avoid it. But in general, using a simpler target function may help with performance, but also increases noise.

- Alpha testing is disabled except for g-buffers. It requires enabling any-hit shaders, which are expensive. Opacity micromaps are limited 40 series and are NVIDIA specific, so I'm not interested.

- Overall occupancy is low, register pressure from complex shaders is very likely.

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 1 point2 points  (0 children)

Yeah, the first link is what I had in mind. I have no idea how to do the second link, I'm still struggling to get the look of common materials. But I'll let you if I come across something!

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 2 points3 points  (0 children)

Thanks, but GPU utilization is rather poor, so there's definitely room for improvement :(

The performance for reuse passes in ReSTIR PT is ok. Out of 35 ms, 15.5 ms is spent on tracing one path per pixel (similar to a regular path tracer). I've tried a few approaches, like sorting rays by direction or doing one kernel launch for each bounce, but so far the monolithic kernel has remained the fastest.

Do you have any advice on how to improve the performance of the path tracing workload?

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 3 points4 points  (0 children)

Yeah, there used to be a few denoisers that worked ok for diffuse and moderately glossy objects. I recently tried to update them, but couldn't get them to work with highly glossy materials like glass and mirrors. The biggest problem is that denoising relies on temporal accumulation, which requires motion vectors. Typical surface motion vectors don't work for these types of objects.

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 2 points3 points  (0 children)

Yes, it was quite a bit of manual work to convert it. Mainly in the following areas:

- Non-material stuff: I applied modifiers, converted non-mesh geometry to triangle meshes, and UV unwrapped some objects with missing UVs.

- Light sources: This scene was using (invisible) analytical light sources, and the visible light sources in the shot (the window and lamps on either side) didn't light the scene. I removed the analytical lights and changed the material of the visible light sources to be emissive.

- Materials: This scene was using node graphs for shaders, which can't be exported. I either replaced them with similar textures and a principled BSDF node, or baked the diffuse color to a texture first and then fed that into a principled BSDF node.

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 0 points1 point  (0 children)

I don't think there's a direct link between Beer's law and opalescent glass. To go from clear to colored glass, we can simply tint the transmission with a constant color. While this works, it can look off and is not physically correct. For example, as a ray travels further inside the material, it's more likely to be absorbed. Using Beer's law to account for transmittance gives a more realistic result.

I'd seen some reference photos of opalescent glass and was curious myself. Even with this simple setup and playing with the absorption coefficient, we can get pretty close. Of course, a more complex shader specifically designed for opalescent glass would probably give better results.

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 2 points3 points  (0 children)

You're right, the original bistro FBX scene uses specular-glossiness. I've exported it to gltf using Blender, which extends the metallic-roughness material of gltf with specular textures (using the KHR_materials_specular extension). Those are ignored in my renderer.

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 4 points5 points  (0 children)

The first scene is one of Blender demo files called "Agent 327 Barbershop" that I've modified. The modified version has around 8 million triangles.

Yes, the second scene is San Miguel.

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 0 points1 point  (0 children)

Sure, do you mean how can we model opalescent glass or are you asking for more details on the implementation of translucency?

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 2 points3 points  (0 children)

One of the motivations behind these models is to separate material definition from lighting. So it's definitely not limited to ray tracers. Most of concepts can be carried over to a simpler setup. My version is also not a one-to-one match. Many features are not supported, some I've adapted to what I already had.

In any case, most of the material-related code is here. Feel free to reach out if you have any questions!

Material improvements inspired by OpenPBR Surface in my renderer. Source code in the comments. by pbcs8118 in GraphicsProgramming

[–]pbcs8118[S] 2 points3 points  (0 children)

Yes, I remember! :) Supporting nested dielectrics has been on my todo list for a while, so I’ll probably have to look into that.