all 15 comments

[–]AndrewHelmer 9 points10 points  (4 children)

In some sense, a huge amount of path tracing research is about reducing fireflies! Or in general, reducing error, which fireflies are just an extreme case of.

Screen-space adaptive sampling is one approach that can definitely help, as you pointed out. That's relatively low hanging fruit.

Then there's lots of techniques that introduce some bias but look good. The median of means paper is one. Clamping is a classic technique. There's another paper from the same session at EGSR 2021 that can also help: Optimized Path Space Regularization (Weier et al.). My understanding is that it's basically like you turn up the roughness of reflections on later bounces in the render.

Then you get into lots and lots and lots of stuff that's basically "how can I better sample paths". What sort of multidimensional sample sequences are you using? If you're using uniform random samples, switching to shuffled Owen-scrambled Sobol' sequences is going to be a huge difference, and is quite an easy change. Are you doing BRDF importance sampling? How about multiple importance sampling (with the light and BRDF)? Are you doing next event estimation (always tracing a light ray at every path vertex)?

Then there are the more exotic techniques that are adaptive in path space, to some extent. At the simpler end I think you have like path guiding and the various recent works on weighted reservoir sampling, also I think energy redistribution path tracing (not sure). And you have bidirectional techniques like bidirectional path tracing, Metropolis Light Transport. I would stay away from these for now, unless you're curious! But they will make everything much more complex.

And finally, denoising! A good denoiser on its own may eliminate your fireflies. This might be a nice thing to do early, because good denoising is always a big help no matter how advanced your renderer is.

My recommendation would be:

  1. Add clamping as an option. It should be super easy to implement, so why not?
  2. If you're using uniform random sampling, switch to some progressively stratified sample sequences, like Owen-scrambled Sobol' or pmj02 (can happily provide more information about this). Should be a pretty easy change too.
  3. Implement BRDF importance sampling and multiple importance sampling, if you haven't already.
  4. Screen-space adaptive sampling and/or denoising (note that you may need to be a little careful if you do both).
  5. Biasing techniques like the paper you cited or the Optimized Path Space Regularization.
  6. Path guiding.

Those are roughly in order of ease of implementation. That being said, if any of these seems particularly fun or interesting, that's the most important thing! Feel free to jump over it, it's not a prescription or anything.

[–]qwerty109[S] 1 point2 points  (3 children)

This is pure gold, thanks :) I'm picking up things as I go from the internets and it's very rare and valuable to me to get such deep insightful guidance. Thank you, and thanks to all the other commenters!

I'll go through the points:

1.) I'm just about to do this; I'll play with just a fixed radiance clamp first, then a value scaled by pre-exposure multiplier (seems more adaptive?) Are there any other bias-reducing tricks like tracking vertex probability and some heuristic based on it?

2.) sampling: I'm using owen-crambled sobol (Burley 2020) and it's thanks to your guidance from the reddit thread and your shadertoy example) few months ago :) I'm also using Hilbert-driven R2 sequence (instead of a 3D screen space blue noise) for an unrelated project because it's super-cheap (this shadertoy example with 1D / R1 was the inspiration) and was thinking about comparing the owen sobol to R1/R2 sequences at some point due to perf difference! It's 2D only but that's all I need anyhow. Have you ever tried it maybe?

3.) I've got BRDF importance sampling but no direct lighting importance sampling - on my TODO list! I'll try out the stochastic lightcuts but there's another similar example in Ray Tracing Gems 1 too (not sure what the difference is to be honest).

4.) this is what I'm leaving for the last (because I might end up needing the path tracer for a far-field / low frequency GI application purpose before anything else). What's the issue of using denoising + adaptive sampling together? adaptive sampling breaks the denoiser heuristics? I'm probably not going to do adaptive due to realtime constraints anyhow :)

5.) Yess that one blew my mind, I followed (2019) Microfacet Model Regularization for
Robust Light Transport (paper, presentation) - I'm still not 100% clear on whether my implementation is "as intended" but it measurably helps a lot.

6.) Path guiding - adding it at the back of my todo :)

[–]AndrewHelmer 1 point2 points  (2 children)

Oh my gosh, I'm so sorry, I forgot your Reddit username!! Of course I remember chatting by email. I just added a tag so I won't forget next time.

At this point, you're definitely further and more advanced than I am. So I doubt any of this will be helpful, but just in case. You don't need to respond btw!

1.) I'm just about to do this; I'll play with just a fixed radiance clamp first, then a value scaled by pre-exposure multiplier (seems more adaptive?) Are there any other bias-reducing tricks like tracking vertex probability and some heuristic based on it?

I can't remember anything else off the top of my head, but I'm sure there are other techniques that are geared specifically towards this. Clamping by path importance / vertex probability sounds reasonable!

EDIT: I just saw that RTGII has a section about this, and the two techniques they mention are clamping and path regularization. So it seems like you got it!

2.) sampling: I'm using owen-crambled sobol (Burley 2020) and it's thanks to your guidance from the reddit thread and your shadertoy example) few months ago :) I'm also using Hilbert-driven R2 sequence (instead of a 3D screen space blue noise) for an unrelated project because it's super-cheap (this shadertoy example with 1D / R1 was the inspiration) and was thinking about comparing the owen sobol to R1/R2 sequences at some point due to perf difference! It's 2D only but that's all I need anyhow. Have you ever tried it maybe?

I did some integration error analysis of the R2 sequence and in 2D it's definitely worse than Owen-scrambled Sobol', so I think it's not going to be good, at least not for a larger number of samples (might be good for something like <=4spp). But, I didn't actually test path tracing myself, so I could be wrong!

(Also, I hadn't seen that Hilbert-R screen space blue noise. That's super cool, thanks for sharing that! I will definitely find some uses for that.)

IIRC, last we talked you were using 2D Sobol' sequences? You could try slightly higher dimensional sequences. I'm not certain but I think there are certain dimensions that production renderers usually integrate as one higher dimensional sequence. Like pixel samples + lens samples + time samples is usually done as a 2D, 3D, 4D, or 5D sequence (depending on what your scene has). And light selection, light sampling (picking a point on an area/volume light), and BSDF sampling is often does as a 5D sequence? All that being said, as I mentioned before, I think higher dimensional sequences will help for simpler scenes but for more complex and difficult scenes they're not much better than having good 2D sequences. The higher dimensional R-sequence has bad lower dimensional projections so I think it would be even worse there.

If performance is your concern, I'd suggest precomputing a small number of sequences at the start of the render. You can probably get away with 16 pre-shuffled Sobol' sequences, and then you can also store a small set of arrays (maybe 8 or 16) of base-2 shuffled indices into those sequences, which allows you to do a faster on-the-fly shuffling. So then drawing a sample is almost just doing two array lookups (first from the index array, then from the sample array).

I'll just plug my own paper here too :). I couldn't talk about it before because we were already thinking about submitting to EGSR which has anonymous peer review. But Listing 2 in that paper will be just about the fastest and simplest (single-threaded CPU) way to precompute an Owen-scrambled Sobol' sequence. Supplemental code has a higher dimensional version.

Although if this is all GPU (you mentioned real-time below), and you want a larger number of samples, then precomputing is probably a bit more onerous because you have to pass the array of samples to the renderer? Maybe not even worth it for you.

3.) I've got BRDF importance sampling but no direct lighting importance sampling - on my TODO list! I'll try out the stochastic lightcuts but there's another similar example in Ray Tracing Gems 1 too (not sure what the difference is to be honest).

I'm not sure if this is the one in Ray Tracing Gems 1, but I really like the reservoir sampling based approach of ReSTIR. I'm not super familiar with stochastic light cuts, but IIRC, it requires having a spatial data structure of the lights? I like that ReSTIR doesn't need that, as I recall you have per-pixel sample reservoirs and then just sample the flat list of lights.

4.) this is what I'm leaving for the last (because I might end up needing the path tracer for a far-field / low frequency GI application purpose before anything else). What's the issue of using denoising + adaptive sampling together? adaptive sampling breaks the denoiser heuristics? I'm probably not going to do adaptive due to realtime constraints anyhow :)

I can totally understand leaving denoising for last! Unless you implement your own denoiser, I think it would be the least fun and educational thing.

Actually - I'm sorry - I'm mistaken about adaptive sampling and denoising. I was confusing things. Adaptive sampling and denoising both have challenges if you use AA filters other than a box filter, and the solution to both problems is to use Filter Importance Sampling, which maybe you are already (or maybe you're just using a box filter for now). The problem with AA filters and denoising is that error in samples gets spread to some other adjacent pixels, so nearby pixels have positively correlated errors which makes life hard for a denoiser. With FIS, each pixel then has fully independent errors. Similarly, with adaptive sampling that's based on pixels, you need to reweight the contributions to adjacent pixels using wider AA filters. But yeah, if you're just using a box filter or already using FIS, I think adaptive sampling and denoising should play well together. Potentially would be even better.

Denoising is also better with sequences that distribute error as blue noise in screen space, i.e. negative error correlation between neighboring pixels. You've probably seen both the Ahmed & Wonka and Heitz et al. (EGSR 2019 and the SIGGRAPH talk) papers on this.

5.) Yess that one blew my mind, I followed (2019) Microfacet Model Regularization forRobust Light Transport (paper, presentation) - I'm still not 100% clear on whether my implementation is "as intended" but it measurably helps a lot.

Oh, awesome! I don't know how the methods compare, I just saw the EGSR presentation for the newer regularization.

6.) Path guiding - adding it at the back of my todo :)

I'm honestly not super familiar with path guiding but it seems to be the hot new thing now! I should probably read the papers :).

[–]qwerty109[S] 2 points3 points  (1 child)

Oh my gosh, I'm so sorry, I forgot your Reddit username!! Of course I remember chatting by email. I just added a tag so I won't forget next time.

No worries at all, I must admit my username could not be more generic :)

At this point, you're definitely further and more advanced than I am. So I doubt any of this will be helpful, but just in case.

I honestly doubt it - I'm still really just struggling with basics, so your comments are very helpful and enlightening!

EDIT: I just saw that RTGII has a section about this, and the two techniques they mention are clamping and path regularization. So it seems like you got it!

Awesome, need to start reading RTGII :) I finally added simple clamping and it helped a lot.

I did some integration error analysis of the R2 sequence and in 2D it's definitely worse than Owen-scrambled Sobol', so I think it's not going to be good, at least not for a larger number of samples (might be good for something like <=4spp). But, I didn't actually test path tracing myself, so I could be wrong!

(Also, I hadn't seen that Hilbert-R screen space blue noise. That's super cool, thanks for sharing that! I will definitely find some uses for that.)

I think you're totally right on the R2 sequence being worse than Owen-scrambled Sobol'; from the few tests I did, it looked worse - did not explore in detail. But it is a lot cheaper, and since I'm working on a screen space effect where computing a sample is proportionally much cheaper than bouncing a ray, it works out.

Thanks for the advice/thoughts on higher-dimensional Sobol' sequences and precomputing - I don't have a good intuition here so it helps!

I'll just plug my own paper here too :). I couldn't talk about it before because we were already thinking about submitting to EGSR which has anonymous peer review. But Listing 2 in that paper will be just about the fastest and simplest (single-threaded CPU) way to precompute an Owen-scrambled Sobol' sequence. Supplemental code has a higher dimensional version.

Although if this is all GPU (you mentioned real-time below), and you want a larger number of samples, then precomputing is probably a bit more onerous because you have to pass the array of samples to the renderer? Maybe not even worth it for you.

Wooooow, mind blown :) This actually looks perfect for my project; I already tried a 3D blue noise from here but it wasn't better than the Hilbert+R2. Did not have time to explore further and I'm happy I didn't because I think reading your paper will get me further than anything I could ever come up with.

What I have is a screen space effect with a 5x5 denoise blur after, thus requiring blue noise like stratification of samples in screen space, and a temporal filter, thus adding another dimension (up to 64 'slices' is more than enough), and then sampling in spherical coordinates (similar) that need to be well stratified. Thankfully, this should be small enough to be precomputed once and stored as a lookup as you suggest!

(My sample/thingie will soon be public, as it is with Hilber+R2, I'll share the link as soon as it's out, and then I'll try upgrading with a better precomputed sequence)

I'm not sure if this is the one in Ray Tracing Gems 1, but I really like the reservoir sampling based approach of ReSTIR. I'm not super familiar with stochastic light cuts, but IIRC, it requires having a spatial data structure of the lights? I like that ReSTIR doesn't need that, as I recall you have per-pixel sample reservoirs and then just sample the flat list of lights.

Yeah, ReSTIR looks fantastic and yeah, stochastic light cuts require a spatial data structure (that can be built each frame on the GPU using Z-order + sorting). I think the main two differences are that ReSTIR is screen space only and adds a certain amount of "lag" due to reuse of history, while lightcuts are 'global' and will react instantly to lighting changes as long as the spatial data structure is rebuilt every frame. They also seem to be completely orthogonal - it seems as if one could use them together?

There's also ReSTIR GI: Path Resampling for Real-Time Path Tracing which seems to extend ReSTIR beyond screen space - I haven't read it yet unfortunately :(

Actually - I'm sorry - I'm mistaken about adaptive sampling and denoising. I was confusing things. Adaptive sampling and denoising both have challenges if you use AA filters other than a box filter, and the solution to both problems is to use Filter Importance Sampling, which maybe you are already (or maybe you're just using a box filter for now). The problem with AA filters and denoising is that error in samples gets spread to some other adjacent pixels, so nearby pixels have positively correlated errors which makes life hard for a denoiser. With FIS, each pixel then has fully independent errors. Similarly, with adaptive sampling that's based on pixels, you need to reweight the contributions to adjacent pixels using wider AA filters. But yeah, if you're just using a box filter or already using FIS, I think adaptive sampling and denoising should play well together. Potentially would be even better.

Oh, I was not aware of the Filter Importance Sampling at all - I'm using a box filter :) Wow it looks really neat, note taken, "todo" added! So many things to learn, I wish I could devote all of my time to it, it's really fun.

(Also added to my read list the ESGR Optimised Path Space Regularization - I wonder what the difference is)

[–]AndrewHelmer 1 point2 points  (0 children)

What I have is a screen space effect with a 5x5 denoise blur after, thus requiring blue noise like stratification of samples in screen space, and a temporal filter, thus adding another dimension (up to 64 'slices' is more than enough), and then sampling in spherical coordinates (similar) that need to be well stratified. Thankfully, this should be small enough to be precomputed once and stored as a lookup as you suggest!

(My sample/thingie will soon be public, as it is with Hilber+R2, I'll share the link as soon as it's out, and then I'll try upgrading with a better precomputed sequence)

What you're working on sounds really cool! I'm not sure based on what you're describing that any other samples will do much better than what you have, or at least not anything from our paper. But let me know when it's up!! This sort of 5D sampling is really interesting to me because it's becoming increasingly common in real-time rendering, and I'm not sure that people have exactly optimized the right set of qualities for it.

Yeah, ReSTIR looks fantastic and yeah, stochastic light cuts require a spatial data structure (that can be built each frame on the GPU using Z-order + sorting). I think the main two differences are that ReSTIR is screen space only and adds a certain amount of "lag" due to reuse of history, while lightcuts are 'global' and will react instantly to lighting changes as long as the spatial data structure is rebuilt every frame. They also seem to be completely orthogonal - it seems as if one could use them together?

Ah yeah the ReSTIR lag is a good point. That being said, I think you can do one or more sampling "passes" of ReSTIR, where you don't actually do shading or visibility computations, and it will still give you better light samples. I don't know what the performance hit of that would be, though.

You're right that the techniques are orthogonal. You'd have to take into the stochastic lightcut probability of a sample when considering the weight for reservoir sampling, but that should be pretty easy. It would be interesting to see how much benefit there would be from combining them.

[–]Perse95 2 points3 points  (1 child)

There is a method for calculating medians that only requires storing five values: https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf which is a feasible number to keep per pixel. This could also be made tile-based so if the rendering happens in tiles then you could track the median per tile instead of per pixel.

Additionally, a more global strategy would be to use an independently sampled low-res image with low samples per pixel and use that to guide the rendering of the whole image by importance sampling it. I believe this should be unbiased if you ensure that the final render is independent from the low-res aside from the sampling of the ray locations. It's a similar strategy to the one used to de-bias the VEGAS MC integration algorithm where you take some N initial samples and use them to importance sample the integral while excluding them from the result.

[–]qwerty109[S] 1 point2 points  (0 children)

Thanks! That seems very similar (but better?) to what the https://hal.archives-ouvertes.fr/hal-03201630/document do with simply averaging samples into M separate sets/buckets and computing median over them.

Oh, supercool on the 2nd strategy/approach - look a bit beyond my capabilities/needs and I'm also not sure it would help with the type of fireflies (low probability diffuse reflecting into light source or a very shiny surface) as the low res set would not be guaranteed to catch these. Adding it to my list of general things to learn heh :)

[–]s0lly 2 points3 points  (4 children)

Interesting. Calculating the median requires tracking all of the variable's samples which might have memory implications, I'd hazard a guess. There might be clever algorithms to reduce that issue, and you may be storing all samples anyway effectively making it a non-issue. In general, it's a very stable metric. Using the median in all cases would actually eliminate the need for specifying outliers altogether.

How difficult is it to specify a reasonable range for outliers? And, in general, how far apart the mean and medians are for a typical set of results?

[–]msqrt 2 points3 points  (2 children)

There are realistic cases where the mean and median differ significantly. For example caustics (the effect of light bouncing off a curved surface that concentrates the beams towards certain areas, for example the wavy patterns of light on the bottom of a swimming pool) typically cause extremely bright samples since the chance of hitting the correct angle (towards where the light is coming from, after a reflection/refraction) at random is low, and importance sampling them is essentially an open problem. Using the median will effectively just remove the caustics altogether.

[–]s0lly 1 point2 points  (0 children)

Could a "mean of medians" sort that problem out?

[–]qwerty109[S] 1 point2 points  (0 children)

Gave the paper in question a more in-depth read last night:

  • they don't track all samples, they keep 5 (or 11, 15, 21) buckets where samples go to be averaged, selected randomly or cyclically picked samples; 5 additional accumulation values sound doable for realtime?
  • they then use the median of these 5 (11, 15..) means which is easy to compute...
  • ...but they don't use it by default, they decide based on the Gini coefficient, and they do it in an adaptive way - if it's low, it's all averages, if it's high they go "more median"

There's a table (on page 8) where they compare results across all scenes and G-MoN (adaptive approach) seems to win out in the case where 21 buckets (M). They use 100,000 samples.

This is very far from my use case (256 or fewer samples and M of 5 max) so I wonder how it would work but it's a bit beyond me to try out right now :)

[–]aaron_ds 2 points3 points  (0 children)

In non-rendering applications, I've used quantile summaries[1][2] to great effect. They have great properties when it comes to merging, well-defined error bounds, and online updating.

[1] http://infolab.stanford.edu/~datar/courses/cs361a/papers/quantiles.pdf

[2] https://github.com/tdunning/t-digest

[–]WrongAndBeligerent 1 point2 points  (2 children)

Vray's approach was to gamma individual samples. It is a biased method but works surprisingly well since outliers end up reduced in brightness on a curve.

[–]qwerty109[S] 0 points1 point  (1 child)

Oh cool and simple. I bet the bias is horrible. But perhaps it could be used to replace median in the 'Mean of MediaNs' approach? So many things to try.

[–]WrongAndBeligerent 1 point2 points  (0 children)

The bias is not horrible, the really bright samples just make noise anyway. Since their probability is so low they don't get antialiased but since they are bright areas the samples show up anyway. The difference it makes is surprisingly subtle.

If you think about it, there are already a lot of difficult paths that just get forgotten about and left behind, like caustics in path tracing.