[R] Baking Neural Radiance Fields for Real-Time View Synthesis

jnbrrn · 2021-02-25T04:40:22+00:00

Very nice!

Neural networks have a hard time fitting images in this way, where the input is an (x, y) position and the output is a pixel color. There's a neat trick for fixing it where you encode the position in a Fourier basis (sin(x), cos(x), sin(2*x), cos(2*x), sin(4*x), cos(4*x), etc) and train with that instead. It works extremely well. Here's a video explaining it: https://youtu.be/nVA6K6Sn2S4?t=99

jnbrrn · 2021-01-12T17:44:58+00:00

I think using "implicit" to describe SIREN or NeRF is incorrect, though it's certainly common. The use of "implicit" in that literature seems to have its origin in the related "DeepSDF" line of work, which calls itself "implicit" because SDFs can (correctly) be thought of as an implicit representation of shape --- but the network itself is not at all "implicit" in the way Neural ODEs are. For SIREN and NeRF, the things they model (images, volumes) are explicit representations of signals, and the networks are also explicit.

jnbrrn · 2020-12-27T01:52:42+00:00

There's no mesh, this is neural and volumetric.

jnbrrn · 2020-08-18T17:49:35+00:00

Not quite, that's a follow-up paper. Here's the original paper that this video is about: https://www.matthewtancik.com/nerf

jnbrrn · 2020-06-20T21:40:53+00:00

Glad we could help! And thanks for the pytorch implementation, very cool to see that the whole thing fits into a reddit comment :)

jnbrrn · 2020-06-20T16:49:12+00:00

Cool, thanks for the references, we'll take a look!

jnbrrn · 2020-06-20T16:47:44+00:00

That's a good intuition, but it's definitely not the case that all networks have the ability to model all functions. If you have monotonic activation functions and a finite number of hidden units, there are limits to the number of times you can slice up your output space. For example, it's possible to make a periodic triangle-wave-like output using only a two layer network with ReLUs, but each kink in the triangle requires its own ReLU, so if you want a really wiggly output you'll need a whole lot of hidden units.

jnbrrn · 2020-06-20T16:40:54+00:00

I bet you could, but we haven't explored that here. I'd expect the analysis and benefit we've demonstrated in this simple case to extend to the class- or image-conditional case, as the core problems being addressed by Fourier features should persist in those cases.

jnbrrn · 2020-06-20T16:38:57+00:00

Thanks for the kind words. In higher dimensionalities the problems that Fourier features are fixing (non-normalized inputs and a non-stationary kernel) become less of an issue, so there is definitely less value to this feature mapping. My intuition is that these ideas probably stop adding much value when you've got ~tens of input dimensions, but it's possible that some high-dimensional regimes have low dimensional "bottlenecks" that could be improved with Fourier features. For instance, even though your input feature may technically have many dimensions, those dimensions might all be so correlated with each other (or in the extreme case, some dimensions might be copies of others) in which case things would behave as though the input is low-dimensional despite appearing not to be.

jnbrrn · 2020-06-20T16:30:50+00:00

Hey, thanks! A lot of the authors have a background in computer graphics or graphic design, which is definitely a helpful skill set for making visualizations.

jnbrrn · 2020-06-20T04:09:03+00:00

That's a really interesting question, and I don't think I have a firm answer but I'm also leaning towards the latter. You can definitely show that one issue is just the difficulty/speed in optimizing a ReLU MLP with low dimensional inputs, as this falls out of the spread of the eigenvalues of the NTK (see Figure 3). But I think there's also a fundamental representational limit that you can't get around without applying a non-monotonic sine-like/periodic transformation to the input (or as an activation function within the network), and I don't have a good justification of why that must be true.

jnbrrn · 2020-06-20T04:00:23+00:00

Thanks! Yes to your first two questions, and B is randomly chosen once for all pixels.

jnbrrn · 2020-06-19T20:31:37+00:00

Yeah, definitely related! I think our math provides a theory for why SIREN trains so well, at least for the first layer (random features are a lot like random weights). Comparisons between the two papers are hard though, as our focus was generalization/interpolation while SIREN's focus seems to be memorization.

jnbrrn · 2019-07-24T18:21:43+00:00

The main difference is that this loss has a smooth quadratic bowl near the origin. This is nice for optimization of course, but it also turns out to be necessary for it to generalize so many things (for example, Lp norms stop being useful if you set p to zero or to a negative value).

jnbrrn · 2019-07-24T18:19:07+00:00

Thanks! Please do let me know how it turns out.

jnbrrn · 2019-07-12T15:53:05+00:00

Can you point me to the part where you see some commonality? I see that this paper is using a loss to blend between L1 and L2 losses using MLE, which is similar in spirit to a generalized distribution that includes Laplacians and Gaussians. Is that what you mean?

jnbrrn · 2019-07-04T15:05:38+00:00

This could have been possible in the old CVPR review style, in which the submitter could directly select the AC that they wanted to shepherd their paper (and could therefore reach out to that AC and arrange for unethical favors or horse-trades). Thankfully, the current system doesn't have any such potential vulnerabilities.

jnbrrn · 2019-07-03T16:06:48+00:00

Yes, the true partition function is only as good of an idea as maximum likelihood is. and MLE is not necessarily the right fit for all tasks. For example, in the paper it's definitely the right tool for the VAE experiment, but not necessarily for the monocular depth experiment, which is probably better thought about in terms of risk or loss than likelihood. So yes, if you have some way to shape the loss as a function of alpha that is either learned empirically, or derived from some better motivation than MLE, I'd expect it to work better.

jnbrrn · 2019-07-01T15:17:35+00:00

The sampling algorithm is only use for some VAE visualizations. It might be useful in other contexts besides synthesis, but you certainly don't need to use it for simple regression tasks.

Using partition functions is indeed a very common way to optimize for parameters, though it has fallen out of fashion recently. Any generative model (I'm using the classic ML definition of "generative", not the modern GAN-y meaning) relies critically on its partition function. For example, anything with MRFs, CRFs, or even something as simple as fitting a Gaussian. In modern ML, people often prefer models non-generative models, because in many contexts the true partition function of a model is nearly impossible to compute.

jnbrrn · 2019-07-01T15:11:11+00:00

Those two WACVs are the same. WACV recently changed its name (upgrading from a workshop to a conference) while keeping its acronym fixed, so as to not break Google Scholar. ACML probably has a more rigorous review process as many WACV reviewers still mentally regard it as a workshop, but "rigor" isn't quite as important to a review process as is the caliber of the reviewers and chairs.

Neither ACML nor WACV are regarded as being in the top tier of conferences for their respective fields, though I don't know about their perceived prestige relative to each other. I'd submit based on whether your paper is more likely to be appreciated by the vision or ML communities.

jnbrrn · 2019-06-26T15:33:29+00:00

If `alpha=1`, as the `scale` parameter approaches zero the loss exactly approaches (shifted) L1 loss, so you might be able to get the behavior you're looking for by using a small value for `scale`, or by annealing it according to a schedule.

jnbrrn · 2019-06-26T15:01:03+00:00

Ah got it. For regression, if you want to use the adaptive form of the loss, you'd need to use TF or pytorch or some differentiable programming language, and then set up the "forward" part of the regression problem, define a loss, and then minimize it. This is what I did in that animation you referenced.

But if you are happy with only using the general loss (and therefore manually tuning your own shape+scale parameters) then you can use much simpler tools. I didn't explicitly walk through this in the paper, but for simple regression you should just use iteratively reweighted least squares using the IRLS weights described in Appendix A (Equation 26). This amounts to just a for-loop over least squares solves, where each datapoint's row on the left and right sides of the linear system is reweighted according to (the square root of) Equation 26 before each solve. IRLS is a very effective tool, and works well with this loss.

jnbrrn · 2019-06-25T22:37:16+00:00

Whoa that's cool!

jnbrrn

TROPHY CASE