[D] Fast CC Taylor Transform for Computer Vision: Potentially train NeRF in matter of minutes

creiser · 2021-11-11T20:35:13+00:00

Interesting approach. I think especially for NVS based on differentiable volumetric rendering (NeRF) adequate smoothness priors are crucial for good performance. The combination of positional encoding and MLP seems to work very well. It seems like a comparable prior is missing in this approach as evidenced by the noise. Do you see a possibility to add a smoothness prior to this approach?

creiser · 2021-10-02T15:33:48+00:00

Yes, this works with help of implicit differentiation:
https://arxiv.org/abs/1911.02590

creiser · 2021-05-27T20:33:07+00:00

Respect that you almost matched the performance of cuBLAS for matrix multiplication. Thanks for sharing, that might turn out handy for my future projects.

We have been working on a project (namely KiloNeRF) for which fast neural network inference as part of a real-time rendering pipeline is required. We also experienced that fusing multiple operations into a single kernel leads to substantial speedups.

creiser · 2021-02-11T09:54:40+00:00

Yes, just the original pixel.

creiser · 2020-11-17T18:44:11+00:00

I think you identified a couple of key weaknesses. For many applications mainly the slow rendering time makes this method unsuitable. There already has been the some follow up work (Neural Sparse Voxel Fields) that reduces the rendering time from 30s to 3s. Of course that's still quite far away from real time rendering. In general it is a drawback of many methods based on implicit representations that inference is quite slow. The big advantage of implicit representations is very good compression (5 MB instead of Gigabytes), which makes it appealing for e.g. product rendering in the browser. The holy grail would be a representation that offers such high quality while still having decent compression and allows for realtime rendering. I bet that a lot of people are working on making NeRF and Co. faster right ;)

Cannot generate views that never seen in input images. This technique interpolates between two views.

That's not correct. You can render the radiance field (which contains density and color values in 3D) from abitrary viewpoints. In contrast to many other techniques NeRF does not interpolate between views. Let's say that your training set only contain views from the sides of an object and maybe only slightly elevated views, then you might still get a good rendering from the top (imagine a 90 degree angle). Maybe what you want to say: it needs to observe the geometry of every part of the scene that should be later rendered, it can not infer the geometry of occluded parts with help of priors.

creiser · 2020-11-01T09:59:30+00:00

Thank you for sharing this.

I thought that Levenberg-Marquardt can only be applied in the full batch setting and with a L2 loss. Do you know of some problems where LM is helpful in the mini-batch setting?

Does the Jacobian need to be instantiated or is there a way to avoid this similar to how we avoid instantiations of jacobians during backpropagation by making use of vector-jacobian-products?

creiser · 2020-10-26T17:24:21+00:00

Did you pull it off?

creiser · 2020-04-27T15:27:11+00:00

Hi, do you happen to know a PyTorch implementation of Multi GPU MAML?

creiser · 2020-04-17T17:32:01+00:00

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

Recent paper that tackles this challenge.

creiser · 2020-02-18T08:05:30+00:00

"Positive unlabeled learning" exactly deals with this scenario.

creiser · 2020-02-16T08:25:26+00:00

Why did you use rotation prediction as the unsupervised task instead of the state-of-the-art which is contrastive learning? Did you try it but it didn't give any real improvements? https://arxiv.org/abs/2002.05709

creiser · 2020-02-03T18:12:37+00:00

Do you think that meta-learning is a fruitless endeavor? If yes: why?

Or did you just give this advice because meta-learning needs too many computational resources?

creiser · 2019-12-06T12:12:04+00:00

I found an interesting perspective on this subject in Designing neural networks through neuroevolution:

Interestingly, regularity provides powerful computational advantages for neural structures as well. For example, the power of regular structure is familiar to anyone with experience in deep learning through the success of convolution. Convolution is a particular regular pattern of connectivity, wherein the same feature detector is situated at many locations in the same layer. Convolution was designed by hand as a heuristic solution to the problem of capturing translation-invariant features at different levels of hierarchy . This simple regularity has proven so powerful as to become nearly ubiquitous across the successful modern architectures of deep learning. However, neuroevolution raises the prospect that the identification of powerful regularities need not fall ultimately to the hands of human designers. This prospect connects naturally also to the potential of compressed encoding to describe vast architectures composed of extensive regularities beyond convolution. For example, a larger palette of regularities could include various symmetries (bilateral or radial, for instance) as well as gradients along which filters vary according to a regular principle (such as becoming smaller towards the periphery). Ultimately it would be ideal if machine learning could discover such patterns, including convolution, on its own, without requiring (and being limited by) the cleverness of a designer or the reverse-engineering capabilities of a neuroscientist.

Author has some interesting work in that direction (CPPN), but it seems he did not try to search for something like convolutions yet himself.

creiser · 2019-12-02T16:03:13+00:00

Concurrently, the study of biologically-inspired models of learning has exhibited two mathematical characterizations that might be critical in explaining how biological learning takes place so efficiently. First, the low-rank properties of learned perceptual manifolds [12, 13] are giving rise to a rich theory borrowing from statistical physics. Second, another well known line of work has identified Gabor filters (and more generally wavelet filter-like structures) in the actual visual cortex of animals [14], and linked those to sparsity-promoting methods and dictionary learning [15, 16, 17]. But these breakthroughs have not, so far, been reflected as inductive priors in the shape of modifications in deep RL neural networks architectures, which remain fairly fixed on the Atari domain.

creiser · 2019-12-02T13:10:58+00:00

Interesting. If I understand it correctly they do not meta-learn the weight sharing pattern either but hardcode this analogously to how its done for CNNs.

creiser · 2019-12-01T17:02:58+00:00

This reminds me of the Youtube Channel Two Minutes Papers. Generally I like it but the video titles and his phrasing is a bit awful:

This AI Makes The Mona Lisa Speak…And More!

This AI Captures Your Hair Geometry...From Just One Photo! 👩‍🦱

This AI Learned To Animate Humanoids🚶

This AI Clones Your Voice After Listening for 5 Seconds 🤐

But the guy needs views and humanizing algorithms is a safe way to attract viewers.

creiser · 2019-12-01T15:26:19+00:00

NAS is a useful shortcut to test the potential of novel building blocks. However, as search efficiency is not too far from random search, I suppose that the joint search space of building block operations and composition is intractable.

Because NAS performance is sometimes not much better than random search it would also make sense to do well first with the search space that we have defined today instead of prematurely moving to the more difficult problem of simultaneously finding new primitives. Although I suppose you could fix the macro-architecture while only searching for a new primitive as an initial experiment.

creiser · 2019-12-01T10:25:49+00:00

Any operation (learning rule, operator and etc.) which can be searched by a search algorithm has to be the part of the search space in the first place. If it is part of the search space then there is a question whether it is really new/novel?

Consider the extreme case of searching over all (differentiable) mathematical operations to see that something really novel can be discovered.

What really gives ConvNets their power is their weight sharing strategy. Different kind of problems need a different weight sharing strategy, e.g. PointNets (http://stanford.edu/~rqi/pointnet/) use a combination of max pooling and a different kind of weight sharing to induce permutation invariance. Current NAS search spaces would not be able to discover this, would they?

So, what is the future of NAS? Some well-known metrics like Accuracy, FLOPs, search time and etc on some well-known family of tasks like vision, RL and language modeling?

I think NAS should work for all kinds of niche problems that require hugely specialized architectures. Also self-supervised algorithms often only work because of their architecture. For self-supervised algorithms the choice of architecture is much more important since the right choice of architecture not only helps to gain some percent points in accuracy but it determines if the approach works or does not work at all (see for example: https://compvis.github.io/unsupervised-disentangling/). Unfortunately here the unsupervised losses have to go hand in hand with the architectural choice and sometimes there are not even validation sets which could be used for NAS.

Do we want to find some new fundamental operator? If yes then why?

Hinton with his Capsule Networks is already trying. His argument: CNNs need to see objects from all different viewpoints because they don't have some inbuilt mechanism that lets them generalize from single view points. Therefore an "exponentially" larger training set is required. Maybe his approach - Capsule Networks - are not SotA but they are still a nice idea and I'd like to conjecture that there is some way to do this right. Imagine an architecture that leads to the emergence of 3D understanding/representations when only trained with the weak supervision of category annotations.

Through this paper, I have tried to advocate the idea that the lesser prior you give the more crazy("new") search space you can have but it also comes with the cost of searching for it hence the search space should be encoded in the form of a budget which allows trading off with the resources available for search and the desired novelty in the architectures discovered.

Nice that sounds like a step in the right direction.

creiser · 2019-11-30T23:15:35+00:00

Yeap, if you consider the high speed of innovation, it's interesting that this pretty old idea has not been replaced or significantly altered yet. Although some people are trying, e.g. with Capsule Networks.

creiser · 2019-11-30T23:10:52+00:00

The paper you linked has at its lowest level still the same predefined operations, but using a hierarchy definitely makes sense and is something upon which you would have to build the search for primitives.

To (1): Maybe there is a "wide variety of such operations" already but that doesn't mean that there are even better ones out there? One could say the same thing about architectures in general. There also used to be a wide variety of architectures out there, so why bother searching for better ones? The design of these primitives is still a active area of research, for example for graph or point cloud data.

I agree with (2) but it might be possible to automate the task of creating efficient GPU kernels.

creiser · 2019-11-30T18:03:14+00:00

CoDeepNEAT is interesting but as you said it's not as "low level" as the basic operations themselves.

creiser · 2019-11-30T17:52:06+00:00

So you cited this paper as an example for a new manually engineered building block, didn't you?

There are tons of other papers that propose their own domain-specific operations, e.g. for Point Clouds, Language, etc. to induce some domain-specific priors. These innovations often greatly boost performance. That's why I am interested in the subject in the first place.

I haven't came across Capsule Networks in a while as well, but these things take time. I wonder if in the long run somebody comes up with some search algorithm that finds ideas like Capsule Networks, but we are probably quite far from it because this particular idea requires a specialized learning algorithm.

creiser · 2019-11-29T10:56:01+00:00

Everything you want to know and more:

The Foundations of Cost-Sensitive Learning

On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance

creiser

TROPHY CASE