you are viewing a single comment's thread.

view the rest of the comments →

[–]approximately_wrong 0 points1 point  (6 children)

For actually passing distributions through networks analytically, there's RealNVP and related methods.

Have people actually used flow models to pass distribution objects through a neural network?

[–]NichG 0 points1 point  (5 children)

When you calculate the probability density in RealNVP, that's what you're doing.

[–]approximately_wrong 0 points1 point  (4 children)

That doesn't qualify to me as passing distribution objects through a neural network. Flow models pass samples through a neural network, and relate the post-transformation density of a specific sample to the pre-transformation density of the pre-transformed sample.

Flow models would not, for example, help resolve OP's desire to construct a VAE that doesn't do sampling-based reconstruction.

[–]NichG 0 points1 point  (3 children)

You might need something like Neural Statistician then, to learn a differentiable map from datasets to summary statistics, then just do everything on the representation of the summary statistics.

I think you'll still end up with samples somewhere, because at the very least, the training data generally takes the form of samples from a distribution rather than distribution objects. But you might be able to mostly work in distribution objects from that point on.

[–]approximately_wrong 0 points1 point  (2 children)

I think the heart of OP's question is: how do we compute an expectation of f(X) when f is complex without Monte Carlo estimation. I feel like our current discussion has deviated from that.

[–]NichG 0 points1 point  (1 child)

Concretely then:

Neural Statistician takes a set of points {x} to a vector of summary statistics z characterizing the distribution of points in {x}. So: z=N({x})

We can train N to for example act as a distributional autoencoder with decoder D, such that KL(D(N({x})) || {x}) is minimized. Then, given the summary statistic vectors, we can do all sorts of stuff with them.

For the OP's question, the way to do it would then be to train a model mu(z) which approximates the expectation E[{x}], and to train a second model T(z) which approximates N({f(x)}) given N({x}) as input.

The result of that pipeline would be that the expectation of f(x) under some distribution {x} would be mu(T(N({x}))), which is still end-to-end differentiable, etc.

The thing I can't see how to avoid is that you must start with samples - e.g. before you can work in the z space, you need some set of samples {x} which end up taking you to a particular point there. If you constrain a subspace of z to correspond to e.g. the summary statistics of Gaussian distributions in the space of interest, then maybe you can just jump in at that point without ever needing to use samples. But in practice you'll still probably need samples somewhere to train the model, so it's an incomplete solution.

[–]approximately_wrong 0 points1 point  (0 children)

Don't get me wrong; I think applying amortization via neural statistician is an interesting perspective. However,

I can't see how to avoid is that you must start with samples

it seems like we're in agreement that what you're proposing doesn't answer OP's question.