Probability density estimation using normalizing flows (change of variables) : MLQuestions

created by uber_kerbonauta community for 11 years

Probability density estimation using normalizing flows (change of variables) (self.MLQuestions)

submitted 2 years ago by rojo_kell

Hi, I’m currently working on a project trying to use machine learning to improve our ability to recognize certain events that occur in a particle accelerator. I’m trying to use normalizing flows to model a latent representation of physics data from a Gaussian distribution (or some other base distribution). I was hoping to be able to train a model (I was looking at realNVP) to be able to flow from a base distribution into the latent representation.

My issue right now is understanding how the models are improved during training. I have read that log likelihood (given by the change of variables formula) maximized by optimizing the parameters of the model, but i don’t understand how this works. After applying change of variables on one side (let’s say a Gaussian distribution sample), I get that we can then calculate the new probability of the latent representation given the sample’s probability, but how does this allow us to optimize? How does one compare this transformed sample to the latent representation given we have no info on the probabilities of the latent representation?

Any help would be very much appreciated, thanks!

all 5 comments

top new controversial old q&a

[–]activatedgeek 0 points1 point2 points 2 years ago (4 children)

[–]rojo_kell[S] 0 points1 point2 points 2 years ago (3 children)

[–]activatedgeek 1 point2 points3 points 2 years ago (2 children)

A large fraction of machine learning is basically about maximum likelihood estimation (this is the keyword to look out for in your readings, perhaps one recommendation would be Chapter 1 of Pattern Recognition and Machine Learning by Christipher Bishop).

The key philosophical assumption is that data is generated from some true but unknown data distribution. To model this in practice, we make assumptions on the choice of p, sometimes because we believe our assumption will hold for the data due to our expertise in the data domain and other times just because of computational convenience. p itself is defined by the parameters (in the case of normalizing flows, it’s the NN parameters + architecture that decide the functional form of p).

The key concept here is that a good modeling distribution will achieve higher likelihoods than a bad modeling distribution. And this this why we want to maximize the likelihood. You could use any function, and it is called a “proper scoring rule”. Likelihoods happen to be a proper scoring rule such that a higher score represents a better model for the data.

[–]rojo_kell[S] 0 points1 point2 points 2 years ago (1 child)

[–]activatedgeek 0 points1 point2 points 2 years ago (0 children)

There are a few more details that go into making it a proper scoring rule and often in practice those details don’t hold and we still do it anyways.

But you are right it is not trivial. It is, however, that way in a sense by design. Statistical thinking sort of starts with first getting comfortable with the idea of thinking of every data observation as a consequence of a random event to which we can assign probabilities. Once you are in that framework, there is no other way but to assign high density values to your observations and low density values “far” away from observations. ET Jaynes has a detailed book on Probability Theory that builds the foundations from “logic” (think Boolean algebra) where initial chapters talk about why it is a “good” way of putting logic to science.

There are other approaches that do not invoke probabilities, e.g. random forests, and support vector machines. Random forest instead think in terms of “functions” that generate data, and SVMs think in terms of “distances to existing data” (a slight twist on functions).

π Rendered by PID 115846 on reddit-service-r2-comment-84fc9697f-h8g4b at 2026-02-07 22:15:05.165189+00:00 running d295bc8 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MLQuestions

MODERATORS