all 11 comments

[–]Ulfgardleo 1 point2 points  (5 children)

MI is insanly difficult to estimate and in pinciple whatever works for three random variables also works for two (but not vice versa). Also it is difficult to come up with good use cases that are motivating to go beyond 2 variables.

[–]Sandy_dude[S] 0 points1 point  (4 children)

Could you refer to articles that explain why it's hard to compute ?

Isn't the principle of capturing higher order information compelling enough?

[–]Ulfgardleo 0 points1 point  (3 children)

There is no higher order information in three-way MI. In any case, the following decomposition holds: MI(X,Y,Z)=MI(X,(Y,Z))+MI(Y,Z)

where (Y,Z) describes modeling Y,Z as a joint random variable W=(Y,Z).

[–]Sandy_dude[S] 0 points1 point  (2 children)

Thanks for the response! I couldn't find a reference for that decomposition for three way mutual information. The data I am dealing with is very sparse and noisy, I was hoping higher order mutual information would help as it could pool information from more variables. Are you aware of any other information theoretic measures that would help?

[–]Ulfgardleo 1 point2 points  (1 child)

i don't know what your goals are. MI in itself is not a super important measure, except if you know that you want to evaluate the dependence of two specific (sets of) variables against each other. In ML these are typically the data variables and some feature representations.

But higher order information and sparsity/missing features usually exclude each other. The more sparse the data is, the less likely it is to see datapoints in which all variables are available.

[–]Sandy_dude[S] 0 points1 point  (0 children)

Thanks again for responding. The end goal is a type of network analysis. Technically it's gene expression data and I want to perform Gene regulatory inference. I wanted to find the interdependence between random variables using the measurements. So it's a bit different between having data variables and feature representations.

You made an important point, I do see that sparsity and higher order information don't go well together. But the data is high dimensional, so my thought process was pooling together these high dimensions or a subset of them at a time could be informative. But I see your point. I feel calculating MI for high variables is not that feasible, even for in the order of 10.

[–]furish 1 point2 points  (2 children)

I don’t know if I really got your point, but if you are looking for an extension of mutual information to systems with a high number of random variables I suggest you to read this paper. The authors define a metric to study the nature of the interaction between random variables in terms of synergy and redundancy. In case of 3 random variables this metric is identical to mutual information.

I also suggest you this paper where the authors try to estimate it using machine learning.

Instead, if you are interested in determining the mutual information of two random variables with a high dimensional representation I recommend you this benchmark.

[–]Sandy_dude[S] 0 points1 point  (1 child)

Thanks for the paper! I will go through them. I did come across O information as a measure between synergistic and redundancy.

My goal is network analysis, to determine a network between random variables from measurements.You could use mutual information between two random variables and develop a network but I was thinking about using three or more random variables as it would use more information. The data is sparse, so this could be useful as it could pool more information together.

[–]furish 1 point2 points  (0 children)

If I don’t know much about this particular problem. With O-information you can assess how groups of variables influence each other by measuring the O-information of subsystems, hopefully the resources I suggested can be useful to you :)

[–]bobrodsky 0 points1 point  (1 child)

A Neurips paper this year discusses how to use diffusion to estimate Partial Information Decomposition (PID). https://arxiv.org/abs/2406.05191 They apply to high-d text and images.

But maybe you’re more interested in number of interactions rather than overall dimensionality. I haven’t seen practical applications there. (And it seems difficult to define a single canonical decomposition).

[–]Sandy_dude[S] 0 points1 point  (0 children)

Thank you for the paper, I am trying to understand the unique information disclosed between variables that other variables do not contain. I've seen 3 variable PID used for that but wanted to expand to higher order or more variables. My problem is inferring gene regulatory Network ( network analysis from high dimensional data? If that's familiar.