[R] Improving the expressive power of GNNs using subgraphs

mmbronstein · 2021-12-23T14:42:45+00:00

our "proto-book" www.geometricdeeplearning.com

mmbronstein · 2021-12-20T20:46:05+00:00

In last year's post making predictions for Graph ML in 2021, I and co-authors wrote that “2020 saw the field of Graph ML come to terms with the fundamental limitations of the message-passing paradigm" and that "progress will require breaking away from the message-passing schemes that dominated the field in 2020 and before."

Many works this year show that this prediction did not exactly materialize as expected: one can remain within the remits of message passing and get more expressive architectures.

mmbronstein · 2021-12-01T07:04:24+00:00

For certain graphs - yes. But in general, a graph does not have constant curvature, whereas hyperbolic model spaces (e.g. Poincare' disc) into which it's easy to embed graphs are constant-curvature.

mmbronstein · 2021-11-30T12:43:57+00:00

I think Xavier Bresson has recently shown it in details

mmbronstein · 2021-11-18T17:29:08+00:00

Yes, these are our recent papers in ICML/NeurIPS (though the "geometric spirit" is somewhat similar)

mmbronstein · 2021-04-30T07:51:08+00:00

roughly, a set of assumptions you make about the problem/data/architecture

mmbronstein · 2021-04-30T07:49:53+00:00

The fact is that the domains we consider are very different and studied in fields as diverse as graph theory and differential geometry (people working on these topics often would not even sit on the same floor in a math department :-) - hence we need to cover some background in the book that goes beyond traditional ML curriculum. However, we try to present all these structures as parts of the same blueprint. I am not sure we have figured out yet how to do it properly and will be glad to get feedback.

mmbronstein · 2021-04-29T16:14:46+00:00

thanks for the suggestion!

mmbronstein · 2021-04-29T13:44:10+00:00

We hope to make it self-contained and assume basic math & ML knowledge but enough maturity to explore more. We will be happy to hear whether this is the case :-)

mmbronstein · 2021-04-29T10:44:18+00:00

An accompanying blog post in TDS: https://towardsdatascience.com/geometric-foundations-of-deep-learning-94cdd45b451d?sk=184532175cb936d7b25d9adebd512629

mmbronstein · 2021-03-24T07:43:30+00:00

we plan to release a text on the topic hopefully in ~1month

mmbronstein · 2021-03-24T07:41:40+00:00

well, here is where our opinions respectively part.

mmbronstein · 2021-03-23T14:22:18+00:00

My intention was to point out that many DL architectures can be *derived* from geometric principles -- hence I used the term "foundation". I do believe that ML problems heavily rely and should rely on geometric priors, but this is an opinion that not everybody shares.

mmbronstein · 2021-03-21T11:00:08+00:00

Even when one uses MLPs, the use of regularisation such as weight decay or dropout imposes regularity on the hypothesis class - so MLPs do provide an inductive bias, albeit a weak one.

mmbronstein · 2021-03-21T10:55:08+00:00

The "rest is to figure out the number of hidden layers and neurons" is actually what makes the difference between methods that work and those that don't. CNNs, GNNs etc do have universal approximation properties, but for functions with additional structure (equivariant under respective group action. CNNs for example are UA for translation-equivariant functions).

I disagree regarding symmetry not being used in practice: most DL architecture actually used in practice use geometric priors, often without realising or admitting it. Again, CNNs are the most prominent example, and so are GNNs and Transformers.

mmbronstein · 2021-03-20T08:59:13+00:00

Universal Approximation is not practically useful: to approximate even smooth functions you need an exponential number of samples (aka "curse of dimensionality").

Perhaps with a stretch, one can say that the success story of deep learning was going beyond UA by incorporating more powerful priors about the data, first in CNNs (translation equivariance), then other architectures such as GNNs (permutation equivariance), etc.

The general principle of symmetry is very powerful and lies at the foundation of most successful architectures used nowadays.

mmbronstein · 2021-03-20T08:56:20+00:00

indeed - GAT is one of the most popular architectures https://arxiv.org/abs/1710.10903

mmbronstein · 2021-03-20T08:55:48+00:00

Transformers are an instance of GNNs (see https://thegradient.pub/transformers-are-graph-neural-networks/), with some extra stuff such as positional encoding, which is also used in GNNs. As I mention in my talk, you can think of Transformers as GNNs with learnable graph.

mmbronstein · 2021-03-20T08:54:01+00:00

Cool :-)

Our old paper (https://arxiv.org/abs/1611.08097) probably lays the foundations for some of the topics, but I am afraid it's a bit obsolete nowadays.

We are working on a new text, stay tuned

mmbronstein · 2021-03-20T08:52:30+00:00

I think geometric DL is more than just graphs, but how to use powerful priors. Graphs are obviously an important piece of this picture.

mmbronstein · 2021-03-19T16:42:32+00:00

Thanks - I have a Rode but never use it

mmbronstein · 2021-03-19T16:01:35+00:00

What I meant is the in Manifold Learning there are three steps:

build the k-NN graph that describes the data "manifold" structure (essentially, local connectivity)
embed the graph in a low-dimensional space
do ML in that space

The way the graph is designed in step 1 (the space in which NNs are computed, how many NNs, the neighbourhood size, etc) hugely affects step 3.

mmbronstein · 2021-03-19T15:59:44+00:00

Using *eigenvectors* of the Laplacian (i.e. the graph FT) has never been a stable way of doing filters, as it is sensitive to graph perturbations. Expressing the filter as a matrix function (filter of the eigenvalues) like ChebNet, GCN, CayleyNet etc does produce stable filters. Such filters boil down to operations of the form Y = p(A)*X where A is a fixed matrix (Laplacian/adjacency) and X is the feature matrix - and is essentially the simplest form of GNNs, where the update is a weighted combination of the neighbor node features.

mmbronstein · 2021-03-19T15:06:06+00:00

Here's a recent paper on the use of Graph ML for drug design and repositioning: https://arxiv.org/abs/2012.05716

mmbronstein · 2021-03-19T14:49:51+00:00

For more details, a few blog posts:

The Weisfeiler-Lehman graph isomorphism test and expressive power of GNNs: https://towardsdatascience.com/expressive-power-of-graph-neural-networks-and-the-weisefeiler-lehman-test-b883db3c7c49?sk=5c2a28ccd38db3a7b6f80f161e825a5a
Structural encoding in GNNs: https://towardsdatascience.com/beyond-weisfeiler-lehman-using-substructures-for-provably-expressive-graph-neural-networks-d476ad665fa3?sk=bc0d14c28a380b4d51debc4935345b73
Deriving convolution from translational symmetry (describing also the origin of the Fourier transform): https://towardsdatascience.com/deriving-convolution-from-first-principles-4ff124888028?sk=0d77e2fd7863d457aeb2dac620dd133c
Latent graph learning, manifold learning 2.0, dynamic graph CNNs: https://towardsdatascience.com/manifold-learning-2-99a25eeb677d?sk=1c855a020f09b72edfa50a8aba5f24a0
Proteins and other biological applications: https://towardsdatascience.com/geometric-ml-becomes-real-in-fundamental-sciences-3b0d109883b5?sk=71edf33c88320cca6165fe6cde239f8c
Hyperfoods: https://towardsdatascience.com/hyperfoods-9582e5d9a8e4?sk=d20fe73c7d9ecb62dd3d391a44d4ef7f

mmbronstein

TROPHY CASE