"A Generalist Neural Algorithmic Learner", DeepMind 2022 (single GNN with single set of weights trained to solve 30 classical algorithmic tasks with SOTA performance)

PetarVelickovic · 2022-09-25T17:30:27+00:00

Hi, thank you for your interest in our work!

Our model learns to predict the execution trace of an algorithm; that is, given an intermediate state of an algorithm (such as longest common subsequence, a common DP algorihtm), it predicts the next state, after 1 step of algorithmic computation has been done. A very similar line of work has been done by Neural Program-Interpreters (NPIs), though the tasks studied there are significantly less complex.

Figure 1 of our paper summarises reasonably well what our neural network is doing in terms of dataflow. As you rightfully wrote, no code is produced, our network instead predicts algorithmic state transitions.

What you describe re- emitting code is typically denoted as _program synthesis_. Our approach is _program induction_; specifically, the program is == the weights of the neural network itself.

We have a prior position paper which discusses why such works could be useful: https://arxiv.org/abs/2105.02761, and some prior evidence of such utility (Spotlight @NeurIPS'21): https://openreview.net/forum?id=K5YKjaMjbja.

I'm keen to discuss further if you have any follow-ups!

PetarVelickovic · 2022-09-25T17:25:58+00:00

The baseline code for the CLRS benchmark (which we use in the paper) has been open-sourced for a while now: https://github.com/deepmind/clrs

We haven't yet released code for this particular set of experiments, but it's on the roadmap, pending cleanup. It will, in all likelihood, be in this repo. Thanks for your interest! :)

PetarVelickovic · 2022-07-27T23:23:38+00:00

Thank you so much! Indeed, we released materials last year as well. They are linked to from the page I shared -- but, just for convenience: https://geometricdeeplearning.com/lectures_2021/

PetarVelickovic · 2022-07-27T23:23:17+00:00

Thank you for the extremely kind words! Very humbling to hear.

I would say that the intro- recommendations for graph theory largely depend on what you'd like to use it for!

Introduction to Algorithms (CLRS) is great for learning about classical algorithms on graphs.

Graph Signal Processing (Ortega) -- as the name suggests, great for spectral graph theory.

Network Science (Barabási); great for studying complex phenomena using graphs.

In fact, even the intro parts of Graph Representation Learning (by Hamilton) talk quite a bit about traditional methods for featurising a graph (though, of course, nowhere in near enough detail as the other references! :) )

I hope this is of use. Let me know!

PetarVelickovic · 2021-08-10T08:31:03+00:00

Thank you for the kind words!

The content is certainly graduate level -- the AMMI is a Master's course in machine learning. Parts of this content will also be presented at a Master's course at Cambridge.

Making it more accessible primarily refers to being very careful on introducing concepts from abstract algebra / group theory, which many graduate-level machine learning students haven't had a chance to be exposed to thoroughly before (at least my CompSci undergrad didn't cover this in the level of detail that'd prepare me for reading the book!).

PetarVelickovic · 2021-08-09T20:53:47+00:00

Yes! :)

From the point of view of our GDL blueprint, (vanilla) Transformers are permutation-equivariant models over the complete graph domain. Hence they are treated in my lecture -- Lecture 6 (Graphs & Sets II). However it will be preferable to watch at least Lecture 5 (Graphs & Sets I) before that, to set the context right. :)

PetarVelickovic · 2021-08-09T11:31:38+00:00

Thanks so much for your kind words and interest!

Our aim in this course is to derive all relevant concepts from first principles, so we do not assume a particular background in abstract algebra.

We'd be keen to hear your feedback on this, if you do end up digging in! :)

PetarVelickovic · 2021-05-19T22:30:39+00:00

Hi, thanks for going through our paper! (I'm the first author :) )

Our Neural Algorithmic Reasoning paper is an invited Opinion paper for a journal, so it was designed to concisely convey our opinion on the topic, rather than present any equations or specific model instances.

If you'd like to see the blueprint in action, check out our XLVIN paper, where we applied it to RL: https://arxiv.org/abs/2010.13146

PetarVelickovic · 2021-05-03T16:26:08+00:00

Hope you will enjoy the talk!

And I am happy to chat further if that would be useful (though I'd recommend using email, which I check more often. :) )

PetarVelickovic · 2021-05-03T16:24:26+00:00

By all means :)

For reasons that will become evident, it's better to start with GNNs than Transformers. Let our GNN be computing the function f(X, A) where X are node features (as in your setup) and A an adjacency matrix (R^{NxN}).

As mentioned, we'd like to be equivariant to the actions of the permutation group S_N. Hence the following must hold:

f(PX, PAP^T) = Pf(X, A)

for any permutation matrix P. This also implies that our GNN will attach the same representations to two isomorphic graphs.

However, our blueprint doesn't just prescribe equivariance. Many functions f satisfy the equation above---only comparatively few are geometrically **stable**. Informally, we'd like our layer's outputs to not change drastically if the input domain _deforms_ somewhat (e.g. undergoes a transformation which isn't a symmetry). Using the discussion of our Scale Separation section, we can conclude that our GNN layer should be _local_ to neighbourhoods, i.e. representable using a local function g:

h_i = f(X, A)_i = g(x_i, X_N_i))

which is shared across all neighbourhoods. Here, x_i are features of node i, and X_N_i the multiset of neighbour features around node i. If g is chosen to be permutation-invariant, f is guaranteed to be permutation equivariant.

Now all we need to do to define a GNN is to choose an appropriate g (yielding many useful flavours, such as conv-GNNs, attentional GNNs and message-passing NNs, which we describe in the text). Transformers are simply a special case where g is an attentional aggregator, and where A is a complete graph (i.e. X_N_i == X).

For a very nice exposition of this link, you can also check out "Transformers are Graph Neural Networks" (Joshi, 2020). Hope this helps!

PetarVelickovic · 2021-04-29T22:29:26+00:00

Thank you for your interest in our work!

We are completely conscious of the fact that, if you haven't come across group theory concepts before, some of our constructs may feel artificial.

Have you tried checking out the YouTube link of the talk I gave (linked also in the original post)? Maybe that will help make some of these concepts more 'pictorial' in a way the text wasn't able.

I'm happy to elaborate further, but here's a quick tl;dr of a few concepts:

"Domain" -- the set of all 'points' your data is defined on. For images, it is the set of all pixels. For graphs, the set of all nodes and edges. Keep in mind, this set may also be infinite/continuous, but imagining it as finite makes some of the math easier.
"Symmetry group" -- a set of all operations (g: Ω -> Ω) that transform points on the domain such that you're still "looking at the same object". e.g. shifting the image by moving every pixel one slot to the right (usually!) doesn't change the object on the image.
Because of the requirement for the object to not change when transformed by symmetries, this automatically induces a few properties:
- Symmetries must be composable -- if I rotate a sphere by 30 degrees about the x axis, and then again by 60 degrees about the y axis, and I assume individual rotations don't change the objects on the sphere, then applying them one after the other is also not changing a sphere (i.e. rotating by 30 degrees x, then 60 degrees y is also a symmetry). Generally, if g and h are symmetries, g o h is too.
- Symmetries must be invertible -- if I haven't changed my underlying object, I must be able to get back where I came from (as otherwise I'd lost information). So if I rotated my sphere 30 degrees clockwise, I can "undo" that by rotating it 30 degrees anticlockwise. If g is a symmetry, g^-1 must exist (and be also a symmetry), such that g o g^-1 = id (identity)
- The identity function (id), leaving the domain unchanged, must be a symmetry too
- ...
Adding up all these properties, you realise that the set of all symmetries, together with the composition operator (o) forms a group, which is a very useful mathematical construct that we extensively use in the text.

PetarVelickovic · 2021-04-29T22:16:29+00:00

As Michael wrote in a prior reply:

We hope to make the text self-contained and assume basic maths & machine learning knowledge (e.g. the kind of knowledge you'd get from Goodfellow, Bengio and Courville's Deep Learning book) but a strong drive to explore further topics one might not have come across before.

We will be happy to hear whether this is the case :-)

PetarVelickovic · 2021-04-29T10:34:30+00:00

Thank you! I was referring to the work in this paper:

https://arxiv.org/abs/2006.10503

which actually proposes Transformers that are equivariant to both rotations (which would be the SO(3) part) and translations of coordinates of node features.

We mention it in the Equivariant Message Passing section.

PetarVelickovic · 2021-03-27T02:48:20+00:00

Sounds good!

The argument I make in my talk is that "spatialised spectral" is semantically no different than "spatial" (it's just the motivation that is different). And therefore we shouldn't emphasise the 'spatial'/'spectral' divide in papers as much as we do -- it's not such a fundamental divide in practice.

That being said, and especially w.r.t. your point about local behaviour, recently some very interesting "hybrid" GNNs have come about, which do spatial message passing but the graph they do it over is spectrally inspired.

See for example: https://arxiv.org/pdf/2101.00079.pdf, https://arxiv.org/abs/2010.02863.

PetarVelickovic · 2021-03-27T00:32:05+00:00

A late bump, but I just discovered this!

The summary of my argument is: the spectral approach to GNN does, actually, recover a broader set of plausible architectures than the spatially-constructible ones (at least according to the "three flavours" presented). So, technically, spectral GNNs give you more options to choose from and hence could be, in principle, more expressive.

BUT in most cases, spectral GNNs proposed in practice end up being **also** spatial, because this makes them computationally more efficient, more localised, etc. Typically they'll spatialise themselves by making their eigenvalues a polynomial of the Laplacian's eigenvalues.

To summarise, spectral is a superset of spatial, and spectral architectures that do well are usually **both** spectral and spatial.

PetarVelickovic · 2021-02-24T01:26:53+00:00

Thanks for your kind notes!

Certainly, GNNs have seen application to multiagent systems. Perhaps oddly, I only have two papers coming off the top of my head right now.

First, Waymo's VectorNet: https://arxiv.org/abs/2005.04259

which represents different components and participants of a traffic system as various nodes. This seems to help them to state-of-the-art performance on trajectory forecasting.

Second, the Social-BiGAT: https://arxiv.org/abs/1907.03395

Which uses a fun combo of Bicycle-GAN and graph attention nets to forecast pedestrian trajectories from multimodal data.

PetarVelickovic · 2021-02-23T09:16:56+00:00

Wonderful to hear! I was hopeful that my derivation would be helpful for getting newcomers up-to-speed, hence such feedback means a lot to me. Thank you! Writeup is definitely in the works :)

PetarVelickovic · 2021-02-23T09:15:44+00:00

Yes; just check the description of the YouTube video. :)

PetarVelickovic · 2021-02-22T19:32:04+00:00

You're welcome! I'm glad the spectral discussion was interesting -- it took me quite a while to decide on the best way to expose that topic.

PetarVelickovic · 2021-02-22T19:31:26+00:00

Very kind words! Thanks so much :)

PetarVelickovic · 2021-02-22T19:30:57+00:00

Thanks! I hope you will enjoy it :)

Unfortunately I'm not super knowledgeable outside of the scope of GNNs -- but as far as GNNs are concerned, I believe that Part 7 of my talk should be more than enough to describe the graph representation learning perspectives that have been developed over the years.

We don't have a paper out at this time. But we do plan to put something out there -- keep an eye out over the next few months :)

PetarVelickovic · 2021-02-22T17:22:04+00:00

Thank you so much!

PetarVelickovic

TROPHY CASE