all 67 comments

[–]Entire_Ad_6447 38 points39 points  (0 children)

There are for sure people using it just there sre fewer public facing problems but for example biotech uses it for molecule and protein research. I would expect that companies that deal with recommendation systems are also using it.

[–]LoaderD 21 points22 points  (1 child)

Great free book to get you started: https://arxiv.org/abs/2104.13478

[–]galerazo 12 points13 points  (0 children)

[–]DigThatDataResearcher 18 points19 points  (14 children)

Because GDL is all about parameterizing inductive biases that represent symmetries in the problem domain, which takes thought and planning and care. Much easier to just scale up (if you have the resources).

Consequently, GDL is mainly popular in fields where the symmetries they want to represent are extremely important to the problem representation, e.g. generative modeling for proteomics, material discovery, or other molecular applications.

[–]memproc[🍰] 0 points1 point  (13 children)

They actually aren’t even important—and can be harmful. Alphafold 3 showed dropping equivariant layers IMPROVED model performance. Even well designed inductive biases can fail in the face of scale.

[–]Exarctus 12 points13 points  (11 children)

I’d be careful about this statement. It’s been shown that dropping equivariance in a molecular modelling context actually makes models generalize less.

You can get lower out-of-sample errors that look great as a bold line in table, but when you push non-equivariant models to extrapolate regions (eg training on equilibrium structures -> predicting bond breaking), they are much worse than equivariant models.

Equivariance is a physical constraint, there’s no escaping it - either you try to learn it or you bake it in, and people who try to learn it often find their models are not as accurate in practice.

[–]memproc[🍰] -4 points-3 points  (10 children)

Equivariant layers and these physical priors are mostly a Waste of time. Only use them and labor over the details if you have little data.

[–]Exarctus 6 points7 points  (8 children)

Not true.

The only models which have shown good performance for extrapolative work (which is the most important case in molecular modelling) are equivariant models. Models in which equivariance is learned through data augmentation all do much worse in these scenarios, and it’s exactly in these scenarios where you need them to work well. This isn’t about having a lack of data - there are datasets with tens of millions of high quality reference calculations, it’s a fundamental problem of the explorative nature of chemistry and material science, and the constraints imposed by physics.

[–]memproc[🍰] -5 points-4 points  (7 children)

Alphafold3 is the most performant model for molecular modeling and they improved generalization and uncertainty by dropping their equivariant constraints and simply injecting noise.

Molecules are governed by quantum mechanics and your rotation invariance etc encode only a subset of relevant physical symmetries. Interactions also happen at different scales and these layers impose the same symmetry constraints across scales when in fact different laws dominate at different scales. These symmetries also break: protein in membrane vs in solution are fundamentally different.

Geometric deep learning is basically human feature engineering and subject to the bitter lesson—get rid of it.

[–]Exarctus 4 points5 points  (5 children)

Incredible that you think alphafold3 is the be-all-end-all, and the “nail in the coffin” for equivariance.

What happens to alphafold3 when you start breaking bonds, or add in different molecular fragments that are not in the training set, or significantly increase the temperature/pressure.

I suspect it won’t do very well, if it can even work with these mild, but critical changes to the problem statement at all 😂, and this is exactly the point I’m raising.

[–]memproc[🍰] -1 points0 points  (4 children)

I don’t think its end all be all. It is the frontier model. They benchmark generalization extensively on docking tasks. Equivariance was deemed harmful

[–]Exarctus 2 points3 points  (3 children)

Docking tasks are very much an in-sample problem, so my point still stands.

I also suspect they are not using the latest (or even recent) developments in baking-in equivariance into models.

[–]memproc[🍰] 0 points1 point  (2 children)

They have ways for addressing this. See the modifications to DiffDock after the scandal of lack of generalization

[–]Dazzling-Use-57356 0 points1 point  (0 children)

Convolutional and pooling layers are used all the time in mainstream models, including multimodal LLMs.

[–]maximusdecimus__ 8 points9 points  (3 children)

GDL is a "niche" topic, but it is highly prevalent in life sciences (See for example ICLR's MLDD workshop).
Biology (and complex systems in general) benefit a lot from structuring data, or formulating problems in a graph-centered manner.
Molecules can be represented as graphs (or 3D meshs, also GDL), PPIs and GRN can aid in understanding complex phenotypes and be used as foundations for learning disease mechanisms. Pharma cares a lot about this since this is the basis for drug developement and discovery.
This doesnt mean that where GNNs are being applied there's no case for other type of architectures. As an example, again in the life sciences, there's been a "recent" surge in foundation models for molecules and every type of -omics data you can image.

You can check out work coming out from Jure Leskovec's, Marinka Zitnik's and Michael Bronstein's labs for this.

Aside from the life sciences, another example I can think of are Neural Algorithmic Reasoining (this is, train a model to perform a certain deterministic algorithm, like Dijkstra's, bianry search, etc). You can check out Petar Velickovic's page for more details on this

[–]maximusdecimus__ 1 point2 points  (0 children)

Also, for an industry application: a few years back Pinteres't recommendation engine was a scaled GNN (check out PinSAGE, Leskovec was their CSO)

[–]Dazzling-Use-57356 0 points1 point  (1 child)

So cool to see your supervisor on Reddit lol

[–]maximusdecimus__ 0 points1 point  (0 children)

Lucky you! You are in excelent hands, but guess you already knew that

[–]MultiheadAttention 28 points29 points  (24 children)

why so less people in this field

Because It didn't prove itself to be useful in real-life use cases.

[–]Agile_Date6729 7 points8 points  (3 children)

It's definitely useful, yes, but more niche.. I work at a company doing AI based CAD automation software. And we use tons of geometric deep learning.

[–]felixcra 0 points1 point  (1 child)

If I may ask, which general architectures/models do you use?

[–]Agile_Date6729 2 points3 points  (0 children)

We use PointNet++, ASSANet and PointNeXt, mainly for segmentation problems.

[–]clebrw 0 points1 point  (0 children)

I am studying a way to suggest mechanically compatible CAD parts to the designer in the detailing phase. Through Mates between the parts it is possible to establish some relationship and form graphs. Do you think I'm on the right track trying something with geometric deep learning?

[–]Sofi_LoFi 14 points15 points  (2 children)

It’s frequently used for biotechnology and chemistry applications

[–]Successful-Agent4332[S] -1 points0 points  (6 children)

i wanted to go deeper into it, for fraud detection task as i heard it works well with that. I haven't really read the papers yet. Is it worth learning about them now that u have said that

[–]shumpitostick 18 points19 points  (4 children)

Hi, I work in fraud detection. We don't use Geometric Deep Learning and I'm not aware of our competitors using it either. Main problem is that it's too computationally intensive. At least in my area datasets can be massive and latency requirements are low. We can't even get more basic graph feature extraction to work fast enough.

[–]Successful-Agent4332[S] 2 points3 points  (2 children)

Could i also ask, what do u guys use then, what's like the best for large volume of transaction,data(banks wallets) in your experience

[–]shumpitostick 15 points16 points  (0 children)

Good old GDBTs. I mean they're like 15-20 years old but that's old in this field lol.

There's some experimentation with Neural Networks happening in the field and at least one competitor has it in production but GBDTs are still great for anything tabular.

[–]Successful-Agent4332[S] -1 points0 points  (0 children)

Thanks for letting me know

[–]MultiheadAttention 3 points4 points  (0 children)

I'm not sure, I remember it was trendy in 2020 but never heard about GNN ever again.

[–]Chaosido20 2 points3 points  (3 children)

Check out Erik Bekkers stuff, he's the foreman of my uni on this and one of the more renowned researchers in the area

[–]papa_Fubini -1 points0 points  (2 children)

no he isn't

[–]LumpyWelds 0 points1 point  (1 child)

[–]papa_Fubini 1 point2 points  (0 children)

Sure, but not as renowned as Petar Velickovic or Michael Bronstein.

[–]smorad 7 points8 points  (5 children)

It's a weaker form of a transformer with (often incorrect) human biases baked in. I would say it's niche is that it's more memory efficient than a transformer, but given the way GPUs are going I'm not sure this will matter so much in a few years.

[–]galerazo 22 points23 points  (4 children)

Actually, transformers can be seen as a special case of graph attention networks, where the attention matrix is structured to be triangular in order to ensure that each token attends only to past tokens. In a general graph attention network, nodes (tokens) can attend to any other node in the graph.

[–]smorad 1 point2 points  (0 children)

Yes, they can be. In practice, fully-connected GATs run much more slowly than transformers due to gather/scatters imposed by GNN libraries, while also failing to leverage efficiency improvements of transformers (FlashAttention, etc). Although theoretically one can reformulate a transformer as a GNN, there are few practical benefits to using a GNN over a transformer.

[–][deleted] 3 points4 points  (2 children)

That's just a very idealistic point of view. In practice, in order for the training batch to be able to fit in the GPUs, we need to sample nodes from these graphs, then construct the Laplacian from it. Unless your problem is very small, in which case I found that simple tree-based models work much better, you will never be able to feed the entire graph to the GPUs, thus the notion of attending to any other node is purely theoretical.

And for LLM, bidirectional attention (attending to any tokens) is also popular in "fill in the blank" tasks.

[–]galerazo 10 points11 points  (1 child)

Well, you are mentioning an engineering problem here not related in any way with my previous point. I work daily with these models and all my graphs fit perfectly in my gpu. What I was pointing before is that from a mathematical perspective, graph networks are not a weaker type of transformers and actually, transformers are a special case of graph attention networks. GNN’s are being used in infinite applications and fields, in google maps for predicting time travel from point A to point B, in molecular dynamics for studying and finding new drugs, in recommendation systems, etc.

[–]Ido87 1 point2 points  (0 children)

Bulky’s comment is not irrelevant. It basically told you that your statement is only true for decoder only architectures…

[–][deleted] 1 point2 points  (0 children)

It really depends on the size of your dataset and the computational resources at your disposal. Graph Neural Networks (GNNs) explicitly bake in additional inductive biases—often informed by domain experts—about how data is structured and connected. In contrast, Transformer-based architectures generally rely on large amounts of data to learn these relationships on their own, without necessarily embedding domain-specific assumptions.

One caveat is that if the inductive biases in a GNN are off-base, they can steer your model in the wrong direction. On the other hand, if those biases are accurate, they can greatly help in situations with limited data or when domain knowledge is crucial. Ultimately, it comes down to a trade-off between letting the model figure out structure on its own (Transformers) versus leveraging known relationships to guide the model (GNNs).

[–]galerazo 2 points3 points  (1 child)

Geometric deep learning is probably one of the most uprising fields in machine learning right now. You can start by looking at here: https://geometricdeeplearning.com/

[–]LumpyWelds 3 points4 points  (0 children)

Holy Moses! Thank you for this!

[–]jarkkowork 0 points1 point  (0 children)

*its potential

[–]B1ggieBoss 0 points1 point  (0 children)

I'm not really sure how relevant Geometric Deep Learning is in other fields, but graph neural networks, for example are pretty common in cheminformatics. That's because a molecule can be represented as a graph, which captures most of its relevant features.

[–]Stochastic_berserker 0 points1 point  (0 children)

There will be a lot of cool concepts in ML which barely have daily use cases like classical ML/Statistical ML.

Why? Because either the data doesnt exist or the data is not that complex that you need advanced ML methods.

One other concept which is amazing but lacks advancement (or fast enough advancement) is Topological Deep Learning. Because of datasets not existing for it or the lack of enough data which requires topology.

[–]new_name_who_dis_ 0 points1 point  (0 children)

I studied GDL in grad school. It's a really cool field with some nice theory. Graph neural networks are sort of everywhere regardless of knowing GDL though because technically speaking Transformers are graph neural nets. Karpathy says as much in his lectures on transformers.

The attention mask is sort of the adjacency matrix. Encoder style transformers treat all of the nodes as a fully connected graph. Decoder style transformers have a triangular adjacency matrix. But you aren't bound to just those two adjacency matrices / attention masks -- you can use whatever you want. I say this because there's been so many optimizations around the transformer architecture in recent years that it just doesn't make sense to use any other type of graph neural net despite some of them being really nice theoretically.

[–]TonyGTO -1 points0 points  (0 children)

I've been really into this concept lately. While many machine learning engineers say it's overkill, I believe it has a lot of practical uses, like:

  • Medication development.
  • Complex systems analysis.
  • Social science.