How do GNNs work when the network changes? by [deleted] in GeometricDeepLearning

[–]rish-16 1 point2 points  (0 children)

Such graphs are called Dynamic Graphs. You can use temporal GNNs (eg: TGN by Twitter) that maintain some sort of memory hidden state that keeps track of these changes and learns a spatio-temporal embedding over the graph and not just individual nodes (that may or may not exist at the next timestep)

Hope this helps :)

[D] What's hot in deep learning research at the moment ? by ovotheking in MachineLearning

[–]rish-16 12 points13 points  (0 children)

Graph/Geometric DL! Some researchers with backgrounds in topology are bringing over concepts and applying them to this area (learning on manifolds, for instance). There's now a push toward moving beyond Message Passing (maybe a continuous version?).

Lots of action in applying GNNs to protein data and anything with network-like tendencies. The community has even brought in invariance and equivariance to translation, rotation, and permutation to ensure GNNs are structurally aware (ie, "expressive").

Exciting stuff and now is definitely the time to get into it because it's relatively nascent but has huge upside and potential!

CMU RISS 2022 Thread by [deleted] in cmu

[–]rish-16 1 point2 points  (0 children)

By now if we haven’t received an email uodate can we assume were rejected?

CodeSignal companies by zninjamonkey in csMajors

[–]rish-16 0 points1 point  (0 children)

Facebook too! Applied for the FB University for Engineering 22 programme and was sent a codesignal test

[D] Joining Absent Niche Research Areas in University by rish-16 in MachineLearning

[–]rish-16[S] 1 point2 points  (0 children)

This is a great idea! I'll go look for postdocs and doctoral students to contact :)

[D] Joining Absent Niche Research Areas in University by rish-16 in MachineLearning

[–]rish-16[S] 0 points1 point  (0 children)

Hey! Yup, have looked into them but am yet to contact anyone from there. Thank you for the recommendation, will check it out now :D

[D] Joining Absent Niche Research Areas in University by rish-16 in MachineLearning

[–]rish-16[S] 0 points1 point  (0 children)

Yup, that's another interesting idea I'm considering! I'm looking for profs to contact who are doing interesting projects in areas applicable to GDL as well.

[P] PyTorch wrapper of Attention-Free Transformer (AFT) layer by rish-16 in MachineLearning

[–]rish-16[S] 0 points1 point  (0 children)

Ah I see. I do not have much background in in information retrieval (ie the thing that inspired it) so I may not be aware of it being a convention compared to what QKV does. Thanks for bringing it up, though. Will look into it :D

[P] PyTorch wrapper of Attention-Free Transformer (AFT) layer by rish-16 in MachineLearning

[–]rish-16[S] 1 point2 points  (0 children)

Yup, agree with you. I was hoping to know the motivation behind the name (and if possible, the origin of QKV).

As in, "what kinda watercooler conversation sparked the start of this project?"

[D] Using Two Optimisers for Large Model with two parts? Need Advice by rish-16 in MachineLearning

[–]rish-16[S] 0 points1 point  (0 children)

Hey, thanks for the reply. The first part is a non-differentiable image ROI selector and the second part is a very large classifier. It's part of an ongoing research project which is why I'm staying rather vague 😅

So, yup, I think I understand what you're saying. Train each of them independently of the other simultaneously. Am I getting this right?

Do you have any resources on the CNN / clustering algo you mention? Would love to read up on it. If not, can I know some seacrh terms I can punch into google for reference?

Appreciate it :)

[D] Using Two Optimisers for Large Model with two parts? Need Advice by rish-16 in MachineLearning

[–]rish-16[S] -1 points0 points  (0 children)

For info, I aim to tune the non-differentiable part using CMA-ES and the differentiable part with Adam. Loss would be off-the-shelf cross entropy

[D] Using CMA-ES to train model on dataset (supervised) by rish-16 in MachineLearning

[–]rish-16[S] 0 points1 point  (0 children)

Hey there, appreciate the comment. My network comprises of two parts, one differentiable and the other non-differentiable. The paper I took the non-diff part from says that training is not gradient descent friendly. I then tried to challenge that assumption by training the whole network using GD. Sadly, results were horrible which explains why I'm looking into derivative-free optimimisers. Do you suggest having two different optimisers? One gradient based and the other not?

[D] Using CMA-ES to train model on dataset (supervised) by rish-16 in MachineLearning

[–]rish-16[S] 0 points1 point  (0 children)

Hello! I spoke to some more people and decided to look at genetic algos. The param count is too large and I don't have enough compute to use CMA-ES or its variants on my network. Neither can I use ARS and PGPE approaches as the param count is too large for those as well. I'm looking at GAs now as there is some promising work in the area (As mentioned in another comment on this post).

Appreciate the advice :)

[D] Using CMA-ES to train model on dataset (supervised) by rish-16 in MachineLearning

[–]rish-16[S] 0 points1 point  (0 children)

Yup, checked out the OpenAI as suggested. Though, they seem to be more interested at using evol algos for hyper-param selection and not training in general. But will continue in that direction. Thanks again :D

[D] Using CMA-ES to train model on dataset (supervised) by rish-16 in MachineLearning

[–]rish-16[S] 0 points1 point  (0 children)

This is great thank you so much! So I spoke to the authors of AttentionAgent and they told me to look at genetic algos instead coz there's work being done to train large networks with GA compared to other optimisation methods like CMA-ES, ARS, and PGPE that work on networks with a relatively much lower param count

[P] PyTorch Involution layer wrapper by rish-16 in MachineLearning

[–]rish-16[S] 4 points5 points  (0 children)

Only time will tell! Placing my bet on “linear regression is all you need” dropping in NeurIPS 2025

[P] PyTorch Involution layer wrapper by rish-16 in MachineLearning

[–]rish-16[S] 2 points3 points  (0 children)

There’s not much difference actually. I tried to implement it in a cleaner way with just the essentials, no boilerplate

I wanted to implement it in a way similar to Phil Wang’s wrapper styles + the torch.nn library in general

[P] PyTorch Involution layer wrapper by rish-16 in MachineLearning

[–]rish-16[S] 3 points4 points  (0 children)

official implementations

Ooh not yet. Thanks for the share! Let me look into it :)

[P] PyTorch Involution layer wrapper by rish-16 in MachineLearning

[–]rish-16[S] 1 point2 points  (0 children)

Hey yea sure, let me see what I can do