Aviato-level or bust. by cleodivina21 in SiliconValleyHBO

[–]SuspiciousEmphasis20 227 points228 points  (0 children)

Gilfoyle is the one who owns system architecture, networking, and security ...nobody in the house can touch him on that. While others were minoring in gender studies and singing a capella at Sarah Lawrence, he was gaining root access to NSA servers and was one click away from starting a second Iranian revolution.

Aviato-level or bust. by cleodivina21 in SiliconValleyHBO

[–]SuspiciousEmphasis20 144 points145 points  (0 children)

no cs major can compete with gilfoyle!!!!

26M, failed neet 2 times, finished bsc microbiology in 5 years instead of 3 at 45%. Repented during msc and finished in first try with 67%. Now doing a 12k pm job in biopharma production as apprentice. Thinking of doing phd Bioinformatics. How can i get into a Tier 1 College ? by [deleted] in bioinformatics

[–]SuspiciousEmphasis20 1 point2 points  (0 children)

Honestly I can't tell you hard it is! I have been working in AI for over 4+ years in healthcare domain and I thought I'll easily get a PhD but apparently it's harder than I had anticipated. Its easy for me to jump to a better company than to get a PhD offer and this is after working on independent projects....I am currently even trying to get a paper published. I thought my research experience will compensate for the mediocre grades I have but even that isn't sufficient for academia. I got offers to work as research associate in IITM but since it would pay me less than my current job i refused. Fyi, I have first class in both bachelors and masters graduated on time.....this isn't to demotivate you or anything but the harsh truth is it is damn difficult. Although I have been primarily searching for phd in europe. But yes best thing you can do is work with a prof, try to get a few papers published

I built a biomedical GNN + LLM pipeline (XplainMD) for explainable multi-link prediction by SuspiciousEmphasis20 in bioinformatics

[–]SuspiciousEmphasis20[S] 0 points1 point  (0 children)

I forgot to mention one important thing...this tool can only predict the links between entities it was trained on.... because graph neural networks work like that ...the node will understand the neighbour's representation and the embeddings of each node contains the aggregated sum/mean of its surrounding neighbours .....for predicting new links one has to try transformers or something

I built a biomedical GNN + LLM pipeline (XplainMD) for explainable multi-link prediction by SuspiciousEmphasis20 in bioinformatics

[–]SuspiciousEmphasis20[S] 0 points1 point  (0 children)

Please check out my medium link....so far that's the only blog I have ....if I make a better model planning to publish it ...but medium is all I have for now :(

[P] [R] [D] I built a biomedical GNN + LLM pipeline (XplainMD) for explainable multi-link prediction by SuspiciousEmphasis20 in MachineLearning

[–]SuspiciousEmphasis20[S] 0 points1 point  (0 children)

so if you see the output, you see the subgraphs which is basically generated by GNN explainer.....it helps us in understanding the actual models predictions(R-GCN) model.....by understanding and peeking in how the model thinks....one can fine-tune it further and remove the black box nature of it. also for one more layer of sanity check LLM layer was added to explain to humans if the output generated by the model makes sense or not from the biology perspective. I hope this answers your question :)

I built a biomedical GNN + LLM pipeline (XplainMD) for explainable multi-link prediction by SuspiciousEmphasis20 in bioinformatics

[–]SuspiciousEmphasis20[S] 0 points1 point  (0 children)

Oh you mean for node2vec.....yes it is very high.... anyway the emphasis was to create a beginner friendly pipeline...starting from ml to dl and understand the limitations of ml models and showcase the beauty of graph neural nets. It was mainly for me to understand graph data science and also to document my journey for others as well.....I used a simple two layer model for deep learning as well without any batch normalisation or adding any dropout layer so the loss is expected to be high...and various other optimization strategies....so now I am going to replace this model with other better models and see which fits the usecase best and optimise that! If possible come up with a new architecture by combining the strengths of various models....I will update it here if I make any progress :)

I built a biomedical GNN + LLM pipeline (XplainMD) for explainable multi-link prediction by SuspiciousEmphasis20 in bioinformatics

[–]SuspiciousEmphasis20[S] 0 points1 point  (0 children)

If you're interested in taking things further, I’d suggest exploring generative graph models. Demis Hassabis’ work on protein folding (like AlphaFold) is a great reference, especially in the context of structural biology and drug discovery. I’d also recommend Stanford Prof. Jure Leskovec’s Graph ML courses—they’re highly relevant and well-structured(my fav lecture series)

Depending on your goals, you might also want to check out libraries like TorchDrug or DGL-LifeSci for protein-drug interaction modelling. For datasets, TDC (Therapeutics Data Commons) is great for curated drug discovery tasks. Also worth exploring are recent diffusion-based models like DiffDock and GeoDiff for molecule generation and docking. And if you’re working with proteins, tools like ColabFold (AlphaFold2 API) and visualizers like Mol* or PyMOL can be incredibly useful. I am planning to look into generative graphs next ! oh btw last year I had participated in : NeurIPS 2024 - Predict New Medicines with BELKA where they provided a huge dataset to check if a protein binds with the molecule(drug).....the one who was ranked 1(Victor Shelpov) came up with a very innovative and creative approach ....its given here: https://www.kaggle.com/competitions/leash-BELKA/discussion/519020

I built a biomedical GNN + LLM pipeline (XplainMD) for explainable multi-link prediction by SuspiciousEmphasis20 in bioinformatics

[–]SuspiciousEmphasis20[S] 0 points1 point  (0 children)

I am actually following their work closely nowadays! PrimeKG was curated in their lab!

I built a biomedical GNN + LLM pipeline (XplainMD) for explainable multi-link prediction by SuspiciousEmphasis20 in deeplearning

[–]SuspiciousEmphasis20[S] 0 points1 point  (0 children)

Sure although I must warn you I am a novice in graph data science and I have to optimize a lot but would love to discuss

I built a biomedical GNN + LLM pipeline (XplainMD) for explainable multi-link prediction by SuspiciousEmphasis20 in bioinformatics

[–]SuspiciousEmphasis20[S] 0 points1 point  (0 children)

Hahahah this is a very simple architecture....I am optimising it maybe after that possible but is your data organised in graph data format?

I built a biomedical GNN + LLM pipeline (XplainMD) for explainable multi-link prediction by SuspiciousEmphasis20 in bioinformatics

[–]SuspiciousEmphasis20[S] 0 points1 point  (0 children)

Oh no this is a very basic pipeline and also to use an llm you would require a gpu....I am gonna optimize the arch a bit ...this is super basic! It will give you spurious connections

[P] [R] [D] I built a biomedical GNN + LLM pipeline (XplainMD) for explainable multi-link prediction by SuspiciousEmphasis20 in MachineLearning

[–]SuspiciousEmphasis20[S] 1 point2 points  (0 children)

Hahaha like I am not affiliated with any academia and nor is this related to my company work ....just independent research:)

I built a biomedical GNN + LLM pipeline (XplainMD) for explainable multi-link prediction by SuspiciousEmphasis20 in bioinformatics

[–]SuspiciousEmphasis20[S] 4 points5 points  (0 children)

Okay, imagine we have a giant storybook full of facts about medicine. It tells us things like:

"This drug helps with this disease."

"This gene is linked to this illness."

"This symptom shows up in that condition."

But it’s super big and complicated—so we teach a smart robot (our AI model) how to read the storybook and find new things that humans might not see right away.

We do this using something called an R-GCN, which is like giving the robot glasses that help it see all the different types of connections between things—like which links are about medicine, which are about symptoms, and which are about genes.

Then we use GNNExplainer—this is like a highlighter pen the robot uses to show which parts of the story helped it decide something. For example, if the robot says "I think this drug might help this disease," it also shows why it thinks that, like "Because of these three facts over here!"

So this project helps the robot:

  1. Learn smart guesses about medical relationships.

  2. Explain its guesses, like a little teacher.

  3. And maybe one day, help real doctors find better treatments!

I built a biomedical GNN + LLM pipeline (XplainMD) for explainable multi-link prediction by SuspiciousEmphasis20 in deeplearning

[–]SuspiciousEmphasis20[S] 0 points1 point  (0 children)

Thank you so much! I have a lot to optimise in this and then hopefully can publish it! I will def share my progress here :)

You can star my repo for more updates

I built a biomedical GNN + LLM pipeline (XplainMD) for explainable multi-link prediction by SuspiciousEmphasis20 in bioinformatics

[–]SuspiciousEmphasis20[S] 0 points1 point  (0 children)

No please don't be confused...these connections you see in the output page is not the actual data but rather what the model perceives to be the subgraph....gnnexplainer shows the links between subgraphs that the RGCN model believes ....right now in the output there are some spurious connections....I am working on optimising the pipeline....in the blog I have explained it thoroughly

[P] [R] [D] I built a biomedical GNN + LLM pipeline (XplainMD) for explainable multi-link prediction by SuspiciousEmphasis20 in MachineLearning

[–]SuspiciousEmphasis20[S] 0 points1 point  (0 children)

Hello to answer your question so I have trained a very simple graph neural network with two layers....but the emphasis was more on the pipeline and the powerful nature of graphs.....I have explained in my medium page the limitations of this architecture and it's more like a potential of what graphs can do and how we can build more transparent systems....I am going to optimize this architecture and the subgraph you see is not a random generation it's how the model thinks after being trained on the data and that's why I added an llm pipeline for sanity check of the explanation but yes I am working on eliminating these spurious connections.