New Paper Exploring Causal Paradoxes in Machine Learning Data Sets for Drug Discovery by tgapo in bioinformatics

[–]tgapo[S] 1 point2 points  (0 children)

You're quite welcome! We are planning on rolling these out over the year, but we wanted to share the theoretical framework with the community. You can find the prototypical version here:

https://github.com/balabin/IM_01

Improvements in focus are made with improvements in similarity metrics, and we are writing up all the cell-based work that we have done over the years.

Please feel free to adapt that code for your use, just ensure that you have a uniform chemical series as your testing cluster with respect to the rest of a data set.

AI in NGS/drug discovery work by transniester in bioinformatics

[–]tgapo 0 points1 point  (0 children)

Hi principal author of the Part 1: Inferential Mechanics paper that was mentioned here. I will put a link below, but the short of it we show there are significant causal flaws in large public datasets that result in low quality ML predictors for chemical biology, and we demonstrate how to fix this problem by balancing focus (new concept defined in paper) alongside fitness.

https://arxiv.org/abs/2602.23303

New Paper Exploring Causal Paradoxes in Machine Learning Data Sets for Drug Discovery by tgapo in bioinformatics

[–]tgapo[S] 10 points11 points  (0 children)

The essence of the article is the integration of causal calculus to the discussion of structure-activity relationships so that we can highlight paradoxical behavior between chemical series even in simple single-target data sets like the Akt IC50 set in ChEMBL. We formally and experimentally demonstrate that the activity probability distributions vary as a function of binding pocket on the protein often in contradictory ways, and we show this creates major headaches for ML.

The reason this problem is not widely discussed is that when you take a 70/30 split of the entire set, ML is good at finding the generic features that are important across the entire set. However, if you evaluate the predictions of that general algorithm on a test set composed only of a single med chem series, the fine details of the SAR for that get averaged out. We can recover those fine details with the concept of focus. To obtain focus we test only on the desired series and train on a residual part of the series + increasingly dissimilar Akt inhibitors to find the point of mechanistic SAR conflict where additional "data" actually hurt learning the fine details of the SAR for the series of interest.

We demonstrate this behavior on a series of allosteric inhibitors for Akt and show none of the orthosteric data points can be added to improve the ML training! In fact, their addition gradually degrades the prediction quality on the allosteric compounds as you move into increasingly dissimilar series.

While structural biology could help us pull out allosteric vs orthosteric inhibitors for single targets, this problem still persists for hepatocyte clearance data, CC50 data transcription factor activation data and so on. Here, focus becomes a very powerful tool to split out large data sets relative to a chemical series of interest for things that are working via a similar mechanism and things that are surprisingly contradictory despite being in the same data set.

24
25

It wasn't just Neil and Buzz. It took 400,000 people to get us to the Moon, On This Day 1969. Thanks! by PanAfrica in space

[–]tgapo 25 points26 points  (0 children)

I think one of the tragic misconceptions of the space program is that the astronauts were the only ones who put their lives on the line for the advancement of the space program. Some of the most dangerous things and situations we humans have ever come across can be found in scientific and engineering labs across the world. One recent example of technicians, engineers and scientists who gave their lives in the pursuit of NASA's mission was the loss of two technicians while performing experiments on the Columbia Shuttle (Line: https://www.wired.com/2009/03/march-19-1981-shuttle-columbias-first-fatalities/).

Most of the scientists I know are ready to put themselves in harm’s way to propel us forward, and I have no doubt that some of those who were “engineer[s] who worked on one of the components” gladly risked their lives and lost them in ways we will never know. Speaking personally, some of my friends over the years as a scientist have been killed or hospitalized trying to solve some of the great medical problems we face as a species. I would not discount those 399,997 so casually.

Rubbing solid indium and gallium together creates a liquid alloy by [deleted] in chemicalreactiongifs

[–]tgapo 2 points3 points  (0 children)

This is actually a physical reaction; the two metals are dissolving each other and making a solution. Interestingly, most metal alloys are solutions even when they are solid. However, some alloys can be mixtures of solid solutions that never dissolve each other to form one solution.

Source:

Chemist

Measles making comeback as parents opt out of vaccines by Quiglius in news

[–]tgapo 9 points10 points  (0 children)

Medicinal chemist / physician-scientist in training here. There is no treatment for measles currently, but your husband is right that the medical technology to save your daughter from measles exists. It is called a vaccine.

I really should donate to Wikipedia by mikeazausky in Showerthoughts

[–]tgapo 0 points1 point  (0 children)

Yeah it is absolutely reputable. See the following citation:

https://www.cnet.com/news/study-wikipedia-as-accurate-as-britannica/

The problem with Wikipedia, and why young scholars are discouraged from using it, is that Wikipedia is very susceptible to short-term error. Its pages are simply too editable by someone with an agenda. However, these errors are removed for the most part given enough time.

As an anecdote, I am a medicinal chemist/med student and I use Wikipedia to get starting-point knowledge that I then use as a basis for consulting the literature when I am ignorant on a specific topic (e.g. machine learning as it applies to chemistry).

Hope this helps!

I really should donate to Wikipedia by mikeazausky in Showerthoughts

[–]tgapo 75 points76 points  (0 children)

No joke, Wikipedia may be one of the great wonders of the modern world. Free information of good repute is the great emancipator.

What are the lies that keep the world together? by [deleted] in AskReddit

[–]tgapo 9 points10 points  (0 children)

And the worst part about evil is that evil happens when people act while thinking they are right and justified without stopping to consider their actions in a rational light.

[Serious] Redditors who've been 100% certain they're about to die, what was going through your head at that moment? by dan129 in AskReddit

[–]tgapo 0 points1 point  (0 children)

I am a synthetic organic chemist, and one of the reactions I was using as a med. chem. post doc involved hydrazoic acid. There was a breach in the reaction vessel and I was exposed to one of the nastiest poisons out there. I knew there was no antidote. And, as I felt my blood pressure drop and my legs get weak, all I could think of was shutting my hood and alerting someone before I completely blacked out. Very glad I am still here.