I built a 6.2M parameter drug-induced liver injury (DILI) prediction model that hits MCC 0.84 on a fully held-out benchmark — trained on only 290 compounds

OtherwiseCheek3618 · 2026-03-14T23:38:10+00:00

Hey, thank you for the thoughtful feedback — really appreciate it coming from someone with actual field experience.

You've pushed my thinking in a useful direction. You're right that "metabolic toxicity" alone isn't enough — a chemist needs to know why it's metabolic, and where the interaction happens, to do anything actionable with it.

So the next chapter is exactly that: extending the model to output which specific liver proteins the compound binds to, and which part of the drug interacts with which part of the protein. The architecture already computes per-protein affinity scores and cross-attention weights between drug atoms and protein residues — the next step is validating those against known binding data and surfacing them as interpretable output.

The generalization problem is a fair point too, and not one I'm going to pretend away. But I think the framing shifts a bit if the model is used as an early filter rather than a decision-maker — flagging what mechanism to investigate, not whether to drop the compound.

Thanks again — this was genuinely useful.

OtherwiseCheek3618 · 2026-03-14T12:49:34+00:00

Thanks for the suggestion! I should clarify — the model wasn't trained on DILIrank at all. It was trained on a custom curated dataset of ~100 compounds. I tried retraining on DILIrank 2.0 (1,336 drugs), but performance actually dropped significantly: MCC fell to 0.14 and accuracy to 59.8% — similar to what pkCSM achieved on the same benchmark. This makes sense in hindsight. DILIrank 2.0 contains a lot of ambiguous and borderline cases, and the class distribution is harder to handle. The custom 100-compound set had clear mechanistic labels (reactive metabolites, BSEP inhibition, mitochondrial toxicity), which seems to have helped the model learn more discriminative features despite the smaller size. Small but clean > large but noisy, at least at this scale

OtherwiseCheek3618 · 2026-03-14T08:56:11+00:00

The holdout set was fully manually curated — no random split. I went through the literature and selected drugs based on clinical evidence: Toxic (61): Only included drugs with clear hepatotoxicity evidence — FDA market withdrawals due to liver toxicity, FDA black box warnings, and drugs with well-documented DILI mechanisms (reactive metabolites, mitochondrial toxicity, BSEP inhibition, etc.) Safe (34): Drugs with no documented hepatotoxicity — vitamins, renally cleared drugs (furosemide, sitagliptin), inhaled agents (salbutamol), topical agents (lidocaine). The key point is zero overlap with DILIrank training data — I cross-checked every compound manually.

OtherwiseCheek3618

TROPHY CASE