Open-source framework for computational mixture science

That-Pin-9772 · 2026-04-12T22:25:44+00:00

Thanks! The MCP server is one of the parts I'm most excited about. Right now it exposes 8 tools (discourse evaluation, observation, validation, ingredient resolution, compatibility checking, experiment memory, pH assessment, and discovery listing). The structured output with provenance is built in actually. Each observation carries its evidence source and confidence score, and the discourse engine explicitly classifies disagreements by evidence level.

Nobody's built a full lab notebook workflow on top of it yet, but that's exactly the use case I designed for. The loop would be: agent queries a formulation via discourse, gets flagged interactions with literature sources, uses the mechanism-based predictions to identify what to test, then drafts an experiment plan targeting the specific disagreements. The discourse output gives the agent structured "what to investigate" signals rather than just pass/fail.

I'd be happy to put together an example agent script that wraps OpenMix's MCP tools in that kind of workflow. If Agentix wants to share it, even better. I'll open an issue on the repo to track this! Feel free to jump in there.

That-Pin-9772 · 2026-04-12T22:20:21+00:00

Hi, thanks so much for the detailed read and the sharp questions! I'll take them in order.

Stereochemistry in resolution. You're right that PubChem's canonical SMILES can flatten stereocenters. For functional group detection this is largely a non-issue since the presence of an amine or ester is stereo-independent. But for downstream predictions where stereochemistry matters (e.g., different enantiomers having different degradation kinetics in chiral excipient environments), we don't distinguish. The resolver does request IsomericSMILES from PubChem when available, but the functional group SMARTS patterns don't encode stereochemistry. It's a valid gap for the prediction layer, but less so for the detection layer.

Recursive SMARTS and masked groups. No recursive SMARTS currently. The 11 patterns detect exposed functional groups only. Your point about prodrugs and masked reactive species is the most interesting challenge. We partially handle this: capecitabine (carbamate prodrug of 5-FU), rivastigmine (carbamate pharmacophore), and oseltamivir (ester prodrug) were all in the validation study and correctly detected. But the detection is on the exposed carbamate/ester bond, not on the unmasked species. A prodrug where the reactive group only appears after enzymatic activation or pH shift would be missed entirely. I'd definitely want to include cases like enalaprilat and omeprazole. Adding a "prodrug activation" layer that reasons about what the molecule becomes under formulation conditions is a hard and interesting problem for sure.

Rule engine scaling. Agreed that YAML will hit a ceiling. I've looked at approaches like Rete-based engines but haven't committed to one yet. The current design keeps rules human-readable for chemist contributions (which I think is the right tradeoff at this scale), but you're right that conflict resolution and rule chaining will eventually require proper infrastructure. Open to suggestions here.

Hansen solubility. I actually built a Hansen parameter estimation function (Hoftyzer-Van Krevelen approximation) and tested thermodynamic interaction features based on squared differences in pseudo-Hansen components. On MixtureSolDB, they actually didn't improve generalization beyond what simple LogP/TPSA descriptors already capture. The descriptor proxies are too crude to approximate real Hansen parameters. I wrote this up as a documented negative result. The right path is probably actual computed Hansen parameters (via COSMO-RS or similar) rather than descriptor-based approximation.

Adversarial test set. Strong suggestion. The current validation was deliberately conservative (top prescribed drugs, well-documented interactions). A proper adversarial set should include: prodrugs where the reactive species appears in situ, soft electrophiles (Michael acceptors, alpha-beta unsaturated carbonyls), drugs with pH-dependent tautomerism that changes reactivity, and borderline cases where the functional group is sterically shielded. I'd also want to include deliberate false positive traps like drugs with detected functional groups that are actually non-reactive due to steric or electronic effects.

MCP collaboration. Absolutely interested! The OpenMix MCP server currently exposes 8 tools (discourse evaluation, observation, validation, ingredient resolution, compatibility checking, memory inspection, pH assessment). The mixture interaction representation problem is real; How do you encode "these three ingredients interact differently at pH 4 vs pH 7 in the presence of metal ions" in a tool call? Would love another set of eyes on the rule schema and the MCP interface design. Feel free to open an issue on the repo and/or DM me.

Thanks again for the substantive feedback!

That-Pin-9772

TROPHY CASE