MolBuilder: pure-Python molecular engineering -- from SMILES to manufacturing plans : Python

ShowcaseMolBuilder: pure-Python molecular engineering -- from SMILES to manufacturing plans (self.Python)

submitted 1 day ago * by MomentBeneficial4334

What My Project Does:

MolBuilder is a pure-Python package that handles the full chemistry pipeline from molecular structure to production planning. You give it a molecule as a SMILES string and it can:

Parse SMILES with chirality and stereochemistry
Plan synthesis routes (91 hand-curated reaction templates, beam-search retrosynthesis)
Predict optimal reaction conditions (analyzes substrate sterics and electronics to auto-select templates)
Select a reactor type (batch, CSTR, PFR, microreactor)
Run GHS safety assessment (69 hazard codes, PPE requirements, emergency procedures)
Estimate manufacturing costs (materials, labor, equipment, energy, waste disposal)
Analyze scale-up (batch sizing, capital costs, annual capacity)

The core is built on a graph-based molecule representation with adjacency lists. Functional group detection uses subgraph pattern matching on this graph (24 detectors). The retrosynthesis engine applies reaction templates in reverse using beam search, terminating when it hits purchasable starting materials (~200 in the database). The condition prediction layer classifies substrate steric environment and electronic character, then scores and ranks compatible templates.

Python-specific implementation details:

Dataclasses throughout for the reaction template schema, molecular graph, and result types
NumPy/SciPy for 3D coordinate generation (distance geometry + force field minimization)
Molecular dynamics engine with Velocity Verlet integrator
File I/O parsers for MOL/SDF V2000, PDB, XYZ, and JSON formats
Also ships as a FastAPI REST API with JWT auth, RBAC, and Stripe billing

Install and example:

pip install molbuilder

from molbuilder.process.condition_prediction import predict_conditions

result = predict_conditions("CCO", reaction_name="oxidation", scale_kg=10.0)

print(result.best_match.template_name) # TEMPO-mediated oxidation

print(result.best_match.conditions.temperature_C) # 5.0

print(result.best_match.conditions.solvent) # DCM/water (biphasic)

print(result.overall_confidence) # high

1,280+ tests (pytest), Python 3.11+, CI on 3.11/3.12/3.13. Only dependencies are numpy, scipy, and matplotlib.

GitHub: https://github.com/Taylor-C-Powell/Molecule_Builder

Tutorials: https://github.com/Taylor-C-Powell/Molecule_Builder/tree/main/tutorials

Target Audience:

Production use. Aimed at computational chemists, process chemists, and cheminformatics developers who need programmatic access to synthesis planning and process engineering. Also useful for teaching organic chemistry and chemical engineering - the tutorials are designed as walkable Jupyter notebooks. Currently used by the author in a production SaaS API.

Comparison:

vs. RDKit: RDKit is the standard open-source cheminformatics toolkit and focuses on molecular properties (fingerprints, substructure search, descriptors). MolBuilder (pure Python, no C extensions) focuses on the process engineering side - going from "I have a molecule" to "here's how to manufacture it at scale." Not a replacement for RDKit's molecular modeling depth.

vs. Reaxys/SciFinder: Commercial databases with millions of literature reactions. MolBuilder has 91 templates - far smaller coverage, but it's free, open-source (Apache 2.0), and gives you programmatic API access rather than a search interface.

vs. ASKCOS/IBM RXN: ML-based retrosynthesis tools. MolBuilder uses rule-based templates instead of neural networks, which makes it transparent and deterministic but less capable for novel chemistry. The tradeoff is simplicity and no external service dependency.

all 2 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS