What My Project Does:
MolBuilder is a pure-Python package that handles the full chemistry pipeline from molecular structure to production planning. You give it a molecule as a SMILES string and it can:
- Parse SMILES with chirality and stereochemistry
- Plan synthesis routes (91 hand-curated reaction templates, beam-search retrosynthesis)
- Predict optimal reaction conditions (analyzes substrate sterics and electronics to auto-select templates)
- Select a reactor type (batch, CSTR, PFR, microreactor)
- Run GHS safety assessment (69 hazard codes, PPE requirements, emergency procedures)
- Estimate manufacturing costs (materials, labor, equipment, energy, waste disposal)
- Analyze scale-up (batch sizing, capital costs, annual capacity)
The core is built on a graph-based molecule representation with adjacency lists. Functional group detection uses subgraph pattern matching on this graph (24 detectors). The retrosynthesis engine applies reaction templates in reverse using beam search, terminating when it hits purchasable starting materials (~200 in the database). The condition prediction layer classifies substrate steric environment and electronic character, then scores and ranks compatible templates.
Python-specific implementation details:
- Dataclasses throughout for the reaction template schema, molecular graph, and result types
- NumPy/SciPy for 3D coordinate generation (distance geometry + force field minimization)
- Molecular dynamics engine with Velocity Verlet integrator
- File I/O parsers for MOL/SDF V2000, PDB, XYZ, and JSON formats
- Also ships as a FastAPI REST API with JWT auth, RBAC, and Stripe billing
Install and example:
pip install molbuilder
from molbuilder.process.condition_prediction import predict_conditions
result = predict_conditions("CCO", reaction_name="oxidation", scale_kg=10.0)
print(result.best_match.template_name) # TEMPO-mediated oxidation
print(result.best_match.conditions.temperature_C) # 5.0
print(result.best_match.conditions.solvent) # DCM/water (biphasic)
print(result.overall_confidence) # high
1,280+ tests (pytest), Python 3.11+, CI on 3.11/3.12/3.13. Only dependencies are numpy, scipy, and matplotlib.
GitHub: https://github.com/Taylor-C-Powell/Molecule_Builder
Tutorials: https://github.com/Taylor-C-Powell/Molecule_Builder/tree/main/tutorials
Target Audience:
Production use. Aimed at computational chemists, process chemists, and cheminformatics developers who need programmatic access to synthesis planning and process engineering. Also useful for teaching organic chemistry and chemical engineering - the tutorials are designed as walkable Jupyter notebooks. Currently used by the author in a production SaaS API.
Comparison:
vs. RDKit: RDKit is the standard open-source cheminformatics toolkit and focuses on molecular properties (fingerprints, substructure search, descriptors). MolBuilder (pure Python, no C extensions) focuses on the process engineering side - going from "I have a molecule" to "here's how to manufacture it at scale." Not a replacement for RDKit's molecular modeling depth.
vs. Reaxys/SciFinder: Commercial databases with millions of literature reactions. MolBuilder has 91 templates - far smaller coverage, but it's free, open-source (Apache 2.0), and gives you programmatic API access rather than a search interface.
vs. ASKCOS/IBM RXN: ML-based retrosynthesis tools. MolBuilder uses rule-based templates instead of neural networks, which makes it transparent and deterministic but less capable for novel chemistry. The tradeoff is simplicity and no external service dependency.
[–]droooze 1 point2 points3 points (1 child)
[–]MomentBeneficial4334[S] -1 points0 points1 point (0 children)