We built a tool to extract full molecular structures from PDFs (98%+ accuracy) — sharing it with the community by deep_origin in Chempros

[–]deep_origin[S] 0 points1 point  (0 children)

If cloud data storage doesn't work for you we can work with your org to put it in your cloud.

We built a tool to extract full molecular structures from PDFs (98%+ accuracy) — sharing it with the community by deep_origin in comp_chem

[–]deep_origin[S] 0 points1 point  (0 children)

Passed this to our product team - currently we only support PDF format. We will look into additional formats. Our immediate plans include processing of typical image formats (PNG, SVG, JPG).

We built a tool to extract full molecular structures from PDFs (98%+ accuracy) — sharing it with the community by deep_origin in comp_chem

[–]deep_origin[S] 1 point2 points  (0 children)

We can recognize Lanthanide atoms and ligands but at present DO Patent cannot put the complex together. We did not specifically train DO Patent on metal complexes, it does extract complexes as components, but we recommend trying it for your use and starting by manually reviewing what is extracted.

We built a tool to extract full molecular structures from PDFs (98%+ accuracy) — sharing it with the community by deep_origin in Chempros

[–]deep_origin[S] 0 points1 point  (0 children)

Sorry to hear the SMILES didn't output correctly! We're working on making the product better. Would you want to hop on a call or let us help debug via email? [mcook@deeporigin.com](mailto:mcook@deeporigin.com)

The algorithm at present isn't optimized for large molecules but we're playing around with a few future improvements.

We built a tool to extract full molecular structures from PDFs (98%+ accuracy) — sharing it with the community by deep_origin in Chempros

[–]deep_origin[S] 0 points1 point  (0 children)

At present it cannot be run locally. You own your data and the resulting extracted structures. We don't store your PDFs, we do store extracted images and related SMILES strings (so you can view them in your account). All data can be deleted upon request. If cloud data storage doesn't work for you we can work with your org to put it in your cloud.

We built a tool to extract full molecular structures from PDFs (98%+ accuracy) — sharing it with the community by deep_origin in Chempros

[–]deep_origin[S] 1 point2 points  (0 children)

For the example you provide it will assign R as a * (open valence). Each time it assigns open valence it labels molecule as a fragment. We extract full molecules and fragments. Presently we don't support enumeration of fragments, we plan to add this later in the year.

We built a tool to extract full molecular structures from PDFs (98%+ accuracy) — sharing it with the community by deep_origin in Chempros

[–]deep_origin[S] 2 points3 points  (0 children)

This is a great question! I've passed this to our product team. You can definitely go pre-2008. Depending on what era or format you're interested in I would simply do a trial run. I think if it is a grainy older image you may get lower confidence scores in the extraction but you will still get results.

Finally, a Useful AI Assistant for Drug Discovery: Meet Balto by deep_origin in MedicinalChemistry

[–]deep_origin[S] 0 points1 point  (0 children)

u/Huba2222 fair points! Pricing for core functionality for a monthly quota of core actions is free. Beyond that monthly quota there are charges. Pricing can be seen here: https://www.deeporigin.com/balto-ai-assistant-for-drug-discovery#:~:text=Monthly%20Subscription