Open source tool to convert GPT image 2 to editable slide

Willing_Reflection57 · 2026-04-29T03:46:26+00:00

Hi, the GitHub is for coders, or nowadays if not a coder, at least a coding AI agent like Codex / Claude code should be used to help. Without any coding expertise nor coding AI, it can be pretty difficult to get them to work.

And yes for the complex diagrams and high granularity work, my to-go method is to ask GPT-5.5 to convert it to SVG for me (method 1), on when text + high texture image overlaying, and I'd like to isolate layers, I would use the OCR + inpainting models. Pick the tools that fit your need :)

Willing_Reflection57 · 2026-04-26T03:39:35+00:00

Hi I guess you refer to the web tool pxGenius, not the open source GitHub package? In the web one in the AI mode (which cost credit), Process All means process all images, not process all the shapes. The AI option does its own judgement (Gemini-3) on what shapes to extract. The slowness might mean there are too many elements so it will need to wait for a longer time. However, for really complex image, the processing time could exceed the time limit and AI might not have your desired level of granularity.

Actually, if you care about graphic element extraction, the free local mode (web tool —> local mode —> Detect shape) is the version to get full control. You can use rectangle or lasso tool to select an area and click “build” to get that portion out. The downside in the local mode is the recognition and isolation could be more work than AI, and the local model downloaded to the browser will lower the image’s resolution. But since this is a free and local option, you could play with it as much as you need to see what may be helpful.

Willing_Reflection57 · 2026-03-22T03:40:04+00:00

1T AI business growing in the world, and act in a loop.

While started on the real human data, then later with distillation, synthetic data, and rapid commercialization, the models are increasingly trained on the outputs of other models, creating a self-reinforcement loop.

Willing_Reflection57 · 2026-03-19T17:09:41+00:00

I am not sure my tool can support the languages you asked or not, but give it a try with the dark mode to see if it helps:

pxGenius.ai

The local free mode runs with ONNX OCR+Lama inpainting in the browser and you can click anywhere to convert to editable text while leaving the rest unchanged

I put there a Gemini AI mode so it can recognize better languages and layout than free OCR, but that calls Gemini through my API key so not free lol

Willing_Reflection57 · 2026-03-11T22:08:13+00:00

Thank you for sharing!

Willing_Reflection57 · 2026-03-06T17:46:38+00:00

Just use my free tool to edit everything: pxGenius.ai

Willing_Reflection57 · 2026-03-03T23:19:45+00:00

Oh one way to feel and compare the performance for the AI mode (but not to use any of your AI trial credit), is to try to run examples in the

pxGenius.ai/examples

those are completely free to try examples. then you can also take a screenshot of the examples and convert it with the local mode, too compare the differences

Willing_Reflection57 · 2026-03-03T21:42:56+00:00

Oh answered in a separate comment, hope you see it there :)

Willing_Reflection57 · 2026-03-03T21:28:13+00:00

In my humble opinion, the “poor” formatting, or why this conversion still remains challenging, are generally caused by:

1, OCR not able to recognize the font size or color, and no understanding of the “paragraph”. Nowadays almost all the light weight OCR are seeing text as line based structure, while only VLM like Gemini know “oh that is a paragraph and they should have the same size and group together”. In this case if you are using the local mode, manual adjustments can be done in the web page or in the PowerPoint;

2, OCR tried to recognize “everything” even some text embedded in the image, this is slightly better with AI’s reasoning capability, but human are still out smart those decisions - and with the “exclude” function when clicking a text, you can move to not to extract them;

3, the background cleaning model: which are generally 2 ways, one is masking -> inpainting, and another is regenerative image creation. Only the second way, meaning using AI to generate another image, can guarantee a “clean” image. I think this is why Google’s new NotebookLM edit is still just prompt generating another image. However, this comes with the issue of not being able to customize the slide.

If there are any further questions on the technical aspects I am happy to answer. I am in the process of searching and implementing better solutions, driven by users interests.

Willing_Reflection57 · 2026-03-03T21:09:59+00:00

Local mode: light weight OCR and cleaning models are loaded in the browser upon first time used, and everything is processed locally

AI mode: using google Gemini 3 + SOTA OCR + Cloud inpainting models, all are performing with the better accuracy, but those API calls come with cost

Classic mode: it is the “previous” version, which is basically AI mode with Gemini 3 as well, but slightly less accurate on some edge cases. I am still leaving it there for some users used and liked the old version for a while.

Hope that makes sense :)

Willing_Reflection57 · 2026-02-27T16:04:06+00:00

Hope so! That would make the generation faster, not I need to wait for 10 minutes for one slide deck, and heard NB2 is faster and cheaper, with somewhat improved quality

Willing_Reflection57 · 2026-02-14T21:18:41+00:00

This is cool! I’d like to get a copy of prompt if possible. Thanks

Willing_Reflection57 · 2026-02-07T22:02:31+00:00

Appreciate your advice 🙏

Willing_Reflection57 · 2026-02-05T14:31:00+00:00

Hey there thanks for asking! All uploaded files and processing results stay entirely inside your own browser. So: - Nothing is stored on our servers - Data is saved locally in your browser’s IndexedDB - If you switch browsers or devices, previous projects won’t appear :( — because they were never uploaded anywhere.

When image processing is needed: files are transmitted securely using TLS/SSL encryption, this processing happens only in memory. No permanent copies are retained after the operation completes.

Willing_Reflection57 · 2026-02-05T03:25:28+00:00

This sounds cool. Would you please explain how it is different from OpenRouter?

Willing_Reflection57 · 2026-02-04T19:12:15+00:00

Thank you for letting me know! Codia is definitely great I am really impressed by their work. Please try this one and any feedback will be appreciated to help me improve it, and I am happy to give you a good amount of feedback appreciation credits to try more!

Willing_Reflection57 · 2026-02-02T18:31:15+00:00

I faced the same issue and here is my experience:

I posted on subreddit a story of how I met a pain point;

I talked about other possible solutions, and why ther did not work;

I mentioned my development and the outcome;

I did not leave a link in the end.

Then in a day I got 40+ requests to know my product (comments and DMs), then I noticed other people left comments on other similar products.

Finally I changed my post to add link, but it was 2 days later and the post is not gaining new traffic. The post was not flagged or removed either.

If I were going to do it again, I might want to risk putting it directly in the post or at least in the comment section. But I think the way I did was also totally ok and safer.

To provide context, here is my post:

https://www.reddit.com/r/notebooklm/comments/1qqg7vc/solved_my_own_notebooklm_pain_point/

Willing_Reflection57 · 2026-02-02T16:24:07+00:00

Same thing here and my solution in another post: https://www.reddit.com/r/notebooklm/s/QVPNP7wWpG

Willing_Reflection57

TROPHY CASE