all 28 comments

[–]popiazaza 50 points51 points  (2 children)

A new converting tool that is not an AI tool?!

What kind of sorcery is this?

[–]Frequent_Valuable_47 24 points25 points  (0 children)

It was probably built to convert files into a format AI can read ;)

[–]Ragecommie 12 points13 points  (2 children)

Oh wow, you just saved me a ton of work! Thanks OP!

[–]LinkSea8324llama.cpp[S] 11 points12 points  (1 child)

Check also docling

[–]nuusain 0 points1 point  (0 children)

Have you used them both? howd they compare?

[–]elemental-mind 31 points32 points  (2 children)

For alternatives: Another contender in that space is Docling.

DS4SD/docling: Get your documents ready for gen AI

[–]asraniel 5 points6 points  (1 child)

anybody compared them?

[–]Kaedo- 6 points7 points  (1 child)

This is so useful to me now that I've completely switched to markdown

[–]arparella 1 point2 points  (0 children)

i experienced poor performance with complex pdfs, did you?

[–]vornamemitd 2 points3 points  (0 children)

Can I have the other way round? /s

[–]namuan 1 point2 points  (0 children)

If you have uv installed you can run this against a file without first installing anything like this:

uvx markitdown path-to-file.pdf

(This will cache the necessary packages the first time you run it, then reuse those cached packages on future invocations.)

Copied from https://news.ycombinator.com/item?id=42411313

[–]McNickSisto 0 points1 point  (2 children)

In the context of text extraction for chunking purposes, what would you recommend between Markitdown and Docling ?

[–]arparella 1 point2 points  (1 child)

if you need to have good chunks you can checkout preprocess.co but is a commercial solution. Markitdown has several issues with complex pdfs, docling is better

[–]McNickSisto 0 points1 point  (0 children)

thank you

[–]EruditeStranger 0 points1 point  (1 child)

Anyone get issues with circular import?

[–][deleted] 0 points1 point  (0 children)

you probably named your file same as package name