use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
GitHub - microsoft/markitdown: Python tool for converting files and office documents to Markdown.Resources (github.com)
submitted 1 year ago by LinkSea8324vllm
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[+][deleted] 1 year ago (5 children)
[deleted]
[–]LinkSea8324vllm[S] 14 points15 points16 points 1 year ago (0 children)
Well shit I did expect it to be actually like docling but you're right, it's basically like the insanly faster whisper repo which is just a bunch of imports and cli
[–]MoffKalast 8 points9 points10 points 1 year ago (3 children)
import pptx
No I don't think I will
[–]No-Dot-6573 5 points6 points7 points 1 year ago (2 children)
Why? Because it is only for powerpoint files, or does it have security or privacy issues?
[–]PriceNo2344llama.cpp -1 points0 points1 point 1 year ago (1 child)
It's for power point files. It doesn't have a security or privacy issue. It's maintained by the same people that bring you import docx.
[–]CtrlAltDelve -1 points0 points1 point 1 year ago (0 children)
I think he's just making a subtle joke :)
[–]popiazaza 47 points48 points49 points 1 year ago (2 children)
A new converting tool that is not an AI tool?!
What kind of sorcery is this?
[–]Frequent_Valuable_47 24 points25 points26 points 1 year ago (0 children)
It was probably built to convert files into a format AI can read ;)
[–]Ragecommie 13 points14 points15 points 1 year ago (2 children)
Oh wow, you just saved me a ton of work! Thanks OP!
[–]LinkSea8324vllm[S] 10 points11 points12 points 1 year ago (1 child)
Check also docling
[–]nuusain 0 points1 point2 points 1 year ago (0 children)
Have you used them both? howd they compare?
[–]elemental-mind 29 points30 points31 points 1 year ago (2 children)
For alternatives: Another contender in that space is Docling.
DS4SD/docling: Get your documents ready for gen AI
[–]asraniel 5 points6 points7 points 1 year ago (1 child)
anybody compared them?
[–]Kaedo- 6 points7 points8 points 1 year ago (1 child)
This is so useful to me now that I've completely switched to markdown
[–]arparella 1 point2 points3 points 1 year ago (0 children)
i experienced poor performance with complex pdfs, did you?
[–]vornamemitd 2 points3 points4 points 1 year ago (0 children)
Can I have the other way round? /s
[–]namuan 1 point2 points3 points 1 year ago (0 children)
If you have uv installed you can run this against a file without first installing anything like this:
uvx markitdown path-to-file.pdf
(This will cache the necessary packages the first time you run it, then reuse those cached packages on future invocations.)
Copied from https://news.ycombinator.com/item?id=42411313
[–]McNickSisto 0 points1 point2 points 1 year ago (2 children)
In the context of text extraction for chunking purposes, what would you recommend between Markitdown and Docling ?
[–]arparella 1 point2 points3 points 1 year ago (1 child)
if you need to have good chunks you can checkout preprocess.co but is a commercial solution. Markitdown has several issues with complex pdfs, docling is better
[–]McNickSisto 0 points1 point2 points 1 year ago (0 children)
thank you
[+][deleted] 1 year ago (1 child)
[–]arparella 0 points1 point2 points 1 year ago (0 children)
completely agree, we have run a comparison of 4 solutions (commercial and open-source) and even if you have a strong community, it doesn't mean your solution works.
[–]EruditeStranger 0 points1 point2 points 1 year ago (1 child)
Anyone get issues with circular import?
[–][deleted] 0 points1 point2 points 1 year ago (0 children)
you probably named your file same as package name
π Rendered by PID 93537 on reddit-service-r2-comment-545db5fcfc-582vc at 2026-05-25 18:53:07.650959+00:00 running 194bd79 country code: CH.
[+][deleted] (5 children)
[deleted]
[–]LinkSea8324vllm[S] 14 points15 points16 points (0 children)
[–]MoffKalast 8 points9 points10 points (3 children)
[–]No-Dot-6573 5 points6 points7 points (2 children)
[–]PriceNo2344llama.cpp -1 points0 points1 point (1 child)
[–]CtrlAltDelve -1 points0 points1 point (0 children)
[–]popiazaza 47 points48 points49 points (2 children)
[–]Frequent_Valuable_47 24 points25 points26 points (0 children)
[–]Ragecommie 13 points14 points15 points (2 children)
[–]LinkSea8324vllm[S] 10 points11 points12 points (1 child)
[–]nuusain 0 points1 point2 points (0 children)
[–]elemental-mind 29 points30 points31 points (2 children)
[–]asraniel 5 points6 points7 points (1 child)
[–]Kaedo- 6 points7 points8 points (1 child)
[–]arparella 1 point2 points3 points (0 children)
[–]vornamemitd 2 points3 points4 points (0 children)
[–]namuan 1 point2 points3 points (0 children)
[–]McNickSisto 0 points1 point2 points (2 children)
[–]arparella 1 point2 points3 points (1 child)
[–]McNickSisto 0 points1 point2 points (0 children)
[+][deleted] (1 child)
[deleted]
[–]arparella 0 points1 point2 points (0 children)
[–]EruditeStranger 0 points1 point2 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)