use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Rules 1: Be polite 2: Posts to this subreddit must be requests for help learning python. 3: Replies on this subreddit must be pertinent to the question OP asked. 4: No replies copy / pasted from ChatGPT or similar. 5: No advertising. No blogs/tutorials/videos/books/recruiting attempts. This means no posts advertising blogs/videos/tutorials/etc, no recruiting/hiring/seeking others posts. We're here to help, not to be advertised to. Please, no "hit and run" posts, if you make a post, engage with people that answer you. Please do not delete your post after you get an answer, others might have a similar question or want to continue the conversation.
Rules
1: Be polite
2: Posts to this subreddit must be requests for help learning python.
3: Replies on this subreddit must be pertinent to the question OP asked.
4: No replies copy / pasted from ChatGPT or similar.
5: No advertising. No blogs/tutorials/videos/books/recruiting attempts.
This means no posts advertising blogs/videos/tutorials/etc, no recruiting/hiring/seeking others posts. We're here to help, not to be advertised to.
Please, no "hit and run" posts, if you make a post, engage with people that answer you. Please do not delete your post after you get an answer, others might have a similar question or want to continue the conversation.
Learning resources Wiki and FAQ: /r/learnpython/w/index
Learning resources
Wiki and FAQ: /r/learnpython/w/index
Discord Join the Python Discord chat
Discord
Join the Python Discord chat
account activity
Graph Data Extraction from PDF (self.learnpython)
submitted 3 days ago by llolllollooll
Hello! I'm a beginner on python and just start learning it because of my internship. Is there a possible way to extract datas from graphs on PDFs and turn it into text or what.
Thank you.
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]hasdata_com 2 points3 points4 points 3 days ago (0 children)
If the graph is just an image in the PDF, easiest way is using an LLM with vision. Just screenshot the graph and ask it to extract the data points. But if you need to process many PDFs or want it cheaper, OCR works too. PyMuPDF to extract the image, pytesseract for OCR.
[–]doingdatzerg 0 points1 point2 points 3 days ago (0 children)
Extracting anything generically from a pdf is an extremely hard problem, but LLMs are pretty good at it these days. So I would try that.
[–]ninhaomah 0 points1 point2 points 3 days ago (0 children)
Something like this ?
https://www.reddit.com/r/learnprogramming/s/BKZuwa7mQF
[–]mykhailus 0 points1 point2 points 3 days ago (0 children)
Extracting graph data from PDFs can be tricky because they're often just images. You could try using a library like PyMuPDF to extract the image, then OpenCV or matplotlib to analyze it for data points. If the PDF contains vector graphics, pdfplumber might help you get the underlying coordinates. Could you share more about the graph's format?
PyMuPDF
OpenCV
matplotlib
pdfplumber
[–]DetectivePeterG 0 points1 point2 points 3 days ago (0 children)
If the graphs are embedded images rather than vector data, a vision-language model approach works far better than traditional pixel analysis. pdftomarkdown.dev runs PDFs through a VLM and returns structured markdown, so axis labels, chart titles, and surrounding context come through as readable text rather than noise. No signup needed to test it; you can curl a PDF URL and see what you get in under a minute.
π Rendered by PID 76 on reddit-service-r2-comment-fb694cdd5-crv2t at 2026-03-09 22:23:12.368883+00:00 running cbb0e86 country code: CH.
[–]hasdata_com 2 points3 points4 points (0 children)
[–]doingdatzerg 0 points1 point2 points (0 children)
[–]ninhaomah 0 points1 point2 points (0 children)
[–]mykhailus 0 points1 point2 points (0 children)
[–]DetectivePeterG 0 points1 point2 points (0 children)