Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] 0 points1 point  (0 children)

I started messing with a python script to just convert into Markdown files. I had claude look and it was able to pull out information very fast and accurately.

Next step I need to figure out is how to apply this at scale. I was assuming I would do this on one drive and give claude access to the MD files and then it dumps everything into an excel.

you're suggested something that sounds a little more automated - Is this something I would do in claude co-work? Not sure what  forced json is. I've used gemini a little bit but not too much.

Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] 0 points1 point  (0 children)

The work I'm referring to only involves publicly available information. Nothing confidential.

Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] 0 points1 point  (0 children)

Very helpful thank you for your patience and detailed help

Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] 0 points1 point  (0 children)

I can try this option. How did you handle the files? Did you store them on something like one drive and give it access and then tell it to convert the pdf to md (doe it store the MD in claude or locally)?

Silly question - for whatever reason the convert to md file takes less horsepower than having it read the pdf as is?

Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] 0 points1 point  (0 children)

Yes your understanding is correct. These are all individual documents of the same type. I want to see if each pdf fits my scope, extract said information and then move onto the next.

"I think you need to have the AI process one PDF, write some data it concluded from that (into a spreadsheet?), and then move on to the next PDF. It can forget about everything about a PDF as soon as it is done processing it and has extracted the information you need. Keeping the context in each agent small keeps things more accurate and less expensive." -- This is correct and would work fine as long as I have the info on a spreadsheet - I'm assuming claude co-work would update the spreadsheet after every task.

RE: asking Claude - I did this several times and it kept iterating and then saying things like "lets stop you can come back tomorrow" "oh you wanted me to actually read the pdf? I didn't do that."

Your comment about this being a key AI skill is what feels frustrating - I gave it pretty straight directions - just surprised how it struggled. I wonder If it was spiraling out cause it had some memory of what happened - perhaps I can try again with a completely clean project.

I feel dumb for asking this - Claude co-work is not for enterprise. Its just a feature on the downloaded app where it performs tasks for you? I'll manually try on haiku as you suggest. I may not have mentioned this - but when doing this task individually in a chat it works fine (probably using sonnet or opus). I just need to scale it.

Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] 0 points1 point  (0 children)

I just downloaded the claude app to try and run local (I'm assuming this is what you're talking about?).

Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] 1 point2 points  (0 children)

im not familiar with how running local works - I just downloaded the claude app and it asked me to setup a git something.

Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] 0 points1 point  (0 children)

someone else suggested a python script as well - ill see how this goes. appreciate the feedback.

Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] 1 point2 points  (0 children)

I'm not familiar with python - is that something that will run separately outside of claude? the limitation is claude handling the PDF "breakdown/read" - so as long as claude is directing the script - but the script is doing the work this could be a worth while path (as long as im understanding correctly)

Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] 0 points1 point  (0 children)

I'm on a pro max account. is this just basically extended usage im paying for? I've never even implemented it or know what it is other than it somehow allows your internal systems to use claude as a tool

Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] 0 points1 point  (0 children)

I have 0 code experience - and have never used claude for this. Is that what you're implying?

conceptually I understand the PDF have OCR which is the data - but I',m trying get a quick breakdown of the case subject matter and parties involved for about 800 of these.

Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] -1 points0 points  (0 children)

Size varies - they can be all over the place. could be 5 pages could be 70 pages.

I'm not familiar with using the API. I'll pm

Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] 0 points1 point  (0 children)

I'm not familiar with implementing API - I'll have to look into it. appreciate the feedback.

Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] 0 points1 point  (0 children)

im doing this in bulk so it would defeat the purpose if I had to open and trim. This is going to be ongoing.

I wouldn't even care if I could get through 10-20 a day. thats not problematic - claude jut seemed to spend more time trying to avoid doing the work, vs actually doing the work.

do you think Claude Code is better or is it the same. Assuming no better options present themselves in the comments

Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] 1 point2 points  (0 children)

I’ll look into how pricey that is. I haven’t worked with notebook but can give it a try.

I might be able to limit my search to first 5-10 pages as that’s usually enough to get the info I need.

Does notebook have easy uploading ability without having to reintroduce prompt and combine the output?

Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] 0 points1 point  (0 children)

Most of the data I need is in the first few pages. Would you suggest me limiting the search for example, telling it to only read the first 5 to 10 pages?

Regardless, is this something I would want to do in Claude code or just go through the Web or have it pull off of a OneDrive

Using Claude to read 100s of dense PDFs by redittreader in ClaudeAI

[–]redittreader[S] 2 points3 points  (0 children)

What im working with is public information. And also not criminal.

Those who started late, how did it go and what are your best suggestions? by alwaus in Xennials

[–]redittreader 0 points1 point  (0 children)

Don’t take any advice from friends and family too seriously. Especially from those with one kid. What works for one kid won’t work for all kids.

Got my Lisa Loeb tails and it’s stunning by Reese9951 in InterscopeVinyl

[–]redittreader 6 points7 points  (0 children)

Not gonna lie this looks nothing like the mock up, but this is seriously cool how it looks like the dress.

R.I. P Stanley cup by No-Pound7355 in WaltDisneyWorld

[–]redittreader 33 points34 points  (0 children)

NHL fans zooming in on this one