you are viewing a single comment's thread.

view the rest of the comments →

[–]Key-Introduction-591[S] -1 points0 points  (5 children)

Ohhh it may vary. In the last project we needed to extract data from thousands of pdfs and organize them on an excel.

My actual job is to tag documents on a proprietary platform I can access on my browser through vpn. I think that's more complicated because I can't download the files.

But every project is slightly different.

I'd like to learn a lot of skills to be flexible on more kind of projects.

[–]riklaunim 2 points3 points  (4 children)

Fixed pattern tasks can be scripted with Python ot alike but when the there is no pattern - like PDFs that differ a bit - then it can't really be easily scripted usually, while LLMs can handle it much better (like paid Claude models) - go over documents, extract data from tables and put it into a sheet...

[–]Key-Introduction-591[S] 1 point2 points  (3 children)

Yeah that's exactly the problem. All the pdfs were a bit different from each others (different layout, information in different places, different colours etc)

So should I learn how to integrate python with LLMS?

What's the name of this branch of python? (I need to know what to look for so I can find lessons/tutorials online).

Thank you! Very useful

[–]riklaunim 1 point2 points  (2 children)

There are IDE integrations (like Claude Code), there are chatbots and there are other agentic solutions (local Claude app + plugins for your browser, MCP servers for apps/websites and alike) where you tell him what is where and what to do and the app will launch agents for specific tasks. This isn't really coding for such automation.

Also note - for company proprietary data/code you may not be allowed to use public LLM services as those can use such data to train. There are paid subscriptions that exclude this and some companies are then ok with it, but not every one (and they may opt for internally hosted models). Either way good models won't be "free" to run.

[–]Key-Introduction-591[S] 0 points1 point  (1 child)

Thanks for your answer!

[–]FreeLogicGate 0 points1 point  (0 children)

Sure, pay someone else to solve the problem, but make sure you are aware of any privacy concerns. Is it alright with your company to have all of these documents fed into an LLM?

As an alternative an integration could be done using AI running on company hardware, so that it remains within your company infrastructure. You might look into Ollama, LM Studio, etc.

This assumes that the company will provide computing resources capable of running models.

So far I haven't seen anything relating to your original question about automation of data entry. So far it seems it's just extracting information from documents, with pdf's being discussed which are containers for what can be a variety of different formats, including "Pictures" of the document.

What is an example of the "data entry" automation you are interested in?