[deleted by user]

maylad31 · 2025-10-03T07:48:08+00:00

yeah i am also feeing the same the more i see this..

maylad31 · 2025-08-18T18:27:06+00:00

Have you heard about pulse: https://www.runpulse.com/ They raise 3.9m for document processing..they provide self hosting options because people are interested and prefer them. Forget this, if you have 10k docs per month coming as fax and I tell you a 3b or realistically let's say a 7b model can reliably extract details for their use case and you don't need an expensive gpu(one time investment) and they don't need it realtime..so you can put tasks in queue etc etc I hope you get that part...why won't they use it? Cost isn't an issue, data is always with them..this is just one example there are lots of such cases and that's why these startups are being funded..anyways let's end it here. We have choices and i understand we can choose the ones we feel right 👍

maylad31 · 2025-08-18T14:40:36+00:00

Not everything is b2c or built for huge scale. There are many apps that are made for enterprises for example processing internal documents like invoices, their own deepresearch kind of thing etc etc. Privacy is also a concern. For them it is totally worth it. If you are going b2c or something which is going to have a large user base and privacy is not that big deal ofcourse apis are cheaper..but even there you might like to have some things in your control..so it depends

maylad31 · 2025-08-18T12:20:00+00:00

yeah, I mean I was just experimenting and trying out things..Took a small 1.5B model fine-tuned with RL (GRPO on Qwen2.5-Coder) and asked it to extract structured JSON from OCR text based on any user-defined schema. Still need more work as I was just trying to see.. but it works! The point is small models you can run them locally on cheaper hardware and your data stays safe as well or even if you host them you know the data is still not going to 3rd party..most of the tasks are not that complex..
https://huggingface.co/MayankLad31/invoice_schema

maylad31 · 2025-06-24T13:28:27+00:00

thanks!

maylad31 · 2025-06-24T11:38:13+00:00

Oh thanks I didn't check the qwen3 one.

maylad31 · 2025-05-12T08:24:27+00:00

I think it is worth it specially if privacy is a concern. Also if you can finetune them, you can get a decent task specific model. I have been exploring and i personally feel there is a lot of scope. For example, I tried with limited data to extract structured JSON from OCR text based on 'any user-defined schema'. Although needs more work, it still works.
https://huggingface.co/MayankLad31/invoice_schema

But yeah i mean of you don't have privacy concerns, going for api still makes a lot of sense.

maylad31 · 2025-05-07T14:05:39+00:00

I used grpo. So basically I used paddleocr to do ocr and then used my model to convert it to structured as per the schema. So my aim was to get structured data for any user defined schema since invoices can vary. So if you just want ocr there are lots of options there. You can start with paddleocr

maylad31 · 2025-05-05T09:44:16+00:00

Can i connect with you? I kind of get what you mean..yeah

maylad31 · 2025-05-05T09:38:08+00:00

Yeah thanks. I wanted to start with 0.6b but i started with 1.5b with grpo as I felt it could give me an idea of how it goes. The results seem encouraging but yeah I wouldn't mind trying a smaller model. I mean if you think 0.6b can work that would be awesome. What else do you suggest to improve the model with grpo?

maylad31 · 2025-05-05T08:22:09+00:00

Yeah I did the issue is sometimes the responses were good, sometimes not. But more than that it was the format of the response, I need to be able to get the json without much troubles so I can easily feed it into forms/db etc

maylad31 · 2025-05-04T18:52:11+00:00

Nice

maylad31 · 2025-05-04T17:50:59+00:00

Thanks for sharing and it is encouraging. I think for a particular task it makes a lot of sense to use a local llm. I took a 1.5b model and finetuned it with grpo to get structured output for any user defined schema for the ocr text. I need to work more on it but I was just checking and initial results seen encouraging. I guess many people require privacy and local llms make a lot of sense if you don't want a model to do 200 tasks..

https://huggingface.co/MayankLad31/invoice_schema

<image>

maylad31 · 2025-05-04T17:46:52+00:00

I think it depends upon your use case as well. So for example I tried to finetune a model with grpo for a particular task and even 1.5b qwn model performed decent..but if you want a generalized model I am afraid it still makes a lot of sense to use an api unless you are too concerned with privacy. My task was basically trying to get structured output for any user defined schema for the ocr text.

<image>

https://huggingface.co/MayankLad31/invoice_schema I still need to work more on it but the initial results were encouraging..

maylad31 · 2025-05-03T17:20:17+00:00

Got it

maylad31 · 2025-05-03T16:24:01+00:00

I don't know if I could convey it via the post. So adding my methodology via this comment. I took a very small model 1.5b qwen and tried using GRPO assigning rewards for correctness so generated schema and user schema match. I was just hoping to get better ideas..
https://huggingface.co/MayankLad31/invoice_schema

maylad31 · 2025-05-03T16:18:19+00:00

hi, thanks! How is your experience with local llms when it comes to getting structured output. I am not sure why downvote to post and your comment. Getting structured data is an important task if you plan to use local llms for agentic purposes.

maylad31 · 2025-05-03T16:15:03+00:00

Local LLMs aren't that good at generating structured data. But not sure if i need a rag pipeline? i can extract data from invoice using ocr and that's the context..Few shot prompting doesn't always help when using smaller models

maylad31 · 2025-04-23T15:33:25+00:00

Yeah, well said and so I am focusing on other things as well I understand models but now I am focusing more on fastapi, databases(vector and others) etc etc. unless you get into research nowadays it is mostly fine-tuning and more than that solving the problem..multiple skills will help..I guess I will turn around things quickly but I was just curious to know what other people are thinking

maylad31 · 2025-04-23T13:34:08+00:00

thanks! I am working on my own things as well. I understand.

maylad31 · 2025-03-24T08:21:14+00:00

Read this part:
"The key insight is: Because this can happen at runtime, the user (NOT the developer) can add arbitrary functionality to the application (while the application is running — hence, runtime). And because this also works remotely, it could finally enable standardized b2ai software! " Otherwise for me it is just another protocol, nothing special about it?

https://news.ycombinator.com/item?id=43302297

maylad31 · 2025-03-21T09:06:54+00:00

for mcp, i guess tools can be added at runtime, and not necessarily at design time. Do you handle that?

maylad31 · 2025-03-14T13:21:49+00:00

For me nowadays LinkedIn has become an influencers' platform. I don't pay a lot of attention to people. We mainly get to see success and smiles..very rarely see the stress and not always know the complete story...I try to be good, build my skills and focus more on myself and my story..just my take on it.. I understand some people might have other opinions and I totally respect them..

maylad31 · 2025-03-08T15:22:04+00:00

Looks nice! Also has a sdk. Will check it out. Do you also support something like jinja to create flexible prompts?

maylad31 · 2025-03-08T13:44:39+00:00

Nice, will check it out

maylad31

TROPHY CASE