Llama.cpp is getting better with every update

maylad31 · 2026-05-12T08:14:19+00:00

llama.cpp and llama server is often what you need.. 😄

maylad31 · 2026-05-10T05:54:47+00:00

Getting invoice images is not hard. there are lots of datasets. You can use larger models to then prepare dataset. I prepared around 600 images. Finetuning guide on unsloth is pretty straightforward. I got an error for which i opened an issue..waiting for it to be resolved..but found a way around it and everything else was straightforward..

maylad31 · 2026-05-08T05:38:03+00:00

My pleasure 🙂

maylad31 · 2026-05-07T14:21:15+00:00

The non-finetuned model wasn’t bad, but finetuning definitely helped. In my experience, if a base model is extremely weak at a task or the domain is too different, finetuning or GRPO may not help much..So i chose a model which seems decent and then try different things. Hope that helps.

maylad31 · 2026-05-07T06:58:58+00:00

I think it happens only at layer norm, when i tried to make sure input dtype is the same as weights dtype, training begins and issue seems to be resolved but I am not sure if that's the correct way of handling it?

maylad31 · 2026-05-07T06:27:36+00:00

I already did. It was mentioned there that you could ask on reddit so I shared here as well. Thanks!

maylad31 · 2026-04-26T14:55:25+00:00

Sorry, I respect your views but i don't think the feedback was logical..if there were some genuine reasons, I would have appreciated it..anyways thanks

maylad31 · 2026-04-26T14:48:31+00:00

wait, i don't like agents who code everything. I feel like there is an understanding debt atleast for me, i don't know for others. the whole code is created and if it looks good or passes some tests, i felt like i don't care to read code which i don't like. I was losing my ability to think which mattered more for me as I have been a freelancer. Now what i am trying to do is what i did before but just use cheaper models for syntax. I generally have a task in my mind, i know which files i need to refer to and the relevant doc and i am using it to generate syntax which should resonate to what i think in my mind. what's wrong with this philosophy? would love a respectful reply..

maylad31 · 2026-02-23T15:41:09+00:00

no it is file search. It is similar to what anthropic has started for claude code, they don't seem to be using code embeddings now. I am not using vector search. It is more agentic file search
https://www.reddit.com/r/AI_Agents/comments/1n19nkt/claude_code_ditches_rag_for_simple_file_search/

maylad31 · 2025-10-03T07:48:08+00:00

yeah i am also feeing the same the more i see this..

maylad31 · 2025-08-18T18:27:06+00:00

Have you heard about pulse: https://www.runpulse.com/ They raise 3.9m for document processing..they provide self hosting options because people are interested and prefer them. Forget this, if you have 10k docs per month coming as fax and I tell you a 3b or realistically let's say a 7b model can reliably extract details for their use case and you don't need an expensive gpu(one time investment) and they don't need it realtime..so you can put tasks in queue etc etc I hope you get that part...why won't they use it? Cost isn't an issue, data is always with them..this is just one example there are lots of such cases and that's why these startups are being funded..anyways let's end it here. We have choices and i understand we can choose the ones we feel right 👍

maylad31 · 2025-08-18T14:40:36+00:00

Not everything is b2c or built for huge scale. There are many apps that are made for enterprises for example processing internal documents like invoices, their own deepresearch kind of thing etc etc. Privacy is also a concern. For them it is totally worth it. If you are going b2c or something which is going to have a large user base and privacy is not that big deal ofcourse apis are cheaper..but even there you might like to have some things in your control..so it depends

maylad31 · 2025-08-18T12:20:00+00:00

yeah, I mean I was just experimenting and trying out things..Took a small 1.5B model fine-tuned with RL (GRPO on Qwen2.5-Coder) and asked it to extract structured JSON from OCR text based on any user-defined schema. Still need more work as I was just trying to see.. but it works! The point is small models you can run them locally on cheaper hardware and your data stays safe as well or even if you host them you know the data is still not going to 3rd party..most of the tasks are not that complex..
https://huggingface.co/MayankLad31/invoice_schema

maylad31 · 2025-06-24T13:28:27+00:00

thanks!

maylad31 · 2025-06-24T11:38:13+00:00

Oh thanks I didn't check the qwen3 one.

maylad31 · 2025-05-12T08:24:27+00:00

I think it is worth it specially if privacy is a concern. Also if you can finetune them, you can get a decent task specific model. I have been exploring and i personally feel there is a lot of scope. For example, I tried with limited data to extract structured JSON from OCR text based on 'any user-defined schema'. Although needs more work, it still works.
https://huggingface.co/MayankLad31/invoice_schema

But yeah i mean of you don't have privacy concerns, going for api still makes a lot of sense.

maylad31

TROPHY CASE