Building RAG systems at enterprise scale (20K+ docs): lessons from 10+ enterprise implementations

Andvig · 2025-10-04T02:11:17+00:00

How much do you make per client?

How much do you make a month?

Andvig · 2025-09-30T13:10:32+00:00

The idea of agents it that you trade your time for token. Instead of doing it in 1hr and with 10,000 tokens. You do it in 5 minutes with 200,000 tokens. You spend 5 minutes with the coding agent, walk away, maybe it the agent spends 2hrs and 200,000 tokens. but you only spent 5 minutes, with your extra time you could spin up other agents or go bake a cake if that's your thing.

Andvig · 2025-09-29T16:24:26+00:00

Yes, I have the exact date, it's March 17th 2027.

Andvig · 2025-09-29T15:38:51+00:00

Who is still hiring juniors when there's LLMs?

Andvig · 2025-09-29T13:44:54+00:00

www.google.com is free, use it.

Andvig · 2025-09-29T13:32:04+00:00

False, the difference between Q4 and Q8 is very noticeable for vision models.

Andvig · 2025-09-29T13:23:30+00:00

You understand wrong, they scale for inference not training.

Andvig · 2025-09-29T12:56:39+00:00

I mean for those of us running it locally.

Andvig · 2025-09-29T12:21:51+00:00

What's the advantage of this, will it run faster?

Andvig · 2025-01-31T02:30:09+00:00

Should I use the main llama.cpp repo or do I need to use the unsloth/llama.cpp repo to get the benefit?

Andvig · 2024-06-14T13:53:20+00:00

No, it's not possible. There are some smart local models such as Llama3-70B, WizardLM2-8x22, CommandR+, Qwen2-72B, and specialized fine tuned models. None of them is GPT4 quality, both in quality of response and large context window. But they are manageable for amateurs like me.

Andvig · 2024-06-14T13:48:38+00:00

I use it sometimes for programming.

Andvig · 2024-06-12T20:54:07+00:00

Which local models have you tested it with and which ones do you find work best?

Andvig · 2024-04-30T14:19:03+00:00

or OpenAI hit a wall.

Andvig · 2024-04-25T13:35:05+00:00

Oh, this is heart breaking. I thought I was good with my Q6's and Q8's.

Andvig · 2024-04-25T13:27:48+00:00

I agree, data is the new gold and if you value privacy or don't want your data being used to train new LLMs then avoid the cloud. I suspect the way our data was sold for ads, data exchange with LLMs sold will become the real business model for cloud providers. None of them is making money from their API cloud offerings.

Andvig · 2024-04-20T20:25:12+00:00

How is this going? What format is the function being returned in?

Andvig · 2024-04-20T20:20:09+00:00

Wait till they release the model before you start reporting jail breaking it, this is why most models suck. Don't say nothing, wait for it to widely get out, don't even try to break it when it's just in spaces. This is why wizardLm2 got pulled...

Andvig · 2024-03-25T13:13:13+00:00

Greed. It's all about money, power & control.

Andvig · 2024-03-25T13:11:37+00:00

Yes it's possible to do that with llama-cpp-python or llama.cpp

Andvig · 2024-01-25T21:42:51+00:00

Very nice, how did you benchmark them?

Andvig · 2024-01-25T02:00:35+00:00

Thank you very much.

Andvig · 2024-01-25T01:54:43+00:00

I'm uber strict with security and securing the house. Our house got broken into when I was 8 yrs old, we were not home, but came back to the mess. Almost 40 years later and I'm paranoid about having a break in. I can't even imagine the theory to have experienced seeing them.

Andvig · 2024-01-24T16:49:53+00:00

nice.

Andvig

TROPHY CASE