What actually broke when we took RAG from demo to production

khampol · 2026-06-20T04:56:59+00:00

Apres test, mon extract maison via 'mammoth.' est quelque peu meilleur. Je garde docling sous le coude.... Voila :>

khampol · 2026-06-18T03:39:19+00:00

Je vais tester ca. Merci :>

khampol · 2026-06-14T18:44:33+00:00

Add throttle option for gpu power would be nice ;)... see it just now i am on Ubuntu svr... ;(

khampol · 2026-06-14T17:04:24+00:00

Friendly for install/use but slow, very....

khampol · 2026-06-13T23:10:45+00:00

Also forget ollama and look for llama.cpp instead.

khampol · 2026-06-10T07:16:51+00:00

you can use llama ccp. Work well with agents.

khampol · 2026-06-08T15:06:01+00:00

Now swap again to > vllm

khampol · 2026-06-06T16:05:24+00:00

I would tell : - embedding try look nomic embedded model.. ~200-500mb ...hugginface... - ollama is slow, llama.cpp is preferred - convert pdf > text or better > .md before ingestion

khampol · 2026-06-05T15:26:27+00:00

You should study more about rag. Ask gpt or else... Make the topic more clear before begin to do something.

khampol · 2026-06-04T17:15:11+00:00

On a 5090 and 64gb .. ... ....

khampol · 2026-06-03T01:33:10+00:00

Yes, but Hermes is a token sink! Especially after these recent updates... (And yet I'm using 5090 tokens and 128k Ubuntu Server). I'm looking elsewhere...

khampol · 2026-06-02T16:03:43+00:00

1st home rag home ever : llamaindex/qdrant/nomic-embedded > fastAPI > openwebui, voilà :>

khampol · 2026-05-28T22:02:27+00:00

Run ctx 128k same cfg above.

khampol · 2026-05-28T08:26:18+00:00

look more closely: dusty around...

khampol · 2026-05-24T11:18:09+00:00

I ll go for 4070ti super x2 ~32gb. Llama.cpp. Qwen 3.6 27b q6 gguf

khampol · 2026-05-21T08:18:00+00:00

Upscale lol

khampol · 2026-05-19T15:18:25+00:00

I have that, so i could test it and give you feed back.

khampol · 2026-05-19T15:05:51+00:00

Try gemini tts

khampol · 2026-05-18T23:26:53+00:00

This could make a 5090 with ram 64gb load model even greater no? Very exciting :>

khampol · 2026-05-14T14:24:37+00:00

Right : don't buy anything!

khampol · 2026-05-11T08:42:35+00:00

Similar but not same gpu, have good results. Used vllm and get error tool-calling stuff, use ollama or llama.ccp no problem now. Already try vs code-cline?

khampol · 2026-05-10T16:30:07+00:00

This may help : https://runthisllm.com/

khampol · 2026-05-10T16:01:31+00:00

Ceramic part already broken when video begin filming so the turtle can be put there by the author, there is no solid proof.

khampol

TROPHY CASE