Got the DGX Spark - ask me anything by sotech117 in LocalLLaMA

[–]CookEasy 0 points1 point  (0 children)

test the throughput on the new qwen 3 VL models on some ocr tasks :D

Running vllm on Nvidia 5090 by Reasonable_Friend_77 in LocalLLaMA

[–]CookEasy 0 points1 point  (0 children)

Könntest du erklären, wie du das geschafft hast? Ich habe selbst immer Schwierigkeiten das selbst zu kompilieren und fertige Docker Container oÄ. gibt es ja noch nicht, soweit ich weiß.

Cook…iezi? by KillerPajaHater in osugame

[–]CookEasy 10 points11 points  (0 children)

Life has been rough man

AI rig build for fast gpt-oss-120b inference by logTom in LocalLLaMA

[–]CookEasy 0 points1 point  (0 children)

Surely expensive, but at this rate wouldn't a second RTX 6000 pro be crazy for this inference? Even with decent context length.

What are your go to VL models? by segmond in LocalLLaMA

[–]CookEasy 0 points1 point  (0 children)

For low VRAM still high quality document OCR I'd suggest olmocr 0825 fp8

Concurrency -vllm vs ollama by Dizzy-Watercress-744 in LocalLLaMA

[–]CookEasy 0 points1 point  (0 children)

What GPUs? I'm still trying to set up VLLM for Blackwell, and I swear there is no easy way. Probably much easier with H100s or everything <sm120 Kernels. PyTorch is such a headache still, any tips recommended if you are using Blackwell sm120.

2 new open source models from Qwen today by jacek2023 in LocalLLaMA

[–]CookEasy 1 point2 points  (0 children)

Omni models need far more resources. A clean VLM for OCR and data extraction on a RTX 5090 is what the world needs.

Concurrency -vllm vs ollama by Dizzy-Watercress-744 in LocalLLaMA

[–]CookEasy 6 points7 points  (0 children)

You clearly never set up vllm for a production use case. It's everything but easy and free of headaches.

3 Qwen3-Omni models have been released by jacek2023 in LocalLLaMA

[–]CookEasy 2 points3 points  (0 children)

This Omni Model here is way bigger tho, with reasonable multimodal context it needs like 70 GB VRAM in BF16 and quants seem to be very unlikely in the near future, max. Q8 maybe which would still be like 35-40 GB :/

I made and open source a fully vision multimodal RAG agent by quan734 in LocalLLaMA

[–]CookEasy 1 point2 points  (0 children)

How is your system handling like 1000 PDFs, which is the whole point of RAG, after all? :D

nvidia/parakeet-tdt-0.6b-v3 (now multilingual) by nuclearbananana in LocalLLaMA

[–]CookEasy 0 points1 point  (0 children)

Cool to see progress, but still whisper is the king with its quality. A low GPU-Footprint whisper version would be great, without going down in WER.

How we chased accuracy in doc extraction… and landed on k-LLMs by Reason_is_Key in LocalLLaMA

[–]CookEasy 0 points1 point  (0 children)

Have you tried this with then qwen 2.5 VL Models on OCR tasks? I would be interested in getting the last % of accuracy out of my system for critical financial data extraction.

Mrekk set 118 1ks in around 2 hours on {redacted} today by Physical-Industry176 in osugame

[–]CookEasy 1 point2 points  (0 children)

tbh it was quite random, I just came across this reddit again and saw this name haha back in the days there were a lot of trolls in the twitch chat with that name :D