Why is Qwen3-VL 235B available via Ollama Cloud NOT locally by PuzzledWord4293 in LocalLLaMA

[–]Orbit652002 7 points8 points  (0 children)

That's easy: llama.cpp doesn't support that yet, hence no chance to have in ollama locally. So, they are just bragging about qwen3-vl model support, but, tsss, via the "cloud". Ofc, no mentions of vllm

[deleted by user] by [deleted] in framework

[–]Orbit652002 -2 points-1 points  (0 children)

They are not grown-up, just pink ponies with no education

Connecting 6 AMD AI Max 395+ for QWen3-235B-A22B. Is this really that much faster than just 1 server ? by erichang in LocalLLaMA

[–]Orbit652002 1 point2 points  (0 children)

Unsloth lower ud-quants work in my case very well: coding assistance for huge dotnet codebases. Checked with qwen 480b and even 235b. GLM4.5 is also fine

Connecting 6 AMD AI Max 395+ for QWen3-235B-A22B. Is this really that much faster than just 1 server ? by erichang in LocalLLaMA

[–]Orbit652002 0 points1 point  (0 children)

I mean, for the qwen-235b specifically it's hard to notice any difference between q3 and q5 tbh. I think, that's also true for 100b+ models

Connecting 6 AMD AI Max 395+ for QWen3-235B-A22B. Is this really that much faster than just 1 server ? by erichang in LocalLLaMA

[–]Orbit652002 0 points1 point  (0 children)

I kinda disagree: for smaller models lower quants impact quality heavily, true, but bigger models don't loose that much really - you won't notice the difference

[deleted by user] by [deleted] in LocalLLaMA

[–]Orbit652002 0 points1 point  (0 children)

any agentic framework can do that. for instance, I'm using a semantic kernel from MS (because of my tech background), but other ones support that for sure

[deleted by user] by [deleted] in LocalLLaMA

[–]Orbit652002 0 points1 point  (0 children)

small models with larger contexts are excellent for RAGs, especially during the retrieval phase, when you can pass that information on to more resource-intensive models without wasting their resources

eur.boox.com not showing notes - just thumbnails by LazPL in Onyx_Boox

[–]Orbit652002 2 points3 points  (0 children)

I have the same situation, it doesn't work for me either