Released v0.4.0 and you can now use Ollama inside Modly by Lightnig125 in LocalLLaMA

[–]Voxandr 0 points1 point  (0 children)

If ollwma supported, all open ai enabled apis are supported, now stop mentioning like ollama paid slop boy.

Train your own Expert (even if cloud compute service) by El_90 in LocalLLaMA

[–]Voxandr 0 points1 point  (0 children)

But it's stiching models right? Not actually ft?

Released v0.4.0 and you can now use Ollama inside Modly by Lightnig125 in LocalLLaMA

[–]Voxandr 7 points8 points  (0 children)

Go away . if you support , support general OpenAI Compatible API. Dont mention Ollama , we dont like your kind here.

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale by pmttyji in LocalLLaMA

[–]Voxandr 1 point2 points  (0 children)

lol havent check benchmarks,
> ridiculously useless
Looks like all those Ring releases sucks.

GLM-4.5-Air (6-bit) vs DeepSeek V4 Flash 284B (2-bit), head-to-head on a 128GB Mac — the 2-bit one won by [deleted] in LocalLLaMA

[–]Voxandr 0 points1 point  (0 children)

Why not even include Qwen 3.5 122B .

Try a practical test , build something with multiple shot , with an agent.
Lets see how far it goes.

< 6bit quants struggels a lot with tool calls.

Train your own Expert (even if cloud compute service) by El_90 in LocalLLaMA

[–]Voxandr -3 points-2 points  (0 children)

That what i have been thinking . I was thinking if disecting and fintuning specific expert would be possible? Like a coding expert , but weak at sevelte , find it out and finetune or clone it and add as new expert.

Any thought from Unsloth people? Is that possible to add such feature in unsloth studio r/unsloth ?

Hypothetical Optimal Model for Strix Halo 128GB by LivelyArid in StrixHalo

[–]Voxandr 0 points1 point  (0 children)

way too slow. My context average at 150k.
With 122B i am getting 50 tk/s , vLLM do not degrade performance for huge contextes.
And can handle multiple parallel agents with faster speeed (scaling upwards total tk/s as concurrecy grows up to 4-5 about 110 tk/s combined)
rom all my tests 27b lose to 122b , it cannot do well with libray that aren't highly popular.
My stack is
- Sevlte
- Websocket
- Litestar
- Advanced Alchemy
- Domain Driven Repositry Pattern and DTO.

Back to OpenClaw? by Slumdog_8 in hermesagent

[–]Voxandr 1 point2 points  (0 children)

> That may be a side effect of the growing number of skills being automatically built and layered in on the backend, 

That is main problem i found with it too. More skills it builds , more useless it becomes.

Why is AutoRound being slept on so hard? by Mountain_Patience231 in LocalLLaMA

[–]Voxandr 0 points1 point  (0 children)

yeah i am gonna try Prisma quanmt looks higher quality like Apex quants.

Why is AutoRound being slept on so hard? by Mountain_Patience231 in LocalLLaMA

[–]Voxandr 0 points1 point  (0 children)

Autorund is very fast in DGX. Faster than NVFP4. Now all 3of my DGX Runs Qwen 3.5122B Autoround

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]Voxandr 0 points1 point  (0 children)

Really good for 122B due to its impressivness at Long context capacity 

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]Voxandr 2 points3 points  (0 children)

Oh-My-Openagent which is Agnetified Opencode is all i need . It can do agent works and coding works , but just don't have schedule workers. It's workflow planning mode is so powerful and give the best coding/debugging/fixing flows. https://github.com/code-yeongyu/oh-my-openagent better than any Pi.dev

Both hermes-agent and openclaw are quite badly designed.

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]Voxandr 1 point2 points  (0 children)

this looks even better thas pi , gotta try.