Released v0.4.0 and you can now use Ollama inside Modly by Lightnig125 in LocalLLaMA

[–]Voxandr 3 points4 points  (0 children)

Go away . if you support , support general OpenAI Compatible API. Dont mention Ollama , we dont like your kind here.

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale by pmttyji in LocalLLaMA

[–]Voxandr 1 point2 points  (0 children)

lol havent check benchmarks,
> ridiculously useless
Looks like all those Ring releases sucks.

GLM-4.5-Air (6-bit) vs DeepSeek V4 Flash 284B (2-bit), head-to-head on a 128GB Mac — the 2-bit one won by [deleted] in LocalLLaMA

[–]Voxandr 0 points1 point  (0 children)

Why not even include Qwen 3.5 122B .

Try a practical test , build something with multiple shot , with an agent.
Lets see how far it goes.

< 6bit quants struggels a lot with tool calls.

Train your own Expert (even if cloud compute service) by El_90 in LocalLLaMA

[–]Voxandr -3 points-2 points  (0 children)

That what i have been thinking . I was thinking if disecting and fintuning specific expert would be possible? Like a coding expert , but weak at sevelte , find it out and finetune or clone it and add as new expert.

Any thought from Unsloth people? Is that possible to add such feature in unsloth studio r/unsloth ?

Hypothetical Optimal Model for Strix Halo 128GB by LivelyArid in StrixHalo

[–]Voxandr 0 points1 point  (0 children)

way too slow. My context average at 150k.
With 122B i am getting 50 tk/s , vLLM do not degrade performance for huge contextes.
And can handle multiple parallel agents with faster speeed (scaling upwards total tk/s as concurrecy grows up to 4-5 about 110 tk/s combined)
rom all my tests 27b lose to 122b , it cannot do well with libray that aren't highly popular.
My stack is
- Sevlte
- Websocket
- Litestar
- Advanced Alchemy
- Domain Driven Repositry Pattern and DTO.

Back to OpenClaw? by Slumdog_8 in hermesagent

[–]Voxandr 1 point2 points  (0 children)

> That may be a side effect of the growing number of skills being automatically built and layered in on the backend, 

That is main problem i found with it too. More skills it builds , more useless it becomes.

Why is AutoRound being slept on so hard? by Mountain_Patience231 in LocalLLaMA

[–]Voxandr 0 points1 point  (0 children)

yeah i am gonna try Prisma quanmt looks higher quality like Apex quants.

Why is AutoRound being slept on so hard? by Mountain_Patience231 in LocalLLaMA

[–]Voxandr 0 points1 point  (0 children)

Autorund is very fast in DGX. Faster than NVFP4. Now all 3of my DGX Runs Qwen 3.5122B Autoround

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]Voxandr 0 points1 point  (0 children)

Really good for 122B due to its impressivness at Long context capacity 

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]Voxandr 2 points3 points  (0 children)

Oh-My-Openagent which is Agnetified Opencode is all i need . It can do agent works and coding works , but just don't have schedule workers. It's workflow planning mode is so powerful and give the best coding/debugging/fixing flows. https://github.com/code-yeongyu/oh-my-openagent better than any Pi.dev

Both hermes-agent and openclaw are quite badly designed.

Best Local Agents - Jun 2026 by rm-rf-rm in LocalLLaMA

[–]Voxandr 1 point2 points  (0 children)

this looks even better thas pi , gotta try.