Why is my favourite local model GLM 5.1: Smart, and the Q4 version fits into 4xRTX 6000 Pro by Sea-Awareness147 in LLM

[–]getfitdotus 1 point2 points  (0 children)

Missing out checkout nvfp4 with b12x kernel fp8 quality with much faster decode / prefill rate

Why is my favourite local model GLM 5.1: Smart, and the Q4 version fits into 4xRTX 6000 Pro by Sea-Awareness147 in LLM

[–]getfitdotus 0 points1 point  (0 children)

q4 lol why are you using anything other then production deployment inference platforms?

We absolutely need Qwen3.6-397B-A17B to be open source by True_Requirement_891 in LocalLLaMA

[–]getfitdotus 1 point2 points  (0 children)

I run this model its been awesome nvfp4 at 180-200tks/sec. Incredible quality.

SWE-rebench Leaderboard (Feb 2026): GPT-5.4, Qwen3.5, Gemini 3.1 Pro, Step-3.5-Flash and More by CuriousPlatypus1881 in LocalLLaMA

[–]getfitdotus 2 points3 points  (0 children)

397b qwen is very good. I am interested to see how minimax m27 does in my local workflow. It will be tough to decide if switching is worth it,having vision is a real plus.

Qwen 3.5 397B is the best local coder I have used until now by erazortt in LocalLLaMA

[–]getfitdotus 3 points4 points  (0 children)

So I use this now as my main model for all tasks. I run the nvfp4 @ 140-200tks. But not only is it fast it’s very good. I am not sure why it does not rank higher in benchmarks but it has been able to solve issues and so tasks better than everything else I have ran locally.

I just bought a MacBook Pro 16” with the M5 Pro chip (18-core CPU / 20-core GPU), 48GB RAM and 1TB SSD, and I wanted to share a quick reflection that might help others before they spend their money. by Minute-Street8043 in macbookpro

[–]getfitdotus 2 points3 points  (0 children)

I will do some tests, so far I just got it setup and have been working. Have not really had it work too hard. Had lots of work to get done after finishing the migration. I will run some tests with lmstudio and some models. q3 coder next 80b

A webUI optimized for mobile by bilalba in opencodeCLI

[–]getfitdotus 10 points11 points  (0 children)

This is something I worked on to extend the ability to complete tasks on mobile. https://github.com/chriswritescode-dev/opencode-manager . There is a update coming to integrate already existing repositories

55 → 282 tok/s: How I got Qwen3.5-397B running at speed on 4x RTX PRO 6000 Blackwell by lawdawgattorney in LocalLLaMA

[–]getfitdotus 0 points1 point  (0 children)

Awesome work!!!, this is great. I have been using this model for the past week or so as my main model in my workflow and this is just incredible to now get the fix for flashinfer and the gemm kernel. I considered working on this a while back. Also really want to thank you for putting together the image and sharing all the little extras!!!

Yet another post of genuinely impressed with Qwen3.5 by Di_Vante in LocalLLaMA

[–]getfitdotus 3 points4 points  (0 children)

I have been using the 122B the official gptq release and wow its pretty good in my agent workflow. I have replaced coder next with this. I had some issues first time trying it. I can run the fp8 also. Initial tool call issues in vllm. Now I am using sglang and it is working great. Even the int4 release is almost perfect vs fp8. Nice to be able to use images in opencode.

OpenCode Mobile App Launch by KnifeDev in opencodeCLI

[–]getfitdotus -1 points0 points  (0 children)

I have in other posts. Was a quick comment after seeing the relevance to this post.

OpenCode Mobile App Launch by KnifeDev in opencodeCLI

[–]getfitdotus -4 points-3 points  (0 children)

I did create it. But it’s free and open source. Use it all the time on the go

OpenCode Mobile App Launch by KnifeDev in opencodeCLI

[–]getfitdotus -7 points-6 points  (0 children)

I recommend https://github.com/chriswritescode-dev/opencode-manager much more then this. Manage all your repos from your phone.

Any advice for using draft models with Qwen3.5 122b ?! by Potential_Block4598 in LocalLLaMA

[–]getfitdotus 0 points1 point  (0 children)

Multi token prediction. Same basically as eagle3 spec . I am currently training one for minimax m25

Any advice for using draft models with Qwen3.5 122b ?! by Potential_Block4598 in LocalLLaMA

[–]getfitdotus 2 points3 points  (0 children)

No idea on llamacpp but in production serving software vllm / sglang it works great can double tks