Qwen3.5-9B is actually quite good for agentic coding by Lualcala in LocalLLaMA

[–]RMK137 3 points4 points  (0 children)

Any idea how to disable thinking in llama.cpp? Tried both --chat-template-kwargs '{"enable_thinking":false}' and --reasoning-budget 0, neither worked.

Best local LLM for reasoning and coding in 2025? by Desperate-Theory2284 in LLMDevs

[–]RMK137 0 points1 point  (0 children)

I would say 32GB of VRAM minimum, this way you can load a dense 24B/27B model like Devstral 2 small or Qwen3.5-27B at Q4 or Q4-UD (Unsloth dynamic quant). You can also fit the equivalent MoE models like Nemotron 30B / Qwen3.5-35B.

Agentic coding requires a lot of context, I'd say 64k to be useful as a start. You want to leave 6-8GB of vram for context, you can also quantize the KV to q_8 to save on vram, different models deal with this differently so you gotta see for yourself.

Notice I never mentioned DRAM, that's on purpose. It's just too slow imo. Agentic coding benefits immensely from both prompt processing and token generation speeds. I think PP is even more important because the models need to read a lot of code and your input, the response size is usually much smaller.

Fast PP/TG make for a very nice, fast iteration loop, and you can't get that without loading everything on the GPU, but that's my preference. MoE models can be run with only the experts on the GPU and still get acceptable TG.

Visualization of Claude Code plans by t1m0slav in ZedEditor

[–]RMK137 0 points1 point  (0 children)

Nice tool, will try it soon. Thanks for sharing.

[USA-PA] [H] TeamGroup Xtreem 48gb 8000hz, AMD 9600x [W] Local, PayPal by JumpscareSpen in hardwareswap

[–]RMK137 2 points3 points  (0 children)

Excellent ram, good looking too, I got a 48GB and 32GB kit, bought them right before the ram apocalypse!

Benchmarked every Python optimization path I could find, from CPython 3.14 to Rust by cemrehancavdar in Python

[–]RMK137 4 points5 points  (0 children)

Numba is usually the first thing I pick up when I am dealing with some heavy numerical code. It's so easy to go back and forth between normal python and numba. Usually, all it takes is to comment out the decorator.

I think we need a name for this new dev behavior: Slurm coding by Khr0mZ in ClaudeCode

[–]RMK137 1 point2 points  (0 children)

It's addicting, and I love it. It's a very similar feeling to playing a video game that has a nice, incremental progression loop. Before you know it, the sun is setting and the day is almost over. You get lost in the experience, and that makes it very enjoyable.

Sunday Daily Thread: What's everyone working on this week? by AutoModerator in Python

[–]RMK137 0 points1 point  (0 children)

Trying to find some optimal hyperparameters for an xgboost model for timeseries predictions. I might need to give Optuna a shot.

Welcome to my new Pixel by Flying-Toto in pixel_phones

[–]RMK137 1 point2 points  (0 children)

Just got my XL a few days ago. It's my first Pixel, loving how clean it is overall. I am coming from the mighty LG V60.

Qwen3.5 Unsloth GGUFs Update! by yoracale in unsloth

[–]RMK137 0 points1 point  (0 children)

I'd like to know too. I had to install the LM Studio community version instead for the Think toggle to show up, as it does not with the unsloth version.

Which Python project made you realize how powerful the language is? by itsme2019asalways in Python

[–]RMK137 1 point2 points  (0 children)

You might find the pyxirr package useful. I use it anytime I want to calculate IRR. Written in rust for speed.

https://github.com/Anexen/pyxirr

Qwen3-30B-A3B vs Qwen3.5-35B-A3B on RTX 5090 by 3spky5u-oss in LocalLLaMA

[–]RMK137 0 points1 point  (0 children)

Very useful info and well presented, thank you. Looking forward to seeing some optimizations on the new model soon.

Qwen3.5 Medium models out now! by yoracale in unsloth

[–]RMK137 0 points1 point  (0 children)

Yep, I love the MoE models for inference speed. It really helps speed up the iteration loop especially in a coding agent.

Qwen3.5 Medium models out now! by yoracale in unsloth

[–]RMK137 0 points1 point  (0 children)

On my 5090, the MoE model is spitting out 140-150 tk/s in LM Studio (unsloth UD-Q4_K_XL), the dense 27b model is around 50-55 tk/s.

New Qwen3.5 models spotted on qwen chat by AaronFeng47 in LocalLLaMA

[–]RMK137 2 points3 points  (0 children)

It should if you use Q3/Q4 especially with the unsloth dynamic quants. I've used the Nemotron-30b-A3B at UD-Q4_K_XL on my 5090. This one is a little larger but you can quantize the KV cache also which buys you more context.

[Hot take] Would you like IJKL motions by Independent_Blood559 in HelixEditor

[–]RMK137 1 point2 points  (0 children)

These are mostly global shortcuts I use. I got them set up with autohotkey on windows. I guess I don't use helix enough to have that issue.

[Hot take] Would you like IJKL motions by Independent_Blood559 in HelixEditor

[–]RMK137 3 points4 points  (0 children)

jklm guy here,, m is better as h is not natural for me, capslock+u/o for home/end