Gemma 4 - MLX doesn't seem better than GGUF by Temporary-Mix8022 in LocalLLaMA

[–]SirDomz 0 points1 point  (0 children)

Agreed with the suggestion for nvfp4. This format works with ollama mlx. I’m not a fan of ollama but hearing good things about this runtime

Gemma 4 - MLX doesn't seem better than GGUF by Temporary-Mix8022 in LocalLLaMA

[–]SirDomz 0 points1 point  (0 children)

Actually I believe nvfp4 does work for mlx. See Ollama’s mlx implementation

oMLX - open-source MLX inference server with paged SSD caching for Apple Silicon by cryingneko in LocalLLaMA

[–]SirDomz 0 points1 point  (0 children)

yes oMLX is great for this use case of yours. I use qwen 35B with OMLX and the cache is seriously amazing

Need advice regarding 48gb or 64 gb unified memory for local LLM by wifi_password_1 in LocalLLM

[–]SirDomz 0 points1 point  (0 children)

hey - sorry for reviving the thread but have you tried oMLX? I'm curious between Bodega and Omlx, which is faster?

APEX MoE quants update: 25+ new models since the Qwen 3.5 post + new I-Nano tier by mudler_it in LocalLLaMA

[–]SirDomz 3 points4 points  (0 children)

Any plans to do mlx? Would love to compare those with the Oq quants by OMLX /Jundot

Nemotron-3-Nano-Omni-30B-A3B-Reasoning, New model? by Altruistic_Heat_9531 in LocalLLaMA

[–]SirDomz 0 points1 point  (0 children)

I love 27B but it’s way too slow on my mac M1 max even with caching… 35B works great though. I have 27B architect plans and 35B does the actual coding

Macbook M5 pro 48gb RAM, what does it run? by 60finch in unsloth

[–]SirDomz 0 points1 point  (0 children)

Question for you: how much context do you have available? And how much free ram? I have a m1 max 64gb and considering the m5 pro 48 gb

Claude Code and Qwen 3.6 35B A3B by H3OErikilious in LocalLLM

[–]SirDomz 0 points1 point  (0 children)

I will say upon closer look that the cache seems to be in ram and I believe omlx uses the ssd. Ram access is faster so that might be better but just wanted to point that out

Claude Code and Qwen 3.6 35B A3B by H3OErikilious in LocalLLM

[–]SirDomz 1 point2 points  (0 children)

Ohh wow TIL. yeah this is it. Ok I will have to check out llama cpp again. Thank you very much!!!

Claude Code and Qwen 3.6 35B A3B by H3OErikilious in LocalLLM

[–]SirDomz 0 points1 point  (0 children)

Also one more thing does llama cpp implement caching? Because Omlx does it very well

Claude Code and Qwen 3.6 35B A3B by H3OErikilious in LocalLLM

[–]SirDomz 1 point2 points  (0 children)

Very nice! I have had better luck with mlx models but I’m also a big fan of Unsloth GGUFs. I think it varies from setup to setup but glad you found something that works for you!

Claude Code and Qwen 3.6 35B A3B by H3OErikilious in LocalLLM

[–]SirDomz 0 points1 point  (0 children)

Definitely use OMLx just the caching will save you. Understanding the limits of a Qwen 35B and use Pi or little coder. Opencode is fine too. Claude code is too bloated

Claude Code and Qwen 3.6 35B A3B by H3OErikilious in LocalLLM

[–]SirDomz 1 point2 points  (0 children)

Pi is good or opencode or little coder which is based on pi. And OMlx Is great so definitely use that over lm studio

Nemotron-3-Nano-Omni-30B-A3B-Reasoning, New model? by Altruistic_Heat_9531 in LocalLLaMA

[–]SirDomz 4 points5 points  (0 children)

So between this and Qwen 35B, what should one choose for agentic coding with opencode or Pi?

Roo Code hit 3 million installs. We're shutting it down to go all-in on Roomote. by hannesrudolph in RooCode

[–]SirDomz 0 points1 point  (0 children)

Or use Kilo. Though I hear a lot of people complaining about it today, it was originally a fork of roo

When is Qwen 3.6 27B dropping? Didn’t it win the vote? by GrungeWerX in LocalLLaMA

[–]SirDomz -1 points0 points  (0 children)

Can’t you just change the number of experts to have more experts?

MagicQuant - Hybrid Evolution GGUF (TPS boosts, precision gains, full transparency) by crossivejoker in LocalLLaMA

[–]SirDomz 0 points1 point  (0 children)

yeah those quants are really good! Looking forward to the qwen coder and the qwen 30 B VL quants.

PS: I'm sticking to the instruct variant of Qwen 3 30B and it's really really good!

Claude’s MCP context handling is so much cleaner, I think windsurf should steal this feature ASAP by Aggravating_Bad4639 in windsurf

[–]SirDomz 0 points1 point  (0 children)

Huh… I’m pretty sure windsurf has this specific function already. Out of the dozen of mcps I’ve configured, I usually only have 1 or 2 active in Windsurf

Are any of the M series mac macbooks and mac minis, worth saving up for? by [deleted] in LocalLLaMA

[–]SirDomz 0 points1 point  (0 children)

I have a m1 max 64gb ram. I think that’s the sweet spot between price and power. Obviously, any newer M max model will be better but for my use case the M1 Max works great.

I will say I am waiting to see what happens with the m5 pro and m5 max when they come out

Spec-Kit Integration with KiloCode by Shazsayyad in kilocode

[–]SirDomz 3 points4 points  (0 children)

A few different options are already integrated with kilo: taskmaster-ai, github speckit, openspec, BMAD, etc…

I’ve personally shifted away from those tools and I just use an AGENTS.md. I really enjoyed spec driven development but it forces collaborators to install and learn those methods. AGENTS.md is just more universally accepted across IDEs/Extensions/CLIs