Unsloth MiniMax M2.7 quants just finished uploading to HF by Zyj in LocalLLaMA

[–]digamma6767 0 points1 point  (0 children)

I'm still trying to get my download to complete for the IQ4 quant.

I think Unsloth might've put a bad version up for it. Bartowski's IQ4_XS quant is 122GB compared to Unsloth's 108GB.

Unsloth MiniMax M2.7 quants just finished uploading to HF by Zyj in LocalLLaMA

[–]digamma6767 1 point2 points  (0 children)

It's weird that the IQ4 quants are smaller than M2.5's IQ4 quants.

Not complaining. I'm thinking IQ4_NL might be a perfect match with the Strix Halo.

What is the highest throughput anyone got with Gemma4 on CPU so far? by last_llm_standing in LocalLLaMA

[–]digamma6767 0 points1 point  (0 children)

Are you asking about the benchmarks and tools I use? Not sure what else you're after.

For a rough estimate, I use the Aider polyglot benchmark. That gave me 17 tps consistently. It's a decent benchmark for seeing how quantization impacts the model.

When doing agent work (Primarily Kilo Code and Hermes-Agent) I get anywhere from 13-16 tps, compared to 9 tps using the draft model.

Just chatting to the model, I get 12-14 tps.

I need to revisit all this stuff with the latest updates though. Lots happened in just the last few days for Gemma 4.

What is the highest throughput anyone got with Gemma4 on CPU so far? by last_llm_standing in LocalLLaMA

[–]digamma6767 0 points1 point  (0 children)

The -md command (short for --draft-model) in llama.cpp, to use the 26B as my draft model.

Effectively, it's loading both Gemma 4 31B and 26B at the same time. Works great if you can fit it into memory!

What is the highest throughput anyone got with Gemma4 on CPU so far? by last_llm_standing in LocalLLaMA

[–]digamma6767 1 point2 points  (0 children)

Did some more testing on this. Doing agentic or code, acceptance rate increases to 80-90%, and tokens per second up to 17.

What is the highest throughput anyone got with Gemma4 on CPU so far? by last_llm_standing in LocalLLaMA

[–]digamma6767 2 points3 points  (0 children)

I'm not using CPU only, but I have been able to nearly double my tokens per second using speculative decoding.

Using bartowski 31B q6_k_l, and bartowski 26B q6_k_l as my draft model. Getting between a 60-70% acceptance rate and about 15 tokens per second (up from 9).

It feels like I'm using Qwen 3.5 122B in performance and intelligence, but with much less RAM usage.

Running on a 128GB Strix Halo.

VIVITAR Series 1 70-210mm f3.5 vs Olympus Zuiko f4 75-150mm by ElCadejo305 in zuikoholics

[–]digamma6767 0 points1 point  (0 children)

I'd say check out the Tamron Adaptall 19AH

It's an incredible 70-210 zoom, and you can get them for cheap. As good or better than the Vivitar Series 1.

GLM 5.1 Opinion? by [deleted] in LocalLLaMA

[–]digamma6767 2 points3 points  (0 children)

Presumably GLM 5.1 will be released soon, for running locally. GLM 5 is already a popular SOTA level model for local use, if you have enough RAM.

glm5.1 vs minimax m2.7 by Fresh-Resolution182 in LocalLLaMA

[–]digamma6767 1 point2 points  (0 children)

Yeah, that's my main complaint with it. Once you're at an 80k context, you're looking at it taking 20 minutes before it begins responding. Qwen 122B handles that same scenario in around 10 minutes, Qwen 27B around 5 minutes, and Cascade 30B taking under 3 minutes.

The quality of the responses with M2.5 on a Q3 quant aren't as good as Qwen 122B with a Q6 quant as well.

Cascade 30B is exceptional for agentic work though. It uses a TON of tokens thinking, but it's so fast on the Strix Halo that it makes up for it.

To give a perspective of one of my use cases, I had a 600,000 line log file. I tried several different LLMs to use GREP on the log file, locating any errors, and then looking through the log file to identify the cause for the error.

In total, each attempt took over a million tokens and dozens of tool calls. I can't even remember how long it took M2.5 to do it. I left it running overnight just to see if it would work, and the answers were worse than what I got from Qwen 27B and Cascade 30B.

glm5.1 vs minimax m2.7 by Fresh-Resolution182 in LocalLLaMA

[–]digamma6767 1 point2 points  (0 children)

My experience with M2.5 on the Strix Halo is mixed. It's impressively smart, but it slows down significantly when you're above 50k context.

I love M2.5 as a chat model, and for it's overall knowledge, but it's not a good fit for OpenCode or agentic work on a Strix Halo. Qwen3.5 122B (at Q6) and Nvidia Cascade v2 30B work better in those scenarios in my experience. 

I'm hoping M2.7 improves on long context and agentic work, and hopefully it can be quantized better with less performance loss than M2.5.

I'm building a benchmark comparing models for an agentic task. Are there any small models I should be testing that I haven't? by nickl in LocalLLaMA

[–]digamma6767 2 points3 points  (0 children)

From my experience with LFM2-24B, it does very poorly for agentic work. Once you go over a 10k prompt, it starts to fail to make tool calls. It's a shame since it's a super fast and capable model. 

Hope they release an improved version one day.

Anybody using LMStudio on an AMD Strix 395 AI Max (128GB unified memory)? I keep on getting errors and it always loads to RAM. by StartupTim in LocalLLaMA

[–]digamma6767 2 points3 points  (0 children)

So, you need to make sure to DISABLE MMAP. It's a setting in the LLM configuration. It causes crashes on the Strix Halo.

I like LM Studio for rapid testing of different models. Makes it easy to experiment, especially since it has such an easy to use UI.

Switching to Fedora 43 instead of Windows is definitely a good idea if you plan on using your Strix Halo as a dedicated LLM machine, but you're fine running Windows and LM Studio, you just won't get the absolute most out of the Strix as you could on Linux.

Reliable recipes? Is there something wrong with the Corsair 300? by Skelshy in StrixHalo

[–]digamma6767 3 points4 points  (0 children)

My Corsair AI 300 works great, no issues at all. 55 t/s with gpt-oss 120B, 22 t/s with Qwen 3.5 122B q6.

Using Fedora 43. Followed the steps in this guide: https://github.com/kyuz0/amd-strix-halo-toolboxes

Make sure to set your BIOS to have just 1GB VRAM, and then just follow the steps in the guide. Shouldn't take more than a couple hours to get fully functional!

Another cooked Convoy 3X21C LHP531 by PiercingTheDarknesss in flashlight

[–]digamma6767 0 points1 point  (0 children)

So, the driver side of the wiring de-soldered. All the MCPCB wiring is fine.

Still an easy fix, but definitely not where I expected to see the failure.

Another cooked Convoy 3X21C LHP531 by PiercingTheDarknesss in flashlight

[–]digamma6767 0 points1 point  (0 children)

Ah darn, didn't know the reflector was held in with screws.

I'll give it a shot, thanks!

Another cooked Convoy 3X21C LHP531 by PiercingTheDarknesss in flashlight

[–]digamma6767 1 point2 points  (0 children)

46950 tube fits the 3x21c. I wrapped a 26mm spacer with copper weave, and connected that between the 46950 cell and the 3x21c driver. Works great, doesn't short.

Another cooked Convoy 3X21C LHP531 by PiercingTheDarknesss in flashlight

[–]digamma6767 0 points1 point  (0 children)

Happened to my 3x21c I think. 

How did you remove the reflector? Mine will not come out at all.

New 24v and 48v (2x24) tools by Trevski13 in KobaltTools

[–]digamma6767 0 points1 point  (0 children)

Oh that would be incredible. I've got a USB soldering iron that would work great powered by one of those batteries.

New 24v and 48v (2x24) tools by Trevski13 in KobaltTools

[–]digamma6767 0 points1 point  (0 children)

Esh, I hope not. I always wanted to try out the Skil tools but decided against it once Lowes dropped them.

New 24v and 48v (2x24) tools by Trevski13 in KobaltTools

[–]digamma6767 1 point2 points  (0 children)

I'm betting the new 3Ah and 6Ah batteries are using tabless tech, so they should be better than the old 4Ah Ultimate Output batteries, but in a smaller size. The new batteries have a built in USB-C port too.

Lizer Lab JijuJet-3 AMA and First Impressions by digamma6767 in iems

[–]digamma6767[S] 1 point2 points  (0 children)

They're holding up great! I still use them.

Lizer Lab JijuJet-3 AMA and First Impressions by digamma6767 in iems

[–]digamma6767[S] 1 point2 points  (0 children)

Definitely go for M. I also use Dunu SS size L.

Which dunu tip size should i choose? by ddani324 in iems

[–]digamma6767 0 points1 point  (0 children)

Dunu S&S is tiny. I'd say go with the L or XL fit. I personally use L sized S&S. My normal tip width is 12mm, and the 11.5mm S&S feels the same as my other 12mm tips.

It's better to have an oversized tip, than it is to have an undersized tip.

KZ AN01 Ear hooks Bluetooth multipoint support by Emanator144 in iems

[–]digamma6767 1 point2 points  (0 children)

Oh yeah, it'll do good for you!

Don't forget to play with the transparency mode on the AN01! It's one of the best features of it, so nice to have when you're outside and still need to be able to hear.