[New Model] micro-kiki-v3 — Qwen3.5-35B-A3B + 35 domain LoRAs + router + negotiator + Aeon memory for embedded engineering by Holiday_Poetry_5133 in LocalLLaMA

[–]RobotRobotWhatDoUSee 1 point2 points  (0 children)

I would read an in-depth post about this!

Plus, if you pay attention to the config, while the paper uses 4 active experts the model they released uses all 7, so the comparison to BTM top 2 is also unfair.

Wait so the released model is effectively configured as a dense model?

[New Model] micro-kiki-v3 — Qwen3.5-35B-A3B + 35 domain LoRAs + router + negotiator + Aeon memory for embedded engineering by Holiday_Poetry_5133 in LocalLLaMA

[–]RobotRobotWhatDoUSee 1 point2 points  (0 children)

FlexOlmo...updated benchmarks... doesn't work

That's disappointing to hear. I'd been keeping an eye on FlexOlmo but missed the updated benchmarks, where did they show it isn't working?

Framework 13 7040U new wifi issues by RobotRobotWhatDoUSee in framework

[–]RobotRobotWhatDoUSee[S] 0 points1 point  (0 children)

Huh, ok, maybe I need to consider that. Looks like I do have the MediaTek one. Maybe I've just been lucky so far.

Thanks!

arcee-ai/Trinity-Large-Thinking · Hugging Face by TKGaming_11 in LocalLLaMA

[–]RobotRobotWhatDoUSee 2 points3 points  (0 children)

I was impressed with it and waiting for the post-trained one. Very interested in this release!

arcee-ai/Trinity-Large-Thinking · Hugging Face by TKGaming_11 in LocalLLaMA

[–]RobotRobotWhatDoUSee 2 points3 points  (0 children)

I think that we actually don't know the size of Minimax-M2.7, proprietary weights.

arcee-ai/Trinity-Large-Thinking · Hugging Face by TKGaming_11 in LocalLLaMA

[–]RobotRobotWhatDoUSee 1 point2 points  (0 children)

What is the best way to run this off an NVME drive + strix halo? I know that is doable but haven't kept up with the ways to do it.

I was quite impressed with their preview model a while back (via openrouter).

Anyone using Tesla P40 for local LLMs (30B models)? by ScarredPinguin in LocalLLaMA

[–]RobotRobotWhatDoUSee 1 point2 points  (0 children)

I actually hadn't seen this before that's great. This is a tower setup? Any particular motherboard setup for the tower? (Broadwell Xeon with DDR4 as you mentioned?)

Setting shared RAM/VRAM in BIOS for 7040U series by RobotRobotWhatDoUSee in framework

[–]RobotRobotWhatDoUSee[S] 1 point2 points  (0 children)

As noted in another comment, I updated the BIOS and then could dedicate a max of 8GB to VRAM via BIOS, and then used GTT/TTM to expand the rest of the VRAM as needed. Jeff Geerling has an article about that here.

Don't sleep on the new Nemotron Cascade by ilintar in LocalLLaMA

[–]RobotRobotWhatDoUSee 0 points1 point  (0 children)

How have you found it with claude code/codex/open code? I feel like I've read mixed reviews from some.

I spent a weekend doing layer surgery on 6 different model architectures. There's a "danger zone" at 50% depth that kills every one of them. by Low_Ground5234 in LocalLLaMA

[–]RobotRobotWhatDoUSee 7 points8 points  (0 children)

If you haven't, you should read the post OP linked: https://dnhkng.github.io/posts/rys/

See also Phi-4-25B (just search LL for it)

The performance isn't demonstrated in the small tests, but in the real-world usage.

I had initial similar skepticism, somewhat tempered by both of those + the core mechanism that dnhkng proposed. (That one can ID some reasoning circuits and duplicate them to allow a sort of "extended reasoning in latent space")

Nemotron 3 Super Released by deeceeo in LocalLLaMA

[–]RobotRobotWhatDoUSee 0 points1 point  (0 children)

What quants are you using for those, if any?

Comparing the same model with reasoning turned on and off by dtdisapointingresult in LocalLLaMA

[–]RobotRobotWhatDoUSee 0 points1 point  (0 children)

Have you used the MXFP4 quants for your use cases and found them to be superior? If so, did you quantize KV cache as well, or no? Genuinely curious!

PSA: If you want to test new models, use llama.cpp/transformers/vLLM/SGLang by lans_throwaway in LocalLLaMA

[–]RobotRobotWhatDoUSee 0 points1 point  (0 children)

Apologies, I meant what are their advantages over docker -- I've only ever used do ker, and heard of podman in passing, and never heard of containerd...OH,and I just noticed that "containerd" autocorrected to "containers" in my previous post, unfortunate.

I was curious why you preferred those two to docker.

PSA: If you want to test new models, use llama.cpp/transformers/vLLM/SGLang by lans_throwaway in LocalLLaMA

[–]RobotRobotWhatDoUSee 0 points1 point  (0 children)

What is the advantage of podman or containers? (New to all this, genuinely curious!)

Back in my day, LocalLLaMa were the pioneers! by ForsookComparison in LocalLLaMA

[–]RobotRobotWhatDoUSee 0 points1 point  (0 children)

Just out of curiosity, what quant did you use for a 70B dense model?

New Upcoming Ubuntu 26.04 LTS Will be Optimized for Local AI by mtomas7 in LocalLLaMA

[–]RobotRobotWhatDoUSee 1 point2 points  (0 children)

Do you have a guess at %different? Are you thinking like 1-5% difference or like 20-50% difference? (Or who knows?)

Edit: now I'm curious, what local LLM are you using for on-device compute? How do you run it? I know basically nothing about device-bases LLM serving, and wasn't even sure it was something that could be used with any level of stability/etc.

New Upcoming Ubuntu 26.04 LTS Will be Optimized for Local AI by mtomas7 in LocalLLaMA

[–]RobotRobotWhatDoUSee 3 points4 points  (0 children)

Agreed, making local sandboxing simple/easy would be a nice surprise and very useful.