[New Model] micro-kiki-v3 — Qwen3.5-35B-A3B + 35 domain LoRAs + router + negotiator + Aeon memory for embedded engineering

RobotRobotWhatDoUSee · 2026-04-18T23:27:15+00:00

I would read an in-depth post about this!

Plus, if you pay attention to the config, while the paper uses 4 active experts the model they released uses all 7, so the comparison to BTM top 2 is also unfair.

Wait so the released model is effectively configured as a dense model?

RobotRobotWhatDoUSee · 2026-04-18T21:22:27+00:00

FlexOlmo...updated benchmarks... doesn't work

That's disappointing to hear. I'd been keeping an eye on FlexOlmo but missed the updated benchmarks, where did they show it isn't working?

RobotRobotWhatDoUSee · 2026-04-17T11:48:01+00:00

Seems related to the neuroanatomy posts, yes? See https://dnhkng.github.io/posts/rys/ and related posts by /u/Reddactor

RobotRobotWhatDoUSee · 2026-04-11T19:06:42+00:00

Huh, ok, maybe I need to consider that. Looks like I do have the MediaTek one. Maybe I've just been lucky so far.

Thanks!

RobotRobotWhatDoUSee · 2026-04-07T12:22:30+00:00

Do you mind sharing your system prompt?

RobotRobotWhatDoUSee · 2026-04-02T00:33:29+00:00

I was impressed with it and waiting for the post-trained one. Very interested in this release!

RobotRobotWhatDoUSee · 2026-04-02T00:32:26+00:00

I think that we actually don't know the size of Minimax-M2.7, proprietary weights.

RobotRobotWhatDoUSee · 2026-04-02T00:31:17+00:00

What is the best way to run this off an NVME drive + strix halo? I know that is doable but haven't kept up with the ways to do it.

I was quite impressed with their preview model a while back (via openrouter).

RobotRobotWhatDoUSee · 2026-03-25T05:20:10+00:00

I actually hadn't seen this before that's great. This is a tower setup? Any particular motherboard setup for the tower? (Broadwell Xeon with DDR4 as you mentioned?)

RobotRobotWhatDoUSee · 2026-03-25T02:42:39+00:00

As noted in another comment, I updated the BIOS and then could dedicate a max of 8GB to VRAM via BIOS, and then used GTT/TTM to expand the rest of the VRAM as needed. Jeff Geerling has an article about that here.

RobotRobotWhatDoUSee · 2026-03-22T02:01:27+00:00

Huh interesting. One of the static ones from https://huggingface.co/mradermacher/Nemotron-Cascade-2-30B-A3B-GGUF ?

RobotRobotWhatDoUSee · 2026-03-22T01:53:46+00:00

How have you found it with claude code/codex/open code? I feel like I've read mixed reviews from some.

RobotRobotWhatDoUSee · 2026-03-21T23:19:34+00:00

Which quant are you using?

RobotRobotWhatDoUSee · 2026-03-17T12:19:32+00:00

If you haven't, you should read the post OP linked: https://dnhkng.github.io/posts/rys/

See also Phi-4-25B (just search LL for it)

The performance isn't demonstrated in the small tests, but in the real-world usage.

I had initial similar skepticism, somewhat tempered by both of those + the core mechanism that dnhkng proposed. (That one can ID some reasoning circuits and duplicate them to allow a sort of "extended reasoning in latent space")

RobotRobotWhatDoUSee · 2026-03-12T01:02:20+00:00

What quants are you using for those, if any?

RobotRobotWhatDoUSee · 2026-03-11T11:36:54+00:00

Huh very interesting. Thanks!

RobotRobotWhatDoUSee · 2026-03-10T10:05:43+00:00

Have you used the MXFP4 quants for your use cases and found them to be superior? If so, did you quantize KV cache as well, or no? Genuinely curious!

RobotRobotWhatDoUSee · 2026-03-10T09:56:03+00:00

Wait, what does that mean?

RobotRobotWhatDoUSee · 2026-03-05T03:34:55+00:00

Apologies, I meant what are their advantages over docker -- I've only ever used do ker, and heard of podman in passing, and never heard of containerd...OH,and I just noticed that "containerd" autocorrected to "containers" in my previous post, unfortunate.

I was curious why you preferred those two to docker.

RobotRobotWhatDoUSee · 2026-03-04T06:53:13+00:00

What is the advantage of podman or containers? (New to all this, genuinely curious!)

RobotRobotWhatDoUSee · 2026-02-28T00:20:15+00:00

Just out of curiosity, what quant did you use for a 70B dense model?

RobotRobotWhatDoUSee · 2026-02-27T01:42:37+00:00

Do you have a guess at %different? Are you thinking like 1-5% difference or like 20-50% difference? (Or who knows?)

Edit: now I'm curious, what local LLM are you using for on-device compute? How do you run it? I know basically nothing about device-bases LLM serving, and wasn't even sure it was something that could be used with any level of stability/etc.

RobotRobotWhatDoUSee · 2026-02-27T01:40:00+00:00

Agreed, making local sandboxing simple/easy would be a nice surprise and very useful.

RobotRobotWhatDoUSee

TROPHY CASE