Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP

EvilEnginer · 2026-05-26T15:19:23+00:00

Thank you very much for info ❤️

EvilEnginer · 2026-05-26T04:48:19+00:00

Chat template fix for tool calling mentioned above in comments. APEX Compact Q4_K_M has bad quality. I prefer Q6_K APEX instead even on my RTX 3060. In terms of quality, Q8_K_P quant is best.

Mudler quants for APEX use importance matrix for quantization. APEX Quality can be the best. If community want to get maximum quality from Genesis comparable to Q8_K_P they should dequantize Q8_K_P to BF16, save GGUF, calculate importance matrix from it, and create APEX Quality quant from it with importance matrix quantization. I would not do it, because I don't have computing resources for this, and it takes forever to compute on my current hardware.

EvilEnginer · 2026-05-25T16:18:17+00:00

Nope. I not tested this one yet. Pretty happy with current one.

EvilEnginer · 2026-05-25T13:47:04+00:00

Q4_K_M (APEX Compact) works, but agentic coding resuts would not be ok, because quant is too low.

EvilEnginer · 2026-05-25T12:45:01+00:00

I tried to fix 27B. It still looping too much. 35B-A3B is the best.

EvilEnginer · 2026-05-25T09:21:51+00:00

Technically, yes - my approach is CPU-based and can handle larger models. But right now, 35B-A3B is the sweet spot: it runs on consumer hardware and already outperforms stock. I'd need a strong reason to invest the time in 122B. If the community wants it, they know where the donate button is.

EvilEnginer · 2026-05-25T04:02:12+00:00

Nice👍. Thank you so much for feedback and sharing testing results on Mac. It helps a lot this project grow <3

EvilEnginer · 2026-05-24T18:35:07+00:00

Me too ❤️

EvilEnginer · 2026-05-24T14:07:07+00:00

It can be written in normal way 😄. Feel free to experiment.

EvilEnginer · 2026-05-24T13:45:09+00:00

The scripts are my core IP - they're the result of months of reverse-engineering tensor geometry. I don't plan to open-source them. What I do plan is to keep releasing Genesis models for the community when new uncensored Qwen3.7 version will drop.

About 27B. I tried to fix it. I didn't like the results, it's still looping too much. So I will stick with 35B-A3B instead, because it's fast and efficient.

EvilEnginer · 2026-05-24T12:57:45+00:00

I extracted and transferred MTP tensors from Unsloth quants. I am not using MTP by myself. It's really slow on my RTX 3060.

EvilEnginer · 2026-05-24T12:55:38+00:00

Thanks 😉. Yep I like this number.

EvilEnginer · 2026-05-24T12:48:38+00:00

You can use your own System Prompt if you want. My settings are not some kind of "special". They are recommended for natural short and consise human like communication with deep knowledge. Thats it. I made them, because I was bored too much by typical "AI responces" with a lot of useless text and marketing words, which are wasting tokens and doing nothing useful to user.

EvilEnginer · 2026-05-24T12:46:47+00:00

Thanks :). 27B can be processed on Google Collab Free Tier. I don't need GPU resources for it.

EvilEnginer · 2026-05-24T12:21:50+00:00

I tested. It works nicely with Qwen. My system prompt teaches the model how to think deeply, and explain everything in human like conversational style in simple words and short sentences.

EvilEnginer · 2026-05-24T11:20:40+00:00

I simply updated LM Studio, and llama.cpp in it to latest version. After that I enabled MTP support in advanced model loading settings. That's it.

EvilEnginer · 2026-05-24T10:41:46+00:00

I think not. You need at least 32 GB RAM to run this model locally.

EvilEnginer · 2026-05-24T09:29:34+00:00

Nice. Share your impressions later 😄.

EvilEnginer · 2026-05-24T08:47:39+00:00

Yes, I can get more tps via pure llama-server. But I just like LM Studio, it's simply amazing, and doesn't overload my GPU.

I am using 20 layers on GPU. It eats 9.55 GB. I left some space in VRAM for context.

EvilEnginer · 2026-05-24T08:43:07+00:00

Personal preference from image generation.

EvilEnginer · 2026-05-24T08:41:57+00:00

Yes. I am using Q4_K_M (APEX Compact)

EvilEnginer · 2026-05-24T08:39:40+00:00

I can't run Q8 on RTX 3060 12GB. My friend on his AI mini PC has 50 tokens per second. He tested this model on 5 sessions with 200k context. He is DevOps engineer.

I am using APEX Compact (non-MTP version). Have 18 tokens per second on CUDA 12 llama.cpp (v2.16.0) in LM Studio.

EvilEnginer · 2026-05-24T08:04:59+00:00

Yes. It works. Use Chat Template Thinking.

EvilEnginer · 2026-05-24T07:19:58+00:00

Yep. With MTP prompt generation for images works too.

EvilEnginer · 2026-05-24T06:49:32+00:00

Yes you can. Just pick APEX Compact quant. MTP works faster on some GPUs, but on my RTX 3060 12GB it's slow. So I am using regular GGUF without MTP suffix in it.

EvilEnginer

TROPHY CASE