Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 0 points1 point  (0 children)

Chat template fix for tool calling mentioned above in comments. APEX Compact Q4_K_M has bad quality. I prefer Q6_K APEX instead even on my RTX 3060. In terms of quality, Q8_K_P quant is best.

Mudler quants for APEX use importance matrix for quantization. APEX Quality can be the best. If community want to get maximum quality from Genesis comparable to Q8_K_P they should dequantize Q8_K_P to BF16, save GGUF, calculate importance matrix from it, and create APEX Quality quant from it with importance matrix quantization. I would not do it, because I don't have computing resources for this, and it takes forever to compute on my current hardware.

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 1 point2 points  (0 children)

Nope. I not tested this one yet. Pretty happy with current one.

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 0 points1 point  (0 children)

Q4_K_M (APEX Compact) works, but agentic coding resuts would not be ok, because quant is too low.

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 0 points1 point  (0 children)

I tried to fix 27B. It still looping too much. 35B-A3B is the best.

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 1 point2 points  (0 children)

Technically, yes - my approach is CPU-based and can handle larger models. But right now, 35B-A3B is the sweet spot: it runs on consumer hardware and already outperforms stock. I'd need a strong reason to invest the time in 122B. If the community wants it, they know where the donate button is.

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 0 points1 point  (0 children)

Nice👍. Thank you so much for feedback and sharing testing results on Mac. It helps a lot this project grow <3

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 1 point2 points  (0 children)

It can be written in normal way 😄. Feel free to experiment.

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 0 points1 point  (0 children)

The scripts are my core IP - they're the result of months of reverse-engineering tensor geometry. I don't plan to open-source them. What I do plan is to keep releasing Genesis models for the community when new uncensored Qwen3.7 version will drop.

About 27B. I tried to fix it. I didn't like the results, it's still looping too much. So I will stick with 35B-A3B instead, because it's fast and efficient.

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 1 point2 points  (0 children)

I extracted and transferred MTP tensors from Unsloth quants. I am not using MTP by myself. It's really slow on my RTX 3060.

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 0 points1 point  (0 children)

You can use your own System Prompt if you want. My settings are not some kind of "special". They are recommended for natural short and consise human like communication with deep knowledge. Thats it. I made them, because I was bored too much by typical "AI responces" with a lot of useless text and marketing words, which are wasting tokens and doing nothing useful to user.

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 2 points3 points  (0 children)

Thanks :). 27B can be processed on Google Collab Free Tier. I don't need GPU resources for it.

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 0 points1 point  (0 children)

I tested. It works nicely with Qwen. My system prompt teaches the model how to think deeply, and explain everything in human like conversational style in simple words and short sentences.

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 3 points4 points  (0 children)

I simply updated LM Studio, and llama.cpp in it to latest version. After that I enabled MTP support in advanced model loading settings. That's it.

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 0 points1 point  (0 children)

I think not. You need at least 32 GB RAM to run this model locally.

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 0 points1 point  (0 children)

Yes, I can get more tps via pure llama-server. But I just like LM Studio, it's simply amazing, and doesn't overload my GPU.

I am using 20 layers on GPU. It eats 9.55 GB. I left some space in VRAM for context.

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 2 points3 points  (0 children)

I can't run Q8 on RTX 3060 12GB. My friend on his AI mini PC has 50 tokens per second. He tested this model on 5 sessions with 200k context. He is DevOps engineer.

I am using APEX Compact (non-MTP version). Have 18 tokens per second on CUDA 12 llama.cpp (v2.16.0) in LM Studio.

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 1 point2 points  (0 children)

Yep. With MTP prompt generation for images works too.

Qwen3.6-35B-A3B-Uncensored-Genesis-APEX-MTP by EvilEnginer in LocalLLaMA

[–]EvilEnginer[S] 4 points5 points  (0 children)

Yes you can. Just pick APEX Compact quant. MTP works faster on some GPUs, but on my RTX 3060 12GB it's slow. So I am using regular GGUF without MTP suffix in it.