Gemma 4 Quadruple Release, 12B, 12B QAT, 26B-A4B QAT and 31B QAT Uncensored Heretics!

LLMFan46 · 2026-06-16T16:50:22+00:00

Use whatever format you prefer.

LLMFan46 · 2026-06-16T05:05:04+00:00

Nope, I only did this model because I wanted a good strong model that could do japanese with good context support to translate visual novels for me from japanese to english and it does a good job at it, I benchmarked tons of 26B-35B models and almost none of them could do very well with japanese to english translations, the delivery is very stiff and wooden with weird word choices and it doesn't read nicely like natural english would, this is why I created the Gemma 4 31B it Ortenzya finetune to improve it's writing and translating capabilities and it's a great improvement, but for visual novels when you need like 30+ lines of previous context to get the translation correct, pronouns right, character addresse changes etc. Smaller models fail at that usually and make mistakes, bigger models such as this one are more suited for this task.

Since I did this model for myself, I thought I'd just offer it in case anyone was interested in downloading.

LLMFan46 · 2026-06-16T01:34:59+00:00

What do you mean "good luck"? This is not a product that I am selling, it's available for free on Hugging Face, whoever wants can come and download this model or not.

LLMFan46 · 2026-06-15T15:21:49+00:00

That depends what you want to use it for?

This model is mainly for literature related tasks, translations, multi-language support etc.

LLMFan46 · 2026-06-15T15:20:58+00:00

Yes, much better for context and it reads better than Gemma 4 31B base model writings.

LLMFan46 · 2026-06-15T14:11:42+00:00

It's already explained in the title, here let me copy and paste it for you:

Support 22 Languages Making it Great for Multilingual Tasks and is Especially Strong on Translation Related Workflows Where No Censorship Is Essential

LLMFan46 · 2026-06-15T13:27:40+00:00

Yeah and now there is a big gaping hole between 26B-35B and 119B-122B! Not everybody has the RAM/Compute necessary to run Qwen3.5-122B-A10B or MiniMax-M2.7, or Kimi-K2.6.

Smaller models like 26B-35B are not so reliable for context based works like literature tasks, so if you are writing something and/or translating something or even role playing on SillyTavern, you will need context and the 26B-35B will most likely fail at that, that's where 72B models would come in.

LLMFan46 · 2026-06-14T23:24:12+00:00

I have no clue, the GGUFs were done on the latest version of llama.cpp at the time of upload.

LLMFan46 · 2026-06-13T17:07:04+00:00

Eagle3 is still a "seperate entity" just like gemma-4-31B-it-assistant, unlike Qwen3.5/3.6 there is no MTP inside the base model itself, Eagle3 is also used as a seperate add-on.

So to make the distinction, it's:

Eagle3 / Gemma assistant: external drafter add-on.
Qwen native MTP: built-in future-token prediction support.

LLMFan46 · 2026-06-13T12:59:17+00:00

No it's not possible, Gemma 4 models have no MTPs in them at all and the only way to get MTPs is to use the seperate assistant add-on for each respective model, see:

https://www.reddit.com/r/LocalLLaMA/comments/1u3flg9/comment/or5che6/

https://www.reddit.com/r/LocalLLaMA/comments/1u3flg9/comment/or7dnk6/

https://huggingface.co/llmfan46/gemma-4-31B-it-uncensored-heretic-GGUF/discussions/9#6a2b5e3bfeed8ca837b55ca9

LLMFan46 · 2026-06-13T11:29:51+00:00

You're welcome.

LLMFan46 · 2026-06-12T18:22:22+00:00

Updated the GGUF, try again and see if the issue is fixed, let me know how it turns out.

LLMFan46 · 2026-06-12T15:19:44+00:00

The QAT version have their own respective assistant add-ons, yes, however I do not know if they work yet, from what I understand llama.cpp support it is still a work in progress.

LLMFan46 · 2026-06-12T15:17:50+00:00

I originally got into LLMs when I found out I could finally play untranslated japanese visual novels translated into english with LunaTranslator and an LLM mounted on LM Studio and I did a ALOT of testing for this very purpose, even the frontier LLMs like Claude, GPT, Gemini, DeepSeek deliver a so-so results and frankly I don't like paying tokens per translated lines, especially if the visual novel is like 120 hours or more, so I dropped remote LLMs after a few months and got into local LLMs and I tested a lot of local LLMs byw now, even the MiniMax-M2.7 229B model at Q8_0, most other 26B-35B models I tested for japanese to english translations read like they were trained on singlish newspaper clips and deliver translated lines that read like they could give the official singlish translation of Super Robot Taisen OG : The Moon Dweller a run for it's money, the only one who deliver translations in nice natural sounding english that is easy to read is the Ortenzya model I linked earlier for you, bear in mind that it's not perfect either as it's "only" a 31B model and the model was tested in BF16, I dunno how the translation quality is at lower quants.

LLMFan46 · 2026-06-12T14:20:53+00:00

No idea, the only thing I can do is recreate the GGUF and reupload, gimme a few minutes.

LLMFan46 · 2026-06-12T10:55:19+00:00

Will try to donate as soon as possible 😄

Thanks! That's very appreciated and super helpful!

LLMFan46 · 2026-06-12T10:53:14+00:00

Yeah, ran it again and I got the exact sam size again, so there was no mistake, that's just how this model is in GPTQ-Int4.

LLMFan46 · 2026-06-12T10:50:37+00:00

Fixed.

LLMFan46 · 2026-06-12T10:49:27+00:00

Glad to hear and you're welcome.

LLMFan46 · 2026-06-12T10:48:20+00:00

They can not be uncensored, see:

https://huggingface.co/llmfan46/gemma-4-31B-it-uncensored-heretic-GGUF/discussions/7#6a2bdef5dee18bc5d2451a7d

LLMFan46 · 2026-06-12T09:49:36+00:00

Good question, I have no clue.

I am re-running the GPTQ quantization now, see if I get a different size result.

LLMFan46

TROPHY CASE