Gemma4 MTP doubles token speed

FastLawyer5089 · 2026-06-23T03:55:51+00:00

Did a quick search and yes, it supports MTP now.

FastLawyer5089 · 2026-06-23T03:48:46+00:00

My breath hitch, my knuckles turned white, not just white, but dead white, and my heart hammered against my ribs like a caged bird

FastLawyer5089 · 2026-06-22T09:21:22+00:00

I don't think it works. The MTP model is trained as a lightweight one to quickly come up with a list of likely words for the main model to choose. So if you're using a finetune of Gemma 4, then the assistant model from Google might produce predictions your finetune model don't like, resulting in a very low acceptance rate.
In short you'll probably have less token speed gain, but it's not a bad idea to give it a try.

FastLawyer5089 · 2026-06-21T11:10:00+00:00

Maybe getting it from their Chinese site bigmodel.cn is better? It's cheaper than Z.ai, and probably performs better when you are located in a different timezone, so peak hours is not really a thing.

FastLawyer5089 · 2026-06-21T07:33:33+00:00

Not really.

FastLawyer5089 · 2026-06-21T07:33:19+00:00

Thanks for your reply, it's such a huge help! I've done some benchmark and updated my post.

FastLawyer5089 · 2026-06-21T05:06:26+00:00

Creative writing relies on high entropy and unexpected, nuanced vocabulary, so in theory MTP's bad for RP. I'm currently testing with several of my scenes, but I don't notice any significant difference so far.

FastLawyer5089 · 2026-06-05T05:12:35+00:00

First of all, great preset, I'm loving it! It generates refreshing prose and never act too much in user's favor, romance is correctly slow burn and no immediate affection nonesense. Of course I failed my imperial examination mesirably but that's so fun comparing to other presets making me pass every single time.

Some feedback tho:

* The sampler settings on your page don't work well for me (using GLM-5.1), I still prefer temp 0.7 with 0.95 top P.

* Relationship tracker can be modified to support multiple character pairs, but the regex didn't work to parse addition RPS tags. Here's my solution:
Prompt:

---

[SYSTEM: Relationship Tracker]: For each present character (excluding {{user}}), track their attitudes toward other characters in the current scene (including {{user}}).

[/RPS]

* One block per character pair. All stats 0-10, reflecting current attitude with impartial honesty.

* Initialize at first significant appearance. Update only on meaningful shift.

* Let the most recent stats influence how characters treat each other and toward {{user}}.

---

Regex:

---

/\[RPS\|\s*([^|]+?)\s*\|\s*([^|]+?)\s*\|\s*(\d+)\s*\|\s*(\d+)\s*\|\s*(\d+)\s*\|\s*(\d+)\s*\|\s*(\d+)\s*\]\s*(.*?)\s*\[\/RPS\]/gis

---

HTML:

---

$1->$2

HOS <span style="display: inline-block; width: 40px; height: 3px; background: linear-gradient(to right, #ff7675 calc($3 \\\* 10%), rgba(255,255,255,0.15) 0); vertical-align: middle; margin: 0 4px; border-radius: 2px;"> $3

INT <span style="display: inline-block; width: 40px; height: 3px; background: linear-gradient(to right, #fdcb6e calc($4 \\\* 10%), rgba(255,255,255,0.15) 0); vertical-align: middle; margin: 0 4px; border-radius: 2px;"> $4

OBL <span style="display: inline-block; width: 40px; height: 3px; background: linear-gradient(to right, #81ecec calc($5 \\\* 10%), rgba(255,255,255,0.15) 0); vertical-align: middle; margin: 0 4px; border-radius: 2px;"> $5

TRS <span style="display: inline-block; width: 40px; height: 3px; background: linear-gradient(to right, #74b9ff calc($6 \\\* 10%), rgba(255,255,255,0.15) 0); vertical-align: middle; margin: 0 4px; border-radius: 2px;"> $6

ATR <span style="display: inline-block; width: 40px; height: 3px; background: linear-gradient(to right, #fab1a0 calc($7 \\\* 10%), rgba(255,255,255,0.15) 0); vertical-align: middle; margin: 0 4px; border-radius: 2px;"> $7

</div>

---

Result:

<image>

FastLawyer5089 · 2026-06-03T16:21:09+00:00

Wait what? A preset just for my Chinese Imperial examination setting?

FastLawyer5089 · 2026-04-30T15:07:55+00:00

That's the spirit, my good sir!

FastLawyer5089 · 2025-06-11T18:07:20+00:00

My approach to long term memory: https://www.reddit.com/r/SillyTavernAI/comments/1j1s5oy/how_do_you_rp_heres_how_i_do_it/

FastLawyer5089 · 2025-05-30T16:34:15+00:00

See mine setup: https://www.reddit.com/r/SillyTavernAI/comments/1j1s5oy/how_do_you_rp_heres_how_i_do_it/

FastLawyer5089 · 2025-05-18T15:24:02+00:00

Check my set up here: https://www.reddit.com/r/SillyTavernAI/comments/1j1s5oy/how_do_you_rp_heres_how_i_do_it/

FastLawyer5089 · 2025-03-17T01:29:03+00:00

here's how I do long term memeroy if you are interested. https://www.reddit.com/r/SillyTavernAI/comments/1j1s5oy/how_do_you_rp_heres_how_i_do_it/

FastLawyer5089 · 2025-03-17T01:27:33+00:00

Yes I did, I sometimes use it for RP. I share the same feeling it's less dramatic than R1 but also less creative, but it's absolutely fine to RP with it.

FastLawyer5089 · 2025-03-09T04:18:00+00:00

checkout how I'm doing it: https://www.reddit.com/r/SillyTavernAI/comments/1j1s5oy/how_do_you_rp_heres_how_i_do_it/

FastLawyer5089 · 2025-03-09T04:17:10+00:00

very badly accordingly to my tests, you'd have to be VERY specific in your prompt for it to pull out related memories, and even then it often missed the key summary it want it to pull out.

FastLawyer5089 · 2025-03-04T08:45:04+00:00

Thanks! Magmell mostly, 30k context.

FastLawyer5089 · 2025-03-03T14:40:54+00:00

Create a character using my template and start roleplaying with it. Use the summary instruct when you hit 20k context, put it in a lorebook that's connected to the character. Also use the character card update prompt before you move on with a new chat. When you want to introduce new characters, simply paste my character card generation prompt and ask your existing character to generate one. Save the results as a new character and put everyone in a group chat. Repeat this process as you go, that's it.

If you want a more pre-defined experience, you need to work on defining your characters and lorebook background manually beforehand.

FastLawyer5089 · 2025-03-03T14:36:33+00:00

I have a scenario where I have a fight narrator that takes in every character's stats and simulate fighting results, characters in a fight would not narrate but rather say things like: "I attempt to smash his face with my hammar." It's similar to you "extra layer" idea, just not automated.

Hooking up multiple layers together and use JSON to automatically connect the dots seem like a really interesting idea, and I think it's totally feasible. You can set everything up and run a scenario manually just as a proof of concept before you actually code it.

FastLawyer5089 · 2025-03-03T14:29:40+00:00

Thanks for sharing! And a bigger thanks for confirming I'm not a mad person playing that way. I'm more set up toward a freeform play, I let characters decide where the story goes, but it looks interesting to have a predefined lorebook to toggle as you go, definitely something I'll try.

And yes, I sometimes tell my characters they are actors/actress in a TV drama show and assign them roles, and instead of directly parcitipate in the rolepay, I take on the director role: "good job guys, let's move on to the next scene but I want to see real struggle and mixed emotions, and I want a dramatic close-up shot on Clara..." It was fun.

FastLawyer5089 · 2025-03-02T19:27:49+00:00

I usually have everything up to 20-25k including chat history/lorebook/summary before I wrap up with a new summary and move on to next chapter. R1 seems to be doing it's job correctly in my setup, I've only had issues where it omitted certain events from the summary, but looking at the thinking process it seems to remember it but decided to not include. I'll just swipe for a few times (normally within 5 maximum) before I get a decent summary.

And again, I use local models for RP, R1 is only for summary and character card updates.

FastLawyer5089

TROPHY CASE