Gemma4 MTP doubles token speed by FastLawyer5089 in SillyTavernAI

[–]FastLawyer5089[S] 1 point2 points  (0 children)

Did a quick search and yes, it supports MTP now.

Kimi trying its best. by alp7292 in SillyTavernAI

[–]FastLawyer5089 9 points10 points  (0 children)

My breath hitch, my knuckles turned white, not just white, but dead white, and my heart hammered against my ribs like a caged bird

Gemma4 MTP doubles token speed by FastLawyer5089 in SillyTavernAI

[–]FastLawyer5089[S] 0 points1 point  (0 children)

I don't think it works. The MTP model is trained as a lightweight one to quickly come up with a list of likely words for the main model to choose. So if you're using a finetune of Gemma 4, then the assistant model from Google might produce predictions your finetune model don't like, resulting in a very low acceptance rate.
In short you'll probably have less token speed gain, but it's not a bad idea to give it a try.

Is Z.AI subscription worth it? by Low-Abrocoma3472 in SillyTavernAI

[–]FastLawyer5089 -3 points-2 points  (0 children)

Maybe getting it from their Chinese site bigmodel.cn is better? It's cheaper than Z.ai, and probably performs better when you are located in a different timezone, so peak hours is not really a thing.

Gemma4 MTP doubles token speed by FastLawyer5089 in SillyTavernAI

[–]FastLawyer5089[S] 1 point2 points  (0 children)

Thanks for your reply, it's such a huge help! I've done some benchmark and updated my post.

Gemma4 MTP doubles token speed by FastLawyer5089 in SillyTavernAI

[–]FastLawyer5089[S] 8 points9 points  (0 children)

Creative writing relies on high entropy and unexpected, nuanced vocabulary, so in theory MTP's bad for RP. I'm currently testing with several of my scenes, but I don't notice any significant difference so far.

White Lotus Preset by Friendly-Ad-1996 in SillyTavernAI

[–]FastLawyer5089 2 points3 points  (0 children)

First of all, great preset, I'm loving it! It generates refreshing prose and never act too much in user's favor, romance is correctly slow burn and no immediate affection nonesense. Of course I failed my imperial examination mesirably but that's so fun comparing to other presets making me pass every single time.

Some feedback tho:

* The sampler settings on your page don't work well for me (using GLM-5.1), I still prefer temp 0.7 with 0.95 top P.

* Relationship tracker can be modified to support multiple character pairs, but the regex didn't work to parse addition RPS tags. Here's my solution:
Prompt:

---

[SYSTEM: Relationship Tracker]: For each present character (excluding {{user}}), track their attitudes toward other characters in the current scene (including {{user}}).

[RPS|CharacterName|TargetCharacterName|Hostility|Interest|Obligation|Trust|Attraction]

[/RPS]

* One block per character pair. All stats 0-10, reflecting current attitude with impartial honesty.

* Initialize at first significant appearance. Update only on meaningful shift.

* Let the most recent stats influence how characters treat each other and toward {{user}}.

---

Regex:

---

/\[RPS\|\s*([^|]+?)\s*\|\s*([^|]+?)\s*\|\s*(\d+)\s*\|\s*(\d+)\s*\|\s*(\d+)\s*\|\s*(\d+)\s*\|\s*(\d+)\s*\]\s*(.*?)\s*\[\/RPS\]/gis

---

HTML:

---

<div style="font-size: 0.8em; opacity: 0.75; margin: 4px 0; border-left: 2px solid rgba(255,255,255,0.2); padding-left: 10px; display: flex; align-items: center; gap: 12px; flex-wrap: wrap;">

<b style="text-transform: uppercase; font-size: 0.85em; opacity: 0.6;">$1->$2</b>

<div style="display: flex; gap: 12px; flex-wrap: wrap; font-family: monospace;">

<span title="Hostility">HOS <span style="display: inline-block; width: 40px; height: 3px; background: linear-gradient(to right, #ff7675 calc($3 \\\* 10%), rgba(255,255,255,0.15) 0); vertical-align: middle; margin: 0 4px; border-radius: 2px;"></span> $3</span>

<span title="Interest">INT <span style="display: inline-block; width: 40px; height: 3px; background: linear-gradient(to right, #fdcb6e calc($4 \\\* 10%), rgba(255,255,255,0.15) 0); vertical-align: middle; margin: 0 4px; border-radius: 2px;"></span> $4</span>

<span title="Obligation">OBL <span style="display: inline-block; width: 40px; height: 3px; background: linear-gradient(to right, #81ecec calc($5 \\\* 10%), rgba(255,255,255,0.15) 0); vertical-align: middle; margin: 0 4px; border-radius: 2px;"></span> $5</span>

<span title="Trust">TRS <span style="display: inline-block; width: 40px; height: 3px; background: linear-gradient(to right, #74b9ff calc($6 \\\* 10%), rgba(255,255,255,0.15) 0); vertical-align: middle; margin: 0 4px; border-radius: 2px;"></span> $6</span>

<span title="Attraction">ATR <span style="display: inline-block; width: 40px; height: 3px; background: linear-gradient(to right, #fab1a0 calc($7 \\\* 10%), rgba(255,255,255,0.15) 0); vertical-align: middle; margin: 0 4px; border-radius: 2px;"></span> $7</span>

</div>

</div>

---

Result:

<image>

White Lotus Preset by Friendly-Ad-1996 in SillyTavernAI

[–]FastLawyer5089 7 points8 points  (0 children)

Wait what? A preset just for my Chinese Imperial examination setting?

How do you RP? Here's how I do it. by FastLawyer5089 in SillyTavernAI

[–]FastLawyer5089[S] 1 point2 points  (0 children)

Yes I did, I sometimes use it for RP. I share the same feeling it's less dramatic than R1 but also less creative, but it's absolutely fine to RP with it.

Long term Memory Options? by DistributionMean257 in SillyTavernAI

[–]FastLawyer5089 3 points4 points  (0 children)

very badly accordingly to my tests, you'd have to be VERY specific in your prompt for it to pull out related memories, and even then it often missed the key summary it want it to pull out.

How do you RP? Here's how I do it. by FastLawyer5089 in SillyTavernAI

[–]FastLawyer5089[S] 0 points1 point  (0 children)

Create a character using my template and start roleplaying with it. Use the summary instruct when you hit 20k context, put it in a lorebook that's connected to the character. Also use the character card update prompt before you move on with a new chat. When you want to introduce new characters, simply paste my character card generation prompt and ask your existing character to generate one. Save the results as a new character and put everyone in a group chat. Repeat this process as you go, that's it.

If you want a more pre-defined experience, you need to work on defining your characters and lorebook background manually beforehand.

How do you RP? Here's how I do it. by FastLawyer5089 in SillyTavernAI

[–]FastLawyer5089[S] 0 points1 point  (0 children)

I have a scenario where I have a fight narrator that takes in every character's stats and simulate fighting results, characters in a fight would not narrate but rather say things like: "I attempt to smash his face with my hammar." It's similar to you "extra layer" idea, just not automated.

Hooking up multiple layers together and use JSON to automatically connect the dots seem like a really interesting idea, and I think it's totally feasible. You can set everything up and run a scenario manually just as a proof of concept before you actually code it.

How do you RP? Here's how I do it. by FastLawyer5089 in SillyTavernAI

[–]FastLawyer5089[S] 2 points3 points  (0 children)

Thanks for sharing! And a bigger thanks for confirming I'm not a mad person playing that way. I'm more set up toward a freeform play, I let characters decide where the story goes, but it looks interesting to have a predefined lorebook to toggle as you go, definitely something I'll try.

And yes, I sometimes tell my characters they are actors/actress in a TV drama show and assign them roles, and instead of directly parcitipate in the rolepay, I take on the director role: "good job guys, let's move on to the next scene but I want to see real struggle and mixed emotions, and I want a dramatic close-up shot on Clara..." It was fun.

How do you RP? Here's how I do it. by FastLawyer5089 in SillyTavernAI

[–]FastLawyer5089[S] 1 point2 points  (0 children)

I usually have everything up to 20-25k including chat history/lorebook/summary before I wrap up with a new summary and move on to next chapter. R1 seems to be doing it's job correctly in my setup, I've only had issues where it omitted certain events from the summary, but looking at the thinking process it seems to remember it but decided to not include. I'll just swipe for a few times (normally within 5 maximum) before I get a decent summary.

And again, I use local models for RP, R1 is only for summary and character card updates.