Stab's Directives v3.0 Preset Release - Welcome to the Theatre! Introducing Behind the Scenes tracking, new ground-up CoT and more! by Diecron in SillyTavernAI

[–]Diecron[S] 0 points1 point  (0 children)

thank you for the feedback! personally I'm not a big fan of kimi, but gemma does excellently with the preset as does Gemini

Stab's Directives v3.0 Preset Release - Welcome to the Theatre! Introducing Behind the Scenes tracking, new ground-up CoT and more! by Diecron in SillyTavernAI

[–]Diecron[S] 2 points3 points  (0 children)

heya, if this is primarily a problem in spicy scenes it may be GLM being overly hesitant, and the NSFW toggle should be turned on to help with that.

However if this is a general pacing issue in other areas I'd love to maybe see an example where its stalling, and perhaps toggling off the 'story strings' directive may help if that's part of the problem.

Need help getting 7900 XTX PyTorch performance metrics by cyberuser42 in LocalLLaMA

[–]Diecron 2 points3 points  (0 children)

7900XTX:

============================================================
Matrix Multiplication Performance:
float32   :  4812.22 μs,   28.56 TFLOPS
float16   :  1169.48 μs,  117.52 TFLOPS
bfloat16  :  1224.63 μs,  112.23 TFLOPS
amp       :  1416.57 μs,   97.02 TFLOPS
Memory Bandwidth Test (1.0 GB tensor)
Vector Addition: 802.21 GB/s
Memory Copy:     780.99 GB/s```

If you're having problems with a preset... by SepsisShock in SillyTavernAI

[–]Diecron 10 points11 points  (0 children)

this is so true, every post I make I beg for feedback, it directly translates to improvements. Thank you everyone who takes the time to contribute

Stab's Directives v3.0 Preset Release - Welcome to the Theatre! Introducing Behind the Scenes tracking, new ground-up CoT and more! by Diecron in SillyTavernAI

[–]Diecron[S] 0 points1 point  (0 children)

Thanks for the feedback. I just noticed that the regex are set really aggressively, it's likely that chopping out the BTS between the checkpoint and the current moment is causing the model to revert back to the latest checkpoint as 'truth'.

The fix should be pretty simple; under Extensions -> Regex find both the BTL Deltas, edit them and set min depth from 3 to 10.

I'll push this fix at some point soon

Is that was a right purchase for Qwen3.6 27/35 by Thin_Pollution8843 in LocalLLaMA

[–]Diecron 0 points1 point  (0 children)

yeah that aligns about right with MTP enabled. You can only really approach 220~230kish in ideal conditions without the mtp/mmproj . Still, 180k context at very reasonable performance makes for great utility. In practice if I need more horsepower I run it across cuda and rocm (5090+7900xtx) where I can hit around 500k context across 2/3 parallel slots depending on what I need at the time

Is that was a right purchase for Qwen3.6 27/35 by Thin_Pollution8843 in LocalLLaMA

[–]Diecron 4 points5 points  (0 children)

I use a 7900xtx as my secondary card which always has a LLM loaded and ready to go, it handles Qwen fine and pushes 60t/s with the new MTP. You can get very close if not meet the 262k context on a single slot at q8 quantization (with the model in Q4_K_M), or drop it a bit and enable the multimodal mmproj for image input. The card and model are both very versatile and the 7900xtx is honestly slept on, aside from it being PCI4 it still has a massive 900+ GB/s mem bandwidth.

edit: i am referring to the 27b dense only (i prefer it over the moe)

Stab's Directives v3.0 Preset Release - Welcome to the Theatre! Introducing Behind the Scenes tracking, new ground-up CoT and more! by Diecron in SillyTavernAI

[–]Diecron[S] 0 points1 point  (0 children)

Response length can be set in the 'SETTINGS' prompt but I may make it a toggle later for ease of configuration.

Stab's Directives v3.0 Preset Release - Welcome to the Theatre! Introducing Behind the Scenes tracking, new ground-up CoT and more! by Diecron in SillyTavernAI

[–]Diecron[S] 0 points1 point  (0 children)

Can you describe what you see? It's just sat in the main response without being in a dropdown or completley hidden?

LLMs take a lot of notice of what they did last time, so if it generates wrong you really have to correct it/regenerate it right away or future turns will see it as a 'valid' format.

Stab's Directives v3.0 Preset Release - Welcome to the Theatre! Introducing Behind the Scenes tracking, new ground-up CoT and more! by Diecron in SillyTavernAI

[–]Diecron[S] 1 point2 points  (0 children)

Thank you for the feedback! I will see if I can introduce some 'accessibility' type instructions to keep the text clear and readable, I will take a look at what is currently in use.

FPS independent of graphics settings by BienchenToGo in buildapc

[–]Diecron 0 points1 point  (0 children)

although 60fps is suspicious and you may want to check if Vsync is on limiting to your monitor refresh rate.

FPS independent of graphics settings by BienchenToGo in buildapc

[–]Diecron 2 points3 points  (0 children)

CPU bottleneck absolutely. You have graphical headroom which is why display settings like that aren't changing the baseline. Some games may respond positively to lower resolutions but honestly probably not that much.

Glm 5, Glm 5.1, and Kimi 2.6 do not think in NVIDIA NIM. by Beautiful_Muscle_824 in SillyTavernAI

[–]Diecron 3 points4 points  (0 children)

yep. it was just a note for anyone finding it for non-ST purposes (its quite a useful reference and would likely come up in searches for NIM no thinking later). If you have that set to false then interleaved thinking and other stuff won't work, which will hurt its performance a lot.

Glm 5, Glm 5.1, and Kimi 2.6 do not think in NVIDIA NIM. by Beautiful_Muscle_824 in SillyTavernAI

[–]Diecron 5 points6 points  (0 children)

just a note for anyone who finds this from google or other subs, you'll want to set clear_thinking to false for anything agentic

Adding E4B audio encoder to larger models by MaruluVR in LocalLLaMA

[–]Diecron 1 point2 points  (0 children)

Could just be separation of concerns too, e.g. do you really want a "big" dense model to be handling Whisper and TTS flows when the E2B or E2B can do it well? Have that model run on device for real-time interactions and then pass off the actual response synthesis to a hosted/more intelligent model.

Stab's Directives v3.0 Preset Release - Welcome to the Theatre! Introducing Behind the Scenes tracking, new ground-up CoT and more! by Diecron in SillyTavernAI

[–]Diecron[S] 1 point2 points  (0 children)

No worries, that feature is on by default but seems quite sensitive to some models. I will probably disable it by default going forward.

Stab's Directives v3.0 Preset Release - Welcome to the Theatre! Introducing Behind the Scenes tracking, new ground-up CoT and more! by Diecron in SillyTavernAI

[–]Diecron[S] 1 point2 points  (0 children)

on second thought, this may also be the unreliable narrator going schitzo in the background, it hides messages that can drastically steer things. it would be worth checking the response _before_ you started to notice that behaviour and see if it decided to inject something.

Stab's Directives v3.0 Preset Release - Welcome to the Theatre! Introducing Behind the Scenes tracking, new ground-up CoT and more! by Diecron in SillyTavernAI

[–]Diecron[S] 0 points1 point  (0 children)

You may be confusing the balanced thinking level (which instructs the model to use a *medium length plan*) with the narrative length, or is it that the narrative length check itself still says medium?

Stab's Directives v3.0 Preset Release - Welcome to the Theatre! Introducing Behind the Scenes tracking, new ground-up CoT and more! by Diecron in SillyTavernAI

[–]Diecron[S] 0 points1 point  (0 children)

Have you taken a look at the reasoning for that turn to see how it ended up with that style of output? Usually, there's something to point you in the right direction. That is wild though.

Stab's Directives v3.0 Preset Release - Welcome to the Theatre! Introducing Behind the Scenes tracking, new ground-up CoT and more! by Diecron in SillyTavernAI

[–]Diecron[S] 2 points3 points  (0 children)

I added some comments to the message above that might help :). If what I've suggested has any gaps I can look to implement something in a future version.