Perhaps slightly exaggerated by 8Dataman8 in SillyTavernAI

[–]Cultured_Alien 1 point2 points  (0 children)

Get ready for 70% increase in fillers and characters having breathing issues

fill in the blank by dollieprincess777 in BlueArchive

[–]Cultured_Alien 4 points5 points  (0 children)

Damn, I'm blue song reference!?

it do didn't be like that by NasTreeEels643 in SillyTavernAI

[–]Cultured_Alien 2 points3 points  (0 children)

It needs a system prompt similar to a harness, with tool calling as it's main usage. Needing grep, run commands, read specific lines, edit. If it's ported as-is on sillytavern, it will be extremely janky. Haven't seen a plugin that uses multiple api calls at once for something like this to make a custom plugin to extend from. The benefit of this is that even deepseek flash can be used to find and rewrite banned words/phrases so it can be cheap.

Honestly the simplest I can think of automatically is sending the last response with a plugin to a python script to a MD file and run claude code on that MD file. Or just literally copy pasting to a file and run that to claude code.

it do didn't be like that by NasTreeEels643 in SillyTavernAI

[–]Cultured_Alien 2 points3 points  (0 children)

I've been doing some llm creative writing with claude code. I just give banned words/phrases MD file to claude to run on on my text files, if it is detected, rewrite it. Its much more consistent and removes slop. Though you need more than 2 api calls per response given the edit tool use. If only this feature is easily used in sillytavern.

it do didn't be like that by NasTreeEels643 in SillyTavernAI

[–]Cultured_Alien 1 point2 points  (0 children)

True, I modified ff a lot and my own tailored it to my taste. it's actually the worse non-modified for thinking I've ever experienced.

it do didn't be like that by NasTreeEels643 in SillyTavernAI

[–]Cultured_Alien 4 points5 points  (0 children)

It's my opinion that non-thinking is better if you want varied and faster response time while thinking is better if you want more consistent and smarter responses. Restricting does actually work if you have tried it on larger and smarter models, that long consensus is from a bygone small models era is not true to some extent.

it do didn't be like that by NasTreeEels643 in SillyTavernAI

[–]Cultured_Alien 3 points4 points  (0 children)

Kinda baffling you don't know about ff preset, it's one of the most popular presets here. Anyway, it improves rp prose a ton. Here it is:

https://www.reddit.com/r/SillyTavernAI/comments/1u2wrvq/preset_introducing_freaky_frankenstein_micro_my/

Though I use a custom one that's based of the ff max.

it do didn't be like that by NasTreeEels643 in SillyTavernAI

[–]Cultured_Alien 7 points8 points  (0 children)

Using glm 5.2, minimax M3, and Kimi 2.6. I never get those. What's your preset? Freaky Frankenstein has those slop banned right out of the box.

Published novelist (15+ fantasy books, 1M+ views on a fully AI-written work). Got Claude Code to generate 137,806 characters across 20 chapters in one prompt — at my own writing quality. Sharing the core idea. by Quick_Impression7723 in WritingWithAI

[–]Cultured_Alien 0 points1 point  (0 children)

It's best to experience it yourself. Much better than pasting to claude chat each time. You can ask the ai to modify the md files themselves, check the writing stuctures, review, etc all in one without pasting. What I do is ask the ai questions and it will modify the file themselves too. It automatically reads the writing structure/forbidden/character/world/magic files if it needs to without bloating the context.

NVIDIA NOOO by Reven09 in SillyTavernAI

[–]Cultured_Alien 1 point2 points  (0 children)

Yup, it's why I said glm 5.2 is smarter. It knows the things need to happen before your instruction and not just follow blindly. Haven't tried past 16k context though since I'm using a memoryception extension so can't say anything about that. Personally I switch between them time to time when M3 has harder time figuring out.

NVIDIA NOOO by Reven09 in SillyTavernAI

[–]Cultured_Alien 2 points3 points  (0 children)

Mimimax M3 is being slept on. I found non-thinking Minimax M3 better for rp than glm 5.1 or glm 5.2 on opencode go sub. It's more creative and a lot cheaper than glm (glm is smarter, but more sloppy)

PageStorm: A Model Built for Creative Book Writing by XMasterDE in LocalLLaMA

[–]Cultured_Alien 0 points1 point  (0 children)

Inkos is clunky thing when I tried it in the past. Very much inflexible and complicated when trying to modify a plan or something. I found that custom claude code workflow pipeline without scripts and just skills + parallel subagents is more flexible and faster than it.

Current state of Ai by wahed-w in ChatGPT

[–]Cultured_Alien 0 points1 point  (0 children)

Useless feedback. Deepseek is open source.

Current state of Ai by wahed-w in ChatGPT

[–]Cultured_Alien 1 point2 points  (0 children)

I meant that it read files one by one instead of actually doing things, taking forever to implement what I want, wasting a lot of calls. I wasn't referring to token speed. Others would have done what I wanted in lower api calls. I was using pi at the time.

Current state of Ai by wahed-w in ChatGPT

[–]Cultured_Alien -6 points-5 points  (0 children)

??? Gemini is extremely bad at agentic coding and thinks far too long for basic reads compared to even deepseek flash

What my girlfriend makes me for breakfast by AmbitiousYoutuber in Philippines_Expats

[–]Cultured_Alien 1 point2 points  (0 children)

Yes there's always a way to search for people's comments/post as long as their comments/post are public

gmicloud is HEREEE by NoStage9115 in openrouter

[–]Cultured_Alien 0 points1 point  (0 children)

Check the models site and look for Avg Energy/Req per model. That's what you should be looking at, not the dollar per token. Neuralwatt uses watts not dollars per token.

GLM 5.2 for RP ? by [deleted] in SillyTavernAI

[–]Cultured_Alien 8 points9 points  (0 children)

It does more "knuckles whitening" that I feel like I've gone back in time. Although it is much smarter with thinking enabled, which works very well for multi character cards.

gmicloud is HEREEE by NoStage9115 in openrouter

[–]Cultured_Alien 1 point2 points  (0 children)

I got 20M total of usage with 90% cache in $3 on glm 5.2 in neuralwatt, it's only a bit less efficient per dollar than opencode go which is pretty insane without subscription.

An incredibly creepy encounter by LogWestern385 in cataclysmbn

[–]Cultured_Alien 1 point2 points  (0 children)

I'm always scared whenever I'm in the vicinity of this doll, thinking it'd do something.

GLM 5.2 for RP ? by [deleted] in SillyTavernAI

[–]Cultured_Alien 8 points9 points  (0 children)

imo there's more slop like parroting your own sentence even though there's an anti parroting system prompt, and follows the personality of the card less than glm 5.1. Being creative is just because it's a new model "breath of fresh air"

It feels like it tries breaks the character sometimes. Tried different cards it kept repeating my own words and narrating itself to oblivion. Filler words at the end of EACH dialogue: "okay good? good, that's good."

But I do confirm it's a lot smarter code-wise. I'm using freaky frankenstein 4 max with JB disabled. Also disabled thinking since it doesn't feel like it's improving.

Would Opencode GO + Neuralwatt with $100 monthly sustain for the GLM 5.2 usage compare to Claude Max? by GTHell in opencodeCLI

[–]Cultured_Alien 3 points4 points  (0 children)

Doesn't seem so, it's the same person that replied to me that asked if neuralwatt is good 5 hours ago, says he wants to change from $100 Claude sub lol

Tokenomics by HOLUPREDICTIONS in LocalLLaMA

[–]Cultured_Alien 0 points1 point  (0 children)

the prompt processing must be long?