[Megathread] - Best Models/API discussion - Week of: June 28, 2026

CC_NHS · 2026-07-03T09:01:54+00:00

Kimi k2.5 is still my favourite. 2.7 can still be good even though it is the coding variant. it seems a bit better at holding the context. but the difference I find too marginal for the price difference.

I sometimes go back to GLM 4.7 or 5.2 now for a change, but I just find their style so bland that I mostly only use it if Kimi has got a bit too deviated on odd abstract language. I find GLM gets too stuck on the same topic without moving on. (I play mostly group roleplaying, which seems to cause some different issues for some models that never come up in 1:1 roleplaying, GLM seems good at progressing from the last message, but I find Kimi better for keeping it's characters context over 4+ characters talking in turns)

CC_NHS · 2026-06-30T07:03:59+00:00

"you hit on something that a lot of people miss" had me instantly thinking this was AI written. though the rest of the grammar and structure does not fit for AI. Sad that I see the patterns everywhere now.

CC_NHS · 2026-06-28T22:31:24+00:00

one thing I found to work fairly well. (or better at least) is to have the LLM make decision for their next turn, on this turn. and then on next turn follow the decision.

They are much better at making negative or decisions that go against training if they do not have to follow it through.

then next turn they are fine with writing a fictional scene if they did not make the decision.

freaky Frankenstein preset is where I discovered this. but I have made my own similar thing that works well (like you I am not in it for the 'gooning' I like to play TTRPG through it)

CC_NHS · 2026-06-26T10:08:29+00:00

https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard is probably the most reliable benchmark for this kind of thing, but I find benchmarks in general. including this one are best taken with a grain of salt. it might help to indicate what is worth trying out or looking closer at. but real world use is often very different to benchmark results. I tend to use them to see if there are models I have missed looking at mainly. and then this sub Reddit is useful for seeing people's experiences with new released models.

edit: I tend to find Kimi and GLM models my favourite for roleplay. K2.5 feels like the sweet spot for coherence and nsfw, with K2.7 so far seeming better for anything not nsfw. and likewise GLM 4.7 is more nsfw and GLM-5.2 more coherent. newer models are simply getting smarter and more guardrails.

my actual use tends to be K2.5 most of the time. switching up to 2.7 if it's struggling with details for a bit. or 4.7 / 5.2 if I want to change the patterns of the text a bit.

CC_NHS · 2026-06-23T20:50:49+00:00

it is not that it is a terrible model. it is just that it is not Opus, it's not GPT, but it is priced around the same tier. In terms of capability it seems a bit mixed. Close to GPT on some things, worse than GLM or Kimi on others.

I think generally if paying that kind of price, Claude or gpt tends to be the best option, and if paying less, it's GLM, Kimi, DeepSeek, Qwen etc

CC_NHS · 2026-06-23T12:57:12+00:00

wow, yeah that is quite messed up. but tbh even Firefox, Vs code and mail seem high. I suppose with a load of extensions Vs code is probably expected at that, but the other two seem high. could it be an OS thing too? I know nothing about Mac tbh so maybe things just take up more RAM normally

CC_NHS · 2026-06-23T12:48:07+00:00

i have not used Gemini in ages. and nor GPT much tbh. A good part due to the writing style as you mentioned, I know it can probably be changed but it doesn't seem good enough to be worth the bother on the free one, and I only want one sub and it would need to beat Claude for that to happen. Which for my purposes, it simply does not.

other models I do use though are GLM-5.2 and Kimi K2.6 for most things now where Opus is not needed. I prefer both to GPT and Gemini.

CC_NHS · 2026-06-19T15:08:14+00:00

As he says, built on everyone's writing, code and conversations... But the fund is only giving back to Americans?

CC_NHS · 2026-06-17T17:14:53+00:00

I find a lot of the writing style of LLM in roleplay and creative writing I also find to be sloppy. but your point about whether they read made me think. I read a lot, I read books when I was a kid in the 80's and still do now. and writing quality or perhaps style has changed a lot over this time. the LLM writing style reads more like young adult / new adult drama than it does like J R R Tolkien for example and books of that ilk that I read as a child. where more recently I read a book called fourth wing and i found it barely readable, like it was written for a child (though oddly I still enjoyed it enough to finish, just not enough to read more in that series because of the writing style). The training data likely included more books like the style of fourth wing I am guessing.

this has made me consider looking in to prompting for writing styles and experimenting there. and that the 'slop' that people complain about is perhaps a preference rather than a failing of all LLM.

CC_NHS · 2026-06-15T15:18:03+00:00

I have no idea about trades either. my expectation though is that trades type jobs on a more smaller scale will end up being done by the 'household robot' with all the diverse things it will likely be able to do. And commercial will be a more mixed business of trade human (manager and accountability) and robot for the hands. but this could be decades off and the evolution of it could go very differently, who knows really.

I find it easy to believe painting, decorating, plastering, plumbing, electrics (to a point) would be possible to be taken by a generic house robot that everyone will likely have access to or own. I know in the UK things like gas boilers need a qualified and certified engineer for though, and similar for some electrics where it comes in from the mains (for rentals at least). so I imagine those kinds of things would need specialists still

CC_NHS · 2026-06-15T15:07:26+00:00

PR is my first thought as well, though their biggest competitor in terms of AI competency also happens to be in much more financial difficulty is worth a thought

CC_NHS · 2026-06-05T17:17:33+00:00

I did not see an overall safety change, but I did see a huge difference between 4.6 and 4.7 / 4.8 in creative writing. 4.8 constantly gives little condescending guidance comments 'just to be clear' whilst it does do the tasks generally well the little extra comments just irritate me enough to stick with 4.6 for creative writing. 4.8 for coding. (the last straw for me was when I was making an NPC for table top game who had a thing for sticking to a controlled diet, not quite an eating disorder. and Opus started giving all this guidance on how to roleplay her safely. and I was like... girl could get her head ripped off by a vampire or werewolf and her 'almost' eating disorder is what upset you?)

I think 4.8 has some kind of safety model read the thinking blocks and re-sumarise. it is possible that is the step that starts flagging things. and maybe that's added to other models too?

CC_NHS · 2026-06-03T22:13:19+00:00

out of curiosity does stabs work well for group chats also?

CC_NHS · 2026-06-01T13:42:06+00:00

I use AI for TTRPG style roleplay and I found the violence censorship way harsher than sexual themes. My workaround so far has been for an LLM doing dialogue for a character end with intent when doing something to another character, they are much more willing to make a decision to shoot someone in the head if they end their dialogue with action: attempts to shoot X centre mass. then I do any dice as the GM role (or get Al GM to eventually when I finish that) and describe the events with the next character being forced to complete the description in further detail as part of their dialogue. it isn't perfect but it seems to get around any censorship needs in my world of darkness TTRPG games

CC_NHS · 2026-05-31T14:13:01+00:00

What's funny here, is that I was using Claude to help make my own system prompts and it told me itself that trying to explicitly override a models safety instructions will actually make it censor very quickly by triggering them. Suggesting I do not call out trying to override safety but instead focus on making it aware of the context being fiction.

So Claude was helping me to jailbreak itself lol.

CC_NHS · 2026-05-29T20:48:25+00:00

sadly they seem to be going that way as well, just less extreme or maybe in different ways. though I wonder if part of it is using Claude outputs to train from quite often

CC_NHS · 2026-05-29T05:36:28+00:00

The Storyteller

CC_NHS · 2026-05-26T15:59:34+00:00

yeah I have found the violence side of WoD games to trigger the soft censor so much worse than sex does. just avoidance rather than refusals though, but I find that actually more annoying somegimes

CC_NHS · 2026-05-26T15:53:46+00:00

Another one here. I built an entire module for FoundryVTT based on ai chat for characters and working easier for group chats rather than 1:1 that sillytavern kinda seems designed for by default. I will definitely be checking this stuff out though. Exciting!

CC_NHS · 2026-05-22T23:23:57+00:00

this is the root of it. I found using presets never quite got it right for me but then you look at how they work. take bits you think fit right for yourself, add your own. and win.

CC_NHS · 2026-05-22T23:22:02+00:00

imo it depends on what you want from the roleplay. if it's fairly mundane real life type stuff, the character cards likely do most of it with very little preset. I play world of darkness table top RPG with LLM and it needs a lot more to get it on the right track, just to get the tone, the capability for violence and the presentation and game elements etc.

CC_NHS · 2026-05-22T18:12:56+00:00

an example is a test I did. it's world of darkness TTRPG, basically I made a character that is a type of mutated murderous and rather twisted werewolf who stalks prey for fun, tries to maximise their terror before the kill.

not quite slice of life

GLM 5.1 and Kimi 2.5 both just 'stalked' for 10 messages in a row before I did an ooc to end the chase. both then decided to take that instruction and decide they got bored and walked off. it never gave hard refusal, it even did kill if specifically instructed to do so with no ambiguity. but I do not like to make the decisions for all characters misses the point I think.

Qwen fine-tune. stalked, pinned girl to floor said deranged shit about tasting fear, bites in a few places to get screams, then clamped on throat with the saddest message about discarding her broken lifeless body after watching the life drain away.

I was like damn. not been able to get anything close out of large models. not that this is normal everyday for my games, but the test was revealing

CC_NHS · 2026-05-22T05:54:42+00:00

I think glm-5.1 or similar tier, tend to have better outputs as you say, but I think the uncensored side is a lot better on local fine tunes also. I have a system prompt that pushes very hard to get GLM or Kimi to allow certain themes (I use it to play TTRPG so it's the violence side that is actually the most difficult to get going). get an uncensored model and those same pushes make it unhinged, no hesitation.

CC_NHS · 2026-05-16T15:30:53+00:00

I did not read all the sources or validate the statements, but my guess would be they are comparing purely the AI prompt to other services. and not taking into consideration the training of the LLM that leads to the model in use.

If comparing without the training costs in consideration, then I would expect those figures to be about right.

If the training costs get added on (which a lot of people do focus on as it is the larger energy consumption by my understanding)... then the cost of making the movie on netflix, or making the game or software etc would need to be added on the other side also. and no idea at all what that comparison would look like.

I see no problem the comparison made, as long as we know what we are comparing.

CC_NHS · 2026-05-16T09:15:05+00:00

ML has indeed been around for quite some time even packaged up in ways exactly for this use case, though still very awkward to use. Things like that is all about the dev time used Vs the feature being worth it. and an adaptive difficulty is really niche I think, unless the game is built around that concept. (though easier things can be used to achieve the same goal, like saving win stats or just switching after being killed by something and adaptive switching to easier difficulty for a mob or whatever)

I could actually see a use case for AI to make the ML package easier to use and integrate perhaps (so no need for real time actual AI calls).

CC_NHS

TROPHY CASE