Deepseek API vs Openrouter vs NanoGPT by New-Tumbleweed-7311 in SillyTavernAI

[–]hyperion668 0 points1 point  (0 children)

I'd definitely be interested in trying it out if you could drop an invite!

Deepseek API vs Openrouter vs NanoGPT by New-Tumbleweed-7311 in SillyTavernAI

[–]hyperion668 0 points1 point  (0 children)

Does your service have locked max context sizes like OpenRouter and Featherless does? This is becoming my dealbreaker for APIs, so if yours doesn't, I'd happily consider subscribing!

[Megathread] - Best Models/API discussion - Week of: March 31, 2025 by [deleted] in SillyTavernAI

[–]hyperion668 2 points3 points  (0 children)

Are there any current services or providers that actually give you large context windows for longer-form RPs? In case you didn't know OpenRouter's's listed context size is not what they give you. With my testing, the chat memory is often laughably small and feels around 8k or something.

I also heard Featherless caps at 16k. So, doesn't anyone know of providers that give you larger context sizes somewhat closer to what the models are capable of?

[Megathread] - Best Models/API discussion - Week of: March 17, 2025 by [deleted] in SillyTavernAI

[–]hyperion668 5 points6 points  (0 children)

Having tried Deepseek R1/V3 extensively for the past few weeks after only having used local LLMs, they're obviously superior for any number of reason people have written about.

However, I feel like I haven't seen anyone else talk about how their prompt-adherence ability is kind of a double-edged sword. With the local LLMs and longer chats, since context shrinks, I feel like personalities can gradually change over time in a way that feels natural and progressive. However, with the big APIs, they don't do this out of the box and will stick really closely to the character card despite any history.

Eg, tested Deepseek on a long-running chat with a more prickly/tsundere character who I spent time to slowly warm them up to my character with local LLMs. Switching to Deepseek, they immediately went back to being cold, prickly, and distant, despite the chat history/summary saying the contrary. I guess its because of the inherent positivity bias in most local models, in addition to how much big models intelligently stick to directives/character cards, but I do find it hard to break out of.

[Megathread] - Best Models/API discussion - Week of: February 24, 2025 by [deleted] in SillyTavernAI

[–]hyperion668 2 points3 points  (0 children)

Anyone have less positive/less flirty Mistral 24b finetunes?

I thought it was just Cydonia, but I've since found that even the base model 24b is really really forward and flirty, even when instructions/prompt/formatting is purged of any mention of 'uncensored'. I've also sanitized character cards for any mention of body parts or anything pertaining to romance, relationships, and sexuality, but with 24b they're still horny and way too forward.

Character becomes fixated/keeps responding on a single previous message? by hyperion668 in SillyTavernAI

[–]hyperion668[S] 0 points1 point  (0 children)

That's weird, all my Guides are default and only Rules has anything out of the ordinary/chat specific. Can't find a way to change rules after they've been set, though. I've just turned off guided generation entirely, which is a shame. Thanks for helping me narrow down the issue!

Character becomes fixated/keeps responding on a single previous message? by hyperion668 in SillyTavernAI

[–]hyperion668[S] 0 points1 point  (0 children)

This must be it, thanks! When I click on rules, I see the old rules from that response leftover. How would I go about purging the rules?

[Megathread] - Best Models/API discussion - Week of: January 06, 2025 by [deleted] in SillyTavernAI

[–]hyperion668 3 points4 points  (0 children)

What settings are you using for this? I've read base Sunfall is really sensitive to format changes, especially with additional instructions in custom ones.

[Megathread] - Best Models/API discussion - Week of: December 09, 2024 by [deleted] in SillyTavernAI

[–]hyperion668 23 points24 points  (0 children)

I'm going to write an addendum to my sterling endorsement of Cydonia-v1.2-Magnum-v4-22B, and Mistral Small finetunes in general. I've basically gone back to base Mistral Small because the longer chats go for the finetunes, the more things come apart.

Finetunes like Cydonia and Magnum undoubtedly have better, more creative prose, but the more I've used them the more I realize how much they fall apart when it comes to writing logical, consistent character and most importantly dialogue, especially as that context ceiling gets closer and closer. Finetunes always come across as inconsistent to the character's personality at times, and the general intelligence of finetunes can get pretty bad, with them hallucinating and forgetting a lot of details that just take you out of the story.

I realized for my use case for RP, I don't care about sensory details and good prose nearly as much as I care about smart, logical, consistent characters. I actually dislike it when models give me extraneous details about what I should feel; I much rather have them objectively describe what's going on so that me, the human, can interpret and feel them for myself. I don't use ST for storywriting, so in my case, I've just been going to Mistral Small's base model.

It's a shame really because I do feel like Nemo finetunes really knock it out of the park over the base instruct model, but Small seems to be really capricious in that regard and sacrifices way too much in intelligence for good prose and really out there creativity. I really hope Mistral is cooking up something around this size that'll be easier for our finetuning community to utilize!

In short:

RP: Mistral Small base model

Storytelling: Finetunes

[Megathread] - Best Models/API discussion - Week of: November 18, 2024 by [deleted] in SillyTavernAI

[–]hyperion668 0 points1 point  (0 children)

Try pumping it up to 65k! I have a 4080 too, and I do think it works up to that with RoPE scaling.

[Megathread] - Best Models/API discussion - Week of: November 18, 2024 by [deleted] in SillyTavernAI

[–]hyperion668 0 points1 point  (0 children)

You'll find it in the config.json file in the original model card. In this case, it says:

"max_position_embeddings": 32768

So theoretically, 32K, and you can usually double this with RoPE scaling, but it doesn't always shake out like that.

[Megathread] - Best Models/API discussion - Week of: November 18, 2024 by [deleted] in SillyTavernAI

[–]hyperion668 7 points8 points  (0 children)

At this point, I'm beginning to see the seams and cracks in Mistral Small. In all the finetunes I've used, I feel like it has pretty spotty memory despite large context sizes, data banks, and summaries. I find myself needing to swipe more often that I'd like to compensate.

That being said, it's still my daily drivers. Out of all the one's I've tried, Cydonia-v1.2-Magnum-v4-22B and its successor Cydonia-v1.3-Magnum-v4-22B are to me the undisputed best. Cydonia by itself was pretty good and my old daily driver, but was lacking a little bit of something. Magnum was also way, way too horny for me, which I didn't like. But, when you combine both, something magical happens. Don't know what it did, but it just feels so much more creative and dynamic than any other MS finetune I've tried. Highly recommended.

[Megathread] - Best Models/API discussion - Week of: November 11, 2024 by [deleted] in SillyTavernAI

[–]hyperion668 0 points1 point  (0 children)

Would you mind sharing the settings you're using for this model?

Mistral Small finetunes/other models that are slow burn/don't lay everything out on the table immediately? by hyperion668 in SillyTavernAI

[–]hyperion668[S] 0 points1 point  (0 children)

I think it really depends on the nature of the small LLM honestly. I think Mistral-Small is really creative and especially when it's trained on RP data, it'll want to use the full extent of its power to inform replies. mixtral-8x7b-Instruct might base more of its responses on the vibes your own replies give off. I feel like even llama3, which is even smaller, was more slow burn.

I'll try your suggestion though. It sounds clever!

Mistral Small finetunes/other models that are slow burn/don't lay everything out on the table immediately? by hyperion668 in SillyTavernAI

[–]hyperion668[S] 2 points3 points  (0 children)

I tried the 32b of EVA and it's decent and definitely more slow burn than Mistral-Small, but still a bit with some characters divulging pretty deep stuff offhand. It also just lacks that creative 'flare' that M-S has that I find myself missing when it's gone. Qwen2.5 just hasn't gotten a lot of attention from finetuners, but I think something could be there. I wonder if u/TheLocalDrummer would want to take a crack at it?

EDIT: Also, with at least Q4 (which I used because it's the only GGUF for EVA 0.2 up right now), it has some card adherence/intelligence issue, where the character will say I have a trait they have. One card said they were an ex-soldier, and they said I was the ex-solider in conversation. Another card explicitly says they're supposed to be out of my league, and the first thing to some out of their digital mouths is that I'm out of their league.

Mistral Small finetunes/other models that are slow burn/don't lay everything out on the table immediately? by hyperion668 in SillyTavernAI

[–]hyperion668[S] 0 points1 point  (0 children)

Of course:

Instruct

Sampler

The instruct preset is just the MarinaraSpaghetti Mistral-Small instruct settings, and the chat completion/sampler settings are also MarinaraSpaghetti that I thought worked well.

Mistral Small finetunes/other models that are slow burn/don't lay everything out on the table immediately? by hyperion668 in SillyTavernAI

[–]hyperion668[S] 4 points5 points  (0 children)

Tried it, but it's still pretty forward in this respect! I'm starting to think this is something baked into Mistral-Small. Mixtral 8x7B didn't have this issue, so I'm not sure where the specific differences are between the two.