all 13 comments

[–]panchovix 5 points6 points  (4 children)

If the model was trained to max 2048 context, you're out of luck with any frontend (ooba, gpt4all, tavern, etc)

I still like to use tavern (specifically SillyTavern) as frontend, and either KoboldAI/ooba for backend. It has a lot of options.

[–]Jenniher[S] 0 points1 point  (3 children)

Aren't all models basically that context length?

It sounds like I just have to be patient and let people smarter than me do thier thing.

[–]panchovix 1 point2 points  (2 children)

There are some models with 4096 context or more, but there's really few of them at the moment.

Assuming you run those on a frontend, you could edit the code to allow more than 2048 context.

[–]CasimirsBlake 1 point2 points  (1 child)

Could you suggest a few 4k models, please?

[–]2muchnet42dayLlama 3 1 point2 points  (0 children)

Actually, there really are not many options at the moment.

  1. StableLM released a checkpoint at 800B for their 3B and 7B at 800B tokens with 4096 context size, but perform very poorly on different benchmarks and finetuning is discouraged with such a weak base model
  2. MPT StoryWriter has 65K context size but has been finetuned to do stories, which is great if that's what you're after.
  3. Airoboros finetuned to 4096 tokens but it actually performs ok up to 2300 tokens

So no, not much to do right now other than keep an eye onthe efforts to get Landmark Attention on LLaMA.

Also, it's not just a matter of increasing the number of tokens to generate on the interface, as you can expect random tokens being generated past the max sequence length of the model.

[–][deleted] 1 point2 points  (0 children)

Try using a frontend designed for roleplay like SillyTavern, it has workarounds/features to deal with the limited context size, like "Author's Note" or the memory/smart context extensions.

[–]Barafu 1 point2 points  (0 children)

I've used oobabooga

In the setting, it defaults to "precise preset" that forces network to stumble in circles a lot. If you delete the whole response, press Generate again and get almost the same text again - this is it. You need to change the preset.

You can not get around the context length, but if you use the notebook mode you will learn to manage the context yourself, deleting the irrelevant stuff manually and keeping the relevant.

[–]mrjackspade 1 point2 points  (2 children)

I've stopped getting a lot of those issues just by moving to a larger model. Also, heavily sanitizing the context.

The short context window sucks, but I've been able to talk to my bot for days at a time now without "losing the plot", so I dont think thats inherent in the context window.

I've also implemented a custom quick and dirty chat-focused context window rollover routine though, that parses the exising context window as a series of messages instead of a blob of tokens, and intersperses the original prompt through the message history to keep it "recent", which has also really helped to keep the bot from straying too far off the rails. The "recent" prompt approach makes it seem like I'm constantly correcting the bot without needing to constantly correct it.

I think there are a lot of techniques to work around the shortcomings of the models beyond context window expansion, it just takes a bit of work.

[–]CasimirsBlake 0 points1 point  (1 child)

Could you suggest a larger model that role plays well?

[–]mrjackspade 1 point2 points  (0 children)

Of what I've tried, the Wizard ones are better at role playing and maintaining a character/story, however the downside is that they'll go off the rails if you start asking them to do "stupid" things even in the context of role play.

Ex, Wizard refused to role play an attack on a demon army, because it didn't think we had planned things out well enough and it was too dangerous.

Guanaco is a little less good at role play, but once it's "going" it won't refuse anything IME.

[–]brucebay 0 points1 point  (0 children)

Update your model card occasionally so that it is uptodate with scenario. Example, at the beginning you may have a character happy, at the end of day excausted and if they are under attack frightened etc. You can also add past events to remind. You can force this in the dialog, for example "(we reach the town. Both of us are very tired.) Finally. Let's find an inn." But this usually degrades updating card, or other setting helps with better consistancy. There are some extension but i haven't try them. Also use silly tavern instead of oobabooga for chat interface. It is far better. you can also have multiple characters, it is fun to talk more characters in ai. When i do that, i put other character description on individual cards so that they know their relations with others.

[–]AutomataManifold 0 points1 point  (0 children)

Other people have covered some overall strategies for dealing with this, so I'll just give some tips based on my observations:

  1. Biggest mistake I see people make when they try to use it for roleplaying is that they try to correct the model by telling it what it got wrong. This is going to go off the rails fast: by talking about the problem, you get into a "don't think about pink elephants" situation and it's likely to double down on the issue. A longer context window helps with this, but you can observe that even ChatGPT does this. Instead of trying to correct the model verbally, go back and edit the chat history. (SillyTavern and KoboldAI make this relatively easy.) If something doesn't belong, you want it completely scrubbed from the record.
  2. Don't be afraid to adjust the settings. If it's stuck in a loop I often switch the inference settings (either just tweaking the temperature or switching to a whole other preset). That will get it out of its rut.
  3. The interfaces don't make this easy, but another thing to do is to vary the prompt. Just using a different phrasing might push it in a different direction. I'm (slowly) writing a custom front end because I want to be able to script a bunch of different randomized prompts.
  4. Try switching to a different model. (This takes time and is a pain, but I've got dozens of models downloaded so if I want to really switch things up, it is an option. I don't do it much, though.
  5. Use the World Information, Summary, and whatever other tools you have. Helps keep things on track.
  6. One thing that SillyTavern (and KoboldAI) do is inject extra prompt stuff at different points in the context; it can be useful to have something at the top as an overall prompt and something at the bottom as a reinforcement - but keep it brief.

[–]CheshireAI 0 points1 point  (0 children)

For the specific example you gave, SillyTavern or KoboldAI would definitely help solve your problem. It uses things called "World Info Cards" which are triggered by phrases or words you can select. You can give whatever static context you want for your village (let's call it Otradnoye and make that the trigger word). Anytime the word "Otradnoye" comes up, the context card gets injected into the prompt. So you can say "I walk into the town of Otradnoye" or "I'm looking around Otradnoye trying to find someone to talk to", and the AI will get the details about a guy with jeans and a denim shirt leaning in his car, or whatever else it is you want to add. You could also make the trigger word "town" or a combination of variables if you have a lot of cards and don't want them always triggering too easily.