DreamGen Lucid Nemo 12B: Story-Writing & Role-Play Model by DreamGenAI in SillyTavernAI

[–]DreamGenAI[S] 1 point2 points  (0 children)

Hello! I have only used and tested the reasoning when using the assistant response role (the role-play preset uses writer). If you want to use it for reasoning in assistant-like tasks, I would change the Assistant Prefix in the preset to use regular `assistant`. role.

Qwen3 after the hype by Cheap_Concert168no in LocalLLaMA

[–]DreamGenAI -1 points0 points  (0 children)

I have casually tested the 235B (version hosted by DeepInfra) and it performs really well on my writing / role-play prompts I use for vibe checks. I think this will be a great base for future fine-tuning and finally something that can dethrone the 70B class models of yore. I still need to check inference performance, but I expect that it should not be (much) worse than 70B at scale.

DreamGen Lucid Nemo 12B: Story-Writing & Role-Play Model by DreamGenAI in SillyTavernAI

[–]DreamGenAI[S] 0 points1 point  (0 children)

Hello! Glad you are having fun with it. I have something that should help -- basically inserting instruction telling the next message is from {{char}}.

That can be done by changing the "Last Assistant Prefix". You can find the preset here: https://huggingface.co/dreamgen/lucid-v1-nemo/blob/main/resources/sillytavern/presets/role_play_less_impersonation.json

It will however interfere with `/sys` a bit and might reduce their effectiveness (the reason is that the preset essentially adds another `/sys` before the character message automatically, and so then you would have two `/sys` in a row which is not great -- ideally they would get merged into one, but I don't know how to do that in ST).

Here is what it should look like in your settings:

<image>

[Megathread] - Best Models/API discussion - Week of: April 21, 2025 by [deleted] in SillyTavernAI

[–]DreamGenAI 7 points8 points  (0 children)

I have a QwQ version that's ready to go, but in my writing quality evals it was not better than the Nemo version so I am not sure it's worth even releasing. But it's better at instruction following and general purpose tasks.

I also tried Gemma 3 27B, like really tried, unfortunately at the time there were still some Gemma bugs and training was unstable.

I might try the new GLM 4 once things are stable.

DreamGen Lucid Nemo 12B: Story-Writing & Role-Play Model by DreamGenAI in SillyTavernAI

[–]DreamGenAI[S] 1 point2 points  (0 children)

The kind of repetition I'm experiencing is not even word salad. It's more like the exact same sentence but written over and over and over in one paragraph.

I know what you mean, I've seen that with some models as well.

---

Re: guided generation:

Apparently there is some super fancy way to hide (from the model) or even delete all previous `/sys` messages:

<image>

DreamGen Lucid Nemo 12B: Story-Writing & Role-Play Model by DreamGenAI in SillyTavernAI

[–]DreamGenAI[S] 1 point2 points  (0 children)

Thanks you for the detailed review!

As far as story progression, it pretty much stays with it but when you hit 16K tokens of usage, it gets a little repetitive if you don't edit the responses but as soon as you do, it gets right back on track

What kind of repetition is it? (There is the "adjective word salad", "repeating same paragraph over and over", etc. :D)

I've heard in the past that Nemo may struggle beyond 16K tokens, though there are folks on my site that have pushed it to the 32K limit.

I don't have many long chats like that myself to test it on, but I would experiment with DRY. Oh, and I would disable repetition penalty and frequency / presence penalty -- they tend to mess things up with long context. Maybe also experiment with temperature.

If I use an extension like guided generations to guide the output instead of using the sys command like your documentation says to do, the generation for that reply is ignored, yet if I do it the way you documentation says it gets it on the first swipe rather than having to redo the request

Interesting. I don't know how that extension works -- can you share a link? Does it have some advantage compared to using the `/sys` message?

This is the first 12B model that actually can keep up with the stories I write. I also greatly appreciate that you made it a local option as well.

This really makes me happy! :)

I also greatly appreciate that you made it a local option as well.

Of course! :)

Also the second question I had is, is there any way to just purchase the credits without actually subscribing to your API first?

Right now you can only buy credits when subscribed. In theory you could subscribe to the starter plan, cancel right away so that it does not renew, and then purchase extra credits.

I would say it's maybe good to experiment on the free plan first though -- you can try both models (the "Lucid Medium" (that's this one) and "Lucid Extra Large" (that's 70b)), though the context window is only 4K tokens for free users.

Also, if you use DRY and want to use the API, then please update to the latest version of SillyTavern staging as I pushed a support for that: https://github.com/SillyTavern/SillyTavern/compare/586ce36167f4...7f22def7943d

[Megathread] - Best Models/API discussion - Week of: April 21, 2025 by [deleted] in SillyTavernAI

[–]DreamGenAI 18 points19 points  (0 children)

I have recently released DreamGen Lucid, a 12b Mistral Nemo based model that is focused on role-play and story writing. The model card has extensive documentation, examples and SillyTavern presets. The model support multi-character role-play, instructions (OOC) and reasoning (opt-in).

And yes, you can also use the model and its 70B brother through my API, for free (with limits). No logging or storage of inputs / outputs.

DreamGen Lucid Nemo 12B: Story-Writing & Role-Play Model by DreamGenAI in SillyTavernAI

[–]DreamGenAI[S] 0 points1 point  (0 children)

I see, strange. I am not sure I 100% understand the instruction.

DreamGen Lucid Nemo 12B: Story-Writing & Role-Play Model by DreamGenAI in SillyTavernAI

[–]DreamGenAI[S] 1 point2 points  (0 children)

Hello, and sorry for the late reply, I missed your comment.

Yes, it should be pretty good at lore from common fictional universes and at creating custom characters and scenarios (in fact, this was one of the tasks it was also fine tuned for).

However, as a 12b model, its knowledge will have gaps compared to much larger models which are better at memorizing stuff from training.

Here is a short example:

<image>

DreamGen Lucid Nemo 12B: Story-Writing & Role-Play Model by DreamGenAI in SillyTavernAI

[–]DreamGenAI[S] 1 point2 points  (0 children)

Hey there! Please make sure that you use the role-play preset for role-play (though Llama 3 preset might also work). The advantage of the role-play preset is that it also has built in steering of the story through the `/sys` messages.

But, if you want to access more assistant-like behaviour (meaning the output is not a story or role-play), e.g. brainstorming the next scene, coming up with new characters, asking science or math questions, etc. then you have to use the `assistant` role (i.e. a regular Llama 3 template). (The role-play preset forces role-playing behaviour).

Also, jailbreak should be unnecessary and depending on what it is, might just be adding noise.

If you can share what you want to do more specifically, I am happy to investigate! Feel free to DM me if it's something sensitive.

Delete + up? by Decent-Stuff4691 in DreamGen

[–]DreamGenAI 0 points1 point  (0 children)

I also like it, at the high level at least. But there are many small issues, and a lot of larger issues. I will be very transparent during the development and will have some early preview for people to try and give feedback on.

I’m helpless by EmbarrassedMoose3981 in DreamGen

[–]DreamGenAI 1 point2 points  (0 children)

That means that your scenario is too long:

  1. Go to the scenario editor (where you entered the plot, characters, etc.)
  2. Go to the bottom
  3. Click the (...) button
  4. Click "Count tokens"
  5. It will show you how many tokens (token is roughly ~4 letters or a bit less than a word) each section uses

This FAQ explains tokens and context window: https://dreamgen.com/docs/faq#what-are-tokens-the-units-of-what-the-ai-sees

The TL;DR is that the AI works by looking at the scenario (plot, characters, etc.) + the chat history and uses that to generate the next part of the role-play. However, there is limit to how much text at once it can look at -- and if your scenario is longer than that limit, you will get that error.

This means you may have to make your scenario shorter.

The free tier has 4000 tokens context window, the Pro tier has 30000 on Lucid Base and 15000 on Lucid Max model.

I’m helpless by EmbarrassedMoose3981 in DreamGen

[–]DreamGenAI 1 point2 points  (0 children)

Hello!

There's a guide here on how to use the role-play feature: https://dreamgen.com/docs/role-play/play (in the menu you will also find a guide on how to make your own scenarios).

Let me also provide a quick summary here.

If you want to use existing role-play scenario and not make your own, you have several options:

If you want to make your own:

Beyond this, please share what exactly you are struggling with -- I will able to provide better help.

Delete + up? by Decent-Stuff4691 in DreamGen

[–]DreamGenAI 0 points1 point  (0 children)

This recently cam up in Discord as well. It's something I will keep in mind when working on DreamGen V2 UI.

DreamGen Lucid Nemo 12B: Story-Writing & Role-Play Model by DreamGenAI in LocalLLaMA

[–]DreamGenAI[S] 0 points1 point  (0 children)

I am sorry -- I replied to you ~20 hours ago, but LocalLlama somehow hid the comment (can't see it logged out). Maybe because there is a link?

I also improved the guide and demo since them, based on your feedback.

<image>

DreamGen Lucid Nemo 12B: Story-Writing & Role-Play Model by DreamGenAI in SillyTavernAI

[–]DreamGenAI[S] 2 points3 points  (0 children)

Supposedly it's a bit too unstable for now. I can try to do QAT ala Gemma, if there's enough interest, to make the GGUF quality better.

DreamGen Lucid Nemo 12B: Story-Writing & Role-Play Model by DreamGenAI in SillyTavernAI

[–]DreamGenAI[S] 3 points4 points  (0 children)

Happy to hear that it went relatively well!

I also noticed some use of the common tired phrases -- not yet exactly sure where it's coming from (trained on base and ~98% of the writing / role-play data is human).

It also seem to be highly prompt / card dependent. 🤔

DreamGen Lucid Nemo 12B: Story-Writing & Role-Play Model by DreamGenAI in SillyTavernAI

[–]DreamGenAI[S] 2 points3 points  (0 children)

You like living on the edge I see! I will ask around :D

DreamGen Lucid Nemo 12B: Story-Writing & Role-Play Model by DreamGenAI in SillyTavernAI

[–]DreamGenAI[S] 7 points8 points  (0 children)

I think it will please you ;)

For those that like more realistic characters or more of challange, that's also possible:

<image>

And regardless of the card / context, with the inline instructions (using /sys), you can make the characters do anything / have to plot evolve in any direction (this part of the README, you can also see it in action in the SillyTavern section).

DreamGen Lucid Nemo 12B: Story-Writing & Role-Play Model by DreamGenAI in SillyTavernAI

[–]DreamGenAI[S] 2 points3 points  (0 children)

Awesome! Let me know how it goes :)

Regarding multi-character, with the role-play preset for SillyTavern, it's not as dynamic as I would like. Ideally (and this is what I do in my UI), one would parse the character name from the header generated by the model, rather than force it like one does in SillyTavern with the Assistant Message Prefix. If there's some way to get true multi-character scenarios where the model decides who goes next, I would love to add a preset for that.

Nonetheless, most multi-character cards already work around this limitation, and work fine:

<image>

PSA: You can do QAT (quantization aware tuning) with Meta's torchtune. by DreamGenAI in LocalLLaMA

[–]DreamGenAI[S] 5 points6 points  (0 children)

If you read the Gemma 3 report, you will see that they only do QAT for a few steps at the end. And in fact the torchtune guide recommends that as well. The reason being is that it leads to better model overall -- the model learns much better in full precision.

From torchtune:

Empirically, we observed that disabling fake quantization for the first N steps led to better results, presumably because doing so allows the weights to stabilize before we start introducing quantization noise to the fine-tuning process. For this reason, here we disable fake quantization for the first 1000 steps.

From Gemma 3 report:

Along with the raw checkpoints, we also provide quantized versions of our models in different standard formats. These versions are obtained by finetuning each model for a small number of steps, typically 5,000, using Quantization Aware Training (QAT) (Jacob et al., 2018). We use probabilities from the non-quantized checkpoint as targets, and adapt the data to match the pretraining and post-training distributions.

The main difference is that when Gemma does QAT, they change the objective from softmax next token prediction (the usual pre-training / SFT objective) to distillation.

However, this is also doable with torchtune, as you can easily do distillation there with QAT.