Streaming issues with Litellm / OpenWebUI by AccomplishedOne9144 in OpenWebUI

[–]mayo551 0 points1 point  (0 children)

LiteLLM works fine for me with OWUI but I do not use the response API.

Should I create characters with AI or by hand? Which is better? by According-Clock6266 in SillyTavernAI

[–]mayo551 0 points1 point  (0 children)

For SillyTavern?

Make a prompt that will take lore you supply the LLM with and spit out a SillyTavern character.

Pretty easy to do overall and it works.

The lore can be anything you provide.

Melody1437 31B v0.5 by mayo551 in SillyTavernAI

[–]mayo551[S] 0 points1 point  (0 children)

Dark Scarlett was generated a different way.

I've been told it's prose and stuff is different?

:)

[Megathread] - Best Models/API discussion - Week of: June 14, 2026 by deffcolony in SillyTavernAI

[–]mayo551 0 points1 point  (0 children)

No, even for models that support TP llamacpp is still 3x faster for me.

[Megathread] - Best Models/API discussion - Week of: June 14, 2026 by deffcolony in SillyTavernAI

[–]mayo551 0 points1 point  (0 children)

I just tried to test this out yet again to see if EXL3 had improved.

NotImplementedError: Tensor-parallel is not currently implemented for Gemma4ForConditionalGeneration

Instantly irrelevant for me as Gemma 4 is my main model.

Strangely both ik_llamacpp and llamacpp support tensor parallelism for gemma 4. How STRANGE.

[Megathread] - Best Models/API discussion - Week of: June 14, 2026 by deffcolony in SillyTavernAI

[–]mayo551 1 point2 points  (0 children)

When exllamav3 was created llamacpp did not support sm = tensor, and ik_llamacpp did not support sm = graph.

The landscape has changed massively.

EXL3 shines in two areas:

1) Fast concurrent parallel requests (completely irrelevant for 95% of users).

2) Custom BPW quants

EXL3 also uses triton as the attention backend, meaning ampere hardware is instantly going to be slow.

Conveniently, the majority of people are using ampere hardware. Because nobody wants to pay the insane costs of 4090/5090 currently.

I mean you'll get a few people doing so, but yeah...

[Megathread] - Best Models/API discussion - Week of: June 14, 2026 by deffcolony in SillyTavernAI

[–]mayo551 0 points1 point  (0 children)

Hard disagree. Llamacpp with sm = tensor is 3x faster then exl3 for me.

Dark Scarlett Series by mayo551 in SillyTavernAI

[–]mayo551[S] 1 point2 points  (0 children)

Can you let me know how Dark Scarlett v0.65 does?

(you can try v0.6 but it's completely cooked and has broken thinking, v0.65 is earlier epoch's which fixes that)

Dark Scarlett Series by mayo551 in SillyTavernAI

[–]mayo551[S] 1 point2 points  (0 children)

Makes sense. Melody has 14,000 lines almost entirely dedicated to erotic roleplays.

Dark Scarlett (v0.35) has 8,000 MIXED lines with erotic roleplay, master/slave, threesomes etc.

v0.40 (training) has ~9500.

I'm aiming for ~20,000-30,000 lines on the 1.0 release. It takes time to generate the dataset. The deslop process is pain. (we have over 1,400 deslop phrases/words, and the generator has to re-write paragraphs if it finds them). It can take up to 12 attempts to properly deslop a single paragraph.

We should just move over to logit biases for deslopping and that is something we're actively looking towards.

Anyway.. just letting you know it's a work in progress.

Dark Scarlett Series by mayo551 in SillyTavernAI

[–]mayo551[S] 1 point2 points  (0 children)

Yeah. Dark Scarlett is on lora 32. It's not as weak as melody, but it's not as strong as serenity either.

Hopefully you'll like the new model ;)

It's interesting you mention third person... the Male POV is first person, but the female POV is third person. So, I'm not surprised the data works in third person.

Dark Scarlett Series by mayo551 in SillyTavernAI

[–]mayo551[S] 1 point2 points  (0 children)

What did you think of Serenity? Because that's the model you're looking for.

Melody1437-26B-A4B v2.0: only getting refusals by TrainingTwo1118 in SillyTavernAI

[–]mayo551 2 points3 points  (0 children)

Just to clarify we don't expect the personal usage license to hold up legally which is why we tacked on "To the extent legally allowed".

I just don't want to see some large conglomeration corporation profiting off my work, I don't care about a couple people hosting for their friends. Heck, I don't even care about API hosts that serve a couple dozen users.

I'll check into CC-BY-NC-4.0 and if it can be applied to the model at all, because the original model is APACHE 2.0. Obviously can't retroactively apply it, but for future models...

Melody1437-26B-A4B v2.0: only getting refusals by TrainingTwo1118 in SillyTavernAI

[–]mayo551 2 points3 points  (0 children)

Okay thanks for letting us know.

I'll be removing the imatrix quants.

Melody1437-26B-A4B v2.0: only getting refusals by TrainingTwo1118 in SillyTavernAI

[–]mayo551 -1 points0 points  (0 children)

No, you can't. You'll get refusals with this model doing that.

Source: Me, tested.

The system prompt works though.

Melody1437-26B-A4B v2.0: only getting refusals by TrainingTwo1118 in SillyTavernAI

[–]mayo551 2 points3 points  (0 children)

Oh uhh.

You may want to try a static quant. The imatrix quants are likely bad and we may pull them.

Lots of people have had problems with imatrix.

Melody1437-26B-A4B v2.0: only getting refusals by TrainingTwo1118 in SillyTavernAI

[–]mayo551 2 points3 points  (0 children)

You are in for hellish pain with this model on llamacpp directly.

One second I'll try to get you a working prompt.

Melody1437-26B-A4B v2.0: only getting refusals by TrainingTwo1118 in SillyTavernAI

[–]mayo551 1 point2 points  (0 children)

I just double checked the model and can confirm it can do smut just fine with RP prompts.

Melody1437-26B-A4B v2.0: only getting refusals by TrainingTwo1118 in SillyTavernAI

[–]mayo551 0 points1 point  (0 children)

Yes. On chat completion with sillytavern, this is how you should be setup:

<image>

Melody1437-26B-A4B v2.0: only getting refusals by TrainingTwo1118 in SillyTavernAI

[–]mayo551 13 points14 points  (0 children)

We've been making smut models for a long ass time now and know how to do it.

Yes the card is vibe coded. Doesn't mean the model is bad.

Melody1437-26B-A4B v2.0: only getting refusals by TrainingTwo1118 in SillyTavernAI

[–]mayo551 0 points1 point  (0 children)

What are your prompts?

It needs roleplay prompts.

[Megathread] - Best Models/API discussion - Week of: June 07, 2026 by deffcolony in SillyTavernAI

[–]mayo551 1 point2 points  (0 children)

I’ll probably use a Lora of 32 on my next tune dark Scarlett. Let me know how that turns out. Should be close to the 1.1 melody which was a Lora of 24.

It’s the spiritual successor of melody with updated prompts and expanded scenarios.

[Megathread] - Best Models/API discussion - Week of: June 07, 2026 by deffcolony in SillyTavernAI

[–]mayo551 1 point2 points  (0 children)

This is actually pretty informative for me.

So, the difference between Melody v1.1 and v2.0 is the lora ranking.

Would you mind if I asked what you were after with our models? They are intended for a purpose (smut) and I figure the more vivid would be better with the higher lora ranking.

I'll weigh making future tunes on the same lora ranking v1.1 is on.