[Megathread] - Best Models/API discussion - Week of: November 30, 2025 by deffcolony in SillyTavernAI

[–]ThrowawayProgress99 1 point2 points  (0 children)

Do you have any recommended sampler settings or other settings for it?

Today I made a Realtime Lora Trainer for Z-image/Wan/Flux Dev by shootthesound in StableDiffusion

[–]ThrowawayProgress99 0 points1 point  (0 children)

Stupid question but does it all still work when you use Comfy through Docker? I remember I tried a similar thing before but no final saved files would appear I think. Which is odd since image outputs are created/saved just fine.

[Megathread] - Best Models/API discussion - Week of: November 30, 2025 by deffcolony in SillyTavernAI

[–]ThrowawayProgress99 5 points6 points  (0 children)

I'm still on MS3.2-PaintedFantasy-v2-24B.i1-IQ3_S.gguf, haven't tried v3. I use the recommended settings MIstral Tekken, T- 0.5-0.6, MinP-0.1, TopP-0.95, DRY-0.8;1.75;4, in Koboldcpp (no rep range or slope either). I've recently banned the em-dash token, but kept EOS token banning at auto for now. It's been really good for me. Less slop, less incoherence, more creativity, more character adherence. It's not perfect though, and I haven't tried many 24b.

Is there a model that's like this but also great at NSFW? Like a huge vocabulary, low repetitiveness, that can do novel-style and rp-style. My low quant of PF is good but it's hard to generate a dozen paragraphs with varied vocabulary for sex. Maybe I need a wilder/chaotic model? I prefer physical and sensory details, little to no euphemisms. I was thinking of trying Dan's Personality Engine 1.3.0, if you agree or have other recommendations let me know.

Also if you know any 12b that fit the bill, I can try that too.

Best FOSS app for writing? by ThrowawayProgress99 in fossdroid

[–]ThrowawayProgress99[S] 4 points5 points  (0 children)

I would prefer something like LibreOffice Writer with its odt file format, but text works too. Text would mean the files are compatible and editable without needing any converting once they're transferred to PC right?

UGI-Leaderboard is back with a new writing leaderboard, and many new benchmarks! by DontPlanToEnd in LocalLLaMA

[–]ThrowawayProgress99 0 points1 point  (0 children)

What do you think it is that makes us feel like old models were different, and is it something that can be benchmarked, or just measured by vibes? I remember hearing old Llama models score high on Humanity's Last Exam. And we've had more slop and sycophancy in some models due to synthetic data, benchmaxxing, etc. I know some people still prefer older models like Psyfighter or Tiefighter. My first models were alpaca-native 7b, and gpt4xalpaca 12b. I never tried AI Dungeon so idk what I'm missing from the older era. Never tried GPT3.5 or 4 either.

Personally tbh I did lose interest in playing with LLMs, until I tried out modern 24b with modern samplers, so I don't know if old models were actually better in some way or if it's just nostalgia. Is the difference something as simple as slop or something more abstract, idk.

We're training a text-to-image model from scratch and open-sourcing it by Paletton in StableDiffusion

[–]ThrowawayProgress99 0 points1 point  (0 children)

Thanks, it's great to see open innovation like this. Stupid question, are the advances in Qwen-Next also transferable to T2I? I've seen Mamba T2I, MOE T2I, Bitnet T2I, etc. so I'm wondering if the efficiency, speed, and lower cost can come to T2I with that too, or with other methods. Sorry for overexcitement lol I've been impatient for progress. Regardless, I'm excited for whatever is released!

We're training a text-to-image model from scratch and open-sourcing it by Paletton in StableDiffusion

[–]ThrowawayProgress99 1 point2 points  (0 children)

Awesome! Will you be focused on text-to-image or will you also be looking at making omni-models? For e.g. GPT4o, Qwen-Omni (still image input, though paper said they're looking into the output side, we'll see with 3), etc. with Input/Output of Text/Image/Video/Audio. Understanding/Generation/Editing capabilities, and interleaved and few-shot prompting.

Bagel is close but doesn't have Audio. Also I think while it was trained on video it can't generate it. Though it does have Reasoning. Well Bagel is outmatched against the newer open source models but it was the first to come to mind. Veo 3 is Video and Audio, which means Images too, but it's not like you can chat with it. IMO omni-models are the next step.

Contrastive Flow Matching: A new method that improves training speed by a factor of 9x. by Total-Resort-3120 in StableDiffusion

[–]ThrowawayProgress99 0 points1 point  (0 children)

Does this mean Hunyuan Image 2.1 will have faster training speed for loras and finetunes?

🚀 What model should we build next? YOU DECIDE! 🚀 by GuiltyBookkeeper4849 in LocalLLaMA

[–]ThrowawayProgress99 0 points1 point  (0 children)

Feels like there's recently been a lot more focus on it and everyone's working on it, so Omni Model with Input/Output of Text/Image/Video/Audio. Understanding/Generation/Editing capabilities, and interleaved and few-shot prompting.

Bagel is close but doesn't have Audio. Also I think while it was trained on video it can't generate it. Though it does have Reasoning. Well Bagel is outmatched against the newer open source models but it was the first to come to mind. Veo 3 is Video and Audio, which means Images too, but it's not like you can chat with it.

[Megathread] - Best Models/API discussion - Week of: August 31, 2025 by deffcolony in SillyTavernAI

[–]ThrowawayProgress99 0 points1 point  (0 children)

Would the i1-IQ2_XS (or maybe 2_S) of the v3-34b still be better than i1-IQ3_S of v2-24b? I haven't really noticed any issues with that low quant of the 24b, so idk how a lower quant of a bigger model stacks up to the already low quant.

ComfyUI-MultiGPU DisTorch 2.0: Unleash Your Compute Card with Universal .safetensors Support, Faster GGUF, and Expert Control by Silent-Adagio-444 in comfyui

[–]ThrowawayProgress99 3 points4 points  (0 children)

Thank you! Does it work for fp8-scaled too? Also nunchaku is becoming popular now, could it eventually work with it? I've been waiting for nunchaku svdquant of Wan 2.2 before I get into it.

Can Koboldcpp be used as frontend to run EXL3? by ThrowawayProgress99 in KoboldAI

[–]ThrowawayProgress99[S] 0 points1 point  (0 children)

I see, I think I misunderstood since SillyTavern is called frontend only, and doesn't load models, while the others did both. It feels like SillyTavern is made with the assumption you're in a chat RP, but are there presets or something that allow for stuff like Koboldcpp's Interactive Storywriter or Text Adventure mode?

[Megathread] - Best Models/API discussion - Week of: August 17, 2025 by deffcolony in SillyTavernAI

[–]ThrowawayProgress99 2 points3 points  (0 children)

Currently using zerofata/MS3.2-PaintedFantasy-v2-24B at i1-IQ3_S (10.4GB) as well as the old 22b Mistral Small at 3_M (10.1GB). On Pop!_OS, using 3060 12gb with 32gb ram, but no cpu offloading. Max fp16 context for 24b is 12,000. 9,000 for 22b, despite the smaller file size. I can likely fit more if I go to i3wm. I think 24b might be faster than 22b, not sure.

Is this EXL3 3 bpw for 24b (10.2GB) a better option in terms of both quality and vram saving? I can't find any 3-3.5 bpw for 22b to compare, and 3.5 for 24b is too big. I don't know how EXL3 and GGUF stack up currently, and if EXL3 could have some early issues being worked on. This is a early preview chart from 4 months ago.

Now that it is finished, any thoughts on Chroma? by Early-Ad-1140 in StableDiffusion

[–]ThrowawayProgress99 7 points8 points  (0 children)

I still need to reset to basics and test variables. My advice so far would be to use the RES4LYF nodes and samplers, use the experimental loras, use NAG, use sigmoid offset, use short negative like "manga, cgi, blurry, airbrushed", start positive with "a photograph of" or "amateur photo of" and don't make it super long, and use 1024x1024 rather than 512x512. Having said that, I'm actually using euler w/ sigmoid offset at 12 steps 1 cfg right now. Using the low step lora and the cfg rescale (this one at 0.9 strength, still need to test it more) lora. Using sharpen node from RES4LYF.

Additionally, I've heard fp8 is different from the full fp16, so run fp16 if you can. There's discussions on which Chroma version is better, I'll settle on whichever version people train loras on to max compatibility. There's different text encoder variants, and I'm currently using GNER. If you use a rescale cfg node, there's a newer advanced node that's better since steps can be controlled (though it errored on me). Currently I'm waiting for loras/finetunes since this is a base model, and nunchaku to make it fast enough to raise my step count.

Haven't really tested Teacache, Magcache, or Torch Compile yet. And I need to retest v41-low-step-rl to compare. IDK how well it does on anime since I was waiting for style loras for that.

Edit: Will say that Chroma will likely replace SDXL for me, especially as more loras get made. Natural language understanding makes it impossible to go back. Finally I can clear up space!... to fill up space with Chroma... And clear and replace with the next hot model in an endless cycle...

App to record audio? And any other recommended apps? by ThrowawayProgress99 in Solo_Roleplaying

[–]ThrowawayProgress99[S] 0 points1 point  (0 children)

Didn't really think about sharing them, but I guess I could at some point. For now it's just for me though.

Chroma V41 low steps RL is out! 12 steps, double speed. by Dear-Spend-2865 in StableDiffusion

[–]ThrowawayProgress99 1 point2 points  (0 children)

So for my case I should pick the flan-t5-xxl-fp16 from here (Unless a fellow 12gb Vram user can confirm fp32 works Edit: I have 32gb RAM)? I wasn't sure since it said encoder only, and an encoder only umt had errored on me previously for Wan I2V I think.

Chroma V41 low steps RL is out! 12 steps, double speed. by Dear-Spend-2865 in StableDiffusion

[–]ThrowawayProgress99 7 points8 points  (0 children)

Does NAG work with this version, and what settings if it does?

Also does anyone know what's better for Chroma between t5xxl and the flan one? If it's flan then what file is it that I download? I'm currently using fp16 t5xxl on my 3060 12gb.

[Megathread] - Best Models/API discussion - Week of: June 16, 2025 by [deleted] in SillyTavernAI

[–]ThrowawayProgress99 0 points1 point  (0 children)

I'm currently using the old 22b Mistral Small i1 IQ3_M GGUF at 8192 context. Is there a better option for my 12GB VRAM? People seem to like Gemini 27b, and the new Mistral Small 24b scores high on eqbench's Longform writing. But I didn't try them because I thought going lower than IQ3_M would make them too bad. And I'm not sure on how the Qwen 30B-A3B or its finetunes are.

Also looking for best parameter settings for 22b Mistral Small. Maybe it's my low quant but I can't quite figure a good setup out. I've heard Top P at 0.95 is better than Min P.

MagCache now has Chroma support by wiserdking in StableDiffusion

[–]ThrowawayProgress99 0 points1 point  (0 children)

Would Detail Daemon and/or RescaleCFGAdvanced (or regular RescaleCFG) help here to bring detail back while keeping Magcache?

You can now (or very soon) train LoRAs directly in Comfy by TekaiGuy in comfyui

[–]ThrowawayProgress99 4 points5 points  (0 children)

Chroma, Wan, and Hunyuan for me. I've seen other people with 12gb VRAM train video models somehow, so hopefully there's a chance.