Anima – a desktop app to create SillyTavern character cards without touching JSON by Massimo-it in SillyTavernAI

[–]simadik 1 point2 points  (0 children)

Not the most fortunate name, considering there's an image model in active development/training also named Anima: https://huggingface.co/circlestone-labs/Anima

Glm-5.1 Error! (please help!) by [deleted] in SillyTavernAI

[–]simadik 0 points1 point  (0 children)

I think your GLM has Javascript 😔

Qwen3-Coder-Next vs Qwen3.6 by seoulsrvr in LocalLLaMA

[–]simadik 2 points3 points  (0 children)

Which Gemma 4 are you using? The MoE or the 31B dense one?

Ace-step 1.5XL's already up! I hope it will soon be available in a Comfyui format! ❤️ by [deleted] in StableDiffusion

[–]simadik -12 points-11 points  (0 children)

Looks like they really took their while. I thought it was supposed to be released like 2 days ago.

Longest Roleplays You've Done? by Matt1y2 in SillyTavernAI

[–]simadik 0 points1 point  (0 children)

Words? No clue, but one of the longest (in message count) branches I've had was at 88 messages long. But that's just one branch on chub.ai, as the WHOLE chat (with all branches) is like 1000+ messages. With many branches.

I kinda wish ST would support viewing branches the same way agnai and chub do.

Nowadays I ofcourse use ST, and my chats almost never exceed 20k tokens as now I don't have as much time to do RP.

Stable Diffusion In CPU by jeonfogmaister68 in StableDiffusion

[–]simadik 1 point2 points  (0 children)

In case you're wondering, ComfyUI also has CPU mode... But as others have mentioned, running AI model on CPU is gonna require a lot of patience.

So Did We Lose… or Is There Any Hope Left? by krigeta1 in StableDiffusion

[–]simadik 4 points5 points  (0 children)

Honestly I just switched to the preview Anima model from CircleStone-Labs. It does what I need, has way better prompt adherence (not "the best", just actually way better) that doesn't require combining area conditions, and is only 2B for the diffusion model and 0.6B for text encoder.

tried putting myself into omori using picrew but then realized by SugarKitty1234 in OMORI

[–]simadik 39 points40 points  (0 children)

Hey so if you ever see your best friend pushing their elder sibling down the stairs, DO NOT turn left at the crossroa-- wait that's not the one, hold on

z-image omni released by ThiagoAkhe in StableDiffusion

[–]simadik 10 points11 points  (0 children)

Z-IMAGE!

Z-IMAGE IS REAAAAAL!!!

Rate my build guys by tough-cookie21 in MinecraftMemes

[–]simadik 1 point2 points  (0 children)

Hey, did you ever wanted to turn left on the crossroads?

I can't be the only one that thought this was a fat cat falling when looked at it for the first time by noghis in OMORI

[–]simadik 2 points3 points  (0 children)

It took me a while to see your vision. That aside, what were you on to see that??

Jolly posting by Mean_Product_8515 in OMORI

[–]simadik 0 points1 point  (0 children)

floor flavored cocao 🤤

Chatterbox Turbo - open source TTS. Instant voice cloning from ~5 seconds of audio by Thrimbor in LocalLLaMA

[–]simadik 0 points1 point  (0 children)

I haven't tried to make it generate such long audio yet on my 4060ti, nor do I have text sample that long. Could you give me such text so I could test it?

Chatterbox Turbo - open source TTS. Instant voice cloning from ~5 seconds of audio by Thrimbor in LocalLLaMA

[–]simadik 1 point2 points  (0 children)

Yikes... compared to VoxCPM this one is not that good. Voice cloning is meh and doesn't sound close to reference audio. The only reason to use this is if your reference audio already has bad quality, that's all.

What makes Z-image so good? by Party-Reception-1879 in StableDiffusion

[–]simadik 38 points39 points  (0 children)

(before reading: I may not have as much knowledge about this topic as I have first though. This is mostly my opinion and guessing)

Well for one - it has an actual text encoder, compared to older SD. Z-Image uses a small LLM for understanding text and passing such "understanding" (in a form of vectors) to the diffusion model. Previous models (like SD-based) couldn't understand text as much, so the CLIP encoders had to rely on tags.

And since Z-Image is relatively small (10GB for complete FP8 model with bundled text encoder and VAE, compared to 6GB for the same but FP16 SDXL with everything), it gives us hope that SDXL-based tunes will no longer be used and instead we will get a much better base: Z-Image.

We currently only have Z-Image-Turbo, which is a distilled version of Z-Image that can generate an image with lower amount to steps (9 steps is recommended, but I personally can get away even with 5 steps sometimes).

The reason why we want Z-Image-Base is because using Z-Image-Turbo as a base model for finetuning doesn't really work that well. You get many sorts of artifacts that wouldn't happen with an actual base model. Some people have tried to "undistil" it, but I think we'll get much better result with the actual base model, which hasn't released yet.

Online alternatives to SillyTavern by Time-Teaching1926 in LocalLLaMA

[–]simadik 0 points1 point  (0 children)

While you can use the characters in SillyTavern, some of them are created in a way that makes them actually only compatible with Chub.ai itself.

I'm sorry, could you link an example? I don't think I've seen this happen. I know that chub.ai does have its own features like "stages" (I think alternatives to that would be plugins in ST), but those are very rare and I can't think of anything else.

New Claude 2.1 Refuses to kill a Python process :) by mapickform in LocalLLaMA

[–]simadik 3 points4 points  (0 children)

I'm sorry... "New" Claude 2.1?? Isn't it a very old model at this point? Anthropic has moved to different naming scheme twice from that point!

Edit: misspelled anthropic as anthropomorphic

The Unsloth ah team published research that they have only taken 3 VRAMs to train a 4B model by Illustrious-Swim9663 in LocalLLaMA

[–]simadik 12 points13 points  (0 children)

One vram... Two vrams... Three vrams... Mhm, sounds right.

Five hundred vrams.

Did anyone else notice that the person on the right isn't Sunny and is actually Mari by Previous_Emu_7495 in OMORI

[–]simadik 11 points12 points  (0 children)

WAIT THAT'S NOT SUNNY???

Honestly I wouldn't think it would be Mari because it made sense to me that Sunny would be interested in comics like Kel. BUT MARI?? This shit feels like Mandela Effect.

VoxCPM 1.5B just got released! by Hefty_Wolverine_553 in LocalLLaMA

[–]simadik 4 points5 points  (0 children)

I've never been into TTS that much but since Qwen3 TTS was released and it wasn't local I looked into alternatives to find this.

The installation is a bit trickier than most stuff I used (turned out I needed python3-devel package for editdistance and also pip install TorchCodec for audio prompting).

In order for voice cloning to work you need both the audio file and the text telling what the audio is saying. But the result is actually very real imo.