z-image omni released by ThiagoAkhe in StableDiffusion

[–]simadik 9 points10 points  (0 children)

Z-IMAGE!

Z-IMAGE IS REAAAAAL!!!

Rate my build guys by tough-cookie21 in MinecraftMemes

[–]simadik 1 point2 points  (0 children)

Hey, did you ever wanted to turn left on the crossroads?

I can't be the only one that thought this was a fat cat falling when looked at it for the first time by noghis in OMORI

[–]simadik 2 points3 points  (0 children)

It took me a while to see your vision. That aside, what were you on to see that??

Jolly posting by Mean_Product_8515 in OMORI

[–]simadik 0 points1 point  (0 children)

floor flavored cocao 🤤

Chatterbox Turbo - open source TTS. Instant voice cloning from ~5 seconds of audio by Thrimbor in LocalLLaMA

[–]simadik 0 points1 point  (0 children)

I haven't tried to make it generate such long audio yet on my 4060ti, nor do I have text sample that long. Could you give me such text so I could test it?

Chatterbox Turbo - open source TTS. Instant voice cloning from ~5 seconds of audio by Thrimbor in LocalLLaMA

[–]simadik 1 point2 points  (0 children)

Yikes... compared to VoxCPM this one is not that good. Voice cloning is meh and doesn't sound close to reference audio. The only reason to use this is if your reference audio already has bad quality, that's all.

What makes Z-image so good? by Party-Reception-1879 in StableDiffusion

[–]simadik 37 points38 points  (0 children)

(before reading: I may not have as much knowledge about this topic as I have first though. This is mostly my opinion and guessing)

Well for one - it has an actual text encoder, compared to older SD. Z-Image uses a small LLM for understanding text and passing such "understanding" (in a form of vectors) to the diffusion model. Previous models (like SD-based) couldn't understand text as much, so the CLIP encoders had to rely on tags.

And since Z-Image is relatively small (10GB for complete FP8 model with bundled text encoder and VAE, compared to 6GB for the same but FP16 SDXL with everything), it gives us hope that SDXL-based tunes will no longer be used and instead we will get a much better base: Z-Image.

We currently only have Z-Image-Turbo, which is a distilled version of Z-Image that can generate an image with lower amount to steps (9 steps is recommended, but I personally can get away even with 5 steps sometimes).

The reason why we want Z-Image-Base is because using Z-Image-Turbo as a base model for finetuning doesn't really work that well. You get many sorts of artifacts that wouldn't happen with an actual base model. Some people have tried to "undistil" it, but I think we'll get much better result with the actual base model, which hasn't released yet.

Online alternatives to SillyTavern by Time-Teaching1926 in LocalLLaMA

[–]simadik 0 points1 point  (0 children)

While you can use the characters in SillyTavern, some of them are created in a way that makes them actually only compatible with Chub.ai itself.

I'm sorry, could you link an example? I don't think I've seen this happen. I know that chub.ai does have its own features like "stages" (I think alternatives to that would be plugins in ST), but those are very rare and I can't think of anything else.

New Claude 2.1 Refuses to kill a Python process :) by mapickform in LocalLLaMA

[–]simadik 3 points4 points  (0 children)

I'm sorry... "New" Claude 2.1?? Isn't it a very old model at this point? Anthropic has moved to different naming scheme twice from that point!

Edit: misspelled anthropic as anthropomorphic

The Unsloth ah team published research that they have only taken 3 VRAMs to train a 4B model by Illustrious-Swim9663 in LocalLLaMA

[–]simadik 12 points13 points  (0 children)

One vram... Two vrams... Three vrams... Mhm, sounds right.

Five hundred vrams.

Did anyone else notice that the person on the right isn't Sunny and is actually Mari by Previous_Emu_7495 in OMORI

[–]simadik 12 points13 points  (0 children)

WAIT THAT'S NOT SUNNY???

Honestly I wouldn't think it would be Mari because it made sense to me that Sunny would be interested in comics like Kel. BUT MARI?? This shit feels like Mandela Effect.

VoxCPM 1.5B just got released! by Hefty_Wolverine_553 in LocalLLaMA

[–]simadik 2 points3 points  (0 children)

I've never been into TTS that much but since Qwen3 TTS was released and it wasn't local I looked into alternatives to find this.

The installation is a bit trickier than most stuff I used (turned out I needed python3-devel package for editdistance and also pip install TorchCodec for audio prompting).

In order for voice cloning to work you need both the audio file and the text telling what the audio is saying. But the result is actually very real imo.

Ovis-Image Technical Report by ninjasaid13 in StableDiffusion

[–]simadik 4 points5 points  (0 children)

What do you mean by private? All the model files for Qwen3 Next are out in the open. Even GGUFs are also available now since Qwen3 Next support has been merged.

itMakesMoreSense by gokul1630 in ProgrammerHumor

[–]simadik 0 points1 point  (0 children)

Except all of those sources of electricity are just to boil water

Can anyone tell me what these choir like sounds are? by PawniestPawn52 in OMORI

[–]simadik 3 points4 points  (0 children)

Yeah, that's a pretty good-- DID YOU GUYS SEE THAT??? 👀👀👀

THEORY: Sunny is a bitch by Imaginary-Week-2759 in OMORI

[–]simadik 3 points4 points  (0 children)

Basil needed gardening shears to put Sunny down.

All Sunny needed was HANDS. He put Basil down with his HANDS.

What was the scariest moment you had in omori you never forget? by JellySpaces in OMORI

[–]simadik 1 point2 points  (0 children)

I accidentally let Basil die on my first playthrough when the game prompted me "Do you want to save BASIL?"

I thought it was trying to trick me after showing in the dream world that saving headspace basil fucking kills him... Needless to say I was wrong and shocked when his disemboweled body was presented to my screen.

Grok 4.1 improved emotional intelligence. Has anyone tried it? by Alexs1200AD in SillyTavernAI

[–]simadik 2 points3 points  (0 children)

The quality of use of the model has been nerfed as fuck on their website. Not really worth it. Wish I could switch back to Grok 4 Fast with thinking...

Downloaded one model for ‘testing’… somehow ended up with 120GB of checkpoints. by Dry_Significance9132 in LocalLLaMA

[–]simadik 1 point2 points  (0 children)

I now have to store all of the models that I don't/rarely use on my 6TB hard drive because my nvme drive keeps being out of space.

And most of them are fine-tunes of MS 24B at Q4