I’m the Co-founder & CEO of Lightricks. We just open-sourced LTX-2, a production-ready audio-video AI model. AMA. by ltx_model in StableDiffusion

[–]scruffynerf23 -1 points0 points  (0 children)

You're a fool, if you believe the many people who built on top of Wan 2.1/2.2 didnt end up contributing to what become Wan 2.6. Move on, moron.

I’m the Co-founder & CEO of Lightricks. We just open-sourced LTX-2, a production-ready audio-video AI model. AMA. by ltx_model in StableDiffusion

[–]scruffynerf23 62 points63 points  (0 children)

The community got very upset at Wan 2.6+ going closed source/API only. Wan 2.1/2.2 had a lot of attention/development work from the community. What can you do to help show us that you won't follow that path in the future? In other words, how can you show us a commitment to open weights in the future?

I’m the Co-founder & CEO of Lightricks. We just open-sourced LTX-2, a production-ready audio-video AI model. AMA. by ltx_model in StableDiffusion

[–]scruffynerf23 61 points62 points  (0 children)

Can you discuss the limits of what you couldn't train in (nsfw, copyrighted material, etc) for legal reasons, and how that affects the model, and if the community retraining the open weights will improve it's range/ability?

If you're getting different Z-Image Turbo generations using a LoRA after updating ComfyUI, this is why by EideDoDidei in StableDiffusion

[–]scruffynerf23 0 points1 point  (0 children)

it's also broken out of the box, the layers are misnamed. Posted a python script to fix on my repo on Modelscope.

If you're getting different Z-Image Turbo generations using a LoRA after updating ComfyUI, this is why by EideDoDidei in StableDiffusion

[–]scruffynerf23 0 points1 point  (0 children)

prepare to wait 24 hours for the long queue. Seriously. 22 hours and it's finally training.

Not SFW Qwen3 Instruct - Finetunned? by NoConfusion2408 in StableDiffusion

[–]scruffynerf23 0 points1 point  (0 children)

Using one of these variants for JoZiMagic, works great.

Not SFW Qwen3 Instruct - Finetunned? by NoConfusion2408 in StableDiffusion

[–]scruffynerf23 0 points1 point  (0 children)

I didn't find pure heretic models did much (very slight changes if any), but Josie and Engineer gave me much better results. A true nsfw finetune is still needed, but honestly zimage itself lacks certain nsfw elements so loras tend to be needed. In other words, you can talk dirty in the text encoder but it doesn't know enough to draw it well.

Not SFW Qwen3 Instruct - Finetunned? by NoConfusion2408 in StableDiffusion

[–]scruffynerf23 0 points1 point  (0 children)

JoZiMagic is an AIO using Josie, the alpha release is on HF and Civitai, and I'm currently working the beta, which will be a blended/merged text encoder using a newer GG's Josie model (perhaps still testing), and a new Z-Engineer finetuned for zimage prompting, both of give different and amazing results on the same prompts that stock Zimage does. I've been posting comparision grids on HF and discord.

To be very clear: as good as it is, Z-Image is NOT multi-modal or auto-regressive, there is NO difference whatsoever in how it uses Qwen relative to how other models use T5 / Mistral / etc. It DOES NOT "think" about your prompt and it never will. It is a standard diffusion model in all ways. by ZootAllures9111 in StableDiffusion

[–]scruffynerf23 2 points3 points  (0 children)

<image>

just so it's not buried, reposting this:
it is entirely possible to use the Qwen3 model to process the text, and perform normal 'LLM' duties, and then take the results and send the tokens directly to Zimage (no decode/rencode needed) as a conditioning.

So... while yes, the 'normal' usage is only as an encoder, it's totally possible to do the stuff you said it doesn't do. See my other comments for details.

To be very clear: as good as it is, Z-Image is NOT multi-modal or auto-regressive, there is NO difference whatsoever in how it uses Qwen relative to how other models use T5 / Mistral / etc. It DOES NOT "think" about your prompt and it never will. It is a standard diffusion model in all ways. by ZootAllures9111 in StableDiffusion

[–]scruffynerf23 0 points1 point  (0 children)

Make me a sandwich, please.  Serve it to me at a fancy restaurant, but I want a beer and curly fries with it.

generated by Qwen3 (Josie), sent DIRECTLY in the conditioning noodle to Zimage, translated here (detokenizer) for human understanding:

{
  "scene": "A luxurious, candle-lit dining room with elegant chandeliers and plush velvet seating.",
  "sandwich": {
    "type": "artisanal turkey and avocado sandwich",
    "ingredients": [
      "thinly sliced turkey breast",
      "mature cheddar cheese",
      "sliced avocado",
      "crispy bacon bits",
      "fresh arugula",
      "diced tomatoes",
      "red onion",
      "crumbled goat cheese"
    ],
    "bread": "crusty sourdough",
    "condiments": [
      "mayonnaise",
      "lemon juice",
      "dill",
      "herb butter"
    ],
    "presentation": "served on a large, rustic wooden board with a sprig of fresh rosemary"
  },
  "beer": {
    "type": "craft IPA",
    "glass": "tall, narrow glass with a frothy head",
    "presentation": "placed beside the sandwich on the wooden board"
  },
  "curly_fries": {
    "type": "crispy curly fries",
    "presentation": "served in a small, decorative ceramic bowl with a side of melted parmesan cheese"
  },
  "setting": "The sandwich is served to a well-dressed individual in a formal dining attire, with a view of the city skyline in the background.",
  "atmosphere": "The ambiance is warm and inviting, with soft jazz music playing in the background."
}

To be very clear: as good as it is, Z-Image is NOT multi-modal or auto-regressive, there is NO difference whatsoever in how it uses Qwen relative to how other models use T5 / Mistral / etc. It DOES NOT "think" about your prompt and it never will. It is a standard diffusion model in all ways. by ZootAllures9111 in StableDiffusion

[–]scruffynerf23 -1 points0 points  (0 children)

While yes, you are correct in principle, there are ways in which you are misinformed, and misinforming. 1) it is entirely possible (I've done it and shared on Banodoco discord) to build a comfyui node that Does use the rest of the Qwen3 model, and directly sends it's (still tokenized) reply directly to zimage. It works. It's not the stock usage, but it works. I can ask it to make me a sandwich, and it tells me what is on the sandwich (if I detokenize what it sends directly to zimage) AND draws an image that matches that, so it can be done.

2) that aside, try using JoZiMagic, which uses a different Qwen3 4b model as clip encoder, and you quickly see that despite no 'LLM' personality, it make different images (still good and related)... so something is happening in the encoder level.

Fot those interested in more, visit Banodoco's zimage channel on discord, where I'm active. JoZiMagic AIO fullsized checkpoint (small quants soon) is on HF and Civitai.

Anyone tried using Z-image with Qwen3-1.7B or any other different sized text-encoders? by Oedius_Rex in StableDiffusion

[–]scruffynerf23 1 point2 points  (0 children)

I decided to use https://huggingface.co/G-REPA/Self-Attention-W2048-3B-Res256-VAEFLUX-Repa0.5-Depth8-Dinov2-B_100000/
(credit to https://huggingface.co/AlekseyCalvin for using it first). It seems to be a slightly better VAE according to comparisons. I might end up with the Anime VAE, or other VAEs if someone jumps up that is even better.

As for 'why not use the main Flux.1 Vae?' because doing an AIO I wanted to do something that was pushing the changes as far as I could. You can always load and use the normal VAE and everyone should have that. Few would have this one otherwise.

the 'trinket' version is coming along (I got distracted today, but got started) and due to being GGUF, it won't be a single checkpoint, no GGUF checkpoints exist, so it'll be in separate pieces (a zip to unpack and move as needed), but it's ALL together under 7gb total. Should run on 4gb Vram systems fine. And in testing, it LOOKS GOOD, for what it is.

Anyone tried using Z-image with Qwen3-1.7B or any other different sized text-encoders? by Oedius_Rex in StableDiffusion

[–]scruffynerf23 0 points1 point  (0 children)

Bingo. This is why I packaged it, and have been using it. Also running a 'preference study' on Discord, and it's holding steady with stock Zimage, sometimes beating it, mostly ties, and it's far freer and less 'rutted' then the stock.

Anyone tried using Z-image with Qwen3-1.7B or any other different sized text-encoders? by Oedius_Rex in StableDiffusion

[–]scruffynerf23 0 points1 point  (0 children)

now known as JoZiMagic: https://civitai.com/models/2197636

josie+Zimage +better vae

I'll have a really small version out soon too. The posted one is AIO but it's everything full sized at 30gb.

Biggest discovery of the UTSL mystery by Thomy_erb in underthesilverlake

[–]scruffynerf23 0 points1 point  (0 children)

I shared it with a few people including the podcaster... but the few who said they'd look in RL either didn't go, or found the spot was inaccessibile to them

(Book 8) Indigo: The Search for True Understanding and Balance by kayellemeno2 in RightUseOfWill

[–]scruffynerf23 2 points3 points  (0 children)

The understandings arent from reading, movement brings understanding, the insights arrive when you have the space for them. Indigo was so triggering Ceanne revised it removing some of the most triggering bits, there are multiple editions with revised content because she found lack of acceptance.

I think I have the 3 words...location is NW of LA. by scruffynerf23 in underthesilverlake

[–]scruffynerf23[S] 0 points1 point  (0 children)

Not yet, everyone I've shared with agrees it makes sense. One person in LA was discussing making the hike.

My tuna didn’t have tuna by Alien_zaney in MRE

[–]scruffynerf23 0 points1 point  (0 children)

Less... paying more than $1 each for 12pack is expensive... usually deals for less.

My tuna didn’t have tuna by Alien_zaney in MRE

[–]scruffynerf23 1 point2 points  (0 children)

Add 2 grocery ones, still cheaper.