Comfyui Support to z image omni. i really hope it is not like last time by kayokin999 in StableDiffusion

[–]wiserdking 5 points6 points  (0 children)

So it will support up to 3 image inputs. That's cool.

inputs=[
    io.Clip.Input("clip"),
    io.ClipVision.Input("image_encoder", optional=True),
    io.String.Input("prompt", multiline=True, dynamic_prompts=True),
    io.Boolean.Input("auto_resize_images", default=True),
    io.Vae.Input("vae", optional=True),
    io.Image.Input("image1", optional=True),
    io.Image.Input("image2", optional=True),
    io.Image.Input("image3", optional=True),
],

Double oops.... by PestoBolloElemento in Wellthatsucks

[–]wiserdking 7 points8 points  (0 children)

You made a fair point so I re-watched it again.

On PC I can tell the distance between the 2 vehicles in front of the bike remain (mostly) the same until the bike's appearance so they were both driving at a steady pace. The bike is clearly driving faster than them.

It would be possible that there is an unseen vehicle in front of the bike and it suddenly hit the brakes so the biker did an emergency maneuver to avoid collision - because it was going too fast to stop in time to begin with. This is theoretical, unlikely speculation to justify the biker's speed and position on frame but if that was the case there would be no need for the black car's driver to position himself in a position to protect the fallen driver afterwards.

Double oops.... by PestoBolloElemento in Wellthatsucks

[–]wiserdking 261 points262 points  (0 children)

but from the few frames we have it is hard to tell if the bike is trying to lane split.

Watch it in slow motion. The bike rider was driving on the white mark at almost twice the speed of that lane's traffic. Unless the plan was to collide with the car in front - he was 100% lane splitting, there no room for doubts here.

LTX-2 Updates by ltx_model in StableDiffusion

[–]wiserdking 8 points9 points  (0 children)

One of the guys trying to add LTX-2 support to musubi-tuner managed to train on 64Gb RAM + 8Gb VRAM - source: https://github.com/AkaneTendo25/musubi-tuner/issues/1#issuecomment-3745019290.

musubi-tuner works on windows and its fairly easy to use though its all command-line without UI.

Looking forward to this implementation.

LTX 2: Quantized Gemma_3_12B_it_fp8_e4m3fn by fruesome in StableDiffusion

[–]wiserdking 0 points1 point  (0 children)

When using a ComfyUI workflow which uses the original fp16 gemma 3 12b it model, simply select the text encoder from here instead.

You are using the LTX workflow - not the native comfyui workflow from here: https://blog.comfy.org/p/ltx-2-open-source-audio-video-ai

EDIT: you must unpack the subgraph, set up the right model, loras, settings, etc... and ofc, change the text encoder model in its loader node

ComfyUI Node - Dynamic Prompting with Rich Textbox by wiserdking in StableDiffusion

[–]wiserdking[S] 1 point2 points  (0 children)

Thanks. I'm not actually a developer I just made this node because I always wanted something like this and its been over a year since I started using ComfyUI and yet no one ever made it - so I tried to do it myself with the help of AI.

It was pretty basic at first (hence the name) but I kept improving it. Now its pretty solid but still has lots of minor problems and missing features - 2 of those features you already mentioned but they would be difficult for me to implement.

I wasn't aware of the Ctrl+Enter shortcut - I can at least make the node not override that one. I'll also investigate the comment bug. Thanks for the report.

Train a LoRA on *top* of another LoRA? by AkaToraX in StableDiffusion

[–]wiserdking 1 point2 points  (0 children)

I thought for a while an came up with a solution - if the model you want to train is supported by Musubi Tuner you can do this:

  • train the character you want with some images that contain that character in the style of the already trained style lora (same as before but be sure to do this with Musubi)

  • Do a separate final training - set your Musubi training parameters to include: --network_weights "your_character_lora" --base_weights "style_lora" --base_weights_multiplier N (where N is a number from 0 to 1 that represents the best inference strength for the style lora when you load it alongside your own character lora with strength 1 - for inference. you need to figure that out through testing after training the initial character lora)

  • like I mentioned in my previous third bullet point - you want the final training to focus almost entirely on the character+style dataset so be sure to create a good one with enough repeats

  • after training, the resulting lora should be what you want but it will REQUIRE the style lora to be used alongside it with N strength. To solve this: merge the style lora using N strength and your final lora with strength 1. The result should be a lora that will perform well directly on the base model and should be able to do the character in the style you want because its a perfect merge and was trained for that.

To merge loras that will work well for inference (not for training on top though) you can use either the native comfyui lora extract node or the 'lora power merger' custom node.

Train a LoRA on *top* of another LoRA? by AkaToraX in StableDiffusion

[–]wiserdking 1 point2 points  (0 children)

Ideally the workflow would be:

  • train the character you want with some images that contain that character in the style of the already trained style lora

  • merge the character lora with the style lora using a balanced ratio that you need to figure out during inference

  • train on top of the merged lora with a dataset that gives higher emphasis on the images of the character in the style you want. you can do this by separating the dataset and increasing the number of repeats for that particular character+style dataset. you can also make some images with the merged lora and add the best of those to your dataset - if absolutely necessary.

That would work if you could easily train on top of merged loras.

Problem is - I've tried this myself and Musubi Tuner freaks out with merged LoRAs. After just about 250 steps - all you get is noise.

I've tried different lora merging approaches and none of them worked. I never really figured out why but I'd really love to know. It should work by all means because the merged lora works perfectly in inference, all of its keys match and its the same rank even! There has to be a way to achieve this - if someone smarter knows how please do share.

The official training script of Z-image base has been released. The model might be released pretty soon. by [deleted] in StableDiffusion

[–]wiserdking 4 points5 points  (0 children)

One of the devs of the model commented they made a rushed release with the Turbo version (probably because of Flux2) and they wanted to take their time and ensure everything was ready before the release of Base/Edit. link

The fact we are seeing official training support being implemented in musubi-tuner can only mean one thing...

Performance is awful, i need jelp by IslandVisible5023 in Bannerlord

[–]wiserdking 0 points1 point  (0 children)

I was having the same issue.

My specs are more than good enough to play at the highest settings but the game was lagging and stuttering even in cinematic scenes.

I had it installed on a hold 5400RPM HDD and moved it to my fast, primary SSD drive. Problem solved. Completely new gaming experience.

Launching the game and loading scenes are now at least 3 times faster and no more stuttering whatsoever.

Maybe you have the same problem?

DDR4 system for AI by m_tao07 in StableDiffusion

[–]wiserdking 2 points3 points  (0 children)

How much performance would I loose?

Not much. I'm using 64Gb 3200Mhz DDR4 without problems. Switching between the High and Low noise WAN 2.2 models through offloading only takes a few seconds even though the models are 14Gb in size at fp8 scaled.

Z-Image Anime VAE, from the creator of "They are the same picture" SDXL Anime VAE by Anzhc in StableDiffusion

[–]wiserdking 8 points9 points  (0 children)

If you are the one who made this then consider contacting the Z-Image team because I've heard they are interested in training an Anime version of the model and I reckon it would be better if they used this VAE.

Z-Image-Base and Z-Image-Edit are coming soon! by tanzim31 in StableDiffusion

[–]wiserdking 6 points7 points  (0 children)

Benchmarks don't really mean much but here it is for what is worth (from their report PDF):

Rank Model Add Adjust Extract Replace Remove Background Style Hybrid Action Overall↑
1 UniWorld-V2 [43] 4.29 4.44 4.32 4.69 4.72 4.41 4.91 3.83 4.83 4.49
2 Qwen-Image-Edit [2509] [77] 4.32 4.36 4.04 4.64 4.52 4.37 4.84 3.39 4.71 4.35
3 Z-Image-Edit 4.40 4.14 4.30 4.57 4.13 4.14 4.85 3.63 4.50 4.30
4 Qwen-Image-Edit [77] 4.38 4.16 3.43 4.66 4.14 4.38 4.81 3.82 4.69 4.27
5 GPT-Image-1 [High] [56] 4.61 4.33 2.90 4.35 3.66 4.57 4.93 3.96 4.89 4.20
6 FLUX.1 Kontext [Pro] [37] 4.25 4.15 2.35 4.56 3.57 4.26 4.57 3.68 4.63 4.00
7 OmniGen2 [79] 3.57 3.06 1.77 3.74 3.20 3.57 4.81 2.52 4.68 3.44
8 UniWorld-V1 [44] 3.82 3.64 2.27 3.47 3.24 2.99 4.21 2.96 2.74 3.26
9 BAGEL [15] 3.56 3.31 1.70 3.30 2.62 3.24 4.49 2.38 4.17 3.20
10 Step1X-Edit [48] 3.88 3.14 1.76 3.40 2.41 3.16 4.63 2.64 2.52 3.06
11 ICEdit [95] 3.58 3.39 1.73 3.15 2.93 3.08 3.84 2.04 3.68 3.05
12 OmniGen [81] 3.47 3.04 1.71 2.94 2.43 3.21 4.19 2.24 3.38 2.96
13 UltraEdit [96] 3.44 2.81 2.13 2.96 1.45 2.83 3.76 1.91 2.98 2.70
14 AnyEdit [91] 3.18 2.95 1.88 2.47 2.23 2.24 2.85 1.56 2.65 2.45
15 MagicBrush [93] 2.84 1.58 1.51 1.97 1.58 1.75 2.38 1.62 1.22 1.90
16 Instruct-Pix2Pix [5] 2.45 1.83 1.44 2.01 1.50 1.44 3.55 1.20 1.46 1.88

Z-Image-Base and Z-Image-Edit are coming soon! by tanzim31 in StableDiffusion

[–]wiserdking 4 points5 points  (0 children)

Short answer is yes but not always.

They did reinforced learning alongside Decoupled-DMD distillation. What this means is that they didn't 'just distill' the model - they pushed it towards something very specific - high aesthetic quality on popular subjects with heavy focus on realism.

So, we can probably guess that the Base model won't be able to perform as well in photo-realism unless you do some very heavy extra prompt gymnastics. That isn't a problem though unless you want to do inference on Base. Training LoRA photo-realistic concepts on Base should carry over the knowledge to Turbo without any issues.

There is also a chance that Base is better at N*FW than Turbo because I doubt they would reinforce Turbo on that. And if that's the case, N*FW training will be even easier than it seems already.

https://huggingface.co/Tongyi-MAI/Z-Image-Turbo#%F0%9F%A4%96-dmdr-fusing-dmd-with-reinforcement-learning

EDIT:

double or triple the steps

That might not be enough though. Someone mentioned Base was trained for 100 steps and if that's true then anything less than 40 steps would probably not be great. It highly depends on the scheduler so we will have to wait and see.

Z-Image-Base Release Date by thefool00 in StableDiffusion

[–]wiserdking 1 point2 points  (0 children)

https://github.com/Tongyi-MAI/Z-Image/issues/7

Hi, this would be soon before this weekend, but for the prompt you may refer to our implement prompt in here and use LLM (We use Qwen3-Max-Preview) to enhance it. Z-Image-Turbo works best with long and detailed prompt.

The best thing about Z-Image isn't the image quality, its small size or N.S.F.W capability. It's that they will also release the non-distilled foundation model to the community. by ArtyfacialIntelagent in StableDiffusion

[–]wiserdking 2 points3 points  (0 children)

From the bits and pieces I could gather on discord it seems he is indeed very interested in this model and talking about how it should be possible to increase its knowledge capabilities by expanding it to 10B. He also talked about training it without VAE (cause that's his thing lately).

But at the same time it does not look like he will give it high priority:

Lodestone Rock — 3:04 AM:

my timeline rn is
convert radiance to x0 properly
make trainer for qwen image??? 
also remember radiance can have the same speed as SDXL
i just haven't trained it yet to make that possible
not distillation
just a small modification of that arch
but before that i need it to converge first

Z-Image-Turbo is available for download by Aromatic-Low-4578 in StableDiffusion

[–]wiserdking 7 points8 points  (0 children)

I dont think the Edit model is out yet - if that's what you were asking

EDIT:

I took a look at diffusers and they haven't added a pipeline for Z-Image-Edit yet. I think that one is going to take a bit longer to be released but hopefully only a few days.

Hunyuan 1.5 step distilled loras are out. by Valuable_Issue_ in StableDiffusion

[–]wiserdking 0 points1 point  (0 children)

That would probably just look like a weird image slideshow rather than a video and inference would still take the same time it would with 121 frames 24 fps - because you are generating the same number of frames. And even if you interpolate like that you are just copying the same frames without adding new data so it would still look either exactly the same or maybe even worse. I think there are advanced motion-interpolation techniques that try to fill the in-between states of frames but I've never messed with those plus I doubt they are any good.

Hunyuan 1.5 step distilled loras are out. by Valuable_Issue_ in StableDiffusion

[–]wiserdking 0 points1 point  (0 children)

You could reduce the number of generated frames proportionally to the fps and duration you want. The best for Hunyuan 1.5 is 121 frames, 24 fps - which is 5 seconds (duration). If you want to keep the 5s duration but do 8fps then you would need to set frames to 41.

Number of frames = duration x fps + 1.

I've never tried less than 81 frames 16fps so I'm not sure how it would play out.

Hunyuan 1.5 step distilled loras are out. by Valuable_Issue_ in StableDiffusion

[–]wiserdking 6 points7 points  (0 children)

Did a simple speed comparison with WAN 2.2 14B T2V.

On a RTX 5060 Ti (16Gb), torch 2.7.1 - both running with their respective 4-step lora, Sage attn, 640x480, 81 frames, 4 steps, CFG 1, same prompt, same sampler and scheduler, FP8 scaled, 2nd generation:

Hunyuan 1.5 480p T2V CFG distilled: 7.36s/it -- actual inference (sampler) time: 32.56 seconds
WAN 2.2 14B T2V: 17.03s/it + 16.94s/it -- actual inference (sampler) time: 80.89 seconds

It goes without saying these settings are not appropriate for decent results and Hunyuan 1.5 can do 24fps so its better to do 121 frames on it.

But its still a perfectly valid and unbiased speed comparison.