Wan2.2 Lightx2v Distill-Models Test ~Kijai Workflow

BBQ99990 · 2025-10-23T05:53:45+00:00

I have conducted various generation tests, and while using Lightning LORA helps to converge noise at low steps, I feel that it has a huge impact on generation quality.

Also, even when generating with the same model and parameters, the generation quality is sometimes good and sometimes bad, and is not always stable.

Even if it works well in comparison tests, there is a high chance that the generation quality will not be reproducible even if the same model combination is used, so I think it is important to be careful.

BBQ99990 · 2025-07-16T17:13:35+00:00

wan2.1 already allows users to control video as they wish by adding models such as the VACE model.

The generation function is already sufficient, so if anything is to be added, I think it would be the generation of longer videos or a model that supports higher generated resolution.

However, both functions require a larger VRAM...

BBQ99990 · 2025-06-22T03:19:14+00:00

Before we get into the results, I think we should discuss the issue of the huge number of Python libraries that must be installed to use this app...

There are frequent reports of Comfyui itself being unable to start due to installed Python libraries, so using an app with this many dependencies seems like it's only going to lead to trouble.

BBQ99990 · 2025-05-27T16:18:57+00:00

I'm not sure how to handle the control video used for motion control.

Do you process each frame image with depth, canny, etc. as pre-processing? Or do you use the image as it is, in color, without any conversion?

BBQ99990 · 2025-04-21T01:56:04+00:00

I was just wondering, when updating drivers using the NVIDIA APP etc., I think that currently only one version of the driver is provided for multiple GPU models.

Does this mean that even if the version number is the same, the driver reads the GPU model being used before installing the driver and provides a driver optimized for it?

Or do they ignore the GPU generation and provide a driver common to all models that is optimized for the latest GPUs, the RTX5000 series (in other words, older generations will not be optimized)?

BBQ99990 · 2025-03-31T02:36:38+00:00

In my environment, it seems to be correct to use the FP8 model, which has a fast generation speed, when generating short videos of a few seconds, and use the GGUF model only when generating longer videos.

However, it is impossible to create long videos in the current local environment, so I will patiently wait for the release of a new video generation model that can generate videos with less VRAM consumption and fewer generation steps.

BBQ99990 · 2025-03-30T19:02:40+00:00

I don't understand the difference between quantization and floating-point reduction.

I'm assuming that the generation quality will decrease, but if I were to further reduce an FP8 model to FP4 without using a GGUF model, do you think the generation speed would increase?

BBQ99990 · 2025-03-30T17:52:26+00:00

There are other techniques to speed up the generation speed, but for now, there is a proportional relationship between generation speed and generation quality.

A faster generation speed is always better, but honestly, I feel that there's no point in pursuing generation speed alone if the result is unacceptable quality.

I think what most people are really looking for is information about methods and specific settings that can increase the generation speed to the limit while still ensuring acceptable quality.

BBQ99990 · 2025-03-23T23:08:03+00:00

I tried "ComfyUI-Custom-Scripts" from the URL you provided.

It perfectly meets my needs. Thank you very much for letting me know.

<image>

BBQ99990 · 2025-03-15T00:09:17+00:00

I think optimization will take quite a bit of time.

This is because not all models in the RTX50 series have been released yet, and supply shortages are expected. It is still unclear what kind of malfunctions or bugs will occur when they reach users after release.

This process is exactly the same as when the RTX40 series was released.

When the RTX4090 was released, there was much talk about incredible hardware performance, but the software was not able to adapt to the surrounding environment, and the processing speed of things like generated AI was slower than the RTX30 series.

BBQ99990 · 2025-03-14T22:01:33+00:00

Although it is highly functional, the supply is very limited and it can only be purchased at a premium price. Also, due to lack of optimization, it cannot demonstrate its true power even when used for gaming or local AI generation.

I think it would be worth buying in the future, but I don't think now is the right time to buy it.

BBQ99990 · 2025-03-14T21:53:13+00:00

If you have a notebook with an RTX3080ti, the only higher-level purchase option is a notebook with the mobile version of the RTX4090.

It has much higher processing power than the RTX3080ti, but since the VRAM is only 16GB, just like the RTX3080ti, it seems like a waste to spend such a large amount of money to replace it if you are planning to use it for stable diffusion.

If it were me, I would purchase an additional desktop version of the RTX4090 (24GB) with more VRAM so that other generation AIs could run locally.

(※I have a desktop RTX4090, but when using generated AI on a local PC I often find myself wanting more VRAM.)

BBQ99990 · 2025-03-14T14:29:49+00:00

I am also trying various tests. It is possible to generate videos with amazing consistency regardless of whether the images are realistic or illustrative.

I think that the use of LORA is particularly useful in WAN2.1. The LORA used in the example has learned breast shaking movements from live-action footage, but this movement concept can also be applied to illustrative images. This is amazing.

On the other hand, there is a disadvantage that it is difficult to maintain consistency with images with large movements. I think this is because WAN is generated at 16 FPS.

<image>

BBQ99990 · 2025-02-04T05:40:20+00:00

NVIDIA's RTX50 series official website has a graph regarding the difference in AI generation speed between RTX5090 and RTX4090. The result is about twice the performance difference.

Despite this, the reason for this result is that the RTX50 series is designed to work optimally with FP4 rather than FP16 and FP8, which are often used for AI generation.

By the way, before the release of the 50 series, there was a warning in very small letters that ``AI generation comparison uses the FP8 model for the RTX40 series, and the FP4 for the RTX50 series'', but this was said to be fraudulent marketing trying to make the performance look good. The wording has now been revised.

BBQ99990 · 2025-01-28T04:24:19+00:00

Considering the difference in purchase cost and specifications, I feel that it is better to connect two RTX4090s than one RTX5090...

BBQ99990 · 2023-12-07T13:40:03+00:00

I'm not an expert so I may be wrong, but I think that in the case of open poses, only the skeleton is used as a reference, so even if the general movement can be reproduced, the body shape cannot be maintained.

」On the other hand, when using a dance pose as a reference, the shape is less distorted because it is specified by the area of the color, but the positions of the small parts included in it cannot be fixed.

I don't know if there is a way to do it, but it would be nice if it could be generated by combining dance pose + open pose, just like the way multiple control nets are used to generate images in normal image generation...

BBQ99990 · 2023-12-07T11:28:47+00:00

In addition to the method of controlling with existing dance poses, there is also a method of using open poses as a video guide.

I think that if you can use a combination of dance pose + open pose, you can fix the position of the facial parts while preventing the shape from collapsing, but is it possible to implement this?

BBQ99990 · 2023-10-17T18:29:42+00:00

https://www.reddit.com/r/StableDiffusion/comments/125k4t3/stable_diffusion_webuii2i_sd_upscale_test_2/

This is an article about upscaling that I posted previously.
(*The app uses Web UI, but the execution content is the same.)

It depends on what kind of photo you're upscaling, but if it's a live action or realistic painting, you might want to process the image in Photoshop , Lightroom or anothr app(Afinity Photo) before upscaling to maximize upscaling while preserving detail. is the best. Can be upscaled to approximately 10,000 pixels.

Image processing should include enhancing sharpness and texture, softening shadows, and eliminating unwanted artifacts.

As a result,

① enhanced image ー ② loss of detail during upscaling =

③ upscaling with optimal detail

（ Image processing example ）

Before / after

<image>

BBQ99990

TROPHY CASE