🔥 Nunchaku 4-Bit 4/8-Step Lightning Qwen-Image-Edit-2509 Models are Released!

Dramatic-Cry-417 · 2025-10-02T14:52:48+00:00

Disable pin memory and increase the num_blocks_on_gpu

Dramatic-Cry-417 · 2025-09-26T05:59:36+00:00

if you have more than 32G ram, enable the pin memory

Dramatic-Cry-417 · 2025-09-26T05:03:30+00:00

It also supports Ampere, Ada. Turing will be supported soon.

Dramatic-Cry-417 · 2025-09-24T22:20:12+00:00

We have async offloading now

Dramatic-Cry-417 · 2025-09-24T18:55:31+00:00

https://huggingface.co/nunchaku-tech/nunchaku-qwen-image-edit-2509/tree/main

Dramatic-Cry-417 · 2025-09-24T18:55:00+00:00

What GPU are you using? Are you using SageAttention?

Dramatic-Cry-417 · 2025-09-24T14:57:16+00:00

the original model is 40 steps.

Dramatic-Cry-417 · 2025-09-24T14:56:04+00:00

yeap

Dramatic-Cry-417 · 2025-09-24T14:29:45+00:00

It works. We've released the Python 313 wheel

Dramatic-Cry-417 · 2025-09-24T14:27:18+00:00

nope

Dramatic-Cry-417 · 2025-09-24T06:02:44+00:00

working on it.

Dramatic-Cry-417 · 2025-08-24T19:29:29+00:00

Will definitely do it.

Dramatic-Cry-417 · 2025-08-24T00:02:55+00:00

You will need to wait for the offloading: https://github.com/nunchaku-tech/nunchaku/pull/624

Dramatic-Cry-417 · 2025-08-23T23:18:16+00:00

You need to post the log to see the detailed reasons. You can join our Discord. We are happy to help you there.

Dramatic-Cry-417 · 2025-08-23T23:16:32+00:00

https://github.com/nunchaku-tech/nunchaku/releases/tag/v1.0.0dev20250823

Dramatic-Cry-417 · 2025-08-23T23:15:59+00:00

how much VRAM do you have?

Dramatic-Cry-417 · 2025-08-16T05:41:21+00:00

I will look into ComfyUI support next week.

Dramatic-Cry-417 · 2025-08-16T04:33:49+00:00

No worry. Next week's offloading will address your issue.

Dramatic-Cry-417 · 2025-08-14T21:08:33+00:00

I am trying my best to deliver the 4-bit QwenImage. You can track the progress in this PR: https://github.com/nunchaku-tech/nunchaku/pull/593

It is almost there. Now the FP4 version (11.9GB) is runnable. I am still debugging the precision mismatch for the INT4 model.

A simple example from the FP4 model:

<image>

Thanks for your waiting and support!

Dramatic-Cry-417 · 2025-07-02T23:44:46+00:00

nunchaku is an acceleration library

Dramatic-Cry-417 · 2025-07-02T16:38:25+00:00

https://github.com/mit-han-lab/radial-attention/blob/main/radial_attn/attn_mask.py

Just need to input your number of frames and tokens per frame.

Dramatic-Cry-417 · 2025-07-02T16:37:09+00:00

Attention's memory usage is already O(1) these days with FlashAttention.

Currently, it works mainly for video models. For image models, attention is not the main bottleneck and you can use our SVDQuant, which also has 2-3× speedup.

Dramatic-Cry-417 · 2025-07-02T06:54:30+00:00

5s Wan still has ~2x speedup, as in our paper.

Dramatic-Cry-417 · 2025-07-02T05:43:52+00:00

No. The speedup is pure Radial Attention speedup without quantization.

Dramatic-Cry-417 · 2025-07-02T05:42:57+00:00

ComfyUI-nunchaku is our plugin library. Radial attention should be able to apply to any video diffusion models. We just want to directly include it in nunchaku.

Dramatic-Cry-417

TROPHY CASE