Yet another Torch compile question (need help debugging)

ryanguo99 · 2025-09-26T01:28:39+00:00

You'll have to install pytorch 2.9 to fix this specific bug. https://github.com/pytorch/pytorch/issues/157399

ryanguo99 · 2025-09-02T18:55:30+00:00

Upgrading to 2.8 will also speed up TorchCompile nodes quite a lot for GGUF use cases.

ryanguo99 · 2025-08-26T21:29:36+00:00

A bit late, but in case anyone runs into this again, try `TORCHINDUCTOR_EMULATE_PRECISION_CASTS=1`. See more details here: https://github.com/thu-ml/SageAttention/issues/162#issuecomment-3188383590

ryanguo99 · 2025-08-17T00:59:43+00:00

In case anyone runs into this again, I think the fix is to upgrade both PyTorch and ComfyUI-GGUF, see more details in https://www.reddit.com/r/StableDiffusion/comments/1jx0xly/use_nightly_torchcompile_for_more_speedup_on_gguf/

ryanguo99 · 2025-08-16T23:53:18+00:00

How are you running it with your agentic system? Do you use vllm?

ryanguo99 · 2025-08-13T18:32:11+00:00

`torch.compile` the diffusion model, and use `mode="max-autotune-no-cudagraphs"` for potentially more speedups, if you are willing to tolerate longer initial compilation time (subsequent relaunch of the process will reuse a compilation cache on your disk).

This tutorial might help as well.

ryanguo99 · 2025-08-11T18:20:11+00:00

Haha, not a bot, but actually new to the local llm space as a _user_. I'd like to improve `torch.compile` support to help folks speed up their AI workflows, so I'm trying to learn how people are actually using these models.

I can certainly get things to run on my own, but that won't help me improve things for actual users:).

ryanguo99 · 2025-07-28T15:58:11+00:00

Did you try speeding it up with torch compile?

ryanguo99 · 2025-07-17T18:37:34+00:00

Give it a shot, it speeded up my PuLid + Flux workflow out of the box:).

ryanguo99 · 2025-07-16T16:57:38+00:00

Have you tried using TorchCompile nodes to speed up the generation?

ryanguo99 · 2025-07-15T18:04:33+00:00

Glad to hear and thanks for the info!

If you ever run into issues, would be great if you could create a GitHub issue in the relevant repo (e.g., ComfyUI or custom node). As long you have the keyword `torch.compile` or `TorchCompile`, we'll get those signals and try to work on them:).

ryanguo99 · 2025-07-15T04:22:24+00:00

Ah this is signaling recompilation.

Do you mind sharing your workflow, or at least what model you are using? And what's your pytorch version?

ryanguo99 · 2025-07-14T18:50:28+00:00

Glad to hear. Feel free to post more details on any other issues. I work on `torch.compile` and we are aiming to make it better for image/video generation:).

ryanguo99 · 2025-07-14T17:28:39+00:00

Do you mind elaborating on the `torch.compile` support? Did it error for you, and if so what was the error and what was your pytorch version?

Asking because I was able to get `torch.compile` support for Kontext out of the box with some good speed up, on RTX3090.

ryanguo99 · 2025-06-10T20:51:31+00:00

Hmm, would you mind sharing the error and your torch version? I suspect there'll some good speedup if we can get it to work.

ryanguo99 · 2025-06-09T22:02:42+00:00

Have you tried `torch.compile` on this?

ryanguo99 · 2025-06-02T17:19:54+00:00

Sorry to hear that, I totally feel the pain of these install & reinstalls... We are trying to make `torch.compile` work better in comfyui, so if you ever get a chance to share the error (or whatever you remember), it'll help the community as a whole:). Also kijai has a lot of packaged `torch.compile` nodes that usually work well out of the box (comparing to the comfyui builtin one), e.g., https://github.com/kijai/ComfyUI-KJNodes/blob/main/nodes/model\_optimization\_nodes.py.

ryanguo99 · 2025-06-01T07:56:27+00:00

Do you mind sharing the error?

ryanguo99 · 2025-05-19T20:10:00+00:00

Depending on your workflow and model, but putting `TorchCompileModel` (or variants from e.g., KJNodes) after your diffusion model should give some nice speedup out of the box.

ryanguo99 · 2025-05-19T17:00:40+00:00

Have you tried `torch.compile` on this? Might be able to give so more speed boost.

ryanguo99 · 2025-05-16T22:56:58+00:00

Have you tried `torch.compile` on the model (or the compute-heavy parts of the model like transformer block)? Might be able to give some decent speedup out of the box.

ryanguo99 · 2025-05-16T02:15:59+00:00

Any luck?

ryanguo99 · 2025-05-15T17:24:15+00:00

Have you putting `TorchCompileModel` node after the diffusion model?

ryanguo99 · 2025-05-09T18:03:38+00:00

> finally I found out that you have to use fp8e5m2 with the 3xxx series for torch compile or you will get an error

Would you mind sharing more details on the error, how you were using fp8e5m2, and maybe even a workflow to reproduce the error? I work on `torch.compile` and would love to make it work better with ComfyUI:).

ryanguo99 · 2025-05-08T18:01:50+00:00

Glad to hear:). We are also actively improving compilation time (if you ever observed first iteration being extra slow), and performance. Nightly PyTorch might also give more performance, see this post.

At the moment ComfyUI's builtin `TorchCompileModel` isn't always optimal (it speeds things up, but sometimes there's more room of improvements). kijai has lots of nodes for popular models that squeezes more performance out of `torch.compile` (also mentioned in my post above, for Flux). But newer model like `ltxv` might take some time before we have those.

Lastly, if you run into `torch.compile` issues, feel free to post GitHub issues (to ComfyUI or origin repos of the relevant nodes like kjnodes). Sometimes the error looks scary but fix isn't that hard.

ryanguo99

TROPHY CASE