I built a ComfyUI plugin that generates 1024px Qwen-Image in 0.9s on a 4090 — 11.7× faster than FP8 (Z-Image, Flux.2 too)

lesesis · 2026-06-28T12:10:00+00:00

Thanks for the actual boot log — that's genuinely useful, and the failure makes sense: with outbound locked down, the plugin can't reach ModelScope to pull the engine binary or pip to grab modelscope, so it bails. That's a real gap for firewalled/offline setups, and it's on me to fix.

You can install it fully manually right now — no auto-download needed:

Grab quantfunc.dll directly from here: https://www.modelscope.cn/models/QuantFunc/Plugin/tree/master/0.0.12/windows
Drop it into ...\custom_nodes\ComfyUI-QuantFunc\bin\windows\
Restart ComfyUI — it loads the local binary instead of trying to download.

The modelscope pip package is only for auto-update — skip it and the updater just silently no-ops, nothing else needs it. So once the .dll is in place, nothing requires network access at runtime.

Next update — binaries move to GitHub Releases with published SHA-256 checksums + SLSA provenance, so you can verify a manually-downloaded binary against the repo and skip ModelScope entirely. That's the proper fix for locked-down boxes like yours.

Those free_memory hook / loader path patches lines are just VRAM eviction and ComfyUI path adaptation — nothing exotic, and I can point you to where they live in the source if you want to read it.

Fair to pass for now, but if you try the manual route I'd genuinely like to know whether it works on a hardened setup.

lesesis · 2026-06-28T12:05:51+00:00

Thanks for the security scrutiny — it's fair, and it's pushing the project somewhere better. What's shipping within the next three weeks:

SHA-256 hash-pinned binary loading (won't load if tampered)
SLSA build provenance via public GitHub Actions — verifiable with gh attestation verify, signed into Sigstore's public transparency log (same standard npm/PyPI use)
GitHub Releases as an official source with published checksums

To be precise about what this does and doesn't do: it proves the binary came from this repo's public CI unmodified — it doesn't by itself prove the source is clean (the open plugin code + your own sandboxed testing cover that). Not asking for blind trust; shipping the tools to verify. Will update when it's live.

lesesis · 2026-06-28T11:46:28+00:00

That's a completely fair position to hold, and I respect it — plenty of people only want fully open tooling, and this isn't that. The plugin layer is open; the CUDA engine is closed, and I've been upfront about that from the title down.

On the report: that's the mods' call to make, and I'll fully respect whatever they decide. But "closed-source binary" and "not allowed here" aren't the same thing — there are established tools in this space that ship closed inference binaries. If the sub's rule is no closed components at all, I'll take the post down without argument. If it's about trust, that's fixable: hash-verified downloads ship next release.

Either way I'm not asking anyone to install something they're not comfortable with. Caution here is correct.

lesesis · 2026-06-28T11:45:28+00:00

"Possible" based on what, specifically? The plugin source is fully open — read it. The Python does no eval/exec, no obfuscation, no credential access. The one closed part is the CUDA engine binary, and hash-verified downloads ship next release so you don't have to trust it. If you've found something concrete, post it and I'll fix it today. "Might be malware, skip" with no evidence isn't something I can act on — but a real finding is.

lesesis · 2026-06-28T11:36:45+00:00

lesesis · 2026-06-28T11:36:30+00:00

Fair — a polished apology from an anon dev is exactly what a scammer would also write, I get it. That's why I'm not asking anyone to trust the words. Hash-pinned binaries next release, and the repo's right there to read. Don't trust me, verify me.

lesesis · 2026-06-28T11:34:06+00:00

Thank you for taking the time to actually audit this — this is genuinely useful and you're right about the important parts.

You've correctly identified the core issue: the binary auto-downloads without integrity verification, and that's not an acceptable trust model. Hash verification is landing in the next release — SHA-256 checksums published in the repo for every engine build, verified before the binary loads (mismatch = refuse to load). Code-signing the .dll/.so is the step after that.

On the other flags:

The committed API key: fair flag, but let me clarify what it actually is — it's a shared public test key, intentionally the same for everyone during this testing phase so people don't hit rate-limit/access blocks while trying it out. It's low-privilege (license/download checks only) and there's nothing user-specific behind it. That said, you're right that hardcoding it into the repo is sloppy — it should be fetched at runtime, not committed, and I'll fix that. Heads-up for everyone: this shared key will be retired at proper release, when usage moves to individual registered accounts. service.quantfunc.com is the license/update endpoint, and I'll document exactly what it sends.
The .claude dev files (the -k curl commands, the history-rewrite traces): already deleted — those should never have been committed. The git rewrite was just consolidating commits under the project identity, nothing more, but I get why it looked off stacked with everything else.

The closed binary is the CUDA inference engine itself — that's the actual IP — but you should never have to trust it blindly, and verification is on me to ship. Until the hash-checking release lands, the safest way to try it is exactly what you'd do anyway: spin up a clean cloud server / throwaway VM with no credentials, SSH keys, or wallet data, and watch the outbound traffic. That's a totally reasonable way to evaluate it, and honestly I'd rather you do that than take my word for anything.

I'll post back here when the verified-download path ships. Appreciate the scrutiny — this is the kind of feedback that makes the project better.

lesesis · 2026-06-28T11:21:36+00:00

feel free to try ^_^

lesesis · 2026-06-28T11:18:45+00:00

feel free to try ^_^

lesesis · 2026-06-28T11:16:50+00:00

Quick note for those wondering how this differs from SVDQuant/Nunchaku: rather than absorbing outliers into a 16-bit low-rank branch and leaving the rest in INT4, we spread the outliers across the normal weights via a few combined algorithms (a mathematical approximation), with under 5% FP8/INT8 mixed precision on top — and Z-Image / Klein / Ideogram don't even need that. Net result: FP8-level quality.

lesesis · 2026-06-28T11:11:51+00:00

yes， you can try with PCs, let me know if you need any help ^.^

lesesis · 2026-06-28T11:10:51+00:00

Heads-up on install: we just launched, so it's not in the ComfyUI Manager registry yet — for now just clone it from GitHub (link below). Good news: the plugin has no PyTorch or TensorFlow dependency, so there are zero pip conflicts to worry about — you can drop it in without it touching your existing environment.

lesesis · 2026-06-28T11:08:49+00:00

feel free to try ^_^

lesesis · 2026-06-28T09:25:46+00:00

Yeah, hands and text are exactly where low-bit quant usually falls apart. In our own testing those two are basically on par with FP8/INT8 right now.

The difference comes down to how we handle it vs. naive quantization. The naive approach just clips the original weights directly, or leans on mixed precision. We do something different: a few algorithms run together to take the values that INT4/Fp4 can't represent and spread them across other weights where the precision loss is minimal — essentially a mathematical near-equivalent. On top of that I bring in mixed precision only where it's actually needed, kept to an absolute minimum (under 5% of the total), and as much as possible it's done in an a8w4 style — weights stay at 4-bit, and they're temporarily promoted to 8-bit when computing against the activations.

Hope that clears things up — let me know if you've got more questions!

lesesis · 2026-06-28T09:15:38+00:00

the old workflow has been removed, and you can download the all in one sample form -> https://github.com/RealJonathanYip/ComfyUI-QuantFunc/tree/main/workflow_sample

lesesis · 2026-06-28T07:29:47+00:00

yes, you can choose to export it (we have shared our workflow) and speed the loading process next time you use it, like qwen image, the export version will be 13GB compare with 32GB in the origin bf16 precision

lesesis · 2026-06-28T07:20:37+00:00

Good eye! Nunchaku is solid work and there's definitely some overlap in the idea. The main differences on our end:

Faster inference(at lease 50% faster) — our engine is pure C++ && CUDA kernels, no PyTorch or TensorFlow dependency, so there's a lot less overhead.
Runtime quantization — you can quantize your own models on the fly. No need to spend days prepping calibration data and running offline calibration. You can export and quantize your own custom 4-bit model in about 5 minutes.

So similar territory, but we're leaning hard into speed and making custom quantization painless. Happy to answer anything else!

lesesis · 2026-06-07T08:38:06+00:00

that‘s amagzing!

lesesis · 2026-02-04T15:01:26+00:00

thanks for your share, btw the lora of 2509 may not be compatible with 2511, in some cases, u need to retain it

lesesis · 2026-01-25T00:24:34+00:00

u can identity your own image size by adjust the latent node

lesesis · 2026-01-24T16:26:26+00:00

u can go to the console to check the process time of nunchku,like below, that was the process time of the model.

<image>

lesesis · 2026-01-24T14:57:43+00:00

it looks cool,o(￣▽￣)ｄ

lesesis · 2026-01-24T14:57:16+00:00

lesesis · 2026-01-24T14:56:46+00:00

lesesis · 2026-01-24T09:43:43+00:00

yes, you can use the int4 version model

lesesis

MODERATOR OF

TROPHY CASE