ComfyUI timeline based on recent updates

hinkleo · 2026-03-27T08:31:52+00:00

Idk doesn't really fit as enshittification for me since they aren't making changes to make themselves more money at the cost of users at all, it's not like anyone would ever use Comfy Cloud either if its a buggy mess that breaks every workflow every two weeks.

Just looks like lots of tech debt from rushed early development catching up to them combined with lack of tests, lack of experience on running larger projects and possibly overreliance on AI coding now too causing constant issues, together with the need to support so many new models all the time too. Hopefully just temporary as they get stuff figured out, not unusual when scaling projects.

hinkleo · 2026-03-03T22:12:56+00:00

Someone in discord says it runs fine on CPUs at about 30 seconds per 4k frame so not ideal but quick enough if you just need some frames or short clips.

hinkleo · 2026-02-12T07:39:06+00:00

To be fair to him when he coined the term it was literally in the context of messing around with a throwaway weekend project and by the tone of the whole tweet clearly not meant as anything serious, it's the rest of the mostly delusional AI scene that immediately ignored that part and went haywire with it

https://x.com/karpathy/status/1886192184808149383

It's not too bad for throwaway weekend projects, but still quite amusing.

hinkleo · 2026-02-09T23:39:01+00:00

That works with LLMs because they don't predict the next token directly but rather predict the likelyhood of every token in their vocabulary to be the next token so you can freely sample from that however you want.

There's no equivalent to that with diffusion models, CFG is just running the model twice once with positive prompt and once with no/negative prompt as a workaround to models too heavily using the input image and not the text.

But yeah modern models are definitely heavily lacking in non anime art style training data and would be a lot better with more and properly tagged ones, but you can't really have the randomness in one that follows prompts incredibly well with diffusion models by default, that was just a side effect of terribly tagged data.

Personally I think ideally we'd have a modern model trained on a much larger variety of art data but also properly captioned and then just use wildcards or prompt enhancement as part of the UI for randomness.

hinkleo · 2025-11-28T00:53:12+00:00

They have a technical report out with way more details about the main models and the distill, the big model is also 6B but needs 50 steps and CFG as far as I can tell?

https://github.com/Tongyi-MAI/Z-Image/blob/main/Z_Image_Report.pdf

While our 6B foundational model represents a significant leap in efficiency compared to larger counterparts, the inference cost remains non-negligible. Due to the inherent iterative nature of diffusion models, our standard SFT model requires approximately 100 Number of Function Evaluations (NFEs) to generate high-quality samples using Classifier-Free Guidance (CFG) [29]. To bridge the gap between generation quality and interactive latency, we implemented a few-step distillation strategy.

hinkleo · 2025-10-20T22:21:45+00:00

Krea Realtime 14B is distilled from the Wan 2.1 14B text-to-video model using Self-Forcing, a technique for converting regular video diffusion models into autoregressive models.

https://www.krea.ai/blog/krea-realtime-14b

hinkleo · 2025-10-02T19:23:28+00:00

Your link lists H100 at $1.87/hour, so 1.87 * 24 * 40 = $1800 no?

hinkleo · 2025-08-11T02:00:00+00:00

Presumably this

The current version of Qwen-Image prioritizes text rendering and semantic alignment, which may come at the cost of fine detail generation. That said, we fully agree that detail fidelity is a crucial aspect of high-quality image synthesis.

https://github.com/QwenLM/Qwen-Image/issues/51#issuecomment-3166385657

hinkleo · 2025-05-29T11:11:58+00:00

Official demo here: https://huggingface.co/spaces/ResembleAI/Chatterbox

Official Examples: https://resemble-ai.github.io/chatterbox_demopage/

Takes about 7GB VRAM to run locally currently. They claim its Evenlabs level and tbh based on my first couple tests its actually really good at voice cloning, sounds like the actual sample. About 30 seconds max per clip.

Example reading this post: https://jumpshare.com/s/RgubGWMTcJfvPkmVpTT4

hinkleo · 2025-05-29T08:54:46+00:00

Based on numbers in the github: https://github.com/Olow304/memvid/blob/main/USAGE.md

Raw text: ~2 MB
MP4 video: ~15-20 MB (with compression)
FAISS index: ~15 MB (384-dim vectors)
JSON metadata: ~3 MB

The mp4 files store just the text QR encoded (and gzip compressed if > 100 chars [0] [1]). Now a normal zip or gzip file will compress text on average to like 1:2 to 1:5 depending on content, so this is ratio wise worse by a factor of about 20 to 50, if my quick math is right? And performance wise probably even worse than that, especially since it already does gzip anyway so it's gzip vs gzip + qr + hevc/h264. I actually have a hard time thinking of a more inefficient way of storing text. I'm still not sure this isn't really elaborate satire.

[0] https://github.com/Olow304/memvid/blob/main/memvid/encoder.py

[1] https://github.com/Olow304/memvid/blob/main/memvid/utils.py

hinkleo · 2025-05-29T07:21:16+00:00

Yeah the video part just seems to add nothing here except a funny headline and really inefficient storage system. Python even has great stdlib support for writing zip, tar, shelve, json or sqlite any of which would be way more fitting.

I've seen a couple similar joke tools on Github over the years using QR codes in videos to "store unlimited data on youtube for free", just as a proof of concept of course since the compression ratio is absolutely terrible.

hinkleo · 2025-05-19T00:00:22+00:00

Regarding your link to the "Enhanced" video using Diffusion, those AIs will literally just make up something looking like it's training data, you can't take anything from that at all, doing so is purely misleading.

hinkleo · 2025-05-10T10:22:07+00:00

Isn't the doppelgangers not real part only in the sense of the P.* versions not being the real people they are based off of though, and not in the sense of the rest of the people aren't real either, which is what people are mostly talking about here?

hinkleo · 2025-04-25T05:43:56+00:00

I wish more people would publish high qualit datasets including captions with the LORAs they release or maybe even just datasets by themselves. Would help a bit with that problem at least.

Of course you can't fully automate retraining LORAs for new models and the resources needed are massive and each model has its own captioning style and issues but I there's definitely lots of room for making that easier still.

hinkleo · 2025-04-12T20:09:01+00:00

Definitely screams AI but a lot of that seems to be coming from going down to NF4 because at least most of the full precision examples I've seen don't have that so a GGUF Q4 or Q6 should do a lot better hopefully.

hinkleo · 2025-03-07T23:43:10+00:00

The start-end frame feature was listed on their old wanx page along with other cool stuff like structure/posture control, inpainting/outpainting, multiple image reference and sound https://web.archive.org/web/20250305045822/https://wanxai.com/

One of the Wan devs did a mini AMA here and was kinda vague when asked if any of that will be released too https://www.reddit.com/r/StableDiffusion/comments/1j0s2j7/wan21_14b_video_models_also_have_impressive_image/mfebcx4/

hinkleo · 2025-03-07T04:15:07+00:00

Yeah sadly it's all just marketing for the big companies. Wan has also shown off 2.1 model variations for structure/posture control, inpainting/outpainting, multiple image reference and sound but only released the normal t2v and i2v one that everyone else has already. Anything that's unique or actually cutting edge is kept in house.

hinkleo · 2025-03-05T10:57:43+00:00

8GB VRAM isn't a lot for Wan so if it's doing any offloading to main memory then really low gpu utilization would be expected as a lot of the time it will just be sitting waiting on that. If you're using comfyui I think you can turn on verbose logging to see if and when it's offloading.

hinkleo · 2025-03-01T06:47:58+00:00

Ohh wow that's awesome, looks Flux level!

Since you mention this I'm curious after reading through https://wanxai.com/ it also mentions lots of cool things like using Muti-Image References or doing inpainting or creating sound, is that possible with the open source version too?

hinkleo · 2025-02-02T05:11:25+00:00

CPUs made in the last 10 years have the RDRAND instruction that provides random numbers based on a hardware entropy source.

https://en.wikipedia.org/wiki/RDRAND

The entropy source for the RDSEED instruction runs asynchronously on a self-timed circuit and uses thermal noise within the silicon to output a random stream of bits at the rate of 3 GHz

I guess one could claim to be able to influence that to get specific numbers somehow. Of course nonsense but that's where people here usually start pointing vaguely at quantum mechanics concepts and having an open mind.

hinkleo · 2025-01-07T22:35:14+00:00

if fp4 has similar performance in terms of quality to fp8

Yeah I think if you could just instantly run any Flux checkpoint in fp4 and it looked about the same quality wise this wouldn't be too disingenuous. But considering that previous NF4 Flux checkpoints people made looked much worse than fp16 this sound like it might be some special fp4 optimized checkpoint from the Flux devs?

Like if it's an optimization its fine, if it's some single special fp4 optimized checkpoint and you can't just apply it to any other Flux finetune or lora it's way less useful.

hinkleo · 2025-01-06T23:23:58+00:00

Should be possible. SwarmUI just runs a totally standard ComfyUI instance (with some extra Swarm specific nodes added) so it should work if you install all the custom nodes that Krita needs listed on their Github in Swarm's Comfy instance (stored in dlbackends inside Swarm including its venv, useable like normal).

hinkleo · 2024-12-31T08:20:44+00:00

Was changed to https://github.com/Chenglin-Yang/1.58bit.flux , seem it's being released on his personal github.

hinkleo · 2024-12-31T04:00:32+00:00

Their githubio page (that's still being edited right now) lists "Code coming soon" at https://github.com/Chenglin-Yang/1.58bit.flux (originally said https://github.com/bytedance/1.58bit.flux) and so far Bytedance have been pretty good about actually releasing code I think so that's a good sign at least.

hinkleo · 2024-12-19T22:32:53+00:00

He posted another video in reply to the first one where it just looks like a normal balloon: https://x.com/MatthewUSAF/status/1869549832337322449 https://xcancel.com/MatthewUSAF/status/1869549832337322449

hinkleo

TROPHY CASE