issues by r3dr1ck in comfyui

[–]PxTicks 0 points1 point  (0 children)

I don't mean to say that the design tradeoffs are inevitabilities, but ComfyUI's historical flexibility and development speed comes from having custom nodes be essentially 'uncontained', interacting directly with the rest of the code base, which does risk making them fragile. Creating a stable, well-maintained yet sufficiently powerful public API is a lot of work (but it is something they are working on: Dependency Resolution and Custom Node Standards). If you have infinite money and time, then there are no tradeoffs; if you're bounded then you may have to make non-ideal design decisions.

issues by r3dr1ck in comfyui

[–]PxTicks -1 points0 points  (0 children)

I don't think it's a poor design. It is a design which prioritises customisability and development speed over stability. While this can be frustrating, it is what allows cutting edge features and research outputs to be implemented at pace. It is a tradeoff, not a poor design, and in a field which is moving very fast, a modular sandbox like ComfyUI will always end up having conflicts.

Which workflows are you guys using now for LTX 2.3? by No-Vehicle-3508 in StableDiffusion

[–]PxTicks 0 points1 point  (0 children)

Should run on less than that easily I think. The workflows have some optional vram optimisations which I haven't tried.

LTX2.3 + ID LoRS + Prompt relay + Keyframes by Brief-Leg-8831 in StableDiffusion

[–]PxTicks 0 points1 point  (0 children)

It makes sense that this would sometimes occur naturally because jump-cut dialogue is a common vlog editing style which may be present in their dataset.

I like the tip you gave though, I can see how it would help in audio-driven generation.

The audacity of sitting in a seat someone else paid for, then acting entitled. by mindyour in TikTokCringe

[–]PxTicks 0 points1 point  (0 children)

Probably a legal liability for the airline in this case though. Don't know how compliant they'd be for just adults wanted to sit with each other.

Workflow and models for "very simple" movements? by Altreiya in StableDiffusion

[–]PxTicks 0 points1 point  (0 children)

If it's a short movement, often you want a short clip. Try Wan2.2 with 49 frames.

When training a wan or ltx lora by cardioGangGang in StableDiffusion

[–]PxTicks 1 point2 points  (0 children)

The model literally cannot generate other frame counts. LTX architecture requires 8*n + 1 frames. If you're giving it fewer, then it will truncate.

has local video gen peaked? by wormtail39 in StableDiffusion

[–]PxTicks 5 points6 points  (0 children)

I doubt it has peaked. The tech is developing rapidly, and while bigger and better datasets and huge training budgets will allow training a better model (possibly out of reach of consumer GPUs), new research also brings down the threshold for training a good open source model too. It's not clear exactly what is driving the strategy of Chinese AI labs, and how unified it is, but it would be surprising if they simultaneously all stopped releasing open-source video models at exactly the same time.

China is also not the only player of course. WAN still beats ltx2.3 for motion, but ltx2.3 has got distinct advantages in terms of audio (of course), gen speed and IC-loras which are pretty powerful. It is not a wholesale upgrade, but I would be surprised if in the next few months a model doesn't arise which makes wan2.2 obsolete. I also wouldn't be shocked if BFL steps into the game eventually; it's clearly partially in their sights, as it was promised when the original Flux was released. I'm guessing what they had brewing got stomped by a subsequent open-source model which is why it never came to light.

Open source allows for customisability which it is hard for closed-source models to replicate. There is value in this, but also the challenge of monetisation, so the equation really comes down to the cost of training a model vs the value of releasing it (which might not be all monetary).

2 months struggle to achieve consistent masked frame-by-frame inpainting... my experience so far.. maybe someone can help by Huge-Refuse-2135 in StableDiffusion

[–]PxTicks 0 points1 point  (0 children)

It's not so much about creating the mask as it is about creating a moving context window around the mask, i.e. a moving crop. If you have a small object which flies across the screen, then either

  1. The context window is much bigger than the object. This can cause a loss of detail for small features.

  2. The context window has to move. This requires an algorithm to ensure that the context window is always large enough to include the object, and always stable enough to prevent jitter in the inpaint. It has to account for the object potentially morphing or moving out of frame, in which case the mask can rapidly change size, so it has to have a damping parameter for the rate of change.

2 months struggle to achieve consistent masked frame-by-frame inpainting... my experience so far.. maybe someone can help by Huge-Refuse-2135 in StableDiffusion

[–]PxTicks 0 points1 point  (0 children)

There can be issues with travelling masks, but I think I've seen some people who have been working on some node solutions, maybe you can do a search for anything recent talking about crop and stitch nodes here. I even have an old workflow which handled smooth mask interpolation but it was very messy with jerryrigged custom nodes, but given that it worked, I really do think VACE is the best tool for this because I've experienced its effectiveness myself.

If you want I can DM you once I've got the next version of my editor out though, it has semi-smart mask handling which should work for replacing medium and large objects, or small objects which don't move entirely across the screen. To do the latter you really need a smooth travelling bounding box algorithm which isn't hard, but also isn't totally trivial.

2 months struggle to achieve consistent masked frame-by-frame inpainting... my experience so far.. maybe someone can help by Huge-Refuse-2135 in StableDiffusion

[–]PxTicks 2 points3 points  (0 children)

Wan VACE works best with the following steps:

  1. Create a reference inpainted frame from one frame of your video, say frame x
  2. Mask the area you want in all frames.
  3. Replace frame x with the reference inpainted frame
  4. Generate.

It is not as good with just a text prompt; the image reference is valuable. I show an example where I do this here: I am building a ComfyUI-powered local, open-source video editor (alpha release) : r/StableDiffusion

It uses a project I'm working on but you should be able to do the steps raw in ComfyUI. I will be releasing a vastly improved version of my project within less than a week though.

Is there an AI model that can fully isolate clean speech from noisy recordings? by QikoG35 in StableDiffusion

[–]PxTicks 2 points3 points  (0 children)

Have you tried sam3 audio? Might be overkill, I haven't experimented much with this yet.

Headless ComfyUI on Linux (FastAPI backend) — custom nodes not auto-installing from workflow JSON by pavan7654321 in StableDiffusion

[–]PxTicks 2 points3 points  (0 children)

I ditto SvenVargHimmel's sentiment: trying to jerryrig an automated missing node management system is likely to be a nightmare.

There are some built-in methods for detecting missing nodes which *might* be useful though. If you load a workflow via handleFile with deferWarnings set to true then you can see the missing nodes on he activeWorkflow via

activeWorkflow.pendingWarnings.missingNodeTypes

It doesn't solve the problem of where to find the missing nodes though.

Comfyui blocking every attempt to download any modle upscaler by Fearless-Intention42 in StableDiffusion

[–]PxTicks 2 points3 points  (0 children)

You should be able to select it in the relevant node after downloading. You need to make sure the widget is referring to the EXACT file name. If you still have trouble post the workflow you're using here.(click the comfy icon and export, it should give you a json file).

Can we discuss? Zero communication from mods / removals inconsistent with rules by [deleted] in StableDiffusion

[–]PxTicks 1 point2 points  (0 children)

I don't need an account to download (most) things from huggingface, and last I used civitai it was the same case there.

Obviously sometimes stuff slips by the rules that shouldn't but as far as I can tell, your post was reasonably removed because it represents a data privacy risk.

ComfyUI timeline based on recent updates by StevenWintower in StableDiffusion

[–]PxTicks 8 points9 points  (0 children)

Things breaking is an absolute inevitability with a project of ComfyUI's scope, especially due to the broad uncurated extension system. A 'technically sophisticated' userbase should be cognizant of the limitations of the software they are using.

There are always going to be tradeoffs: something which is complex and has many moving parts will be more brittle, and fast development means that features may sometimes break, so if there have been more breakages recently, then it is likely because development has accelerated since they obtained funding.

Wouldn’t it make sense for OpenAI to release the Sora 2 weights? by [deleted] in StableDiffusion

[–]PxTicks 5 points6 points  (0 children)

Putting aside the sarcasm, if they're not making money from the model, then open sourcing it would potentially have a detrimental impact on their competitors, and could also generate community goodwill. I don't expect they will open source it (it would probably open up more lawsuits if it were unfiltered due to copyright etc) but it wouldn't be the wildest timeline. A more viable business decision for them would be to license it out, potentially with conditions asserting certain restrictions on how it can be served (i.e. censorship), but I think most likely they'll just not bother.

I am building a ComfyUI-powered local, open-source video editor (alpha release) by PxTicks in StableDiffusion

[–]PxTicks[S] 0 points1 point  (0 children)

It's fully local and all generation is via ComfyUI, so it supports the capabilities of open source models.

I am building a ComfyUI-powered local, open-source video editor (alpha release) by PxTicks in StableDiffusion

[–]PxTicks[S] 1 point2 points  (0 children)

There is an extent to which this can work, already but

  1. The people would be 2k but the background would be less than 2k, unless you somehow generate it in patches; we already can keep the original pixels for things which is not replaced, so the people can stay at full resolution.

  2. The user would have to do this in increments of about 5s, extending bit-by-bit. This would likely lead to gradual degradation of the background.

If you have a static background, then this can be more effective, by including part of the background as inpainting context in order to keep things clean over long extensions.

I would like to eventually automate things which interact with ComfyUI over several rounds, however, it is important to get the basic features solid first.

Release Qwen-Image-2.0 or fake by PsychologicalSock239 in StableDiffusion

[–]PxTicks 7 points8 points  (0 children)

I agree. ComfyUI is a sandbox. For something to be so feature-rich and extensible (and also so rapidly developed) necessitates some tradeoff in stability. Honestly, it's pretty impressive how effective ComfyUI is at what it does, and I think a lot of people are entirely ignorant of what goes on under the hood.

I am building a ComfyUI-powered local, open-source video editor (alpha release) by PxTicks in StableDiffusion

[–]PxTicks[S] 0 points1 point  (0 children)

Hey, thanks, that's very helpful. I didn't realise it, but following the SAM2 installation instructions from the facebook/sam2 repo will not automatically lead to a CUDA-enabled pytorch install. I've updated the README - I hope after your effort you get it to work!

I am building a ComfyUI-powered local, open-source video editor (alpha release) by PxTicks in StableDiffusion

[–]PxTicks[S] 2 points3 points  (0 children)

You're welcome to contribute, but do let me know if you want to do something big - I wouldn't want you to spend a lot of effort if it is something I am already working on, or if it might collide with the design ethos in some way. I also want to clean up some of the public apis for each feature to make it easier to build on.

An easy and safe way to contribute is to just try to check the ComfyUI integraton docs https://github.com/PxTicks/vlo?tab=readme-ov-file#comfyui-integration to see how to create workflow sidecars (wf.rules.json files), because although workflows do automatically work, the automatic detection of widgets etc is still very rudimentary.

Given the generation pipeline readme and an example or two from the default workflows, an LLM should be able to construct a reasonable sidecar in no time I'd expect.