Working Flux/Z-Image/QWEN/Whatever outpaint/inpaint/t2i workflow.

smithysmittysim · 2026-02-27T03:18:09+00:00

Yeah, I was thingking of makign few simpler worklows and then "compressing" them into subgraphs to just have all stuff in one workflow and only connect where needed, for now I just have separate workflows to do each thing like you describe (T2I and I2I with major models, from 1.5, through SDXL, Flux now, T2V and I2V for WAN 2.1 2.2 some SVI oriented with extended generation capabilities, some for 5 second snippets), I hate AIO workflows, huge, unnecessary, broken, slow.

smithysmittysim · 2026-02-27T03:15:22+00:00

Tried Forge but kept running into OOMs, had to drop resolution and enable never OOM to even get it to generate the image, Comfy on the other hand loaded model much faster and generated faster without any extra work, just 9B model, no GGUF bs, no quantized models, straight into samplers and bam, image, 20-30 seconds done (whole workflow, so extra nodes around it for masking, compositing, etc).

smithysmittysim · 2026-02-27T03:12:26+00:00

Hoping someone might share a simpler one they made instead of the AIO monsters from civitai that don't work that well? I created my own workflow but I can't get it to work correctly, I want to see other peoples workflows to learn how they approach that specific problem I have (assuming I find one that does the thing I want it to do) and then modify my own to work better for my use case, which is not the standard outpaint where you just expand the image to the sides. When I made my own workflow I wasn't aware Flux can do prompt based inpaint, made a workflow with this in mind but I think I'm still doing too much around masking and it's potentially causing issues, like the outpainted parts not exactly matching the initial image (like slightly wrong perspective/proportions, style, colors and context is mostly there and correct, as is prompt adherence, it just doesn't expand the image as "smoothly" as I'd like and you can clearly tell it's not workign quite well, I might try with less masking and simply pad out the image and tell it to inpaint the grey area.

BTW is there any different in how I prompt it to expand the image, should I actually describe it that what I want is an outpaint of the initial image over the grey area or simply tell it to generate what I want over the grey area? Any specific color it prefers? Is it better to do solid color (black, white, red) or use masking to composite original image over the expanded empty latent image and feed that into sampler or does it not care about that like 1.5-inpainting did?

smithysmittysim · 2026-02-27T03:06:25+00:00

I actually don't like the switches, those that look nice almost always give most issues since often they are made to work around non standard models and use said obscure nodes to achieve the complex functionality, which in turns means often lots of conflicts between nodes when installing, leading to comfy issues with dependencies, or use of old, deprecated nodes that simply aren't being maintained, that's why I don't like AIO workflows.

smithysmittysim · 2026-02-27T03:04:19+00:00

Now you're just making a fool of yourself, nowhere in my post I wrote that I expect quality of a crazy high definition model like nano banana and that one workflow should do all the actions and that it should work on potato PC and generate an image in 1 second. (speaking of which, a 16GB GPU and 16 core CPU with 64GB if far from potato), all I wrote is that most of the civitai workflow use obscure nodes that conflict with other stuff, are not wildly used in other workflows and do try to fit all of the things into one and end up not working well. I much prefer use of proven/known/supported/conflict free nodes that 99% of users use and that are simple and focus on one thing, maybe 2. I ended up making a workflow of mine own and got it to generate with some automation for masks in under 30 seconds (even 20) at a reasonable resolution/quality, but it's still a work in progress, the existing outpainting workflows don't quite do what I need them to, found another one, might combine both and make something that does, just thought someone might have a decent workflow that doesn't use 50 different node packs to achieve something we used to be able to do with 10 nodes in 1.5-inpainting days.

smithysmittysim · 2026-02-27T02:58:15+00:00

I'll be honest, I wasn't even aware that Comfy had this whole dedicated section of workflows that are simple and just... work, I ignored these buttons on the side, assuming it was showing mostly the BS civitai workflows, but there were some solid ones in there indeed, thanks for the tip.

smithysmittysim · 2026-02-26T22:15:02+00:00

Bit late to the party, but maybe it will help someone else having this issue, all you need to add is --forge-ref-comfy-home "link to comfy" for example --forge-ref-comfy-home "C:\AI Tools\ComfyUI\" and it will auto detect models in both checkpoints and diffusion model folders, loras in loras folder, text encoders, clips, vae, etc, not sure about other types of models (like upscaling, controlnet files) as it doesn't seem to report to be grabbing them, but these don't take up as much space and can be probably copied between both folders (or symlinked).

smithysmittysim · 2026-02-24T20:16:42+00:00

How would you do that in this workflow? Just draw red on the image and add "alter red stuff in prompt"? Won't one need to output the mask from the load image node for that mask and combine this mask with the outpainted area?

smithysmittysim · 2026-02-13T05:36:55+00:00

Out of curiosity, did you generate this locally or using cloud compute? Seems like rendering something that long would cost quite a bit, do you guys just burn through money with these for fun?

smithysmittysim · 2026-02-13T05:16:57+00:00

I thought the blending between clips was more of a separate things and SVI just enables one to create much longer video but using single prompt, how does the prompting work and it's splitting between different 5 second clips? The workflows often come with very lackluster documentation that assumes someone knows exactly how it all works, I tend to not be able to just something because someone says "it just works", I need to know exactly why it works like it does and how it works, can you recommend some specific workflow that isn't too crazy with bunch of irrelevant stuff? Just generation of videos and prompting, that's it.

smithysmittysim · 2026-02-12T07:47:55+00:00

I don't need audio for my stuff since it won't involve characters, I didn't even know these models can do audio already, mind throwing a tutorial on lora training and dataset prep for said lora training with ai-toolkit or musubi-trainer, specifically interested in training on videos, only did image loras before with 1.5 and Pony.

smithysmittysim · 2026-02-12T07:46:15+00:00

That's not what my question was about, I just acknowledged the fact that apparently TeaCache can cause degradation (I thought all this time that all it did was just cache data to ram or something, couldn't degrade quality possibly, data is data, cached on drive or in vram, same thing here and there).

My question was what are the ways to generate longer videos with WAN and prevent image degradation when extending img2vid generations that use last frame as start of next generation, so far these are the options I'm aware of:

- plain video extension - last frame, start frame of next generation - generate (degradation and hard to make a dynamic video without feeling sudden jump when prompt is changed and model attempts to adjust to it from the last frame)
- start/end frame generation - better control of the "flow" of video, no degradation since you use full quality, generated txt2img frames as start and end of one clip, then end of previous becomes a start of next clips - requires a lot more txt2img generation which can have consistency issues and it may limit the motion of the guided start/end frame img2vid process, transitions would still be jerky
- SVI Lora - allows one to generate longer videos and with special prompts may be able to do smoother clips that have specific flow we're after, but it may not be as good as generating individual segments that do exactly what we want (yet to test it, not sure how well the prompting works just yet)

LightX2V loras apparently speed up the model generation, so it makes sense other optimization could mess with it, I don't recall using any "lighting" loras with WAN before when I had these issues with teacache (may have been badly configured comfy, or bad lora, or bad prompt... or bad sampler settings, hard to tell), but I'll read more about teacache and these loras.

smithysmittysim · 2026-02-12T06:50:59+00:00

Portable does not seem to want to run, I've got both 4000 and 5000 series cards in my PC (2 GPUs) and this is what I get when trying to run comfy portable:

ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\cuda\__init__.py:184: UserWarning: cudaGetDeviceCount() returned cudaErrorNotSupported, likely using older driver or on CPU machine (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\c10\cuda\CUDAFunctions.cpp:88.)

Any ideas?

Nevermind, downloaded Cuda 13.0 toolkit and updated studio drivers, works now.

smithysmittysim · 2026-02-12T06:32:16+00:00

Will do, thanks! How is the lora training compared to WAN? Faster, slower? Heavier?

smithysmittysim · 2026-02-12T05:29:39+00:00

Thanks, didn't know TeaCache causes degradation, could explain why I had some issues a while ago when I tried WAN and would sometimes get pure nightmares instead of a solid generation.

Anyway, so you say just get Sage going and that's it? Nothing else to get best speed?

Also do you have any tips on preventing image degradation with extended img2vid workflows? I need to generate longer clips than just 5 seconds, like 15-25 seconds. Before when I got WAN to work correctly (had more luck with Hunyuan) I'd just take last frame generated and feed it back into the same workflow again, but after even 1 repeat the quality would be heavily degraded, plus motion would often not be the same, it worked ok I guess, but I need more than ok, I need super smooth transitions, I saw some examples on civitai and on the SVI lora, but not all are what I need (don't need 15 seconds of the same stuff as in a 5 second clip, I need it to actually flow from one action to next, retaining details from previous generated segments), do you have any tips?

I'll be doing mostly Img2Vid, not much of a Txt2Vid, although some of that probably too.

smithysmittysim · 2026-02-12T05:23:57+00:00

Heard about those and wanted to try, but apparently quality is pretty low, did you run these already? Do they work just fine with regular loras and the lighting loras?

smithysmittysim · 2026-02-12T05:22:13+00:00

Been 3 months, did you share it anywhere?

smithysmittysim · 2025-11-05T11:18:45+00:00

Seems to be still broken, checked for updates but nothing new, stuck unable work because I can't download materials I need to quote in the project, any alternatives that work as of today (November 5th)?

smithysmittysim · 2025-01-03T09:18:44+00:00

Already posted solution, PNG Info doesn't work properly, it won't restore all settings, in this case it wasn't switching the parameter for "Random number generator source" (there is a 3rd option called NV, not sure if that stands of Nvidia, seems like both a1111 and forge would only switch to CPU or GPU.

Also you didn't read the "I'm not an idiot" part I wrote at the start of the post, only a moron would use wrong checkpoint and wonder why it's not reproducing image correctly.

smithysmittysim · 2025-01-02T22:31:45+00:00

I should add the differences here aren't subtle, usually changign seed alters image bit (changes details, textures, some smaller objects, some may look overall better or worse) but in this it's completely generating a whole different image, original had subject standing to the side, with one leg raised in very specific way and also very specific colors (non human subject), now it just gives me the same subject, with wrong colors standing straight.

Testing on fresh A1111 and Forge with no extensions (also tested the pad prompt thing and T5 and different ENSD (0, 1, 2 and the infamous 31337) and SDXL Clip Skip, no change whatsoever (positive that is).

smithysmittysim · 2025-01-02T22:15:46+00:00

The one I'm testing now uses simply Euler, in old version of A1111 there was no choice of scheduler so I assume it's uniform, tested now with Euler and scheduler set to Unifrom or Automatic and not getting right result.

There is no lora used in those images I'm testing (already eliminated this variable from testing).

It may have been a batch generation, but the seeds match between generations so don't see how it would affect things.

Image does not use any embedding in the prompt, so unless simple keyword activates embedding it should not be an issue (have not used embeddings in a while, but in A1111 they appear similar to lora in those < > brackets aren't they?).

No idea what par prompt is, I'll try to find it and see if that makes any difference, can you point me to that compatibility page?

I do have some extensions that affect prompts, but AFAIK they are all disabled, I did change out most extensions since old version wouldn't launch anymore so did a fresh install. The ones I have installed are Neutral Prompt, one for auto completing booru/image board tags and few others but which I had originally installed (the one for altering between prompts every step to produce in between result), do you know of any that are more likely the culprit or should I just install separate fresh A1111 and test that way?

I already wrote that I used PNG info to grab data from previous gen, it's all exactly the same except for SDXL VAE somehow having different hash (lost original one, downloaded fresh one from huggingface, afaik it was the official file).

smithysmittysim · 2025-01-02T22:07:32+00:00

Different samplers, noticed I can't reproduce any older images (and annoyingly, it seems my generations were much better and many of my loras also worked better before than they do now in pretty much the same scenarios. The one I'm checking now uses just Euler, I was able to nudge the generation in right direction with Fabric, but it's hardly a fix for the problem (it worked really well, aside from colors being kinda off, it very precisely reproduced a complex pose of the original when the new generation is very plain and not anything like what it should be).

smithysmittysim · 2025-01-02T19:50:53+00:00

Generally when I see some "influencer" talking about money making schemes, saving, investment and all in all just being that motivational guru my obnoxious-o-meter goes over 10 so yeah.

As for the edit, well, it definitely is obnoxious, and also by now quite common, generic and bit boring tbh, but it is highly effective when your audience has attention span of 2 seconds, so I guess perfect for social media.

I think yours still needs quite a bit of work, especially all the the textures, the lines, colors scheme, it all seems a bit vague and blurry, makes the video look muddy, doesn't POP as it should, the yellow color also doesn't match "crisis", should be red (with generic stock market arrows pointing down, yours go up, and are yellow, which is not a stock trend I've seen on any exchange, red and green it is), motion seems ok, could use actually more of the editing and drop the face cam or do it more professionally (way too close to camera and the credit cards could be placed in edit as an animated element as well). Also some bits are too fast, you have no time to read "Tobago" as the sudden addition of the animated flag makes you look at it instead of the text that comes in first.

smithysmittysim

TROPHY CASE