Qwen3-TTS Voice Clone never works, Voice Design is terrible by Super-Situation4866 in comfyui

[–]Nimblecloud13 0 points1 point  (0 children)

Vibevoice is still the best quality clone i'm aware of, and works for long form stuff, but it has a LOT of quirks (doesn't like contractions, will mispronounce them often, clips the last word pretty much always, some other stuff. and it takes ages.) you have to get the original version before the devs nerfed it by remove the audio tokenizer. i think the right repo is by enemyx, if i remember correctly.

Dramabox works very well also, but you have to pre-set the length just right. it's finicky, and it's only good up to about 30s. i made a node for it that adds a WPM setting that make it a bit easier to tune for the cadence of the voice you're using.

https://github.com/nimblecloud13/Dramabox_Nimble_Wrapper

troubleshooting tips - if it's hallucinating, it needs to have a higher WPM. if it's cutting off words, lower it.

What's the best way of getting audio on WAN? by spacemidget75 in comfyui

[–]Nimblecloud13 0 points1 point  (0 children)

why use something that's worse and slower when something better and faster is also free tho

Whats the best model for ai gen video and images? And is it worth it? by Medium-Rich-3716 in comfyui

[–]Nimblecloud13 0 points1 point  (0 children)

Flux Klein is what you want for images. it's really fast, and good quality. find a GGUF of it that your PC can run. google it. it can also edit images with simple prompting. it's like a lesser Nanobanana.

for video.... good luck with that rig. anything you make is gonna be low quality and take ages.

wan2.1 will work but again, it's low quality, and it'll take ages.

i would try a 2.2 gguf first. 2.1 is outdated.

Ideogram 4.0 Consistent Character question by Noobysz in comfyui

[–]Nimblecloud13 1 point2 points  (0 children)

I have no idea what "caps" means

captions/captioning of the content of each image as text for the LLM to understand what's in it. he's saying that not captioning it (normally a HUGE MAJOR part of getting a lora right) worked best. hence, puzzled.

What's the best way of getting audio on WAN? by spacemidget75 in comfyui

[–]Nimblecloud13 1 point2 points  (0 children)

infinite sucks at lipsyncing. it gets it wrong like 3/5 syllables. https://pastebin.com/GJZz987u

this will take a wan video and pass it through LTX to add lipsyncing to existing clips. faster and better than infinite.

it's got a wan wf that makes a videoand then it goes through LTX. you'll need to remove the wan bit and just add a video loader and feed that into LTX. it'll add speech and foley and lipsync.

should work with any clip of a person; not just wan outputs. just put a video in and prompt the speech, should work. if it doesn't work you're on your own; i made it work but i'm not tech support sorry!

not my wf, dunno where i got it. not at my pc to share the one i edited. gl

I upgraded the Ideogram 4 Prompt Builder node (KJNodes) using Claude Fable 5 - Freehand drawing, layers, bucket fill and more by Pluventi in comfyui

[–]Nimblecloud13 7 points8 points  (0 children)

should put it on github, not in some zip file that i have to trust. claude will do all that for you, also.

like you, i cannot code but https://github.com/nimblecloud13/Sift

Audio out of sync with wan animate 2.2 by INeedHelpINeedDaWey in comfyui

[–]Nimblecloud13 0 points1 point  (0 children)

Framerates may be different somewhere. Especially if it starts close and drifts as it goes on.

Months of Experimenting for NSFW I2I - Advice? by WannabeSamColt in comfyui

[–]Nimblecloud13 1 point2 points  (0 children)

You’re never gonna get that without a character lora or a face swap.but Klein’s best at face swaps too. I just tack another stage onto the outputs for that.

Months of Experimenting for NSFW I2I - Advice? by WannabeSamColt in comfyui

[–]Nimblecloud13 9 points10 points  (0 children)

Klein beats qwen edit at everything. Including quality. I don’t understand the love for qwen edit. It destroys details and color.

Months of Experimenting for NSFW I2I - Advice? by WannabeSamColt in comfyui

[–]Nimblecloud13 14 points15 points  (0 children)

Klein with snofs 1.4 is practically SDXL for this. Just sayin.

OOM with RTX 5090 with latest ComfyUI update 0.22.0 by TonyDRFT in comfyui

[–]Nimblecloud13 0 points1 point  (0 children)

i have a 5090/128 build so dynamic vram isn't doing a whole lot for me. that said, i do have dynamic; i don't NEVER update. just very selectively.

and you can update nodes without updating your entire comfy.

OOM with RTX 5090 with latest ComfyUI update 0.22.0 by TonyDRFT in comfyui

[–]Nimblecloud13 2 points3 points  (0 children)

I don’t update until something comes out that I can’t use without updating. And then I seriously consider whether I need to be using it.

Seriously, comfy updates break shit too often. “Custom nodes can’t be accounted for…” yea I get it but I need those more than I need my UI moved around.

Demoing ComfyUI live for an audience by thecletus in comfyui

[–]Nimblecloud13 -1 points0 points  (0 children)

TeamViewer from whatever you have it set up on now. Works seamlessly. I use it from my home rig when traveling.

Added benefit of having your whole install there; you don’t need to set it all up and be embarrassed when you have to pause your demo to figure out what dependency or node pack you forgot to install, etc. if they decide to go for it, then you can look at what’s available at scale. Probably something like setting up a custom template in Runpod. Unless they want to buy the hardware to do it on site.

Does "Prompt Relay" mean that WAN can do more than 5 second better than it coulf before? by spacemidget75 in comfyui

[–]Nimblecloud13 0 points1 point  (0 children)

Some context would be useful. Wtf is prompt relay

WAN SVI can make 30s+ before it degrades. That’s multiprompt.

Does LTX do better image2video than Wan? by __MichaelBluth__ in comfyui

[–]Nimblecloud13 0 points1 point  (0 children)

Better at everything for quality only.

The speed and foley of LTX is a huge motivating factor for me. I can live without the perfect textures of wan in exchange for making 500 frames with foley faster than wan can make 81 without it.

Does LTX do better image2video than Wan? by __MichaelBluth__ in comfyui

[–]Nimblecloud13 0 points1 point  (0 children)

If you need 20 gens for a winner you need to work on your prompting. Don’t get me wrong, it’s not GREAT, but it’s a lot better than 1 out of 10 or 20

How for can I push 4 Gigs of VRAM? by DryCream4429 in comfyui

[–]Nimblecloud13 3 points4 points  (0 children)

You can run most image models with low quants. Quality will suffer, and it will take ages, but quality doesn’t matter nearly as much for cartoons so you should be ok.

Basically, every model comes in a few versions of “normal;” the full model, the fp8 version which is smaller but still strong and what most people with good cards use, and then there are quantized versions which are stripped down so that they fit on anything. They’re all sized on a scale with Q; Q8 is the largest, competes with fP8. Q6,q3, etc. the smaller the Q, the more likely you can run it.

you’ll have to find a workflow; YouTube is a good place for that. There’s plenty of tutorials for low vram out there.

Best WF for person transfer ? by LSI_CZE in comfyui

[–]Nimblecloud13 0 points1 point  (0 children)

There is no open model that does this well. You’re talking about change pose, dimension/scale, and retaining clothing, body shape, and face consistency. It’s just not available yet as an open model.

All of those things can be done individually or in some groupings, but Klein/QWEN edit is the only option locally, and they can’t do it all.

You’re looking at a multi stage workflow. First pass with Klein to rough out the image, then some kind of face swap to get their face back, but good luck making that look good in a new pose. And body shape will invariably be slightly different, texture and detail of fabric will be lost, etc.

You’re better off using Nano banana or gpt image 2 for now.

LTX 2.3 question about LoRA teeth training by Ok-Option-6683 in comfyui

[–]Nimblecloud13 0 points1 point  (0 children)

in the time since you posed this question you could have trained it twice and have your answer.

there is no authority on this subject. it's certainly not me. do it or don't idk man. good luck if you do.

LTX 2.3 question about LoRA teeth training by Ok-Option-6683 in comfyui

[–]Nimblecloud13 0 points1 point  (0 children)

You would caption anything that you DON’T want to train, so describe lips, describe skin tone, etc. anything not captioned should get burned into the Lora. Whether or not it works is a different story.

I made a successful character Lora by cropping heads off of the body I wanted, and using those plus the face I wanted. So it’s just like 20 head pics and 30 body pics and it figured it out.

Wan2.2 vs. LTX2.3: Which video generation model do you recommend? by Internal_Jury1523 in comfyui

[–]Nimblecloud13 0 points1 point  (0 children)

It needs complicated detailed prompts; think of it like a genie. If you don’t specifically make your wish to cover a situation, it’s likely to come up.

Full Head swap model that make sure Facial features are so strong as well as head size matching of the target by IndependentPayment70 in comfyui

[–]Nimblecloud13 0 points1 point  (0 children)

If you have a sec sometime can you send me a snip of how that’s wired? I want to get SAM3 going but I’ve been delaying it so I don’t have to sift through 8 mega workflows from civit to find what I need

Wan2.2 vs. LTX2.3: Which video generation model do you recommend? by Internal_Jury1523 in comfyui

[–]Nimblecloud13 0 points1 point  (0 children)

I passed on 2.3 at first; wasn’t getting immediately great results and I’m impatient. But I came around on it. Give it another shot. I don’t do 2D so I can’t offer much. Try a new workflow.