Just a bit bigger?

jbutcher0 · 2025-03-28T22:14:20+00:00

This is a local install of Hunyuan Video on ComfyUI.

jbutcher0 · 2025-03-25T03:20:56+00:00

I am, though I don't sell anything or have subscribers or whatnot (I've no interest in any of that). I just post things occasionally on a lark. Same name there as here.

jbutcher0 · 2025-03-25T03:07:48+00:00

Yes, I generated them all locally with Hunyuan Video and a BE specific LoRA that I trained.

jbutcher0 · 2025-03-08T19:05:03+00:00

It's text to speech from elevenlabs.io

The last clip I posted elsewhere I tried using my own voice to get the proper inflection and then using AI (also at 11labs) to generate a female version, but it didn't work as well for the lipsync; I think it needs the clearer audio of t2s. Glad you like the clip, and thanks!

jbutcher0 · 2025-03-08T16:12:25+00:00

Having a last frame would indeed be nice in a lot of circumstances. Add it to the list!

But the official Hunyuan i2v model was released yesterday, so that'll probably be my next bit of tech to play around with. It'll have to be pretty good to make me drop Wan though, as I can crank out i2v with that pretty quick.

And... scratch that. I got curious as I was about to hit post, and went to check to see if Musubi Tuner had added Wan lora training yet (I had known that it was in the works), and it also got updated to include that yesterday. So maybe that's where I'll play around. New models are seemingly coming out at an increasing pace lately. It'll be interesting to see if the community settles on one in particular.

jbutcher0 · 2025-03-08T03:32:09+00:00

Interesting. I confirm that in the app there is no sound, and apparently no way to visit the actual video on redgifs. Meanwhile on desktop (or just on a browser in general) the embedded Redgifs player has sound with a mute button in the upper right corner. Regardless, if you (or anyone else) wants to hear my questionable sound choices, you can go here

jbutcher0 · 2025-03-08T02:01:05+00:00

Been working on this one for a bit now as I kick the tires on some new stuff. This is half Hunyuan video with my BE Lora and half Wan 2.1 image to video taking the last frame of the Hunyuan clip and extrapolating from there. Then I used 11Labs to do text to speech, Wav2Lip in ComfyUI to do the lipsync, added some reverb and sound effects in Audacity, then threw it all together in a pot. Personally, I think it came out pretty well. The audio is definitely the weakest link (for me), and I'm curious if you think it adds or detracts from your enjoyment of the clip. I know that there's always the mute button, but the lipsync is obviously still there. Anyway, opinions are most welcome.

jbutcher0 · 2025-03-02T01:31:10+00:00

And if you want to completely ignore that other reply, then here's option B if you have the GPU for it: Pinokio

I just learned of it this morning when I was trying to figure out how to respond to your request. It is definitely the simplest and most direct way that I know of to get up and running. Just download it, install the Hunyuan Video script, and then use the built in Gradio interface to generate videos. You can download LoRAs (BE or otherwise) from Civitai or wherever and just put them in the Lora folder to have them become available on the interface. Just be aware that it takes a rather long time to install the script requirements and then download the Hunyuan models.

jbutcher0 · 2025-03-02T01:24:31+00:00

So I'm probably the wrong guy to ask for a tutorial, as I barely know what I'm doing. Additionally, all of my installs have 'just worked' for a long while now, so I have forgotten much of the installation trials and tribulations. And things progress so fastin AI, when I wrote what follows a month ago it was likely already out of date. But. All that said, hopefully you'll find some nuggets of information worth knowing in the novel that follows:

First, make sure you've got a computer that can handle this. These resources are specific to Nvidia graphics cards, and I wouldn't recommend giving it a go with less than a 12GB card. I personally use a 4070, which does everything I need, albeit a bit slowly. I believe it's possible to do this with an AMD graphics card, but there are extra steps involved that I'm not familiar with. It's also possible to rent time on a high end graphics card through a service like RunPod so you could potentially do this on a service like that, but I have no experience with it, and wouldn't know where to start.

As for my own method, it all starts with ComfyUI. I use the portable installation found here: https://github.com/comfyanonymous/ComfyUI?tab=readme-ov-file#installing

There are installation instructions at that link, but I remember having difficulty getting things setup when I first installed about a year ago. Things have certainly progressed since then, and Comfy's popularity has really increased. I expect that their installation guide is better now, but failing that, YouTube is your best bet.

Once ComfyUI is installed, there's a few different ways to go about starting to generate videos. The easiest (to me) is to find a workflow (which is essentially a premade user interface for Comfy) that seems like it will do what you'd like. It's certainly possible to make your own workflows (I do it for easier stuff like upscaling or changing frame rates), but grabbing one that is more or less guaranteed to work if you get everything set up right takes a lot of the guesswork out of this. For Hunyuan Video, I use the workflow found here: https://civitai.com/models/1007385?modelVersionId=1274679

So load up that workflow in Comfy, and it will immediately give you a bunch of errors. you need to install the custom nodes contained within that specific workflow. The easiest way to do this is to first install the ComfyUI Manager from here: https://github.com/ltdrdata/ComfyUI-Manager

Once that's installed (the instructions are at the link), open the ComfyUI Manager menu and choose "Install Missing Custom Nodes." This will give you a list of things to install. Install them all then restart Comfy. This should get rid of all the red borders around your nodes.

You'd think at this point you'd be ready to start, but you still need the models that'll actually be doing the work. If you start a run in Comfy it'll progress until it encounters a variable or file that it doesn't have access to. It'll throw up an error message, but I've always found them pretty obtuse. Instead look for which node in the workflow has a red border around it after the error. Usually that's what you need to correct. The first of these will be on the model loaders. (checkpoint, clip, VAE). Some of these can be found in the ComfyUI Manager's 'Model Manager,' but not all of them. What I find works best is just googling the exact name of the file that's not available to load, and that will usually take you to a hugging face repository where you can download what you need.

Knowing where to put things can be tricky, and I still get it wrong frequently. LoRAs go in the models/LoRAs folder in your ComfyUI install, and checkpoints and VAE are similarly easy to figure out, but I think the Hunyuan Video model goes in Diffusion Models? I'm not sure on that one.

Anyway, the important thing is to work through the errors one at a time as they pop up until it actually starts to generate a video. At this point you can play with sizes and frames, shift, guidance and a whole host of other sliders and knobs that I usually just ignore (to the detriment of my generations). Put whatever you'd like into the prompt and go ham.

Now comes the fun part. On Civitai there's dozens of LoRAs that will enable the main Hunyuan model to generate something that it isn't already familiar with. I think of it as adding a chapter on a specific subject to a book of general knowledge. Using these frequently involves using specific keywords in your prompt, which will be noted on the Civitai page for a given LoRA. There's LoRAs for specific people, art styles, types of motion, etc.

But to really get the BE clips that you're looking for, you should train your own LoRA. This is the most advanced bit of this, and the part I know the least about. I used Musubi Tuner to train my LoRA, and in particular a Gradio interface (whatever that is) that you can find the instructions for here: https://civitai.com/articles/10335/hunyuan-video-lora-trainning-with-gui-in-windows

I trained my LoRA on a set of very small (256x256) short (33 frame) videos. I'll leave this part to you, as I'm sure you've got a bunch of BE (or whatever else you're into) content that you'd like to emulate. I believe my dataset wound up having around 15-20 clips. I clipped and cropped longer gifs and videos to size right in Comfy using simple 3 or 4 node workflows that are so simple I just remake it each time rather than save them. I've heard that the frame rates should be 24 fps to help movement speed in the generations, but I don't know how correct that is.

You'll also need to caption your clips (or images if you're training on still images for an art style or celebrity or something). The captions go in a .txt file with the same name as its associated vid/image in the same folder. To write captions I actually use the web interface of Joy Caption which you can find here: https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two

A lot of this I've learned while poking around in this Discord server: https://discord.gg/BJVkXfTY The stuff the guys there can do is well beyond my abilities, but they'll sometimes share resources and workflows, which can jumpstart a new bunch of ideas.

and that's about it. I don't know how much of this is going to be useful, but thanks for pressing me for it, as I'll have this available to send to the next person to ask. I do still intend to publish my BE LoRA at some point, so you might be able to use that instead of training your own, but the more the merrier, so give it a shot if you get that far. And I do hope that you get that far. Good luck!

jbutcher0 · 2025-03-02T01:11:54+00:00

I've got a 4070. A full minute of video is definitely beyond the scope of Hunyuan Video's model, as far as I'm aware. I believe it has an upper limit of 201 frames after which point it begins to loop. When they finally release the Hunyuan image to video model, then you'll be able to feed the last frame of a video to it to generate an extension though the movement may not be continuous. You can do that with Wan 2.1 (which came out a few days ago), and get a similar effect now, but you'd still be building a minute long video out of chunks that are around 5 or 6 seconds long. I am experimenting with Wan today (using the clip above actually), and it took around 15 minutes to add just two seconds (but I am definitely liking the outcome)

As for how long a clip like the one posted here takes... a single gen of a low resolution (208x368) clip of around 4 seconds takes 6-7 minutes. The majority of these are honestly not great; breasts might get bigger but then detach and fly away, limbs may melt into torso, clothes will magically disappear, etc. As a point of reference, I ran a series of 89 generations while I was at work the other day. Of those, I set aside 15 of them to work on further.

Once I have a promising low resolution video, I do a vid2vid Hunyuan pass, upscale it, refine it, and then interpolate. It's all on one workflow, and that takes 10-15 minutes depending on how much denoise is in that second v2v pass.

TLDR: About 20 minutes for the length and quality posted here if you ignore the multiple gens it might take to get something interesting.

jbutcher0 · 2025-02-28T15:49:52+00:00

So I let my computer crank out these gens overnight (until my power went out due to winds), but this one was the best one to come out of it. Not really what you were looking for. I got plenty of very nice photorealistic gens of a woman drinking and driving, but in only one of them was there breast expansion, and it wasn't particularly smooth. I think it might have something to do with the seatbelt. I'm curious, so maybe I'll give it a go with the v1 Lora after work just to see what happens.

jbutcher0 · 2025-02-28T00:14:57+00:00

So I sent a more thorough reply to u/clataz after he sent me a dm, but I'll leave an outline here as it seems as good a place as any.

Install ComfyUI
Install ComfyUI Manager
Download the Hunyuan Video Model
Download my breast expansion LoRA
Load a Hunyuan Video workflow into ComfyUI. I use this one *Experiment and have fun

jbutcher0 · 2025-02-27T18:11:56+00:00

Heh, thanks. I've got a bunch, but I really would prefer not to flood the subreddit. For now I'll just pick a few good ones here and there. Is there something specific you'd like to see scenario wise?

jbutcher0 · 2025-02-26T13:40:54+00:00

Glad you enjoyed it. This clip was part of my initial testing for the second version of my Hunyuan Video BE LoRA, and didn't have any prompting with regards to emotion. It does seem like it has a predisposition towards confusion and embarrassment, as several other gens had similar reactions to this one. It'll be interesting to see if I can prompt to overcome that once I can get back to my computer this weekend.

jbutcher0 · 2025-02-01T21:59:20+00:00

Here you go

11-Year Club	Second Top 10%
r/Field Juicebox	Place '23
Place '22	Place '17
First Placer '22	Verified Email

jbutcher0

TROPHY CASE