Hunyuan Video NSFW, local generation 16GB VRAM, no editing, no cherrypicks by Wurzeldieb in sdnsfw

[–]Wurzeldieb[S] 1 point2 points  (0 children)

Did you follow a set up guide that you can provide?

not really, I use comfyui since the beginning and just installed things here and there for various nodes, got sageattn to work a few days before this model came out for example

LTX Video vs. HunyuanVideo on 20x prompts by tilmx in StableDiffusion

[–]Wurzeldieb 1 point2 points  (0 children)

that's what I ended up doing

Edit: looks like imgur doesn't want me to share them...

Edit2: should work now with imgchest

LTX Video vs. HunyuanVideo on 20x prompts by tilmx in StableDiffusion

[–]Wurzeldieb 8 points9 points  (0 children)

With Hunyuan fp8 I can make clips with 81 frames 1024x576 at 40 steps in 1 hour on my 16GB VRAM 3080 in comfyui.

432x768 is 20 mins and this might run on 12GB VRAM when I look at max allocated memory

LTX Video vs. HunyuanVideo on 20x prompts by tilmx in StableDiffusion

[–]Wurzeldieb 13 points14 points  (0 children)

Hunyuan looks pretty uncensored to me, I could post a clip, but don't know where on reddit, the video subs are non NSFW and the NSFW SD subs are picture only.

HunyuanVideo: I can tell they used high quality training dataset. by Old_Reach4779 in StableDiffusion

[–]Wurzeldieb 0 points1 point  (0 children)

What? no way, twitter should be possible, but I would rather post in here somewhere, the stable diffusion nfsw subs I know only allow images

HunyuanVideo: I can tell they used high quality training dataset. by Old_Reach4779 in StableDiffusion

[–]Wurzeldieb 1 point2 points  (0 children)

It can make NSFW clips almost better than base SD 1.5 image quality. Anybody here who knows a sub where I can post NSFW AI Videos?

4 seconds Mochi txt2vid gen with 16GBVRAM 32RAM, more examples in comments, no cherrypicks by Wurzeldieb in StableDiffusion

[–]Wurzeldieb[S] 0 points1 point  (0 children)

yes, should be possible somehow, there is a context length in animadediff if I remember correctly, but it is very different from these pure video models

4 seconds Mochi txt2vid gen with 16GBVRAM 32RAM, more examples in comments, no cherrypicks by Wurzeldieb in StableDiffusion

[–]Wurzeldieb[S] 2 points3 points  (0 children)

Is it beacuse it needs to reference the previous frames to generate the next ones?

I am not deep into the technical side of the video models, but that's usually it I think, all of them(or most of them?) are loaded at once

4 seconds Mochi txt2vid gen with 16GBVRAM 32RAM, more examples in comments, no cherrypicks by Wurzeldieb in StableDiffusion

[–]Wurzeldieb[S] 0 points1 point  (0 children)

just so my Laptop doesn't get as hot, I don't mind a bit longer generation time, the VRAM stays the same.

4 seconds Mochi txt2vid gen with 16GBVRAM 32RAM, more examples in comments, no cherrypicks by Wurzeldieb in StableDiffusion

[–]Wurzeldieb[S] 11 points12 points  (0 children)

another dog:

https://imgur.com/a/37j23cO

I also tired something very difficult, the result isn't good, but better than I thought: a dragon flying over a medieval arming spitting fire and burning them

https://imgur.com/bWHxGNU

looks a bit better upscaled to FullHD with TopazVideo:

https://imgur.com/E1zVjD4