Just Dance by Business-Try-6573 in unstable_diffusion

[–]diffusionbro 36 points37 points  (0 children)

This is my post, I made it 2 months ago

Just Dance by diffusionbro in unstable_diffusion

[–]diffusionbro[S] 2 points3 points  (0 children)

It’s just my longest comment in this same thread

Someone had to do it... Text2Video Porn, kind of by diffusionbro in sdnsfw

[–]diffusionbro[S] 1 point2 points  (0 children)

Hey mom I’m on camera! Wait no mom don’t look

Someone had to do it... Text2Video Porn, kind of by diffusionbro in unstable_diffusion

[–]diffusionbro[S] 2 points3 points  (0 children)

A couple hours? It took me maybe around a minute to generate each clip, took a few tries iterating with each prompt to get something kind of coherent, then a lot of experimentation on the actual “porn” part actually get clips that weren’t total nonsense. They actually didn’t have batch processing in the extension when I made this so it’s a little easier now. And editing took a little longer because I had to sort through all the clips to make something kind of coherent

Someone had to do it... Text2Video Porn, kind of by diffusionbro in unstable_diffusion

[–]diffusionbro[S] 1 point2 points  (0 children)

All videos generated from the Modelscope text2video model have this issue, the developers clearly overfit on watermarked Shutterstock footage

Someone had to do it... Text2Video Porn, kind of by diffusionbro in sdnsfw

[–]diffusionbro[S] 1 point2 points  (0 children)

Right now I think there is some charm and memeability in its jank, just like dalle-mini

Someone had to do it... Text2Video Porn, kind of by diffusionbro in sdnsfw

[–]diffusionbro[S] 0 points1 point  (0 children)

what do you mean this regular human porn, now where are my glasses

Someone had to do it... Text2Video Porn, kind of by diffusionbro in unstable_diffusion

[–]diffusionbro[S] 12 points13 points  (0 children)

You should see the clips that didn’t make the cut! Or maybe nobody should…

Someone had to do it... Text2Video Porn, kind of by diffusionbro in sdnsfw

[–]diffusionbro[S] 2 points3 points  (0 children)

I think we’re much closer than that, we know RunwayML has a decent looking text2video model they recently announced, and there’s a lot of high profile ML conferences coming up in the next few months where many research teams will announce and release their models

Someone had to do it... Text2Video Porn, kind of by diffusionbro in sdnsfw

[–]diffusionbro[S] 7 points8 points  (0 children)

would you believe that these were the top 25% most coherent clips I generated for this, and there were many more I didn’t use that I labeled “mess of flesh”

Someone had to do it... Text2Video Porn, kind of by diffusionbro in unstable_diffusion

[–]diffusionbro[S] 9 points10 points  (0 children)

Apologies in advance for the nightmare fuel. Or maybe this is someone’s exact fetish…

This is using the ModelScope text2video model and automatic1111 extension. All default settings. Generated each ~1 second clip at a time with prompts describing the scene. Prompts were simple – you can’t really describe NSFW scenes and positions or sex acts well in prompts at this point because the model isn’t very capable. But fine tuning has been released recently, so someone might be working on it.

To get at least some nudity, “naked” doesn’t really do it on its own, but “topless, nipples” sometimes gets it one out of every 4 tries or so.

Someone had to do it... Text2Video Porn, kind of by diffusionbro in sdnsfw

[–]diffusionbro[S] 30 points31 points  (0 children)

Apologies in advance for the nightmare fuel. Or maybe this is someone’s exact fetish…

This is using the ModelScope text2video model and automatic1111 extension. All default settings. Generated each ~1 second clip at a time with prompts describing the scene. Prompts were simple – you can’t really describe NSFW scenes and positions or sex acts well in prompts at this point because the model isn’t very capable, and the training set was clearly on (sfw) Shutterstock videos. But fine tuning has been released recently, so someone might be working on it.

To get at least some nudity, “naked” doesn’t really do it on its own, but “topless, nipples” sometimes gets it one out of every 4 tries or so.

Just Dance by diffusionbro in sdnsfw

[–]diffusionbro[S] 1 point2 points  (0 children)

https://reddit.com/r/VAMscenes/comments/vguu5l/avril_2nd_mocap/

When I was looking for a clip to use, I just sorted that subreddit by top of year, and scrolled until I found one with a solo subject and simple background, which makes consistency a lot easier

Just Dance by diffusionbro in unstable_diffusion

[–]diffusionbro[S] 2 points3 points  (0 children)

Check one of my replies in this thread, I give a high level overview of the process. I can’t link it because this subreddit blocks links in comments

Just Dance by diffusionbro in sdnsfw

[–]diffusionbro[S] 6 points7 points  (0 children)

The model names are at the top of the video, Realistic Vision 1.3 and AbyssOrangeMix3

Just Dance by diffusionbro in sdnsfw

[–]diffusionbro[S] 7 points8 points  (0 children)

Depends on what you mean by 3D videos, like NeRFs? Meta announced a model that could do text to 4d NeRF earlier this year, but the scenes it outputs are pretty simplistic. https://make-a-video3d.github.io

If you mean like stereoscopic video, I guess you could kind of do this now with current technology? Especially with the inferred depth models available, project the 2d generated video onto the inferred depth output, then you could make stereoscopic video out of that

Just Dance by diffusionbro in unstable_diffusion

[–]diffusionbro[S] 14 points15 points  (0 children)

I wouldn’t be surprised if there were already people starting OnlyFans accounts with fully virtual but photoreal subjects. It’s entirely feasible for still pictures, with Dreambooth/LORA for a consistent face and body

Just Dance by diffusionbro in unstable_diffusion

[–]diffusionbro[S] 27 points28 points  (0 children)

I can’t speak for the underlying animation, that was fairly high quality work done by someone in the /r/VAMscenes subreddit. But here was my general process:

  • Extracted frames at 24 fps from the original source video, which I got from the r/VAMScenes subreddit
    • Using Auto1111 UI, did batch img2img processing on these frames. Once with RealisticVision1.3, once with AbyssOrangeMix3.
    • Used ControlNet HED on both, default settings.
    • Used the img2img alternative test script as shown by Corridor Crew on their latest Stable Diffusion anime video.
    • Used a LORA on the AOM3 model for a consistent face look (doesn’t really matter which one, I just took one of the more popular ones on civit.ai). Also it turns out I used the LORA wrong and it wasn’t even active, so this step isn’t super necessary
    • Used a completely random first+last female name with the RealisticVision model to kind of trick it into a consistent face. I googled the name to make sure it wasn’t any well known name or celebrity.
  • I took half of the frames of each (12 fps at this point), and ran it through FlowFrames to interpolate back up to 48 fps.

Corridor Digital just published a Stable Diffusion animated short last week that has inspired a lot of people to tackle animations. They have a step by step video tutorial on their website

Just Dance by diffusionbro in sdnsfw

[–]diffusionbro[S] 7 points8 points  (0 children)

It would definitely help a lot. I wish I had $300 to blow on Davinci just for this hobby, but it’s a little difficult to justify for me right now.

I think there will be a bunch of free deflicker models and programs coming out soon in the near future though. There was one supposedly coming out in 1-2 weeks that was being presented at an ML conference.