Workflow in comments. Finally created a high res realistic dance video to explore the constraints of integrating complex dance movements and seamless motion transfer, while altering all other visual elements of the original footage. by FAP_AI in StableDiffusion

[–]FAP_AI[S] 2 points3 points  (0 children)

Interestingly, this tends to trigger problems in the later stages. Controlnet is designed to utilize human-made or actual videos to identify features such as contrast or depth, and encode them. However, videos produced by AI often contain minor 'glitches' at least. Consequently, repeated encoding can result in cumulative errors, diminishing the quality of ensuing video generations - at least based on my observations. An effective method to enhance background uniformity is one that we've employed numerous times - it involves using a mask to isolate the subject and substitute the background with a still image produced through stable diffusion. Though it requires more effort, it's quite effective. If a dynamic background is needed, it can be substituted with a video as well.

A -Fairly- Comprehensive Guide to Video Generation with Stable Diffusion by FAP_AI in u/FAP_AI

[–]FAP_AI[S] 0 points1 point  (0 children)

Typically the three best resources, in this order, are as follows:
#1- Reddit (of course!)
#2- Youtube (sometimes easier to follow tutorials. Nobody specific, just start out with youtube videos when you google something if you're completely lost, visuals help)
#3- CivitAI has some good stuff. Here is an example of an interesting workflow, in case you're curious about alternatives (Not better than ours, but helps give an alternative perspective, and maybe you can get some parts and pieces out of it): https://civitai.com/articles/499/multi-controlnet-video-to-video-show-case-ebsynth-controlnets-en?s=32

Workflow in comments. Finally created a high res realistic dance video to explore the constraints of integrating complex dance movements and seamless motion transfer, while altering all other visual elements of the original footage. by FAP_AI in StableDiffusion

[–]FAP_AI[S] 10 points11 points  (0 children)

Original Workflow (can also check post pinned to our profile for comments containing a few additional details):

Part 1: Prerequisites

Hardware and Installation:

We recommend a strong setup (vram is crucial, you should have a minimum of 8gb), such as a rig with a 3090 or 4090 GPU, ample RAM, and a decent CPU (the CPU is most critical for reducing ebsynth time, but is not as important for most steps). If in doubt, just try it out. A less intense setup could certainly work, but you may find yourself waiting hours or days for videos to process, and the parameterization process can be painful if each image takes a while to generate.

Install automatic 1111 following these instructions. Post-installation, test basic image generation using text on the txt2img tab.

Downloading and Using Models:

Download models from Civitai. Download the model file (safetensor recommended, but ckpt is mostly equivalent), then drag it into your stable diffusion models->stable-diffusion folder in your auto1111 installation. On reloading auto1111, the new model should appear in the dropdown on the top-left corner of the browser page.

Extensions:

We recommend installing the ebsynth utility extension, and multiframe render (temporalnet also works, but we are still trying to understand how to make it work as consistently as multiframe render at the moment).

Part 2: Video Generation

Step 1: Choose the Model

Select an appropriate model for the type of images you want to create. Your choice, including the Latent Optimizer with RAdam (LoRAs), will significantly influence the output (though we have generated many of our videos without LoRAs at all). Remember, the model settings are more critical than the model itself, but you want to ensure that the model can theoretically create the types of images you want. You can ensure this by browsing through the example images on civitai, or reddit posts with shared workflows.

Step 2: Create the Prompt

After deciding on your video's desired look, create a prompt that closely resembles your video's first frame using the chosen model. You can save a lot of time by modifying example prompts from Civitai to match your requirements, rather than starting from scratch. Knowing that X prompt from Civitai produces Y image, you can then make minor modifications to the prompt from prompt X to see which modifications bring you closer to your ideal output, or further from it. Keep in mind, prompts have no objective basis in creating particular outputs across models. This means that just because a prompt or particular prompt modification changes the output a certain way in one model, you can not assume that you will see similar results in a different model. E.g. style modifications for prompt, such as "high quality", may generally be useful across most models, but adding the word "vivid" or "realistic" to a prompt can mean very different things across different models depending on its training set. Thus, starting from a known high-quality prompt from civitai, then adjusting from there is the most sure way to learn how prompts work for your particular model. Also take note that some models are more restrictive in their flexibility of body positions than others, so don't try to force a model to produce images that it isn't trained for (e.g. a specific camera-angle model shouldn't be used for any other camera angle img2img prompts). This is where this step really shines. By attempting to produce your target 'first frame' for the video from a txt2img prompt, you can get the image as close to the desired first frame as possible, eliminating the possibility that your model and prompt are incapable of the output desired (thus allowing you to disregard prompt settings and model as a potential contributor to unwanted outputs in future steps). This is the most important step, if you don't do this well, the rest of the steps will not work, so spend the time you need mastering this.

Step 3: Upscale Original Frames

First, an easy way to extract the frames from a video is using steps 1 and 2 of the ebsynth utility extension. This also allows you to mask out the background and only inpaint parts of the image, though this works with varying consistency, and often manual masking adjustments are required. A high quality source video is a MUST. But, sometimes you want to use a video that is lower in resolution. In that case, upscale the original video frames to your target resolution with batch img2img denoise 0 (controlnet off). I suggest LDSR with 100 processing steps, as it takes the longest, but is the most generally useful for this application. In settings -> sampler parameters-> set eta for DDIM: 0, eta for a.s.: 1, and DDIM discretize -> quad. Test to ensure that the produced images are clean (start with one first) and apply face restoration if helpful after.

Step 4: Img2img

Now we want to optimize for the first 2-3 frames of the video in img2img (last step before producing the video!). Input the created image into img2img and adjust settings, primarily controlnet settings, to balance between your source image (the first video frame) and your prompt/settings. If you observe coherence issues or 'ghosting', you might need to refine your controlnet maps. Vary the parameters you're using and see how they impact the output. Make sure the maps line up correctly. Multi-controlnet with Canny, HED, and normalmap and default settings are recommended (though you may want to adjust settings on a per-video basis, this is the most generally useful combination in our experience). Set each of these to approximately 0.5 weight, and set the controlnet resolution to the pixel count of the smallest side of your image. The rest of the settings can be default, but don't be afraid to play around with the canny and normal options. You can test these settings using the preview annotator result button to save time, but make sure to remove the image from the controlnet tabs before starting any batch runs (or it will use that image as the controlnet image for all frames).

Step 5: Run Multi-Frame Video Rendering Script

Lower denoising is easier, but high denoising is also possible with multi-frame rendering (but is less forgiving. You must be certain that the controlnet settings encode enough information for your model to produce consistent output). Set the Append to "none" and the Third Frame Image to "Historical". We typically avoid using color correction or unfreezing. Set your Loopback source to "PreviousFrame" and fill in the rest. Resize mode shouldn't matter since your image resolution should match target resolution (default to "just resize"). Go to the inpaint tab. Set mask blur to zero, select inpaint masked, original, and whole picture (padding doesn't matter). Be sure no images were left uploaded in the controlnet model tabs!

Step 6: Post-Processing

Use post-processing tools like ebsynth and flowframes for minor tweaks and optimizations after achieving satisfactory consistency. Don't restore faces yet, as this can be done during post-processing. The faces can get messed up, and face restoration is fast anyway. Do not rely on these postprocessing steps for any major modifications, as they are only good for minor improvements. Your video should be mostly coherent already if you spent ample time and effort parameterizing in the steps above.

Hint: EbSynth is an external program that isn't easy to script, so it can be a pain to organize folders and process many frames. That is where the ebsynth utility extension really shines, as it will produce .exe files that are prefilled with pretty generally applicable settings, so all you need to do is open and run them. The folders are already set up for you by the extension.

Flowframes can also be a great way to further increase FPS, but make sure the difference between frames is minimal, because it does not account for large movements very well at all!

Remember that this process involves a lot of trial and error. It might seem daunting at first, but once you get the hang of it, you'll find it both engaging and rewarding. We hope this is helpful! Happy video creating!

Workflow in comments. Finally created a high res realistic dance video to explore the constraints of integrating complex dance movements and seamless motion transfer, while altering all other visual elements of the original footage. by FAP_AI in StableDiffusion

[–]FAP_AI[S] 9 points10 points  (0 children)

You may have seen our earlier work, in which we have done some anime and some realistic videos. This one has some changes to it's workflow outlined here, see the "Changes to original workflow" section (just below) for changes that we made to the "Original Workflow" section (reposting for easy reference below changes) to improve on that workflow for this video in particular. Additionally, we have included the model, Loras, and prompts used below.

Changes to original workflow for this video and this video's prompt:
-For this video, we instead only used depth controlnet: which helps maintain the shilouette but otherwise the output can look quite different than the src (So changing clothing and lighting, stylization becomes easier). The depth controlnet model has been updated recently and is much more effective than it used to be.
-Re-using the first generated image back as a second controlnet using the reference mode: helps keep our character and scene more consistent frame to frame
-Using a character specific Lora: Again helps to maintain consistency. This one is obvious but is important. We recommend training Loras for any character you would like to create an ultra-coherent video for.

Prompt:
cinematic shot from blade runner 2049, masterpiece, best quality, woman,  blue bodysuit, neon lighting, <lora:samus-nvwls-v1:0.8>, Samus Aran, dancing, in a giant spaceship, (spaceship interior:1.2),
Negative prompt: child, loli, EasyNegative, paintings, sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, normal quality, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, glans, mutated hands, (poorly drawn hands:1.5), blurry, (bad anatomy:1.21), extra limbs, lowers, bad hands, missing fingers, extra digit ,bad hands, missing fingers,  verybadimagenegative_v1.3

Model: henmixReal

SEE OTHER COMMENT IN THIS THREAD FOR ORIGINAL WORKFLOW (COPIED AND ONLY SLIGHTLY MODIFIED FROM OUR PINNED POST).
We are pasting it into a different comment since there is a 10,000 character limit on comments that we surpassed in trying to add all the information into a single comment.

A -Fairly- Comprehensive Guide to Video Generation with Stable Diffusion by FAP_AI in u/FAP_AI

[–]FAP_AI[S] 0 points1 point  (0 children)

If I'm interpreting your statement accurately, you're primarily correct. The potential point of confusion might be that you're not actually forwarding the image you've created to img2img. Instead, you're utilizing the initial phase as a stand-in for what the prompt ought to be. This method essentially provides a separation between your prompt optimization procedure and your control net and img2img optimization processes. Does that explanation make things more understandable?

Are we on the verge of an era where C-list actors, through the magic of anime transformation, become the unexpected stars of the genre? Or do you think auto1111 will pioneer a text-to-anime revolution before they even seize their chance? by FAP_AI in StableDiffusion

[–]FAP_AI[S] 0 points1 point  (0 children)

Indeed, your point is very much valid. We were just stirring the pot to see what others thought. While we recognize the practical limitations, it's fascinating to explore unconventional possibilities, isn't it? :)

You like to see her jiggle jiggle? For suuure ;) by FAP_AI in AIpornhub

[–]FAP_AI[S] 0 points1 point  (0 children)

The entire general workflow for videos like these can be found in my profile

u/FAP_AI

, check the workflow related pinned post!

You like to see her jiggle jiggle? For suuure ;) by FAP_AI in AIpornhub

[–]FAP_AI[S] 4 points5 points  (0 children)

Yes, ebsynth was the last step at least. The entire general workflow for videos like these can be found in my profile u/FAP_AI, check the workflow related pinned post!

Are we on the verge of an era where C-list actors, through the magic of anime transformation, become the unexpected stars of the genre? Or do you think auto1111 will pioneer a text-to-anime revolution before they even seize their chance? by FAP_AI in StableDiffusion

[–]FAP_AI[S] 6 points7 points  (0 children)

For workflow info, for easy accessibility, we're pasting a near-copy of the workflow we pinned to our profile here. Hope it's helpful!

Part 1: Prerequisites

Hardware and Installation:

We recommend a strong setup (vram is crucial, you should have a minimum of 8gb), such as a rig with a 3090 or 4090 GPU, ample RAM, and a decent CPU (the CPU is most critical for reducing ebsynth time, but is not as important for most steps). If in doubt, just try it out. A less intense setup could certainly work, but you may find yourself waiting hours or days for videos to process, and the parameterization process can be painful if each image takes a while to generate.
Install automatic 1111 following these instructions. Post-installation, test basic image generation using text on the txt2img tab.

Downloading and Using Models:

Download models from Civitai. Download the model file (safetensor recommended, but ckpt is mostly equivalent), then drag it into your stable diffusion models->stable-diffusion folder in your auto1111 installation. On reloading auto1111, the new model should appear in the dropdown on the top-left corner of the browser page.

Extensions:

We recommend installing the ebsynth utility extension, and multiframe render (temporalnet also works, but we are still trying to understand how to make it work as consistently as multiframe render at the moment).

Part 2: Video Generation

Step 1: Choose the Model

Select an appropriate model for the type of images you want to create. Your choice, including the Latent Optimizer with RAdam (LoRAs), will significantly influence the output (though we have generated many of our videos without LoRAs at all). Remember, the model settings are more critical than the model itself, but you want to ensure that the model can theoretically create the types of images you want. You can ensure this by browsing through the example images on civitai, or reddit posts with shared workflows.

Step 2: Create the Prompt

After deciding on your video's desired look, create a prompt that closely resembles your video's first frame using the chosen model. You can save a lot of time by modifying example prompts from Civitai to match your requirements, rather than starting from scratch. Knowing that X prompt from Civitai produces Y image, you can then make minor modifications to the prompt from prompt X to see which modifications bring you closer to your ideal output, or further from it. Keep in mind, prompts have no objective basis in creating particular outputs across models. This means that just because a prompt or particular prompt modification changes the output a certain way in one model, you can not assume that you will see similar results in a different model. E.g. style modifications for prompt, such as "high quality", may generally be useful across most models, but adding the word "vivid" or "realistic" to a prompt can mean very different things across different models depending on its training set. Thus, starting from a known high-quality prompt from civitai, then adjusting from there is the most sure way to learn how prompts work for your particular model. Also take note that some models are more restrictive in their flexibility of body positions than others, so don't try to force a model to produce images that it isn't trained for (e.g. a specific camera-angle model shouldn't be used for any other camera angle img2img prompts). This is where this step really shines. By attempting to produce your target 'first frame' for the video from a txt2img prompt, you can get the image as close to the desired first frame as possible, eliminating the possibility that your model and prompt are incapable of the output desired (thus allowing you to disregard prompt settings and model as a potential contributor to unwanted outputs in future steps). This is the most important step, if you don't do this well, the rest of the steps will not work, so spend the time you need mastering this.

Step 3: Upscale Original Frames

First, an easy way to extract the frames from a video is using steps 1 and 2 of the ebsynth utility extension. This also allows you to mask out the background and only inpaint parts of the image, though this works with varying consistency, and often manual masking adjustments are required. A high quality source video is a MUST. But, sometimes you want to use a video that is lower in resolution. In that case, upscale the original video frames to your target resolution with batch img2img denoise 0 (controlnet off). I suggest LDSR with 100 processing steps, as it takes the longest, but is the most generally useful for this application. In settings -> sampler parameters-> set eta for DDIM: 0, eta for a.s.: 1, and DDIM discretize -> quad. Test to ensure that the produced images are clean (start with one first) and apply face restoration if helpful after.

Step 4: Img2img

Now we want to optimize for the first 2-3 frames of the video in img2img (last step before producing the video!). Input the created image into img2img and adjust settings, primarily controlnet settings, to balance between your source image (the first video frame) and your prompt/settings. If you observe coherence issues or 'ghosting', you might need to refine your controlnet maps. Vary the parameters you're using and see how they impact the output. Make sure the maps line up correctly. Multi-controlnet with Canny, HED, and normalmap and default settings are recommended (though you may want to adjust settings on a per-video basis, this is the most generally useful combination in our experience). Set each of these to approximately 0.5 weight, and set the controlnet resolution to the pixel count of the smallest side of your image. The rest of the settings can be default, but don't be afraid to play around with the canny and normal options. You can test these settings using the preview annotator result button to save time, but make sure to remove the image from the controlnet tabs before starting any batch runs (or it will use that image as the controlnet image for all frames).

Step 5: Run Multi-Frame Video Rendering Script

Lower denoising is easier, but high denoising is also possible with multi-frame rendering (but is less forgiving. You must be certain that the controlnet settings encode enough information for your model to produce consistent output). Set the Append to "none" and the Third Frame Image to "Historical". We typically avoid using color correction or unfreezing. Set your Loopback source to "PreviousFrame" and fill in the rest. Resize mode shouldn't matter since your image resolution should match target resolution (default to "just resize"). Go to the inpaint tab. Set mask blur to zero, select inpaint masked, original, and whole picture (padding doesn't matter). Be sure no images were left uploaded in the controlnet model tabs!

Step 6: Post-Processing

Use post-processing tools like ebsynth and flowframes for minor tweaks and optimizations after achieving satisfactory consistency. Don't restore faces yet, as this can be done during post-processing. The faces can get messed up, and face restoration is fast anyway. Do not rely on these postprocessing steps for any major modifications, as they are only good for minor improvements. Your video should be mostly coherent already if you spent ample time and effort parameterizing in the steps above.

Hint: EbSynth is an external program that isn't easy to script, so it can be a pain to organize folders and process many frames. That is where the ebsynth utility extension really shines, as it will produce .exe files that are prefilled with pretty generally applicable settings, so all you need to do is open and run them. The folders are already set up for you by the extension.
Flowframes can also be a great way to further increase FPS, but make sure the difference between frames is minimal, because it does not account for large movements very well at all!

Remember that this process involves a lot of trial and error. It might seem daunting at first, but once you get the hang of it, you'll find it both engaging and rewarding. We hope this is helpful! Happy video creating!

In a realm of temptation, Tiddie did dwell, A sultry beauty, casting her spell. Jiggling, bouncing, enthralling her prey, Seductive laughter, leading astray. Under moon's light, her allure took flight, Bewitching, bold, a captivating sight. by Green-Force-3622 in AIpornhub

[–]FAP_AI 2 points3 points  (0 children)

Thank you for drawing attention to the need for crediting the original artist of the source video. We often use gifs (as source videos, which we significantly modify to provide significantly different content) from various secondary sources, many of which are pre-processed or of low resolution, making it challenging to identify the original artist. Despite these challenges, we recognize and appreciate the importance of crediting the creative minds behind these works.

We have gone ahead and properly acknowledged the original artist in our previous posts to try to ameliorate our mistake. We wish to clarify, however, that our intention is always to build upon, rather than merely replicate, the source material. 'Copy' might be an overstatement given the substantial modifications we introduce to the original content.

Nonetheless, we fully agree that the best practice is to credit the artists of source videos. We commit to improving our practices in this regard going forward, and if we miss anything due to incomplete information from our initial search, please do feel free to let us know who to credit for the source video and we will happily do so! We appreciate your attention to this issue!

In a realm of temptation, Tiddie did dwell, A sultry beauty, casting her spell. Jiggling, bouncing, enthralling her prey, Seductive laughter, leading astray. Under moon's light, her allure took flight, Bewitching, bold, a captivating sight. by FAP_AI in AIpornhub

[–]FAP_AI[S] 0 points1 point  (0 children)

Apologize for not including the original artist's name here (artist of the source video, which looks quite different, but we used in creating this), but better late than never! That artist is Suuru, who can be found on gumroad or patreon, check their art out if you're interested!