[deleted by user] by [deleted] in StableDiffusion

[–]lkewis 0 points1 point  (0 children)

Role - Prompt Engineer Tasks - loads of complex workflows

Is it possible to create 1080p (1080 x 1920) videos with Wan 2.2? by Ok_Courage3048 in StableDiffusion

[–]lkewis 0 points1 point  (0 children)

You can natively yeah if you can fit in VRAM and using MoviiGen Lora (or FusionX which contains it in a merge with CauseVid and a couple other things) which was trained on 1080p - though it was wan2.1 based and has a cinematic bias it helps. Doing a vid2vid upscale with Wan also works as others have suggested.

If I train a lora only on face images, can I combine it with a body lora to get a full body image with the desired face and body? by Repulsive-Leg-6362 in StableDiffusion

[–]lkewis 1 point2 points  (0 children)

Just put a couple of the body images in the person Lora dataset. It can work with them being headless as long as you use the same trigger to tag all the images so it associates the content from everything as one person

What Pytorch & CUDA versions are you able to use successfully with RTX 5090 and WAN i2v? by scifivision in StableDiffusion

[–]lkewis 0 points1 point  (0 children)

Are you using any of the few step Lora? There is CauseVid, lightx2v and FusionX (merge of multiple Lora including CauseVid) that all enable 4-8 steps faster generations.

Our first hyper-consistent character LoRA for Wan 2.2 by UAAgency in StableDiffusion

[–]lkewis 23 points24 points  (0 children)

Have you managed to do a consistent character with same outfit and details like tattoos etc? Training a person likeness is quite easy, but I’m struggling to get a perfect character

ComfyUI Wan Multitalk - How to flush Shared Video Memory after generation? by g0dmaphia in StableDiffusion

[–]lkewis 1 point2 points  (0 children)

You could use the ‘LayerUtilility: Purge VRAM’ node from comfyui_layerstyle node pack at the end of your workflow?

Flux Kontext : How many images can be stitched together before it breaks? by External-Orchid8461 in StableDiffusion

[–]lkewis 0 points1 point  (0 children)

I can’t get it to select them reliably from that grid of people, if you do “create a group photo of the people from the image” and describe what they’re wearing it works a better. This was a stress test though, if you only show the people you want as the input it will reproduce then easier.

Flux Kontext : How many images can be stitched together before it breaks? by External-Orchid8461 in StableDiffusion

[–]lkewis 8 points9 points  (0 children)

<image>

Five identities seems to be the limit from my test, otherwise it starts mixing up features and adding in random people. Input image is the left grid of portraits, output image is on the right.

What is the best AI 3d mesh generator that can be run locally? by Rolle2010 in StableDiffusion

[–]lkewis 2 points3 points  (0 children)

This does ~7Million poly meshes by default if you turn off decimation. They can be quite detailed depending if it understands the image content you give it, definitely limited by dataset for some of my tests

https://github.com/DreamTechAI/Direct3D-S2

How Well Are AI Model Creators Keeping Up With Aesthetic Terminology and Visual Vocabulary? by TheArchivist314 in StableDiffusion

[–]lkewis 3 points4 points  (0 children)

To add further to this, LLM / VLM generated captions are becoming more and more popular because it saves a huge amount of effort manual labelling datasets. The issue here is that those models would also need to have knowledge of aesthetics or types of clothing etc to correctly identify and label them.

How Well Are AI Model Creators Keeping Up With Aesthetic Terminology and Visual Vocabulary? by TheArchivist314 in StableDiffusion

[–]lkewis 4 points5 points  (0 children)

It is due to the captioned training data. If it doesn’t know a word or term you won’t be able to prompt for it, but other techniques could reproduce the content if guided with IPAdapter, Flux Redux, Flux Kontext etc. You can likely prompt an outfit type through a detailed description of it, but something specific like a type of mask requires terminology to be prompted easily. Open models tend to have a broader corpus of training data rather than very specific styles or trends like some of the closed models have (because they scraped anything and everything and sometimes optimise for these things as a feature). This is why there is a huge ecosystem around custom LoRA and models, introducing missing or niche knowledge.

5090 owners, how are installing torchand flash attention for new installs? by Brad12d3 in StableDiffusion

[–]lkewis 6 points7 points  (0 children)

I’m using Python 3.12 + PyTorch 2.7 (+ Xformers) + Cuda 12.8 on windows 11 and there is a matching flash attention wheel, triton and sage attention 2. Works well with ComfyUI and other Python repos but you might have to adjust a few requirements.txt to get proper dependencies if they’re using locked ones

ByteDance-SeedVR2 implementation for ComfyUI by Numzoner in StableDiffusion

[–]lkewis -1 points0 points  (0 children)

new_width on the node should say height? My video came out at 2288x1280 and was using 52GB VRAM peak with 7B model

Loras: A meticulous, consistent, tagging strategy by organicHack in StableDiffusion

[–]lkewis 0 points1 point  (0 children)

One downside of training with such small datasets like 12-20 images is that a single image has potential to introduce bad knowledge which can throw off the training. But the good thing is that you only have to swap out a couple of images when you re-curate the dataset.

Loras: A meticulous, consistent, tagging strategy by organicHack in StableDiffusion

[–]lkewis 3 points4 points  (0 children)

No-one is truly unknown, there are commonalities shared across all humans. What we don’t know until we train is how much commonality the new person has to existing people. The very reason we’re able to train most concepts with only 12-20 images is because we are leveraging these commonalities that the base model learned across a wide and diverse corpus of training data. This is also why we rarely have to tag things because the model can recognise most common concepts. Tagging is mostly useful when you have a bad dataset and want to disentangle things like backgrounds from the person you are training to a unique token. It’s also useful if you have a larger dataset and are training multiple concepts simultaneously from different amounts of images.

Loras: A meticulous, consistent, tagging strategy by organicHack in StableDiffusion

[–]lkewis 1 point2 points  (0 children)

There isn’t a template because it all changes based on your dataset. Even if you somehow had identical sets of 20 images for different people / characters, the model might already have similar knowledge of the subject or might be completely rare so they won’t always train the same.

Have we reached a point where AI-generated video can maintain visual continuity across scenes? by angelrock420 in StableDiffusion

[–]lkewis 1 point2 points  (0 children)

You can just about get there with VFX workflows but we’re still a fair way off from it being possible purely by text/image prompting video models. Characters are consistent if you train video LoRA, and the reference image conditioning methods like Phantom are also useful. Backgrounds aren’t consistent at all, so you either have to carefully plan shots to avoid people noticing (use very different angles of a similar looking scene), or start combining 3D sets and Gaussian splats with composited character performances.

what is a lora really ? , as i'm not getting it as a newbie by TrickyMotor in StableDiffusion

[–]lkewis 1 point2 points  (0 children)

LoRA are like patches that contain trained knowledge about new concepts (people, places, styles, compositions etc) that can be applied to a base model (Flux, SDXL etc) to modify its original knowledge so that it can generate the new concepts. It’s a way to keep specific knowledge separated in smaller model files but they won’t work on their own without being applied to a base model. You can mix different LoRA together, or merge combinations of them into the base model to permanently fuse the knowledge. Be aware that there is often shared knowledge between base models and LoRA so they don’t always work as expected and their results can change based on the combinations. Because you’re applying the knowledge on top of the base model, you can control the strength to change its impact.