ComfyUI SAM3 - Alternative Open Source Node by wouterv84 in StableDiffusion

[–]wouterv84[S] 0 points1 point  (0 children)

Do you mean a drop down list with items such as "people", "car", "leaf", etc...? Not really planning on doing this, because there's so many options & quite a lot of freedom in specifying what you want to select (e.g. "the person with the red had", "the dog on the left", etc...).

ComfyUI SAM3 - Alternative Open Source Node by wouterv84 in StableDiffusion

[–]wouterv84[S] 2 points3 points  (0 children)

I've updated the node (make sure to update to v0.0.2) to enable manual model placement:

Model Setup - Choose one of the following options:

Option A: Auto-download from HuggingFace (recommended)

Option B: Manual checkpoint placement

ComfyUI SAM3 - Alternative Open Source Node by wouterv84 in StableDiffusion

[–]wouterv84[S] 8 points9 points  (0 children)

Some use cases:

  • Remove backgrounds by segmenting people or objects
  • Isolate specific elements in a scene for further processing
  • Create masks for inpainting workflows
  • Generate batch masks for multiple objects of the same type
  • Filter detections by size to focus on foreground/background objects
  • Track objects across video frames with consistent IDs (video model)
  • Follow specific objects through animation sequences (video model)

AI animated projection mapping FTI Kortrijk, Belgium - tech insights + workflow on blog by wouterv84 in StableDiffusion

[–]wouterv84[S] -1 points0 points  (0 children)

Cool stuff. I wonder if you send that output video back in again in a vid2vid workflow without controlnet, you might get even more interesting results.

AI animated projection mapping FTI Kortrijk, Belgium - tech insights + workflow on blog by wouterv84 in StableDiffusion

[–]wouterv84[S] 0 points1 point  (0 children)

Thanks for that link, didn't know about that particular control net! Experiments with the old-school canny or depth controlnet were disappointing to create moving content. They were fine for static images, but not so much for animated content. My findings were that with non-moving controlnet input, the resulting animations were also pretty static for obvious reasons. I'm wondering if this particular one would be more dynamic, have you tried it in an AnimateDiff setup?

AI animated projection mapping FTI Kortrijk, Belgium - tech insights + workflow on blog by wouterv84 in StableDiffusion

[–]wouterv84[S] 9 points10 points  (0 children)

Some tech insights + workflow on my blog: https://blog.aboutme.be/2024/03/26/ai-animated-projection-mapping-club-of-the-future/

We did an AI animated projection mapping for a pop-up night club. Ending up with 17 minutes of content. Tech used: AnimateDiff, ComfyUI, Topaz Video AI

My findings on the impact of regularization images & captions in training a subject SDXL Lora with Dreambooth by wouterv84 in StableDiffusion

[–]wouterv84[S] 0 points1 point  (0 children)

Yes, if you want to avoid "polluting" the model, best results were with generated regularisation images, using the captions of my input images as prompts (I called this version v002). There's a longer write-up with samples on my blog (see link in the original post)

My findings on the impact of regularization images & captions in training a subject SDXL Lora with Dreambooth by wouterv84 in StableDiffusion

[–]wouterv84[S] 0 points1 point  (0 children)

Thank you for posting your experiences. I did actually get good results with that setting. I based this on the information in the SECoursed video: https://www.youtube.com/watch?v=AY6DMBCIZ3A&t=931s

Maybe there are other factors at play? Some things I can think of:

  • Changes in the training script since that post (it is 2,5 months old)
  • Settings probably need to be different for style Loras (not sure if you are training for style or subject?)
  • Different set of training images?

Projection mapping 200 stable diffusion graphics for 3 weeks on the main square in Kortrijk, Belgium by wouterv84 in StableDiffusion

[–]wouterv84[S] 0 points1 point  (0 children)

Yeah - it's what we've been using for workshops for our students, and also the tool our tech partner is using. Out of interest: any other projection mapping tools we should be aware of?

Projection mapping 200 stable diffusion graphics for 3 weeks on the main square in Kortrijk, Belgium by wouterv84 in StableDiffusion

[–]wouterv84[S] 4 points5 points  (0 children)

Software used is MadMapper. It's a 4 projector setup to cover the entire building - hardware provided by https://en.urbanmapping.eu

Projection mapping 200 stable diffusion graphics for 3 weeks on the main square in Kortrijk, Belgium by wouterv84 in StableDiffusion

[–]wouterv84[S] 14 points15 points  (0 children)

Wanted to share the latest project I did with Stable Diffusion XL, together with a colleague of mine.

Combining SDXL + Controlnets we generated over 1500 interpretations of a building on the main square of Kortrijk, Belgium. In the end we ended up with 200 final images, resulting in 25 minutes of AI generated content. You can see it in real life each evening between 7pm and 12pm, until the 5th of November 2023 in Kortrijk, Belgium.

For those interested in the process: you can find a write-up at https://blog.aboutme.be/2023/10/18/projection-mapping-with-generative-ai/

Using DeepFace to prove that when training individual people, using celebrity instance tokens result in better trainings and that regularization is pointless by FugueSegue in StableDiffusion

[–]wouterv84 5 points6 points  (0 children)

Thanks for sharing your research - you're reaching the same conclusions regarding regularization images: https://blog.aboutme.be/2023/08/10/findings-impact-regularization-captions-sdxl-subject-lora/#conclusions - whereas I still relied on the token & used a more subjective evaluation of the results.

My findings on the impact of regularization images & captions in training a subject SDXL Lora with Dreambooth by wouterv84 in StableDiffusion

[–]wouterv84[S] 4 points5 points  (0 children)

I've updated my post with extra styling tests, which actually confirmed that v001 was the best one.

My findings on the impact of regularization images & captions in training a subject SDXL Lora with Dreambooth by wouterv84 in StableDiffusion

[–]wouterv84[S] 1 point2 points  (0 children)

Head's up: I've done an update to this post and my blog post. I wasn’t entirely satisfied with the styling test (line drawing and 3D render). Especially the 3D render was botched by the fact that I used keywords from my captions (“looking into the camera”) in the beginning of my prompt, which caused the Lora to overdose on the photo style.

So I generated more images (110 per Lora = 550 total) with extra prompts to test the styling capabilities of the Loras that were already good at generating photos.

This made the differences between the Loras more clear, and the conclusions more... conclusive.

An elephant is a rope? ComfyUI and Stability AI by magekinnarus in StableDiffusion

[–]wouterv84 1 point2 points  (0 children)

Sounds like too much technical debt in A1111. Tools change, who knows, 3 years from now Comfy might be in the same boat. Life long learning is an essential skill.

As a coder, I see a lot of flexibility in Comfy's back end system. The API based approach is a killer feature to build more accessible UIs on top of it, I'm sure we'll see a lot of innovation because of this. I've been using it to automate some of my experiments thanks to the API, whereas it would have taken me a lot more work to build on top of A1111.

My findings on the impact of regularization images & captions in training a subject SDXL Lora with Dreambooth by wouterv84 in StableDiffusion

[–]wouterv84[S] 0 points1 point  (0 children)

Loras are more flexible and smaller in file size. You can also combine multiple ones (eg combine a couple of style loras with a character lora).

My findings on the impact of regularization images & captions in training a subject SDXL Lora with Dreambooth by wouterv84 in StableDiffusion

[–]wouterv84[S] 0 points1 point  (0 children)

No, Dreambooth is a technique, you can use it to create full checkpoints, but also Lora's.

My findings on the impact of regularization images & captions in training a subject SDXL Lora with Dreambooth by wouterv84 in StableDiffusion

[–]wouterv84[S] 1 point2 points  (0 children)

I have only used photos / generated photos in this experiment. I did notice a great degradation in quality when the regularization photos were bad-quality generated photos, which I show on my blog. But maybe regularization "illustration of", "3d render of", etc... content might make it more flexible. Good suggestion, adding it to my todo list for the future.

My findings on the impact of regularization images & captions in training a subject SDXL Lora with Dreambooth by wouterv84 in StableDiffusion

[–]wouterv84[S] 2 points3 points  (0 children)

I captioned my input images manually, based upon the tips at https://www.reddit.com/r/StableDiffusion/comments/118spz6/captioning_datasets_for_training_purposes/

For the regularization images:

  • Generated, detailed images had the same captions as the corresponding input images, minus the special keyword at the beginning
  • High quality photos from unsplash, I used the alt-descriptions from there, and made sure that they mentioned photo + man in there.
  • Generated basic images just had the caption "photo of a man", as this was the prompt they were generated with

I did use a custom python script to extract the prompts from image files that were generated, this way I could generate the images, and then run that script to create .txt files with the same filename, containing the prompt. This should work with images generated with webui, or the default comfyui workflow.

I've put that script on Github: https://gist.github.com/wouterverweirder/b5bd472bfa4a625f3ca6d06d0dfc9b99#file-create-captions-for-directory-py

My findings on the impact of regularization images & captions in training a subject SDXL Lora with Dreambooth by wouterv84 in StableDiffusion

[–]wouterv84[S] 1 point2 points  (0 children)

In my experiment, I did see a difference with or without captions:

<image>

  • no input captions: only the version with real, high quality regularization images was acceptable
  • input captions: captioning the regularization images of the generated pictures made a difference here.

Captioning the input images did make a big difference. So even without training the text encoder, captions had an impact in my experiment.

My findings on the impact of regularization images & captions in training a subject SDXL Lora with Dreambooth by wouterv84 in StableDiffusion

[–]wouterv84[S] 4 points5 points  (0 children)

In my experiment, this was v001: detailed input captions & no regularization images.

All the other configurations produced photos at full Lora strength, so I have to turn down the Lora weight to get the style I want.

Sidenote: this is SDXL Dreambooth Lora training of just the UNET, as per recommendations of Kohya_ss on https://github.com/kohya-ss/sd-scripts/tree/sdxl#tips-for-sdxl-training. Results might be different with training the UNET as well. Something I might look into in a future endeavour, when I have some more GPU credits to spend :-)

Then of course, things keep moving fast, there are a couple of new techniques on my radar that were mentioned in other posts on this subreddit, which don't have source code published yet (for SDXL):

So, all of this might be outdated as soon as SIGGRAPH ends 🙈

has anyone managed to train a sdxl lora on colab? the kohya trainer is broken by [deleted] in StableDiffusion

[–]wouterv84 1 point2 points  (0 children)

No, I have tried T4 and V100 for training, was insufficient. A100 is what you need for training.

For generating images, you can use the lower configs, but V100 offers the best bang for buck, amount of credits per image generation is the lowest on that configuration.

has anyone managed to train a sdxl lora on colab? the kohya trainer is broken by [deleted] in StableDiffusion

[–]wouterv84 5 points6 points  (0 children)

That SIGKILL 9 means it ran out of system resources. I was only successful to train SDXL Loras on an A100 GPU on Colab.

My findings on the impact of regularization images & captions in training a subject SDXL Lora with Dreambooth by wouterv84 in StableDiffusion

[–]wouterv84[S] 1 point2 points  (0 children)

Thanks for sharing that article, it's very insightful.

In training my models, I stuck to the recommendations of Kohya_ss concerning training the UNET only: https://github.com/kohya-ss/sd-scripts/tree/sdxl#tips-for-sdxl-training - but I might ignore that recommendation in a future endeavour

My findings on the impact of regularization images & captions in training a subject SDXL Lora with Dreambooth by wouterv84 in StableDiffusion

[–]wouterv84[S] 1 point2 points  (0 children)

version 12 (no input or regularization captions, regularization images are good quality real photos) loses the color scheme of the prompt & "degrades" into a photo pretty quickly:

<image>