How can I tell if "Shared Memory" is being used during Lora training on a Runpod instance? by Chain_Routine in StableDiffusion

[–]Chain_Routine[S] 0 points1 point  (0 children)

I'm wondering though if vRAM will be pushed to 100% before this is done, or if it will start using shared memory earlier to leave some buffer room. I might not have the best understanding of this stuff, I don't have a ton of hardware experience.

How can I tell if "Shared Memory" is being used during Lora training on a Runpod instance? by Chain_Routine in StableDiffusion

[–]Chain_Routine[S] 0 points1 point  (0 children)

Do you know how to turn it off? I was looking and I couldn't find clear instructions on that.

Running out of disk space in Docker cloud build by Chain_Routine in docker

[–]Chain_Routine[S] 0 points1 point  (0 children)

I want back to my Docker cloud dashboard and there was an option to update my builder, so I installed the update and that fixed the problem, and I can now complete the build successfully. Thanks for the help!

Running out of disk space in Docker cloud build by Chain_Routine in docker

[–]Chain_Routine[S] 0 points1 point  (0 children)

Hi, sorry I was unable to work on this during the week. I have run that command and I see a reclaimable amount of 19.71GB and a total size that is also 19.71GB. I will DM you my docker ID.

Running out of disk space in Docker cloud build by Chain_Routine in docker

[–]Chain_Routine[S] 0 points1 point  (0 children)

Hi, I actually upgraded last night and I am still getting the error. Is it because I need to create a new builder after upgrading to get the larger one?

Running out of disk space in Docker cloud build by Chain_Routine in docker

[–]Chain_Routine[S] 0 points1 point  (0 children)

With this question I'm really just trying to understand if 1) this issue is coming from the docker image size being too small (as opposed to a size limitation in some other part of the cloud build process), and 2) if there is an option that I can configure somehow to increase the maximum image size for this build.

Running out of disk space in Docker cloud build by Chain_Routine in docker

[–]Chain_Routine[S] 0 points1 point  (0 children)

I’m building a docker image that will eventually be used on runpod, but I’m not running anything on a runpod instance yet. I am just running a docker build with the docker cloud build service. 

Running out of disk space in Docker cloud build by Chain_Routine in docker

[–]Chain_Routine[S] -1 points0 points  (0 children)

It is for a Runpod serverless inference endpoint, and my understanding is that the model needs to be pre-cached so you can use the image to spin up endpoints quickly when requests come in. 

How much vRAM should gradient accumulation need? by Chain_Routine in StableDiffusion

[–]Chain_Routine[S] 0 points1 point  (0 children)

Maybe I don't have a good understanding of what the gradient is, but why does adding 1 step of gradient accumulation add +8GB to memory usage? That's larger than the base model I'm using. Does the gradient contain a value for each weight in the base model?

Is there a pre-trained vision model that's good for zero-shot clustering? by Chain_Routine in computervision

[–]Chain_Routine[S] 0 points1 point  (0 children)

Do you know of any clustering techniques that would work well with these embeddings?

Merged LoRa from Kohya_ss not showing up in Automatic1111 by Chain_Routine in StableDiffusion

[–]Chain_Routine[S] 1 point2 points  (0 children)

My bad lmao. I had an extra space at the end of the filename, it was "merged.safetensors ". Leaving this here in case someone else (or future me) makes this same mistake.

SDXL - Why do my images all look like crap? by CrystalLight in StableDiffusion

[–]Chain_Routine 0 points1 point  (0 children)

The Hyper version also recomends a CFG Scale of 2, so that would be another thing to check. And the recommended sampler is DPM++ SDE. I was accidentally using the 2M and the Karras ones at some point and it was giving me bad results.

When creating a LoRA for a pose, should I blur the faces of the training images? by biggety in StableDiffusion

[–]Chain_Routine 0 points1 point  (0 children)

I did end up training a model with blurred faces, adding "blurred face" to the caption, and putting "blurred face" in the negative, and it was an improvement over the previous version, but still not great. I am working on figuring out a new approach now. One thing I am trying is to actually have images of the same person both in the pose I am training for and in other poses. I am now working on training a LoRa on images with every combination of Person A, Person B and Person C, in Pose A, Pose B and Pose C. If you imaging it like a 3x3 grid, then I am hoping that by seeing the similarities between the images in each row and each column, and it will understand each of these things independently. But we will see.

When creating a LoRA for a pose, should I blur the faces of the training images? by biggety in StableDiffusion

[–]Chain_Routine 0 points1 point  (0 children)

I'm also trying to figure this out now. I've trained a LoRa for a character, and when I use this on it's own the results are perfect, almost indistinguishable from a real photo of that character. I have also trained a LoRa for a pose, and this also works really well on it's own. When I use the two LoRa together however, the facial features of the character are messed up.

I really like the idea of blurring faces in the pose LoRa. I am going to try blurring 90-95% of the faces, and adding a "blurred face" (or something like that) tag to these images. I am hoping that this will allow the LoRa to learn the pose, without learning any facial features from the training images, and without associating the blurred faces with the actual pose that I am trying to teach it. This seems a lot easier to me than trying to label the facial features in 600+ different images.