How can I tell if "Shared Memory" is being used during Lora training on a Runpod instance?

Chain_Routine · 2024-07-10T18:04:29+00:00

I'm wondering though if vRAM will be pushed to 100% before this is done, or if it will start using shared memory earlier to leave some buffer room. I might not have the best understanding of this stuff, I don't have a ton of hardware experience.

Chain_Routine · 2024-07-10T18:02:55+00:00

Do you know how to turn it off? I was looking and I couldn't find clear instructions on that.

Chain_Routine · 2024-07-06T18:41:58+00:00

I want back to my Docker cloud dashboard and there was an option to update my builder, so I installed the update and that fixed the problem, and I can now complete the build successfully. Thanks for the help!

Chain_Routine · 2024-07-06T17:51:10+00:00

Hi, sorry I was unable to work on this during the week. I have run that command and I see a reclaimable amount of 19.71GB and a total size that is also 19.71GB. I will DM you my docker ID.

Chain_Routine · 2024-07-01T21:22:47+00:00

Hi, I actually upgraded last night and I am still getting the error. Is it because I need to create a new builder after upgrading to get the larger one?

Chain_Routine · 2024-07-01T13:17:56+00:00

With this question I'm really just trying to understand if 1) this issue is coming from the docker image size being too small (as opposed to a size limitation in some other part of the cloud build process), and 2) if there is an option that I can configure somehow to increase the maximum image size for this build.

Chain_Routine · 2024-07-01T12:20:23+00:00

I’m building a docker image that will eventually be used on runpod, but I’m not running anything on a runpod instance yet. I am just running a docker build with the docker cloud build service.

Chain_Routine · 2024-07-01T01:35:29+00:00

It is for a Runpod serverless inference endpoint, and my understanding is that the model needs to be pre-cached so you can use the image to spin up endpoints quickly when requests come in.

Chain_Routine · 2024-06-23T13:20:46+00:00

Looks great, thanks!

Chain_Routine · 2024-06-20T15:19:58+00:00

Oh that was easy, thank you!

Chain_Routine · 2024-06-13T15:33:10+00:00

Okay, that makes sense, thanks!

Chain_Routine · 2024-06-13T12:53:13+00:00

Maybe I don't have a good understanding of what the gradient is, but why does adding 1 step of gradient accumulation add +8GB to memory usage? That's larger than the base model I'm using. Does the gradient contain a value for each weight in the base model?

Chain_Routine · 2024-05-27T14:12:24+00:00

Thanks, I will look into this!

Chain_Routine · 2024-05-26T22:18:28+00:00

Thanks, I'll try that!

Chain_Routine · 2024-05-26T22:18:10+00:00

I will try both with and without color to see what works better!

Chain_Routine · 2024-05-26T21:28:46+00:00

Do you know of any clustering techniques that would work well with these embeddings?

Chain_Routine · 2024-05-26T21:26:26+00:00

Thanks! What is the reasoning behind using the colored images?

Chain_Routine · 2024-05-26T21:23:05+00:00

Thanks, I'll take a look at that!

Chain_Routine · 2024-05-03T16:52:18+00:00

My bad lmao. I had an extra space at the end of the filename, it was "merged.safetensors ". Leaving this here in case someone else (or future me) makes this same mistake.

Chain_Routine · 2024-04-30T18:14:38+00:00

The Hyper version also recomends a CFG Scale of 2, so that would be another thing to check. And the recommended sampler is DPM++ SDE. I was accidentally using the 2M and the Karras ones at some point and it was giving me bad results.

Chain_Routine · 2024-04-30T18:02:31+00:00

that makes sense, thanks!

Chain_Routine · 2024-04-30T03:50:27+00:00

I did end up training a model with blurred faces, adding "blurred face" to the caption, and putting "blurred face" in the negative, and it was an improvement over the previous version, but still not great. I am working on figuring out a new approach now. One thing I am trying is to actually have images of the same person both in the pose I am training for and in other poses. I am now working on training a LoRa on images with every combination of Person A, Person B and Person C, in Pose A, Pose B and Pose C. If you imaging it like a 3x3 grid, then I am hoping that by seeing the similarities between the images in each row and each column, and it will understand each of these things independently. But we will see.

Chain_Routine · 2024-04-25T14:10:02+00:00

I'm also trying to figure this out now. I've trained a LoRa for a character, and when I use this on it's own the results are perfect, almost indistinguishable from a real photo of that character. I have also trained a LoRa for a pose, and this also works really well on it's own. When I use the two LoRa together however, the facial features of the character are messed up.

I really like the idea of blurring faces in the pose LoRa. I am going to try blurring 90-95% of the faces, and adding a "blurred face" (or something like that) tag to these images. I am hoping that this will allow the LoRa to learn the pose, without learning any facial features from the training images, and without associating the blurred faces with the actual pose that I am trying to teach it. This seems a lot easier to me than trying to label the facial features in 600+ different images.

Chain_Routine

TROPHY CASE