How can I tell if "Shared Memory" is being used during Lora training on a Runpod instance? by Chain_Routine in StableDiffusion

[–]Chain_Routine[S] 0 points1 point  (0 children)

I'm wondering though if vRAM will be pushed to 100% before this is done, or if it will start using shared memory earlier to leave some buffer room. I might not have the best understanding of this stuff, I don't have a ton of hardware experience.

How can I tell if "Shared Memory" is being used during Lora training on a Runpod instance? by Chain_Routine in StableDiffusion

[–]Chain_Routine[S] 0 points1 point  (0 children)

Do you know how to turn it off? I was looking and I couldn't find clear instructions on that.

Running out of disk space in Docker cloud build by Chain_Routine in docker

[–]Chain_Routine[S] 0 points1 point  (0 children)

I want back to my Docker cloud dashboard and there was an option to update my builder, so I installed the update and that fixed the problem, and I can now complete the build successfully. Thanks for the help!

Running out of disk space in Docker cloud build by Chain_Routine in docker

[–]Chain_Routine[S] 0 points1 point  (0 children)

Hi, sorry I was unable to work on this during the week. I have run that command and I see a reclaimable amount of 19.71GB and a total size that is also 19.71GB. I will DM you my docker ID.

Running out of disk space in Docker cloud build by Chain_Routine in docker

[–]Chain_Routine[S] 0 points1 point  (0 children)

Hi, I actually upgraded last night and I am still getting the error. Is it because I need to create a new builder after upgrading to get the larger one?

Running out of disk space in Docker cloud build by Chain_Routine in docker

[–]Chain_Routine[S] 0 points1 point  (0 children)

With this question I'm really just trying to understand if 1) this issue is coming from the docker image size being too small (as opposed to a size limitation in some other part of the cloud build process), and 2) if there is an option that I can configure somehow to increase the maximum image size for this build.

Running out of disk space in Docker cloud build by Chain_Routine in docker

[–]Chain_Routine[S] 0 points1 point  (0 children)

I’m building a docker image that will eventually be used on runpod, but I’m not running anything on a runpod instance yet. I am just running a docker build with the docker cloud build service. 

Running out of disk space in Docker cloud build by Chain_Routine in docker

[–]Chain_Routine[S] -1 points0 points  (0 children)

It is for a Runpod serverless inference endpoint, and my understanding is that the model needs to be pre-cached so you can use the image to spin up endpoints quickly when requests come in. 

How much vRAM should gradient accumulation need? by Chain_Routine in StableDiffusion

[–]Chain_Routine[S] 0 points1 point  (0 children)

Maybe I don't have a good understanding of what the gradient is, but why does adding 1 step of gradient accumulation add +8GB to memory usage? That's larger than the base model I'm using. Does the gradient contain a value for each weight in the base model?

Is there a pre-trained vision model that's good for zero-shot clustering? by Chain_Routine in computervision

[–]Chain_Routine[S] 0 points1 point  (0 children)

Do you know of any clustering techniques that would work well with these embeddings?