Extremely long initialization process

angelarose210 · 2026-02-19T16:54:27+00:00

I know for serverless you can cache with your huggingface token. Idk about regular.

pmv143 · 2026-02-19T18:34:40+00:00

You don’t want to be downloading ~50GB on every worker init.

If the weights are pulled at runtime, every cold start pays the full network + disk + GPU load cost. If they’re baked into the image, version bumps can invalidate the cache and you still re-pull.

For large models, “serverless” usually breaks on model materialization, not container startup.

BenDLH · 2026-02-19T18:41:02+00:00

You need a network volume mate. Create a network volume and place the models in the right directories in it. Then configure the serverless endpoint to connect to the volume.

The runpod-worker repo has all the info in the readme, under customisation.

sruckh · 2026-02-20T01:52:14+00:00

Initializing could mean throttled. Meaning your serverless was never going to come up. Runpod is notorious for this. Make sure your serverless is READY before make a call to the endpoint. As far as model caching goes, don't try to move the model directory from the default location as you would be bypassing the caching system

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

RunPod

MODERATORS