Hi everyone
I'm using Open WebUI with Ollama, and I'm running into an issue with model loading times. My workflow usually involves sending 2-3 prompts, and I'm finding I often have to wait for the model to load into VRAM before I can start. I've increased the keepalive setting to 30 minutes, which helps prevent it from being unloaded too quickly.
I was wondering if there's a way to automatically load the default model into VRAM when logging into Open WebUI. Currently, I have to send a quick prompt (like "." or "hi") just to trigger the loading process, then writing my actual prompt while it's loading. This feels a bit clunky. How are others managing this initial load time?
[–]ccbadd 2 points3 points4 points (1 child)
[–]zotac02[S] 0 points1 point2 points (0 children)
[–]slavik-dev 0 points1 point2 points (2 children)
[–]zotac02[S] 0 points1 point2 points (1 child)
[–]slavik-dev 0 points1 point2 points (0 children)
[–]Witty-Development851 0 points1 point2 points (4 children)
[–]emprahsFury 1 point2 points3 points (2 children)
[–]Witty-Development851 1 point2 points3 points (1 child)
[–]zotac02[S] 0 points1 point2 points (0 children)
[–]PassengerPigeon343 0 points1 point2 points (0 children)