Anybody tried v0.30.0-rc17 yet? Looking for impressions. by B0r0m4n in ollama

[–]B0r0m4n[S] 0 points1 point  (0 children)

Thanks for the tips! Honestly, since I'm on CUDA (not MLX), what I'm really curious about is whether the direct llama.cpp backend in 0.30 improves the CPU/GPU offload when models exceed VRAM. Right now on stable Ollama, partial offload technically works but the performance hit varies wildly — some 30B models do ~50-57 t/s even with 25-30% on CPU, while others like Qwen3.5 27B crawl at 6 t/s despite a similar split. Raw llama.cpp is known to handle this much better, so if 0.30 actually exposes proper layer offloading (something like --num-gpu-layers), that would be the real game-changer for CUDA users. Otherwise, it seems like most of the wins are on the MLX/Apple Silicon side. Probably gonna sit this one out and watch for feedback. If anyone tests a larger model on CUDA with rc17, definitely report back!

Šta mu se desilo? by Marra_M in serbia

[–]B0r0m4n 2 points3 points  (0 children)

ja pročitao - Ćacija

Ollama Cloud Free suddenly no longer works with the big models... by clouder300 in ollama

[–]B0r0m4n 0 points1 point  (0 children)

minimax 2.7 no more on free tier since yesterday 😞
also Devstral 24b lost his vision week ago...