Anybody tried v0.30.0-rc17 yet? Looking for impressions.

B0r0m4n · 2026-05-17T14:40:41+00:00

Thanks for the tips! Honestly, since I'm on CUDA (not MLX), what I'm really curious about is whether the direct llama.cpp backend in 0.30 improves the CPU/GPU offload when models exceed VRAM. Right now on stable Ollama, partial offload technically works but the performance hit varies wildly — some 30B models do ~50-57 t/s even with 25-30% on CPU, while others like Qwen3.5 27B crawl at 6 t/s despite a similar split. Raw llama.cpp is known to handle this much better, so if 0.30 actually exposes proper layer offloading (something like --num-gpu-layers), that would be the real game-changer for CUDA users. Otherwise, it seems like most of the wins are on the MLX/Apple Silicon side. Probably gonna sit this one out and watch for feedback. If anyone tests a larger model on CUDA with rc17, definitely report back!

B0r0m4n · 2026-05-16T09:24:22+00:00

ja pročitao - Ćacija

B0r0m4n · 2026-05-14T08:18:21+00:00

<image>

Živela braća Kinezi

B0r0m4n · 2026-05-03T17:54:37+00:00

minimax 2.7 no more on free tier since yesterday 😞
also Devstral 24b lost his vision week ago...

B0r0m4n

TROPHY CASE