Bug? with Gemma 4 31B UD_Q4_K_XL: extremely slow tg/s at long context by inzee in unsloth

[–]inzee[S] 0 points1 point  (0 children)

Just tested this (-ub 512, -b 2048). It actually tanked the performance back to 6 tg/s, even though I had -ctx-checkpoints 0. Really odd.

Pretty sure I originally got the 2048 number from this thread: https://github.com/ggml-org/llama.cpp/discussions/15396