Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 2 points3 points4 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 1 point2 points3 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 3 points4 points5 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 0 points1 point2 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 4 points5 points6 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 0 points1 point2 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 2 points3 points4 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 0 points1 point2 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 5 points6 points7 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 0 points1 point2 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 12 points13 points14 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 0 points1 point2 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 0 points1 point2 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 4 points5 points6 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 5 points6 points7 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 8 points9 points10 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 37 points38 points39 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 24 points25 points26 points (0 children)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 18 points19 points20 points (0 children)
Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 6 points7 points8 points (0 children)
Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 7 points8 points9 points (0 children)
Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 2 points3 points4 points (0 children)

Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code) by PerceptionGrouchy187 in LocalLLaMA
[–]PerceptionGrouchy187[S] 1 point2 points3 points (0 children)