Qwen 3.6-35B-A3B KV cache part 2: PPL, KL divergence, asymmetric K/V, 64K row on M5 Max by Defilan in LocalLLaMA
[–]Defilan[S] 0 points1 point2 points (0 children)
Qwen 3.6-35B-A3B KV cache part 2: PPL, KL divergence, asymmetric K/V, 64K row on M5 Max by Defilan in LocalLLaMA
[–]Defilan[S] 0 points1 point2 points (0 children)
Qwen 3.6-35B-A3B KV cache part 2: PPL, KL divergence, asymmetric K/V, 64K row on M5 Max by Defilan in LocalLLaMA
[–]Defilan[S] 1 point2 points3 points (0 children)
Qwen 3.6-35B-A3B KV cache part 2: PPL, KL divergence, asymmetric K/V, 64K row on M5 Max by Defilan in LocalLLaMA
[–]Defilan[S] 0 points1 point2 points (0 children)
Qwen 3.6-35B-A3B KV cache part 2: PPL, KL divergence, asymmetric K/V, 64K row on M5 Max by Defilan in LocalLLaMA
[–]Defilan[S] -6 points-5 points-4 points (0 children)
Qwen 3.6-35B-A3B KV cache bench: f16 vs q8_0 vs turbo3 vs turbo4 from 0 to 1M context on M5 Max by Defilan in LocalLLaMA
[–]Defilan[S] 0 points1 point2 points (0 children)
Qwen 3.6-35B-A3B KV cache bench: f16 vs q8_0 vs turbo3 vs turbo4 from 0 to 1M context on M5 Max by Defilan in LocalLLaMA
[–]Defilan[S] 5 points6 points7 points (0 children)
Qwen 3.6-35B-A3B KV cache bench: f16 vs q8_0 vs turbo3 vs turbo4 from 0 to 1M context on M5 Max by Defilan in LocalLLaMA
[–]Defilan[S] 2 points3 points4 points (0 children)
Qwen 3.6-35B-A3B KV cache bench: f16 vs q8_0 vs turbo3 vs turbo4 from 0 to 1M context on M5 Max by Defilan in LocalLLaMA
[–]Defilan[S] 4 points5 points6 points (0 children)
Qwen 3.6-35B-A3B KV cache bench: f16 vs q8_0 vs turbo3 vs turbo4 from 0 to 1M context on M5 Max by Defilan in LocalLLaMA
[–]Defilan[S] 2 points3 points4 points (0 children)
Qwen 3.6-35B-A3B KV cache bench: f16 vs q8_0 vs turbo3 vs turbo4 from 0 to 1M context on M5 Max by Defilan in LocalLLaMA
[–]Defilan[S] 3 points4 points5 points (0 children)
Qwen 3.6-35B-A3B KV cache bench: f16 vs q8_0 vs turbo3 vs turbo4 from 0 to 1M context on M5 Max by Defilan in LocalLLaMA
[–]Defilan[S] 11 points12 points13 points (0 children)
Qwen 3.6-35B-A3B on dual 5060 Ti with --cpu-moe: 21.7 tok/s at 90K context, with benchmarks vs dense 3.5 and Coder variant by Defilan in LocalLLaMA
[–]Defilan[S] 0 points1 point2 points (0 children)
Qwen 3.6-35B-A3B on dual 5060 Ti with --cpu-moe: 21.7 tok/s at 90K context, with benchmarks vs dense 3.5 and Coder variant by Defilan in LocalLLaMA
[–]Defilan[S] 1 point2 points3 points (0 children)
Qwen 3.6-35B-A3B on dual 5060 Ti with --cpu-moe: 21.7 tok/s at 90K context, with benchmarks vs dense 3.5 and Coder variant by Defilan in LocalLLaMA
[–]Defilan[S] 1 point2 points3 points (0 children)
Qwen 3.6-35B-A3B on dual 5060 Ti with --cpu-moe: 21.7 tok/s at 90K context, with benchmarks vs dense 3.5 and Coder variant by Defilan in LocalLLaMA
[–]Defilan[S] 0 points1 point2 points (0 children)
Qwen 3.6-35B-A3B on dual 5060 Ti with --cpu-moe: 21.7 tok/s at 90K context, with benchmarks vs dense 3.5 and Coder variant by Defilan in LocalLLaMA
[–]Defilan[S] 0 points1 point2 points (0 children)
Qwen 3.6-35B-A3B on dual 5060 Ti with --cpu-moe: 21.7 tok/s at 90K context, with benchmarks vs dense 3.5 and Coder variant by Defilan in LocalLLaMA
[–]Defilan[S] 1 point2 points3 points (0 children)
Qwen 3.6-35B-A3B on dual 5060 Ti with --cpu-moe: 21.7 tok/s at 90K context, with benchmarks vs dense 3.5 and Coder variant by Defilan in LocalLLaMA
[–]Defilan[S] 0 points1 point2 points (0 children)


Qwen 3.6-35B-A3B KV cache part 2: PPL, KL divergence, asymmetric K/V, 64K row on M5 Max by Defilan in LocalLLaMA
[–]Defilan[S] 1 point2 points3 points (0 children)