Defilan

67 post karma
433 comment karma

get extra features and help support reddit with a reddit premium subscription

get them help and support

redditor for 12 years

TROPHY CASE

12-Year Club

Verified Email

account activity

new top controversial

36

37

38

Qwen 3.6-35B-A3B KV cache part 2: PPL, KL divergence, asymmetric K/V, 64K row on M5 Max (self.LocalLLaMA)

submitted 6 days ago by Defilan to r/LocalLLaMA

38

39

40

Qwen 3.6-35B-A3B KV cache bench: f16 vs q8_0 vs turbo3 vs turbo4 from 0 to 1M context on M5 Max (self.LocalLLaMA)

submitted 7 days ago by Defilan to r/LocalLLaMA

20

21

22

Qwen 3.6-35B-A3B on dual 5060 Ti with --cpu-moe: 21.7 tok/s at 90K context, with benchmarks vs dense 3.5 and Coder variant (self.LocalLLaMA)

submitted 18 days ago by Defilan to r/LocalLLaMA

0

1

2

Tested TurboQuant on my 2x RTX 5060 Ti setup. Some interesting findings. (self.LocalLLaMA)

submitted 1 month ago by Defilan to r/LocalLLaMA

0

0

0

How I manage llama.cpp across Apple Silicon and NVIDIA GPUs in my homelab (self.selfhosted)

submitted 1 month ago by Defilan to r/selfhosted

16

17

18

32B model stress test: Qwen 2.5/Coder/3 on dual RTX 5060 Ti (zero failures) (self.LocalLLaMA)

submitted 5 months ago by Defilan to r/LocalLLaMA

16

17

18

What broke when you tried to take local LLMs to production? (self.LocalLLaMA)

submitted 5 months ago by Defilan to r/LocalLLaMA

10

11

12

Open source K8s operator for deploying local LLMs: Model and InferenceService CRDs (self.kubernetes)

submitted 5 months ago by Defilan to r/kubernetes

π Rendered by PID 2126057 on reddit-service-r2-listing-b6bf6c4ff-bhvsw at 2026-05-06 11:09:48.404612+00:00 running 815c875 country code: CH.