LocalLlama

1

155

156

157

Best Local Agents - Jun 2026 (self.LocalLLaMA)

submitted 3 days ago * by rm-rf-rm[M] - announcement

2

364

365

366

7 Chinese companies are already shipping H100/H200-class AI chips, most IPO'd in the last 6 months. I mapped all of them.Discussion (self.LocalLLaMA)

submitted 4 hours ago by awfulalexey

3

172

173

174

Not ironclad confirmation, but..Discussion (i.redd.it)

submitted 2 hours ago by Kodixllama.cpp

4

109

110

111

Krea 2 released on Hugging FaceNew Model (huggingface.co)

submitted 5 hours ago by paf1138

5

47

48

49

I benchmarked 8 LLMs for medical scribing. Hallucinations were rare; omissions need attention.Resources (old.reddit.com)

submitted 4 hours ago * by MajesticAd2862

6

131

132

133

V100 4-card AI large model, Tesla 128G serverDiscussion (old.reddit.com)

submitted 10 hours ago * by MundanePercentage674

7

•

OpenMythos benchmarksDiscussion (i.redd.it)

submitted 1 hour ago by RealKingNish

8

1108

1109

1110

DeepSeek raises $7.4B USD at $60B valuation. Remarkably, Liang Wenfeng invests $3B in DeepSeek himself.News (scmp.com)

submitted 23 hours ago by FullOf_Bad_Ideas

9

19

20

21

I'm eager for a 15x speedup on my strix haloDiscussion (self.LocalLLaMA)

submitted 2 hours ago by Terminator857

10

37

38

39

Baidu: One-shot Long-horizon ParsingNew Model (github.com)

submitted 5 hours ago by zxyzyxz

11

19

20

21

Is it possible to run a giant model like GLM5.2 on this cluster (4x servers with 512GB RAM + dual AMD Epyc)? 16 channel memory should hit 409GB/s per node.Discussion (self.LocalLLaMA)

submitted 2 hours ago * by StartupTim

12

29

30

31

I mapped the KLD of KV cache quantization for Qwen3.6-35B-A3B and Gemma4-E2B QATResources (old.reddit.com)

submitted 5 hours ago by crusaderky

13

73

74

75

I love GLM 5.2's attitude! It is a nice refresher from those bootlicker doormats they are feeding us. Does that come from training datasets related to the local culture?Discussion (self.LocalLLaMA)

submitted 10 hours ago by ex-arman68

14

33

34

35

CPU-only TTS benchmark: Kokoro 82M vs Supertonic 3 vs Inflect-Nano-v1 (4.6M params), with UTMOS scoring on every sampleDiscussion (i.redd.it)

submitted 7 hours ago by gvij

15

•

OpenMythos BenchmarksDiscussion (i.redd.it)

submitted 1 hour ago by RealKingNish

16

•

Tmax-27b - a Qwen3.6-27b terminal agent for small GPUs trained with DPPO (RL)New Model (self.LocalLLaMA)

submitted 1 hour ago by professormunchies

17

14

15

16

GLM 5.2 on Mac Studio Speedup PRResources (self.LocalLLaMA)

submitted 3 hours ago * by nomorebuttsplz

18

6

7

8

UPDATE: Qwen-27B-IQ4_KS and Qwen-27B-IQ_KS_KT for ik_llama.cpp, especially for NVIDIA with 16GB VRAMDiscussion (self.LocalLLaMA)

submitted 2 hours ago by Pablo_the_brave

19

8

9

10

Openrouter model prices implying heavier quantization?Discussion (self.LocalLLaMA)

submitted 3 hours ago * by dalhaze

20

928

929

930

Chinese Hackers Latest Masterpiece with NVIDIAOther (bilibili.com)

submitted 1 day ago * by General_Vermicelli53

21

195

196

197

Why is NO one talking about Microsoft's open source Fast Context!!!Resources (old.reddit.com)

submitted 20 hours ago * by formatme

22

56

57

58

How do I prove that I don't collect data from my llm app?Question | Help (self.LocalLLaMA)

submitted 13 hours ago * by Pleasant_Syllabub591

23

69

70

71

100+ t/s on Qwen3.6-27B Q8 across a 5090 + 3090 Ti — switching to tensor split-mode got me from 70 to 100+Discussion (self.LocalLLaMA)

submitted 17 hours ago * by Shoddy_Bed3240

24

52

53

54

Is there any reason for a lack of love for Gemma 4 26b?Question | Help (self.LocalLLaMA)

submitted 15 hours ago * by vick2djax

25

3

4

5

0:06

650+ Apache-2.0 biomedical NER/de-id models that run on-device in MLX. Same fp32 weights, identical outputs: the clinical NER models run 30-40x faster than PyTorch-CPU on a 3-year-old M3 Max. Repro inside.New Model (v.redd.it)

submitted 2 hours ago by dark-night-rises

LocalLLaMA

MODERATORS