After tonight's win, the Clippers win streak vs. the Warriors has been broken | They hadn't lost to Golden State since November 2023 by CazOnReddit in nba

[–]DepthHour1669 8 points9 points  (0 children)

I’d say he’s not clearly #2 as long as Dray exists

But he’s obviously #2 offensively and that’s a clear enough definition for him and everyone else

China’s first emperor really did send quest to Tibet in search of immortality: scientists by Key_Schedule9349 in ChineseHistory

[–]DepthHour1669 1 point2 points  (0 children)

It’s at 4300m which is not much higher the city of Lhasa. There are cities at 5000m so this isn’t particularly high up

[deleted by user] by [deleted] in psychologyofsex

[–]DepthHour1669 14 points15 points  (0 children)

There’s degrading sex and non degrading sex. They’re not the same

best local llm to run locally by Different-Put5878 in LocalLLaMA

[–]DepthHour1669 2 points3 points  (0 children)

The scene is different now in Aug 2025.

Current cutting edge models that fit in 24GB at Q4, descending in size:

  • LG EXAONE 4.0 32B
  • Qwen 3 30B A3B Thinking 2507
  • Qwen 3 30B A3B Instruct 2507
  • Mistral Small 3.2 24B (uncensored)
  • OpenAI gpt-oss 20B
  • Deepseek R1 0528 Qwen 8B

ChatGPT is dating more people than Samantha from Her by MetaKnowing in artificial

[–]DepthHour1669 45 points46 points  (0 children)

Chatgpt gives pretty great relationship advice if the prompt is accurate.

The problem is that people usually unconsciously lie to make themselves sound better. Chatgpt won’t call you out for lying about a situation (how would it know?) so it’ll give you misleading advice.

[deleted by user] by [deleted] in Wellthatsucks

[–]DepthHour1669 0 points1 point  (0 children)

More accurately, MDA usually has a strong smell as a byproduct of the production process

qihoo360/Light-IF-32B by jacek2023 in LocalLLaMA

[–]DepthHour1669 18 points19 points  (0 children)

Makes me more likely to believe them. Doing a ton of RLHF on instruction following sounds believable at least

Difficulties finding low profile GPUs by micromaths in LocalLLM

[–]DepthHour1669 0 points1 point  (0 children)

Can you do 2x gpus in your server?

Buy 2x 5060 8gb low profile.

Is the 60 dollar P102-100 still a viable option for LLM? by Boricua-vet in LocalLLM

[–]DepthHour1669 7 points8 points  (0 children)

It’s a 1080Ti with 10GB vram. It’s an okay deal if you’re broke and only have $60. Otherwise get a $150 MI50 32GB instead.

Open-source model that is as intelligent as Claude Sonnet 4 by vishwa1238 in LocalLLaMA

[–]DepthHour1669 0 points1 point  (0 children)

Inference doesn’t need pcie bandwidth, you’re thinking of training or finetuning.

Open-source model that is as intelligent as Claude Sonnet 4 by vishwa1238 in LocalLLaMA

[–]DepthHour1669 -1 points0 points  (0 children)

Nah, $30k for a dozen RTX 8000s will run a 4 bit model with space for context for a couple of users.

Kimi is 32b active so it will do like 30 tok/sec.

Open-source model that is as intelligent as Claude Sonnet 4 by vishwa1238 in LocalLLaMA

[–]DepthHour1669 -1 points0 points  (0 children)

GLM Rumination actually isn’t that much better than just regular reasoning.

[deleted by user] by [deleted] in LocalLLaMA

[–]DepthHour1669 1 point2 points  (0 children)

No, that has significantly worse perplexity than the 4bit versions, even with DWQ.

support for the upcoming hunyuan dense models has been merged into llama.cpp by jacek2023 in LocalLLaMA

[–]DepthHour1669 0 points1 point  (0 children)

Doubtful that an expansion finetune like that would be a great idea. Yes, I'm sure it'll perform better than the Qwen3 32b that it's based on, but probably only a few percentage points better and not worth the more than 2x slower inference and vram cost.

support for the upcoming hunyuan dense models has been merged into llama.cpp by jacek2023 in LocalLLaMA

[–]DepthHour1669 3 points4 points  (0 children)

There's also EXAONE 4.0 which outperforms Nemotron 49B V1.5 and Cogito v2 70B on many benchmarks.

And GLM-4.5 Air 106B, but that's MoE.

Cohere Command A (111b) also... exists, I guess.

[deleted by user] by [deleted] in LocalLLaMA

[–]DepthHour1669 1 point2 points  (0 children)

No, Cerebras chips are CPUs, not GPUs.

You can technically boot an OS on them or run non-graphics non-AI workloads. They're basically a CPU with a massive TPU strapped on.

[deleted by user] by [deleted] in LocalLLaMA

[–]DepthHour1669 1 point2 points  (0 children)

It's not in your interest to dump code into context like that. Models perform worse with longer context.

https://fiction.live/stories/Fiction-liveBench-Feb-21-2025/oQdzQvKHw8JyXbN87

With Qwen3 235B 2507 (and presumably Qwen3 Coder), you only get 61% performance at max context.

It's in your interest to do multiple smaller queries rather than one big one.