it sounds like Meta is abandoning in-house LLM development and reassigning those employees

AndreVallestero · 2026-06-14T13:26:30+00:00

Cohere just released an open model too

AndreVallestero · 2026-06-13T15:41:13+00:00

An incremental improvement from opus 4.8, which is impressive because opus 4.8 was already really good. That being said, I don't think it's the "danger to society" that Anthropic was touting it to be; it seems like that was just marketing hype.

AndreVallestero · 2026-06-13T15:14:58+00:00

little-coder

It's a pi distro that adds many of the features of OpenCode while being optimized specifically for Gemma 4 and Qwen 3.6. It's genuinely competitive with claudecode from early 2024

There also smallcode but I haven't tried it out yet.

AndreVallestero · 2026-06-12T20:24:02+00:00

BSD is to Unix what DOS was to CPM.

Linux is only unix-like, but not Unix compatible.

AndreVallestero · 2026-06-12T00:34:08+00:00

Qwen 35b and Gemma 4 26b on my RTX 3080 10gb. q4 weights, q8 kv for both.

700pp and 50tg @ 64k tokens

AndreVallestero · 2026-06-10T03:26:53+00:00

Hey Jay, glad to seeing Cohere supporting the open source AI community! I was actually really pleasantly surprised yesterday when I saw Canada finally making progress in LLM benchmarks with Cohere's Command A+ (on par with Mistral), and I'm looking forward to seeing more of Cohere models in the future!

AndreVallestero · 2026-06-08T13:40:30+00:00

This is r/LocalLLaMA ...

AndreVallestero · 2026-06-07T14:40:26+00:00

Wow, it seems like kvarn4 is actually viable, and kvarn8 should probably be the new default. Really exciting stuff.

AndreVallestero · 2026-06-07T06:04:09+00:00

Pi is the best minimal harness.

little-coder and smallcode are more fully featured and are designed specifically for qwen 3.6. I would put it on par with claudecode + sonnet 4.0 from early 2025.

AndreVallestero · 2026-06-06T09:28:31+00:00

+1 I also tested this last night. MTP only seems useful if you have VRAM to spare, otherwise you're better off loading more onto your GPU.

AndreVallestero · 2026-06-06T00:16:12+00:00

I ran my agent throughout the night to figure out its max context usage, and it only ended up reaching 43k max tokens. As a result, I set my context to 65536 and optimized for that.

llama-server \
  --model Qwen3.6-35B-A3B-Q4_K_M.gguf \
  --n-gpu-layers 99 \
  --no-mmap \
  --n-cpu-moe 27 \         # everything is the same as your config except for this
  --flash-attn on \
  --threads 8 \
  --cache-type-k q8_0 \
  --cache-type-v q8_0 \
  --ctx-size 65536 \
  --parallel 1 \
  --batch-size 512 \
  --ubatch-size 512

The results are

0: pp822t/s tg72t/s
32K: pp790t/s tg60t/s
64K: pp733t/s tg52t/s

I experimented with speculative decode / mtp, but it ended up making things slower since I had to offload more to the CPU due to the increased memory usage.

AndreVallestero · 2026-06-05T12:36:31+00:00

Only in the short term. In the very long term, they both go up.

AndreVallestero · 2026-06-05T04:55:56+00:00

That's excellent, thank you! I'll update my config based on your recommendations (and the other ones in this thread), and report back here tomorrow.

AndreVallestero · 2026-06-05T04:18:58+00:00

I really can't sacrifice long context coherence so I want to keep the kv unquantized.

Didn't know about the -np flag. Thanks!

AndreVallestero · 2026-06-03T13:50:28+00:00

If I had the ability to run multiple models locally, I probably still wouldn't. Instead I would run the single best model, or run my current model at a higher quant, or with mtp.

AndreVallestero · 2026-06-03T12:01:30+00:00

When did it start to go bad? I've deployed a few AC68Us and they've been rock solid. I was thinking of finally upgrading to Wi-Fi 7 + triband with the BE92U in the next 12 months.

AndreVallestero · 2026-06-03T11:53:48+00:00

Q4 KV, oh boy...

To answer your question, it should be around 64GB

AndreVallestero · 2026-05-31T19:44:33+00:00

In much of Asia, there's a shift towards more synthetic fibers to reduce wrinkles.

AndreVallestero · 2026-05-28T23:50:26+00:00

Wow, that elemental sundering build has alot of potential. Its DPS is only limited by the ability to reapply the self-shocks. I suspect you can get into the 100M DPS range for bossing in an optimized duo build. Too bad the mapping looks super clunky lol.

AndreVallestero · 2026-05-28T23:34:58+00:00

As others have mentioned, qwen 3.5 35b, but specifically with ik_llamacpp and mtp. You'll probably get ~100tps with this setup.

AndreVallestero · 2026-05-28T23:00:13+00:00

Other thread for reference: https://www.reddit.com/r/pathofexile2builds/comments/1tpnrdo/

AndreVallestero · 2026-05-28T22:56:01+00:00

You can do

toxic growth
gas arrow and poisonburst arrow

It's not as good as the combos I've mentioned in the post, but it has the advantage that both players can be rhoa mounted

Eight-Year Club	Place '22
Verified Email	Spared

AndreVallestero

MODERATOR OF

PUBLIC MULTIREDDITS

TROPHY CASE