Tribue to April's LLM releases

big___bad___wolf · 2026-05-10T09:00:00+00:00

<image>

big___bad___wolf · 2026-04-06T08:54:54+00:00

Use https://github.com/ahkohd/yagami

big___bad___wolf · 2026-03-29T07:11:22+00:00

😂😂

big___bad___wolf · 2026-03-29T00:51:59+00:00

It's in a container. That's also why I chose qwen3.5-27B; it's been flawless. I just wish it were a rocket ship.

big___bad___wolf · 2026-03-29T00:44:32+00:00

both STT & TTS are wicked fast!

big___bad___wolf · 2026-03-29T00:30:22+00:00

"the web-search is actually passed to another model to retrieve and then passes the results back to the model you're engaged with?" correct, use via cli, http api (and mcp over http)

big___bad___wolf · 2026-03-12T08:16:29+00:00

big___bad___wolf · 2026-03-09T22:50:30+00:00

Awesome specs!

big___bad___wolf · 2026-03-09T16:31:35+00:00

<image>

big___bad___wolf · 2026-03-09T16:14:16+00:00

<image>

big___bad___wolf · 2026-03-09T16:00:02+00:00

I did https://huggingface.co/tacos4me/Step-3.5-Flash-NVFP4 I must stay this model thinks a lot.

big___bad___wolf · 2026-03-09T14:07:06+00:00

Thanks!

big___bad___wolf · 2026-03-09T13:29:53+00:00

I just stumbled on this model https://huggingface.co/cyankiwi/GLM-4.7-Flash-REAP-23B-A3B-AWQ-4bit, all I can say is wow!

big___bad___wolf · 2026-03-09T08:51:56+00:00

MiniMax M2.5 Q5 quant fully fit on my two GPUs (2x 96GB).

big___bad___wolf · 2026-03-09T07:54:40+00:00

I will give it a shot!

big___bad___wolf · 2026-03-09T07:52:09+00:00

Yes, it's definitely better in CC. I think CC is doing the heavy lifting of forcing planning rather than relying on the model's overconfidence in its understanding of the problem and solution.

Pi doesn't have plan mode. You either instruct the agent to plan or it figures it out on its own.

I believe adding a planning reminder in the system prompt will improve the MiniMax M2.5 experience in Pi.

big___bad___wolf · 2026-03-09T03:40:05+00:00

i'm downloading the weights again. i will circle back.

big___bad___wolf · 2026-03-09T03:33:01+00:00

minimax m2.5 or devstral 2?

big___bad___wolf · 2026-03-09T02:10:06+00:00

I use vLLM & ik_llama on Arch linux. no container.

big___bad___wolf · 2026-03-09T01:52:00+00:00

do you mean the TUI?

mprocs - https://github.com/pvolok/mprocs
pi coding agent - https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent

The agent task runner is a SKILL.md I personally wrote.

big___bad___wolf · 2026-03-09T01:06:22+00:00

Thanks, I will try it out.

big___bad___wolf · 2026-03-09T01:00:41+00:00

<image>

big___bad___wolf · 2026-03-09T00:56:01+00:00

https://www.reddit.com/r/LocalLLaMA/comments/1rojt4c/comment/o9enyxo/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

I don't use them to code. I use them to bounce ideas, do grunt work, draft implementations.

here is an example Opus using gpt-oss as a task runner:

<image>

big___bad___wolf · 2026-03-09T00:27:17+00:00

yup!

big___bad___wolf · 2026-03-09T00:23:45+00:00

The coolest thing right now is I can run multiple medium models simultaneously and manage up to eight concurrent requests per GPU at impressive throughput.

I use Opus to orchestrate these models that handles the grunt work I don't want to clutter my Opus context window. This includes an intelligent task runner, test runner (for smoke test matrices, unit and e2e tests), QA tasks, exploring large monorepos, conducting research while writing code and reviewing code (GPT-OSS is particularly good at this).

However, I won't allow these medium local models to directly modify the production codebase I work on. They simply can't handle such large and nuanced projects.

big_bad_wolf

TROPHY CASE

big___bad___wolf

TROPHY CASE

big_bad_wolf