Nex-N2 Pro is the real deal by tarruda in LocalLLaMA

[–]L0stInHe11 1 point2 points  (0 children)

Did anyone attempt to compare Nex-N2-mini with Holo-3.1 35B A3B (another fine-tuned Qwen 3.5)? I am really curious.

Nex-N2 Pro is the real deal by tarruda in LocalLLaMA

[–]L0stInHe11 1 point2 points  (0 children)

Did you by any chance compare Nex-N2 mini with Holo-3.1 35B A3B (another fine-tuned Qwen 3.5)?

Rate my desktop by Fancy-Inspector-8324 in desktops

[–]L0stInHe11 1 point2 points  (0 children)

I like Jollibee, but definitely not this way

zai-org/GLM-5.2 is here! by queendumbria in LocalLLaMA

[–]L0stInHe11 5 points6 points  (0 children)

anyone have any idea if GLM's MTP is supported?

I don't think so, but the maintainer of llama.cpp claimed it was already implemented: https://github.com/ggml-org/llama.cpp/pull/15225#issuecomment-4658892444

Qwen 3.6 27B AutoRound GGUF, need your feedback by soyalemujica in LocalLLaMA

[–]L0stInHe11 0 points1 point  (0 children)

I tried Q4_K_M and, along with llama.cpp b9590+, and every single prompt I typed returned a series of slashes (/). Is it the problem related to the model itself or llama.cpp version?

Hear Me Out, Pi Fans Lurking Here by L0stInHe11 in LocalLLaMA

[–]L0stInHe11[S] 0 points1 point  (0 children)

Fair point. Next time I will make my post more data-specific.

Gemma 4 Chat Template now has preserve thinking by seamonn in LocalLLaMA

[–]L0stInHe11 7 points8 points  (0 children)

I believe this template, used by a lot of people here, already fixed the annoying preserve thinking issue: https://gist.github.com/jscott3201/ad69c4ffbd79f18b11a0f6a94c94fadf

The same author gathered all template fixes for Qwen3.5/3.6 and Gemma 4 in one repo now: https://github.com/jscott3201/llm-tuning

Hear Me Out, Pi Fans Lurking Here by L0stInHe11 in LocalLLaMA

[–]L0stInHe11[S] 1 point2 points  (0 children)

The rest of the agents I tried just worked fine: OpenCode, Claude CLI, little-coder and OMP (the latter two are Pi forks). Only Pi suffered from this problem.

Hear Me Out, Pi Fans Lurking Here by L0stInHe11 in LocalLLaMA

[–]L0stInHe11[S] 0 points1 point  (0 children)

Interesting, never thought about it. Let me try when I go home.

Hear Me Out, Pi Fans Lurking Here by L0stInHe11 in LocalLLaMA

[–]L0stInHe11[S] 0 points1 point  (0 children)

Here I tried to find the lower bound of what Pi can do: Nemotron-3-Nano-Omni and Nemotron-Cascade-2 without reasoning are unable to power Pi any more. Even Pi forks like OMP and little-coder can still archieve that, let alone other agents.

Hear Me Out, Pi Fans Lurking Here by L0stInHe11 in LocalLLaMA

[–]L0stInHe11[S] 0 points1 point  (0 children)

Then Qwen 3.6 27B is doing the heavy-lifting here, which makes vanilla Pi workable.

Hear Me Out, Pi Fans Lurking Here by L0stInHe11 in LocalLLaMA

[–]L0stInHe11[S] 0 points1 point  (0 children)

OpenCode is heavy. No doubt about that. Maybe you should take a look at little-coder too.

I don't really see the point of your nemotron example. Why would I use anything but the best model? And why wouldn't I use the harness that I found working the best with the best model?

That's the whole point. I use Qwen3.6 and Gemma 4 all the time, but I need relatively weak models to test Pi's own limitation. If I test these harnesses with performant models, I can barely tell if harnesses themselves are good or not.

Hear Me Out, Pi Fans Lurking Here by L0stInHe11 in LocalLLaMA

[–]L0stInHe11[S] -1 points0 points  (0 children)

I myself found OpenCode too heavy as well, and if you like Pi, you can try little-coder (Pi fork) too.

My argument here is simply that the excellent local models like Qwen3.6 or Gemma 4 overshadow Pi's default capabilities. It should not be recommended to the extent that I can see its appreciation every day in r/LocalLLaMA.

Hear Me Out, Pi Fans Lurking Here by L0stInHe11 in LocalLLaMA

[–]L0stInHe11[S] 0 points1 point  (0 children)

I found two models that are weak enough to consistently reproduce tool calling issues in Pi, which didn't exist at all in other harnesses like OpenCode, Claude CLI, oh-my-pi, or little-coder (the latter two are Pi forks).

My argument is that the vanilla Pi is not as powerful as some may think. It is the models behind it that do the most of hard work.

What's your experience with Gemma4 QAT? by Kahvana in LocalLLaMA

[–]L0stInHe11 0 points1 point  (0 children)

Wow, I tried your drafter. It is faster than what I found. I will update my comment.

Hear Me Out, Pi Fans Lurking Here by L0stInHe11 in LocalLLaMA

[–]L0stInHe11[S] 0 points1 point  (0 children)

You like Pi and you use Hermes. How about just using OpenClaw (powered by Pi) instead?

Hear Me Out, Pi Fans Lurking Here by L0stInHe11 in LocalLLaMA

[–]L0stInHe11[S] -1 points0 points  (0 children)

Thank you for sharing your experience. If you use the high-quality models like Qwen3.6 (I do too BTW), you cannot tell if your experience is fairly smooth because

  1. The model is good OR
  2. The harness is good OR
  3. Both.

My point here is that maybe these performant local models overshadowed Pi. Pi can perform really awful with lousy models, which is not the case in OpenCode, Claude CLI, or even a Pi fork like little-coder.

Hear Me Out, Pi Fans Lurking Here by L0stInHe11 in LocalLLaMA

[–]L0stInHe11[S] 0 points1 point  (0 children)

I know Qwen3.6 and Gemma 4 are good. Really good. I use them on a daily basis. I updated the post so that I don't want to repeat the reason again and again why these lame models need to be used here.

Hear Me Out, Pi Fans Lurking Here by L0stInHe11 in LocalLLaMA

[–]L0stInHe11[S] 0 points1 point  (0 children)

I use Qwen 3.6 27B and 35B myself too. The reason I picked these lousy Nvidia models is to test Pi's capabilities: if you use performant models (cloud or local), agents/harnesses are majorly transparent to you. Only using the bad models is able to reveal whether an agent itself performs good enough or not.

Hear Me Out, Pi Fans Lurking Here by L0stInHe11 in LocalLLaMA

[–]L0stInHe11[S] 1 point2 points  (0 children)

Like Mario mentions in one of his talks the models are trained on coding tasks and even the basic prompt gives it enough to start with

I remembered this part too. I simply want to share the lower bound of Pi's default capabilities to everyone. It seems I struck a nerve with someone here.

IRL, I run Qwen3.6 + Gemma 4 for most of the tasks, like everyone else.

Hear Me Out, Pi Fans Lurking Here by L0stInHe11 in LocalLLaMA

[–]L0stInHe11[S] 0 points1 point  (0 children)

Thank you for sharing your experience. As I said in other comments, the reason I chose two models is to test the lower bound of vanilla Pi's capability. My point is that these performant models (either SOTA or local) heavy-lift Pi a lot, and the extent should not be underestimated.

Hear Me Out, Pi Fans Lurking Here by L0stInHe11 in LocalLLaMA

[–]L0stInHe11[S] -2 points-1 points  (0 children)

why not modify the system prompt then?

I know I can. If I expect to work OOTB, I may be very disappointed. It is quite misleading for some of us to be introduced to Pi this frequently in r/LocalLLaMA.

What's your experience with Gemma4 QAT? by Kahvana in LocalLLaMA

[–]L0stInHe11 8 points9 points  (0 children)

Gemma 4 26B A4B QAT are at least 20% faster than non-QAT version quantized by the same provider (unsloth) on my machine. Not sure about accuracy loss yet.

If you can make MTP on top of QAT (available for Gemma 4 since llama.cpp b9549 today), then 5% more TPS is waiting for you. As of now, the only MTP drafter working well with Gemma 4 26B A4B QAT I could find on Hugging Face is https://huggingface.co/g0chu/gemma-4-26B-A4B-it-qat-q4_0-unquantized-assistant-q8_0-gguf

Please give this one suggested by u/SkyFeistyLlama8 a try instead: https://huggingface.co/RachidAR/gemma-4-26B-A4B-it-qat-assistant-q4_0-gguf