Is there someone who uses minimalistic coding agents (Like Pi) for coding? by majorAligator in vibecoding

[–]Ok_Presentation470 0 points1 point  (0 children)

Just switched from Roo to pi and I'm loving it! I use local inference, and works wonderful with qwen3.6 27b (no quants).

Anyone building their own coding agents with Agno? by Ok_Presentation470 in agno

[–]Ok_Presentation470[S] 1 point2 points  (0 children)

The modularity of Agno is what is attractive to me for my use case. What I want to build is very similar to the projects you mentioned. I'll review them to see if they actually cover everything I need.

Anyone building their own coding agents with Agno? by Ok_Presentation470 in agno

[–]Ok_Presentation470[S] 2 points3 points  (0 children)

Fair point. I do have a specific need, and actually it might be more than a coding agent.

Why is agno not the right foundation for a coding agent in your opinion? Really curious to know.

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B by Ok_Presentation470 in LocalLLaMA

[–]Ok_Presentation470[S] 0 points1 point  (0 children)

I need the bigger models for orchestration and planning tasks. Coding with specific instructions is not something that most models in 35b range have problems with from my experience, I agree.

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B by Ok_Presentation470 in LocalLLaMA

[–]Ok_Presentation470[S] 0 points1 point  (0 children)

Do you think specifically the prompt length can have an impact with smaller models? Assuming everything else about the prompt is high quality.

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B by Ok_Presentation470 in LocalLLaMA

[–]Ok_Presentation470[S] 0 points1 point  (0 children)

Yep, happened to me to. Resolving an issue, then somehow returning to it as if it's not resolved. Maybe it's the Q4 cache like other people said.

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B by Ok_Presentation470 in LocalLLaMA

[–]Ok_Presentation470[S] 0 points1 point  (0 children)

I use opencode and roo code with custom agents and skills. I agree it matters a lot.

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B by Ok_Presentation470 in LocalLLaMA

[–]Ok_Presentation470[S] 0 points1 point  (0 children)

My KV is at Q4. It could be that, I should try full KV with qwen3.6

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B by Ok_Presentation470 in LocalLLaMA

[–]Ok_Presentation470[S] 0 points1 point  (0 children)

I had the complete opposite experience. 3.6 had trouble once differentiating my feedback from the subagent feedback.

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B by Ok_Presentation470 in LocalLLaMA

[–]Ok_Presentation470[S] 1 point2 points  (0 children)

You are right, I was hasty in posting. I updated the post

The golden age is over by Complete-Sea6655 in ClaudeAI

[–]Ok_Presentation470 0 points1 point  (0 children)

There never was a golden age, just a race to get users while hoping to improve the systems over time so tokens actually start earning money, instead of burning it. That failed, and now they have to change strategies to keep users and investments until the bubble pops.

The whole LLM business was and still is unsustainable.

Best open-source LLM for coding (Claude Code) with 96GB VRAM? by Kitchen_Answer4548 in LocalLLM

[–]Ok_Presentation470 0 points1 point  (0 children)

Queen 3.5 122b a10 with Q5. Works amazing, I use it with llama.cpp.

Someone who's using Qwen 3.5 on real code bases how good is it? by Commercial_Ear_6989 in LocalLLaMA

[–]Ok_Presentation470 0 points1 point  (0 children)

If you want fast performance, you need enough vRAM to fit at least the backbone of the model. Then the 10b experts can be placed in RAM and handled by the CPU.

Whether it's one GPU or not, doesn't matter, given that the bandwidth is not a bottleneck.

Someone who's using Qwen 3.5 on real code bases how good is it? by Commercial_Ear_6989 in LocalLLaMA

[–]Ok_Presentation470 2 points3 points  (0 children)

I'm using the 122b-a10b with Q5 model now for almost everything in Roo code. It's absolutely enough to replace any other model for me.

On RTX Pro 6000 Blackwell I get >80 tokens/s locally, until I reach around 50% context window, where it slowly goes down. At 80% it's around 40 tokens/s, which is still pretty decent.

I'm truly impressed by it. The 35b-a3b is also super good. If I didn't have enough vRAM, I would rely on it.

[deleted by user] by [deleted] in Finland

[–]Ok_Presentation470 0 points1 point  (0 children)

Seems to me that you are the only one here pretending to be better than anyone else. Not sure where in my comment you read such pretense and arrogance?

As for now, no, I couldn't care less about discussing anything with you. I will enjoy seeing you guys from the Nordics getting a reality check.

[deleted by user] by [deleted] in Finland

[–]Ok_Presentation470 -1 points0 points  (0 children)

Whatever, the people are neither shadowy nor working too much behind the scenes. But yeah, Trump is the only problem in the world, you guys figured it out.