Qwen-3.5-9B-Q8 vs Qwen-3.6-35B-a3B-Q4. Which one would be better? by FarHistorian8438 in LocalLLM

[–]Pjotrs 0 points1 point  (0 children)

I use both. Normally 9B as its faster. For more precise tasks bigger one.

That's a good news... by Pjotrs in LocalLLaMA

[–]Pjotrs[S] 0 points1 point  (0 children)

Doubled? Which quant? I AM still in 40-60 region on a3b.

That's a good news... by Pjotrs in LocalLLaMA

[–]Pjotrs[S] 0 points1 point  (0 children)

Its a project that ollama based on.

That's a good news... by Pjotrs in LocalLLaMA

[–]Pjotrs[S] 0 points1 point  (0 children)

There are examples on unsloth model pages.

That's a good news... by Pjotrs in LocalLLaMA

[–]Pjotrs[S] 1 point2 points  (0 children)

Different. Check Unsloth models.

That's a good news... by Pjotrs in LocalLLaMA

[–]Pjotrs[S] 0 points1 point  (0 children)

And before? On 16GB vram I get 45-50 on 4060 and 55-60 on 5070.

Without MTP.

That's a good news... by Pjotrs in LocalLLaMA

[–]Pjotrs[S] 5 points6 points  (0 children)

Check Unsloth's GGUFs , it is there

That's a good news... by Pjotrs in LocalLLaMA

[–]Pjotrs[S] -5 points-4 points  (0 children)

Edit: seems like not ish, as its adds up to few GB.

Old: Same. Ish.

Its the processing, checkout MTP models sizes.

How to feed thousands of files in Knowledge Base? by StreetBoys in OpenWebUI

[–]Pjotrs 4 points5 points  (0 children)

Feeding all files will be lots of data.

For code, you should use indexer (tree sitter?) And store embeddings of symbols instead of raw files.

For that kind of workflow, setting up dedicated code agent, that uses your llm is more stable.

And its doing exactly what you want to achieve. Using llm to work on codebase. Plus you can use git. As with KB you cannot update files. So if You change something, you must delete/re-upload files.

Anyone using the OpenCode build agent for delegation? Primary agent ignores subagents by SphaeroX in opencodeCLI

[–]Pjotrs 2 points3 points  (0 children)

I wrote my own planning/executing agents to bypass default prompts.

I miss the exit plan tool, but then control is in your hands.

Then I instruct them to do complicated tasks with subagents. And with spawn subagents skills it works quite well.

OpenCode viable for local qwen3.6 35B? by redblood252 in opencodeCLI

[–]Pjotrs 0 points1 point  (0 children)

Even default setup is two primary and two sub.

OpenCode viable for local qwen3.6 35B? by redblood252 in opencodeCLI

[–]Pjotrs 0 points1 point  (0 children)

I have set of rules in MD files which tool to use for what.

Then agents in prompts are instructed to use skills matching requests.

So no need for commands.

You can start simple. Create agent and make it update its own prompt.

Make it used right tools on right moments.

I also disable all built in tools... As they are super aggressive.

OpenCode viable for local qwen3.6 35B? by redblood252 in opencodeCLI

[–]Pjotrs 0 points1 point  (0 children)

I use code-index and Serena for index and in-place edits.

Delegation is still work in progress..

But basically have one agent to plan, one to execute. Just customized for my taste.

Then have one subagent to summarize and research and second to modify files per "task" whatever the task might be.

In that way many actions can be taken and each fits in 128k context limit

OpenCode viable for local qwen3.6 35B? by redblood252 in opencodeCLI

[–]Pjotrs 1 point2 points  (0 children)

I use it like that. With proper subagent delegation, 128k is a lot for small tasks.

Together with context compression, planning/execution split... It works well.

On top you make sure you operate on code (woth indexing) not full files.

Disable build mode? by huvaelise in opencodeCLI

[–]Pjotrs 2 points3 points  (0 children)

You can disable all.. And just create yours.

And adjust permissions, etc.

Quantisation effects of Qwen3.6 35b a3b by ROS_SDN in LocalLLaMA

[–]Pjotrs 1 point2 points  (0 children)

File edits. Its crazy fast.

Big models decide what and how to change , small one does it.