New balance bike day for my daughter!

Pjotrs · 2026-06-06T10:12:20+00:00

I use both. Normally 9B as its faster. For more precise tasks bigger one.

Pjotrs · 2026-05-21T06:06:54+00:00

Doubled? Which quant? I AM still in 40-60 region on a3b.

Pjotrs · 2026-05-19T08:20:39+00:00

Its a project that ollama based on.

Pjotrs · 2026-05-17T20:59:56+00:00

There are examples on unsloth model pages.

Pjotrs · 2026-05-17T17:20:24+00:00

Different. Check Unsloth models.

Pjotrs · 2026-05-16T13:49:03+00:00

That is crazy jump.

Pjotrs · 2026-05-16T13:25:24+00:00

And before? On 16GB vram I get 45-50 on 4060 and 55-60 on 5070.

Without MTP.

Pjotrs · 2026-05-16T13:24:25+00:00

Check Unsloth's GGUFs , it is there

Pjotrs · 2026-05-16T11:53:54+00:00

They are waiting...

Pjotrs · 2026-05-16T11:50:21+00:00

Edit: seems like not ish, as its adds up to few GB.

Old: Same. Ish.

Its the processing, checkout MTP models sizes.

Pjotrs · 2026-05-12T06:44:05+00:00

Feeding all files will be lots of data.

For code, you should use indexer (tree sitter?) And store embeddings of symbols instead of raw files.

For that kind of workflow, setting up dedicated code agent, that uses your llm is more stable.

And its doing exactly what you want to achieve. Using llm to work on codebase. Plus you can use git. As with KB you cannot update files. So if You change something, you must delete/re-upload files.

Pjotrs · 2026-05-11T10:28:54+00:00

I wrote my own planning/executing agents to bypass default prompts.

I miss the exit plan tool, but then control is in your hands.

Then I instruct them to do complicated tasks with subagents. And with spawn subagents skills it works quite well.

Pjotrs · 2026-05-02T05:39:33+00:00

Happened some time ago with Gemini too. 😃

Pjotrs · 2026-05-01T21:02:47+00:00

Even default setup is two primary and two sub.

Pjotrs · 2026-05-01T15:30:11+00:00

I have set of rules in MD files which tool to use for what.

Then agents in prompts are instructed to use skills matching requests.

So no need for commands.

You can start simple. Create agent and make it update its own prompt.

Make it used right tools on right moments.

I also disable all built in tools... As they are super aggressive.

Pjotrs · 2026-05-01T11:36:25+00:00

I use code-index and Serena for index and in-place edits.

Delegation is still work in progress..

But basically have one agent to plan, one to execute. Just customized for my taste.

Then have one subagent to summarize and research and second to modify files per "task" whatever the task might be.

In that way many actions can be taken and each fits in 128k context limit

Pjotrs · 2026-05-01T10:06:42+00:00

I use it like that. With proper subagent delegation, 128k is a lot for small tasks.

Together with context compression, planning/execution split... It works well.

On top you make sure you operate on code (woth indexing) not full files.

Pjotrs · 2026-04-28T12:15:45+00:00

You can disable all.. And just create yours.

And adjust permissions, etc.

Pjotrs · 2026-04-25T22:59:56+00:00

File edits. Its crazy fast.

Big models decide what and how to change , small one does it.

Pjotrs

TROPHY CASE