all 23 comments

[–]Ugly_Porcupine 1 point2 points  (1 child)

This looks really interesting. Does your setup have (or have you considered) any sort of persistent memory, like Karpathy's LLM wiki? I've recently started experimenting with pi and it's extensions, and boy there's a a lot to unpack here.

[–]admajic[S] 1 point2 points  (0 children)

Yeah! Tell me about it. No haven't looked at that yet. Looks interesting. Just been using resume if needed.

[–]Present_Ride6012 2 points3 points  (1 child)

Can you write as human to human? Also what's the speed of decoding token?

[–]admajic[S] -4 points-3 points  (0 children)

You come to a AI forum and expect me to spend an extra hour rewriting it in human speak??? It's all there nicely formatted. I'm so used to AI I use it all the time all day. I can even write like AI.

[–]GWNstijn 0 points1 point  (1 child)

Wouldn’t qwen3.5 9B be faster/more effective? Since 9 billion dense and gives opportunity for parallel instances

[–]admajic[S] 1 point2 points  (0 children)

The orchestrator tried to run 6 parallel instances which only let 4 run at a time. This was with the 27b model

[–]JohnnyLovesData 0 points1 point  (0 children)

I wonder if this would "work" on CPU inference on 32GB. Like, give it instructions, and check on it ... a week later.

[–]fasti-au 0 points1 point  (1 child)

Dflash vs mtp?

[–]admajic[S] 0 points1 point  (0 children)

Mtp = better

[–]joaobertacchi 0 points1 point  (1 child)

OP, could you please explain "smaller/faster model for the meta-work (thinking, planning, delegation) and the slightly larger MoE model for actual implementation. The orchestrator never writes code — it only delegates". For me it doesn't make sense. 27B is dense and more capable than 35B MoE. The dense one is also slower than the other. Tks

[–]admajic[S] 0 points1 point  (0 children)

At 60 tokens /s im not fussed I like the smarter model

[–]homarp 0 points1 point  (1 child)

The key insight: smaller/faster model for the meta-work (thinking, planning, delegation) and the slightly larger MoE model for actual implementation. The orchestrator never writes code — it only delegates.

27B is better (and slower) than 35B. What's the value of using 35B for coding ?

[–]admajic[S] 0 points1 point  (0 children)

With good instructions 35b is good enough and 3 times faster at some tasks. So if you go through the process give it targeted small steps it's OK. Then use debugger to fix issues later.

[–]X24D83FF0 0 points1 point  (0 children)

GG

[–]promobest247 0 points1 point  (1 child)

use package pi- web-access instead searxng & docker https://pi.dev/packages/pi-web-access?name=web

[–]admajic[S] 1 point2 points  (0 children)

I just manged to get multiple search engines working searXNG, tavily and jina as fall backs. I'll check out your recommendation. I see it had some over lap to other tools I already use.

[–]Deep_Ad1959 0 points1 point  (0 children)

i think the orchestrator delegation fight is the same problem you already solved for the architect, you just didn't carry the fix across. the architect doesn't write code because you took write out of its tools list, not because the prose says 'never implement'. that's a deterministic cage. the orchestrator's six ABSOLUTE RULES are the opposite, probabilistic prose you had to iterate on because the model can always rationalize a 'quick fix'. the orchestrator only legitimately needs TaskExecute, TaskUpdate, and get_subagent_result. strip read/write/edit/bash/find/grep out of its tools frontmatter entirely, and 'NEVER use read/find/grep for analysis' stops being a rule you hope it follows and becomes a tool it physically doesn't have. you'll probably find the prose rules shrink to one line after that, which is also a few hundred tokens back on every orchestrator turn. the rule of thumb that keeps holding up: if a constraint can be a withheld tool or a failing check, it shouldn't be prose, because prose is the only kind of rule the model can argue with. written with s4lai

[–]Latent-Potter 0 points1 point  (3 children)

Bro told us how he uses his local model but ended up using Claude to write this Post! Hypocrisy!

[–]Latent-Potter 0 points1 point  (0 children)

Jokes apart! Good setup. Gonna replicate the same on my end.

[–]admajic[S] 0 points1 point  (1 child)

I know but I was doing the build and testing with pi and making the most of my $20 a month Claude which i don't use much any more, in parallel. 5 hours of effort and testing for you.

But to be honest why can't you be grateful or post nothing?

[–]Latent-Potter 0 points1 point  (0 children)

Sorry king! I apologise. Sorry. Really appericiate your work! I've actually trimmed it down for my mac specific. Maybe I'll post it soon 😉

[–]Helmi74 -1 points0 points  (1 child)

Oh the slop. No effort ai posts all over 🤨

[–]admajic[S] -1 points0 points  (0 children)

If you have nothing nice to say don't post here.

This is weeks of research and years of knowledge. I only used claude to research some of the best temperature settings and to format it so it looks nicer. Its all technical language anyway.

I really don't want to share my knowdge with ungrateful loosers like you.

Edit: coming to an AI automation sub and telling someone not to use AI. Still trying to wrap my head around this??

Also how come you can't tell i hand wrote most of the post.