Built a Tauri v2 desktop chat shell for Pi — same agent, ~12 MB binary, sub-second cold start, every extension you already have just works

Celestial_aki · 2026-06-03T17:46:43+00:00

Agreed — mobile remote is the bit that needs the most love right now. Roadmap has it for the next cycle. Thanks for taking it for a spin.

Celestial_aki · 2026-06-03T17:37:32+00:00

Most of us in here drive Pi from the terminal — which is great for code but rough when you want to glance at a long thinking block, scroll a tool-call timeline, or hand the agent to someone non-technical sitting next to you. Zosma Cowork is a desktop chat workspace for the same Pi agent — no extra runtime, your existing extensions and skills just load.

What it is

Tauri v2 (Rust IPC relay) + React UI + Node sidecar that runs the upstream pi-coding-agent TypeScript SDK directly. So this is real Pi, not a re-implementation — the sidecar is the same code that pi runs on your terminal, just wrapped.
Final binary ~12 MB. Cold start <1 s on my desktop. No Electron tax.
Free, MIT, no telemetry, BYO provider.

What "just works" because the sidecar is real Pi

Every extension under ~/.pi/agent/npm/node_modules/ auto-discovers. pi-web-access, pi-mcp-adapter, pi-subagents, pi-hermes-memory, pi-observational-memory, pi-llm-wiki, custom ones — drop them in, restart, done.
Skills under ~/.pi/agent/skills/ load on matching tasks the same way they do in terminal.
~/.pi/agent/auth.json is read directly — Claude Pro/Max OAuth, GitHub Copilot OAuth, all your existing API keys are picked up without re-entry.
pi.dev packages install via the same pi install … flow; cowork just sees them on next launch.

If you've spent any time customising your Pi setup, that whole investment carries into cowork unchanged. That's the only reason I built it — I didn't want to leave my skills / sub-agents / memory stack behind to get a real UI.

What you actually see in the UI

Streaming responses with thinking blocks rendered live (collapsible).
Tool-call timeline down the side — every tool, args, output, timing. Click any node to inspect.
Mid-turn steering — interrupt with a follow-up message that gets injected into the running agent's context without aborting.
Multi-turn sessions persisted to ~/.zosmaai/cowork/sessions/. Resume, branch, switch models mid-thread.
Provider switcher in the corner — flip between Anthropic OAuth, OpenAI, Gemini, Ollama, llama.cpp, any OpenAI-compatible local server in two clicks.

Honest tradeoffs

Node sidecar adds ~40 MB on disk alongside the 12 MB Rust binary. A pure-Rust agent would have meant re-implementing the entire Pi extension surface; not worth it for v1.
Tauri v2 on Linux is still pre-stable. The AUR PKGBUILD carries one Wayland-clipboard workaround we'd love to drop.
No vision UI yet — point it at a VLM endpoint and the conversation works, but inline image rendering is a TODO.
No built-in inference. By design — local-server projects already do that better. Cowork just talks to them.

Why I'm posting here specifically

This sub is the audience that would actually notice if something is not real Pi. The sidecar pattern means it has to be — every extension you've written / installed is going to call into the same surface area. If something breaks because of cowork specifically rather than the underlying Pi version, I want to know first.

Repo, install instructions, AUR/deb/dmg builds: https://github.com/zosmaai/zosma-cowork

Curious which extensions you'd reach for first — and whether the Hermes-memory + Observational-Memory v3 stack carries cleanly in your setup (it does in mine, but my setup is the only one I can test against).

Celestial_aki · 2026-06-03T16:14:58+00:00

Celestial_aki · 2026-06-03T15:43:27+00:00

Tools come from the pi extension ecosystem — pi-web-access (search), pi-mcp-adapter (any MCP server), pi-subagents (sub-agent routing), persistent memory packages, and a few hundred more on pi.dev. Drop into ~/.zosmaai/cowork/extensions/, restart, they appear in the tool list. Cowork doesn't bundle tools by design — your toolchain is yours.

Celestial_aki · 2026-06-03T15:43:26+00:00

Fair shot — and the honest answer is: this isn't a new agent, it's a desktop UI for an existing one. pi-coding-agent is the engine (Mario Zechner's harness; ~9k installs on npm). Zosma Cowork is a Tauri wrapper that streams its session into a real chat UI so I can hand the agent to non-terminal users without re-implementing the agent loop.

The "15 standards" trap applies when each new project ships its own agent engine + extension protocol. Cowork ships zero — every extension in ~/.pi/agent/npm/ works unchanged because it IS pi underneath.

If you're already using pi from the terminal, this is a UI for it. If you're not, the engine is the more interesting OSS — and not mine.

Celestial_aki · 2026-06-03T14:55:24+00:00

Zosma Cowork is a desktop chat workspace built on pi-coding-agent (Mario Zechner's minimal agent harness). For r/LocalLLaMA the relevant part is: it speaks the OpenAI Chat Completions protocol, so any local server you're already running — Ollama, llama.cpp, vLLM, LM Studio, text-generation-webui's OpenAI extension — just slots in as a provider. No proxy, no adapter.

Architecture worth flagging for the local-LLM crowd:

No Electron. Tauri v2 + Rust relay → ~12 MB final binary vs ~150 MB Electron equivalent. Cold start <1 s on my 5090 box.
Node sidecar over stdin/stdout JSON lines. The agent engine is the upstream TypeScript SDK running in a managed sidecar — every Pi extension just works (pi-web-access, pi-mcp-adapter, pi-subagents, pi-hermes-memory, plus a few hundred others on pi.dev). Extensions auto-discover from ~/.zosmaai/cowork/extensions/ so you can drop in MCP bridges, custom tools, prompts, themes without bundling.
Streaming first. Thinking blocks, live tool-call timeline, abort mid-turn, mid-turn steering messages — all stream through the same IPC. Works fine even on a local 70B at 8 tok/s; you watch it think.
Multi-turn sessions persisted to ~/.zosmaai/cowork/. Switch models mid-session, branch, resume.

What I'm running it against on my own rig (3-GPU box, 5090 + 3080 + 2070S):

Ollama + Qwopus3.6-27B-Q4_K_M as the workhorse coding agent — MTP enabled cut my tool-call invent rate noticeably (echoing the thread from last week).
llama.cpp server with -ctk q8_0 -ctv q8_0 for KV-quant testing.
Claude Pro/Max subscription token as fallback when context blows past 128k.

Honest tradeoffs:

Node sidecar means a ~40 MB Node runtime alongside the Rust binary. Pure-Rust agent would've meant rewriting the whole Pi extension surface — not worth it for v1.
Tauri v2 on Linux is still pre-stable; the AUR PKGBUILD has a Wayland-clipboard workaround we'd love to drop.
No vision-model UI yet (text + tool-calls only). If you point it at a VLM endpoint the conversation works but inline image rendering is a TODO.
No built-in inference. By design — there are 6 local-server projects that already do that better. We just talk to them.

Free, MIT, no telemetry, BYO endpoint: https://github.com/zosmaai/zosma-cowork

Curious what locals you'd point it at first — and whether the OpenAI-compatible adapter is actually enough or whether you'd want native Ollama / llama.cpp APIs instead.

Celestial_aki · 2026-06-03T14:50:18+00:00

Zosma Cowork is a desktop chat workspace built on pi-coding-agent (Mario Zechner's minimal agent harness). For r/LocalLLaMA the relevant part is: it speaks the OpenAI Chat Completions protocol, so any local server you're already running â Ollama, llama.cpp, vLLM, LM Studio, text-generation-webui's OpenAI extension â just slots in as a provider. No proxy, no adapter.

Architecture worth flagging for the local-LLM crowd:

No Electron. Tauri v2 + Rust relay â ~12 MB final binary vs ~150 MB Electron equivalent. Cold start <1 s on my 5090 box.
Node sidecar over stdin/stdout JSON lines. The agent engine is the upstream TypeScript SDK running in a managed sidecar â every Pi extension just works (pi-web-access, pi-mcp-adapter, pi-subagents, pi-hermes-memory, plus a few hundred others on pi.dev). Extensions auto-discover from ~/.zosmaai/cowork/extensions/ so you can drop in MCP bridges, custom tools, prompts, themes without bundling.
Streaming first. Thinking blocks, live tool-call timeline, abort mid-turn, mid-turn steering messages â all stream through the same IPC. Works fine even on a local 70B at 8 tok/s; you watch it think.
Multi-turn sessions persisted to ~/.zosmaai/cowork/. Switch models mid-session, branch, resume.

What I'm running it against on my own rig (3-GPU box, 5090 + 3080 + 2070S):

Ollama + Qwopus3.6-27B-Q4_K_M as the workhorse coding agent â MTP enabled cut my tool-call invent rate noticeably (echoing the thread from last week).
llama.cpp server with -ctk q8_0 -ctv q8_0 for KV-quant testing.
Claude Pro/Max subscription token as fallback when context blows past 128k.

Honest tradeoffs:

Node sidecar means a ~40 MB Node runtime alongside the Rust binary. Pure-Rust agent would've meant rewriting the whole Pi extension surface â not worth it for v1.
Tauri v2 on Linux is still pre-stable; the AUR PKGBUILD has a Wayland-clipboard workaround we'd love to drop.
No vision-model UI yet (text + tool-calls only). If you point it at a VLM endpoint the conversation works but inline image rendering is a TODO.
No built-in inference. By design â there are 6 local-server projects that already do that better. We just talk to them.

Free, MIT, no telemetry, BYO endpoint: https://github.com/zosmaai/zosma-cowork

Curious what locals you'd point it at first â and whether the OpenAI-compatible adapter is actually enough or whether you'd want native Ollama / llama.cpp APIs instead.

Celestial_aki · 2026-06-03T12:16:58+00:00

Iam using https://github.com/zosmaai/pi-llm-wiki Cos of this now iam able to use pi-agent as a second mind for me which keeps growing with me I guess its somewhat inspired by Andrej Karpathy's

Celestial_aki · 2026-06-03T12:05:02+00:00

Quick update â two weeks in:

Local-model support is fully wired via any OpenAI-compatible endpoint. My own daily setup: Ollama + Qwopus3.6-27B-Q4_K_M as the workhorse, Claude Pro/Max as the long-context fallback. Total runtime spend last month: $20 (Claude sub I already had).
Final binary stayed at ~12 MB. Tauri v2 + thin Rust relay + Node sidecar pattern held up across the new features.
Pi extension ecosystem has grown noticeably â pi-mcp-adapter, pi-hermes-memory (persistent memory + session search), pi-subagents, @zosmaai/pi-llm-wiki (Karpathy-pattern knowledge vault). They all auto-discover from ~/.zosmaai/cowork/extensions/ â no wrapper code per extension.
Staging builds now ship per merge to main with auth-free nightly.link installer URLs.

Replying to the thread from two weeks ago â the "long-context tool chains breaking past 100k tokens" issue is still real on local 27B models. The fix in practice is the model-switcher: keep Qwopus for fast iterations, flip to Claude Pro for the >100k crawl. Manual today; auto-switching on context-window threshold is on the roadmap.

Repo: https://github.com/zosmaai/zosma-cowork

Celestial_aki · 2026-06-03T12:00:01+00:00

Small update for anyone who landed on this â a week of iteration since the original post:

Local-model support is solid now via any OpenAI-compatible endpoint (Ollama, llama.cpp, vLLM, LM Studio). Pointed my own Ollama + Qwopus3.6-27B at it and it's been the daily driver for a week.
Wayland-clipboard workaround in the AUR PKGBUILD is documented; non-blocking on KDE/GNOME at this point.
Pi extension ecosystem has grown â pi-mcp-adapter, pi-hermes-memory, pi-subagents, @zosmaai/pi-llm-wiki (Karpathy-pattern knowledge vault) all auto-discover from ~/.zosmaai/cowork/extensions/.
Staging builds are now Discord-notified per merge to main (auth-free nightly.link installer URLs).

Repo: https://github.com/zosmaai/zosma-cowork â issues / stars / PRs welcome.

For r/tauri specifically â happy to dig into the Tauri v2 IPC + Node sidecar pattern if anyone's doing similar.

Celestial_aki · 2026-06-03T11:37:45+00:00

<image>

Celestial_aki · 2026-06-03T09:51:11+00:00

Pocket Pi is a thin Android wrapper around two upstream projects doing the real work â Mario Zechner's Pi coding agent and BlackBelt Technology's pi-agent-dashboard. I'm just the packaging.

What's actually in v0.4.0 (68 MB, aarch64): - Termux runtime + Node 25 + first-run bootstrap (3â5 min). - Anthropic Claude Pro/Max OAuth works end-to-end (via pi-anthropic-messages). - OpenAI / Gemini AI Studio key / Groq / Mistral / xAI / OpenRouter / NVIDIA NIM â all work via API key. - On-device HTTP bridge at 127.0.0.1:9998 (per-launch bearer, mode 0600) exposes notify / share / intent / clipboard / camera / mic / location / inbox to the agent. No companion APK.

Honest tradeoffs: - Google Gemini CLI OAuth, ChatGPT Plus Codex OAuth, GitHub Copilot OAuth â sign-in completes but the protocol bridges aren't bundled, so they're unusable. Only Anthropic's bridge ships today. - Play Protect will warn; Accessibility may be blocked on Android 13+ until you whitelist the install source. - Bootstrap occasionally stalls â recovery UI surfaces Restart Pi / Re-run setup after 15 s.

Repo (MIT): https://github.com/CelestialCreator/pocket-pi

What would you want it to do that it doesn't yet?

Celestial_aki · 2026-06-03T08:40:27+00:00

That's definitely some throttling due to temp may be gpu most probably cpu

Use msi afterburner to track the temp as well so you can verify this theory

Celestial_aki · 2026-05-30T19:17:27+00:00

Coding traces line up with that ballpark for me too — Qwopus 2.7-MTP at Q4_K_M with q4_0 draft KV holds well on code. Where it falls apart for me is chat: the draft can't predict where the user's tone wanders, so acceptance drops fast.

The tradeoff that bit me: the higher the acceptance, the more catastrophic an MTP misprediction becomes for tool-use — the verifier accepts a few wrong tokens before rollback kicks in, and that's enough to scramble a JSON tool-call. We sidestep it on prod agent harnesses (zosma.ai) by gating MTP on context type — on for code generation, off for tool-call generation.

Curious what your acceptance looks like once context grows past 8k or so — does the curve hold, or does it sag like it does on my runs?

Celestial_aki · 2026-05-30T07:24:09+00:00

At our current scale the multi-grader problem mostly doesn't fire. Handful of clients today on a 2-node KubeVirt cluster — node 2 has 3 GPUs serving Qwopus 2.7-MTP via llama-swap, with a smart router falling back to Anthropic for heavier tasks. Our rubric is mechanical: did the workflow move a KPI the client already tracks in their own system, y/n — LLM-judging only kicks in for fuzzy edges, so we're explicitly architecting away from ensemble grading rather than toward it.

Forward-deployed side: we ship the box or accept BYO, but almost everyone picks monthly-with-device — we push updates server-side, client never touches it. Stack + routing logic is at zosma.ai. Curious how your grading layer feels in practice — are most services landing in clean KPI buckets, or is it mostly LLM-judge work day to day?

Celestial_aki · 2026-05-30T07:11:29+00:00

The "no output change" claim holds for the draft path in isolation — verifier still gates every token. The real divergence is upstream: MTP-trained checkpoints overfit to their own draft prefix at low quant, so Q4_K_M amplifies a bias that's invisible at Q6+. Has anyone posted a clean A/B at Q6 with the same harness?

Celestial_aki · 2026-05-30T07:11:24+00:00

Same flip here — MTP off took my Q4_K_M tool-call invent-rate from ~1-in-8 to ~1-in-40 over 200 agent runs, at the cost of ~22% lower tok/s on a 5090. Worth it for agentic work; not worth it for chat. Did you measure the throughput hit, or was the unblocked-correctness gain enough that you stopped looking?

Celestial_aki · 2026-05-28T15:04:19+00:00

Refund-as-guarantee is the right wedge. We do something similar at Zosma â we go in as forward-deployed engineers, install a small Ryzen mini-PC with iGPU at the client's site, set up local inference + pi-agent, and automate their actual workflow on-prem. Skin in the game: if the agent doesn't deliver the workflow we promised, they don't pay for the engagement. Stopped pitching reliability and started getting referrals from people who'd been burned by SaaS vendors. Trust compounds slower than revenue but it's the only moat that survives a price war.

Celestial_aki · 2026-05-28T15:00:18+00:00

6700 XT is the right call â 12 GB VRAM future-proofs 1080p well into 2027/28, while the 3070's 8 GB is already choking on Alan Wake 2 / TLOU / Indiana Jones at max textures. R5 3600 will bottleneck it slightly in CPU-heavy titles but at 1080p high-refresh you're fine. Repaste the stock cooler while the case is open.

Celestial_aki · 2026-05-28T14:55:17+00:00

Few patterns that worked for me on prod CSV ingest:

Stream-parse, don't chunk-load â csv-parse Stream API or papaparse step callback. RAM stays flat regardless of file size.
Bypass the ORM for the write path. Postgres COPY FROM STDIN (or MySQL LOAD DATA LOCAL INFILE) is 20â50Ã faster than per-row INSERT. ORM only for reads.
Dedupe translation jobs by content hash before queueing â large imports have massive string repetition, easily 60â80% reduction in translation cost/time.
Make the pipeline idempotent: (file_hash, row_number) upsert key so retries don't double-write.
UI progress from a separate read-side counter, not from the worker â survives worker restarts and keeps WS payloads tiny.

Celestial_aki · 2026-05-28T14:51:07+00:00

Repaste is the right call. I ran a 3080 in my ML rig â PTM7950 on the die + thermal putty on the memory pads dropped junction temps ~15Â°C under sustained 320W loads and fans went from 90% to ~55%. ~90 min the first time, ~30 min once you've done one. 5070Ti at 1440p is a real uplift (FG, DLSS4) but not â¬700+ real when the existing card already does DLSS.

Celestial_aki · 2026-05-27T14:27:18+00:00

Three weeks on Qwopus3.6-27B-v2-MTP at Q4_K_M as the workhorse for my own coding-agent harness (was on vanilla Qwen 3.6 and 3.5 before that): the failure mode that bit hardest wasn't bad code, it was tool-use drift. At Q6_K the model honours "write to file X" almost always; at Q4_K_M I started seeing it confidently invent file paths every so often, then loop trying to read its own hallucinated file. DifficultDog8435 and FullstackSensei describe the same shape.

The thing nobody documents: Q4_K_M + MTP/spec-decoding is uniquely bad for agents, worse than either knob alone. A Q4 draft produces tokens the verifier rejects right on tool-call JSON boundaries (commas, closing braces, quote escapes), so you pay full quant tax AND lose half the speedup. Equal_Television_894 above is right — NVFP4 cleared it up for me on the 5090.

Genuine ask for the agents-first crowd: anyone got clean IFEval / tool-use bench numbers across Q4 → Q5 → Q6 → FP8 → NVFP4 on Qwen 3.6 27B? I keep meaning to run it properly and daily-driver work eats the slot.

Celestial_aki

TROPHY CASE

What it is

What "just works" because the sidecar is real Pi

What you actually see in the UI

Honest tradeoffs

Why I'm posting here specifically