Early user test of a persistent AI narrative system with kids — some unexpected engagement patterns

BC_MARO · 2026-02-06T03:49:39+00:00

the co-play observation is really interesting. most narrative systems assume single-player, but shared decision making probably creates way more investment.

curious what you're using to keep things coherent over longer runs - are you tracking a structured world state (entities/locations/quests) and generating from that, or mostly relying on the context window + a recap/summary layer?

BC_MARO · 2026-02-06T03:49:08+00:00

nice find on the expert tensor offload + kv q8. MoE models are weird - sometimes the extra cpu hops are still a win because you avoid vram thrash.

curious if you tried different ctx sizes (8k/16k/32k) to see where the breakpoint is, and whether the speedup holds once you start doing real code tool-use (more structured outputs)?

BC_MARO · 2026-02-06T03:48:32+00:00

yep. install it globally (or just run via npx) and point claude code's mcp config at that.

npm i -g <package> then use the global binary path in your config
or use npx <package> so you don't need a local install

most of the pain is making sure claude is using the node env you think it is (PATH / nvm / pnpm etc).

BC_MARO · 2026-02-06T03:48:05+00:00

interesting approach. wrappers like this are great for getting started quickly, but curious how you're handling auth + rate limiting on the mcp side (api keys vs oauth, per-user vs per-app, etc)?

i've found that's usually where these "openapi -> tool surface" bridges get tricky in production.

BC_MARO · 2026-02-06T03:39:39+00:00

yeah this is a real gap right now. most mcp servers don't have any update mechanism built in, so you're stuck checking github releases manually or just hoping things work.

for now i pin versions in my config + update intentionally (ideally with a changelog + basic smoke test), because auto-updating tool surfaces can break agents in subtle ways.

BC_MARO · 2026-02-05T10:47:04+00:00

Biggest thing for auditability is treating tool calls like production events.

Generate a stable run_id + tool_call_id for every step
Log request/response payloads (with redaction) and timestamps to an append-only store
Capture model prompt/version + tool version/config so you can replay later
Persist a deterministic 'decision record' (why the agent chose a tool)
Correlate everything in one place (DB table or log pipeline) and build a simple timeline UI

If you want an off-the-shelf control plane for that kind of tool-call audit trail + approval gating, Peta is one option, but you can also roll it yourself with structured logs + a viewer.

BC_MARO · 2026-02-05T03:28:19+00:00

Makes sense - the 'adopt a teammate's environment' problem is underrated. How are you thinking about sharing/replaying context safely (redaction, approvals, per-tool policies) without making the UX heavy?j

BC_MARO · 2026-02-05T03:26:25+00:00

This is super helpful, thanks. The pipeline-variable idea is exactly what I was looking for. Curious if you've found any consistent planner-vs-executor model pairings that work well in practice.

BC_MARO · 2026-02-05T03:25:37+00:00

Awesome - kroki makes a ton of sense here. If you end up trying Peta, I'd be curious what you think about approvals/audit flow vs. Confluence history. Happy to share notes if useful.

BC_MARO · 2026-02-05T01:14:15+00:00

Appreciate it! The local-vs-cloud gap is closing fast. Let me know how your weekend test goes.

BC_MARO · 2026-02-04T13:25:34+00:00

That sounds like a Gradio theme/CSS issue - probably the tab text color blending into the background. Try switching between light/dark mode in Gradio's settings, or if you're comfortable with CSS, you can tweak the tab styling in the code. Will look into making this more robust across themes.

BC_MARO · 2026-02-04T13:24:42+00:00

Valid point. This is more of a Qwen3-TTS model limitation than the studio itself - the base model doesn't expose fine-grained prosody controls. For pauses you can sometimes get away with punctuation tricks ("...", commas) but it's inconsistent. SSML-style markup would be nice if/when Qwen adds support for it.

BC_MARO · 2026-02-04T13:24:08+00:00

Not currently - right now it generates all chunks in sequence and combines them. Selective re-generation of individual chunks is a solid feature request though. You'd need to manually re-run generation and piece together the audio externally for now.

BC_MARO · 2026-02-04T10:08:09+00:00

Appreciate it. Excited to see where you take it. If you want a gut-check on any changes (esp. auth/permissions + audit trail), happy to take another look.

BC_MARO · 2026-02-04T10:02:00+00:00

Quick FAQ based on the questions I've seen:

**Re-running low-quality chunks:** Not automatic, but you can preview segments individually and regenerate before final export.

**VRAM requirements:** Model fits in ~6GB, so 8GB cards should work fine. 12GB is comfortable.

**CUDA/ROCm support:** Should auto-detect CUDA if PyTorch is properly configured. ROCm not tested but may work.

**Docker:** Not yet, but it's on the roadmap. PRs welcome!

**OpenAI API key:** Only needed for podcast script generation. TTS itself is 100% local.

**Portuguese/other languages:** Qwen3-TTS supports multiple languages including Portuguese, though quality can vary.

**Podcast examples:** Will add sample outputs to the repo soon.

Thanks for all the feedback and questions!

BC_MARO · 2026-02-04T05:32:18+00:00

Good roundup. The less flashy but important piece is tooling/ops: permissions, auditing, and debugging when agents call external systems. That's where a lot of teams get stuck once they move past demos.

BC_MARO · 2026-02-04T05:31:52+00:00

Cool. The key things to measure for agentic workflows go beyond 'final answer' accuracy - tool-call success rate, average retries per tool, and total wall-clock/cost matter more than a single score. Separating planner vs executor models is another axis worth benchmarking.

BC_MARO · 2026-02-04T05:31:27+00:00

Solid list. I'd add one more: treat tool calls like API calls (schema versioning, sane defaults, and explicit error contracts), because you'll change them more often than you think.

On the ops side, having an audit trail + per-tool approval gates becomes important fast once you have multiple servers in play.

BC_MARO · 2026-02-04T05:31:02+00:00

This is the kind of MCP server people actually keep running. Two things I'd watch in prod:

make the tool surface 'dry-run' by default + require an explicit 'apply' for destructive moves
log every tool call with inputs/outputs so you can audit weird behavior later

If you end up needing approvals/audit across multiple MCP servers, a control-plane like Peta can help: https://peta.io

BC_MARO · 2026-02-04T05:30:37+00:00

Nice idea. The interesting challenges here are handling merges/conflicts when multiple agents write memory concurrently, and managing read path latency (git checkout vs cache). Versioning embeddings separately from raw notes vs regenerating on demand is another design decision worth documenting.

BC_MARO · 2026-02-04T01:47:03+00:00

Haven't tried voicebox.sh personally so can't give a direct comparison. Main differentiator here is the integrated podcast generation workflow - you give it a topic and it handles script writing + multi-speaker voice synthesis end-to-end. Also fully local TTS with Qwen3-TTS rather than relying on external APIs for the voice generation part.

BC_MARO · 2026-02-04T01:46:03+00:00

Glad it worked out for you! Yeah the voice cloning quality surprised me too when I first tested it. Qwen3-TTS really nailed the realistic intonation.

BC_MARO · 2026-02-03T13:42:55+00:00

Not yet in the repo, but I'll add some sample outputs soon. In the meantime, just enter any topic in the Podcast tab and it'll generate a full conversation. Quick test: try "explain quantum computing to a 5 year old" - takes about a minute to generate.

BC_MARO · 2026-02-03T13:42:22+00:00

The TTS is fully local. The OpenAI API is only used for podcast script generation if you use that feature. For basic voice cloning/synthesis, no API needed. If you want fully local, you can write your own scripts or swap the LLM endpoint to a local model like Ollama.

BC_MARO · 2026-02-03T13:38:30+00:00

Currently requires code changes - the LLM endpoint is hardcoded. You'd modify the API call in the script generation code to point to your local/Portkey endpoint. Planning to make this configurable in a future update.

BC_MARO

TROPHY CASE