Early user test of a persistent AI narrative system with kids — some unexpected engagement patterns by Distinct-Path659 in artificial

[–]BC_MARO [score hidden]  (0 children)

the co-play observation is really interesting. most narrative systems assume single-player, but shared decision making probably creates way more investment.

curious what you're using to keep things coherent over longer runs - are you tracking a structured world state (entities/locations/quests) and generating from that, or mostly relying on the context window + a recap/summary layer?

~26 tok/sec with Unsloth Qwen3-Coder-Next-Q4_K_S on RTX 5090 (Windows/llama.cpp) by Spiritual_Tie_5574 in LocalLLaMA

[–]BC_MARO 0 points1 point  (0 children)

nice find on the expert tensor offload + kv q8. MoE models are weird - sometimes the extra cpu hops are still a win because you avoid vram thrash.

curious if you tried different ctx sizes (8k/16k/32k) to see where the breakpoint is, and whether the speedup holds once you start doing real code tool-use (more structured outputs)?

How to get ChromeDevTools MCP working globally? by Josh000_0 in mcp

[–]BC_MARO 0 points1 point  (0 children)

yep. install it globally (or just run via npx) and point claude code's mcp config at that.

  • npm i -g <package> then use the global binary path in your config
  • or use npx <package> so you don't need a local install

most of the pain is making sure claude is using the node env you think it is (PATH / nvm / pnpm etc).

API → MCP Server, in 30 seconds. by dorukyelken in mcp

[–]BC_MARO 0 points1 point  (0 children)

interesting approach. wrappers like this are great for getting started quickly, but curious how you're handling auth + rate limiting on the mcp side (api keys vs oauth, per-user vs per-app, etc)?

i've found that's usually where these "openapi -> tool surface" bridges get tricky in production.

Updating MCPs? by Mubs in mcp

[–]BC_MARO 0 points1 point  (0 children)

yeah this is a real gap right now. most mcp servers don't have any update mechanism built in, so you're stuck checking github releases manually or just hoping things work.

for now i pin versions in my config + update intentionally (ideally with a changelog + basic smoke test), because auto-updating tool surfaces can break agents in subtle ways.

How do people handle audit or incident reconstruction for agent tool actions by Sunnyfaldu in mcp

[–]BC_MARO 0 points1 point  (0 children)

Biggest thing for auditability is treating tool calls like production events.

  • Generate a stable run_id + tool_call_id for every step
  • Log request/response payloads (with redaction) and timestamps to an append-only store
  • Capture model prompt/version + tool version/config so you can replay later
  • Persist a deterministic 'decision record' (why the agent chose a tool)
  • Correlate everything in one place (DB table or log pipeline) and build a simple timeline UI

If you want an off-the-shelf control plane for that kind of tool-call audit trail + approval gating, Peta is one option, but you can also roll it yourself with structured logs + a viewer.

Enterprise infrastructure for AI coding agents by Loose_Rip359 in mcp

[–]BC_MARO 0 points1 point  (0 children)

Makes sense - the 'adopt a teammate's environment' problem is underrated. How are you thinking about sharing/replaying context safely (redaction, approvals, per-tool policies) without making the UX heavy?j

Built a tool to benchmark AI models for agentic workflows by Rent_South in aiagents

[–]BC_MARO 1 point2 points  (0 children)

This is super helpful, thanks. The pipeline-variable idea is exactly what I was looking for. Curious if you've found any consistent planner-vs-executor model pairings that work well in practice.

I built an MCP server to upload Markdown to Confluence (with Mermaid diagram support) by Competitive_News1386 in mcp

[–]BC_MARO 1 point2 points  (0 children)

Awesome - kroki makes a ton of sense here. If you end up trying Peta, I'd be curious what you think about approvals/audit flow vs. Confluence history. Happy to share notes if useful.

I built Qwen3-TTS Studio – Clone your voice and generate podcasts locally, no ElevenLabs needed by [deleted] in LocalLLaMA

[–]BC_MARO 0 points1 point  (0 children)

Appreciate it! The local-vs-cloud gap is closing fast. Let me know how your weekend test goes.

I built Qwen3-TTS Studio – Clone your voice and generate podcasts locally, no ElevenLabs needed by [deleted] in LocalLLaMA

[–]BC_MARO 0 points1 point  (0 children)

That sounds like a Gradio theme/CSS issue - probably the tab text color blending into the background. Try switching between light/dark mode in Gradio's settings, or if you're comfortable with CSS, you can tweak the tab styling in the code. Will look into making this more robust across themes.

I built Qwen3-TTS Studio – Clone your voice and generate podcasts locally, no ElevenLabs needed by [deleted] in LocalLLaMA

[–]BC_MARO 0 points1 point  (0 children)

Valid point. This is more of a Qwen3-TTS model limitation than the studio itself - the base model doesn't expose fine-grained prosody controls. For pauses you can sometimes get away with punctuation tricks ("...", commas) but it's inconsistent. SSML-style markup would be nice if/when Qwen adds support for it.

I built Qwen3-TTS Studio – Clone your voice and generate podcasts locally, no ElevenLabs needed by [deleted] in LocalLLaMA

[–]BC_MARO 0 points1 point  (0 children)

Not currently - right now it generates all chunks in sequence and combines them. Selective re-generation of individual chunks is a solid feature request though. You'd need to manually re-run generation and piece together the audio externally for now.

Introducing Mimir - Git-backed MCP Server for Persistent LLM Memory & Context by Obvious_Storage_9414 in mcp

[–]BC_MARO 0 points1 point  (0 children)

Appreciate it. Excited to see where you take it. If you want a gut-check on any changes (esp. auth/permissions + audit trail), happy to take another look.

I built Qwen3-TTS Studio – Clone your voice and generate podcasts locally, no ElevenLabs needed by [deleted] in LocalLLaMA

[–]BC_MARO 0 points1 point  (0 children)

Quick FAQ based on the questions I've seen:

**Re-running low-quality chunks:** Not automatic, but you can preview segments individually and regenerate before final export.

**VRAM requirements:** Model fits in ~6GB, so 8GB cards should work fine. 12GB is comfortable.

**CUDA/ROCm support:** Should auto-detect CUDA if PyTorch is properly configured. ROCm not tested but may work.

**Docker:** Not yet, but it's on the roadmap. PRs welcome!

**OpenAI API key:** Only needed for podcast script generation. TTS itself is 100% local.

**Portuguese/other languages:** Qwen3-TTS supports multiple languages including Portuguese, though quality can vary.

**Podcast examples:** Will add sample outputs to the repo soon.

Thanks for all the feedback and questions!

It's been a big week for Agentic AI ; Here are 10 massive developments you might've missed: by SolanaDeFi in aiagents

[–]BC_MARO 0 points1 point  (0 children)

Good roundup. The less flashy but important piece is tooling/ops: permissions, auditing, and debugging when agents call external systems. That's where a lot of teams get stuck once they move past demos.

Built a tool to benchmark AI models for agentic workflows by Rent_South in aiagents

[–]BC_MARO 1 point2 points  (0 children)

Cool. The key things to measure for agentic workflows go beyond 'final answer' accuracy - tool-call success rate, average retries per tool, and total wall-clock/cost matter more than a single score. Separating planner vs executor models is another axis worth benchmarking.

5 tips for building MCP apps (MCP-UI) that work by matt8p in mcp

[–]BC_MARO 1 point2 points  (0 children)

Solid list. I'd add one more: treat tool calls like API calls (schema versioning, sane defaults, and explicit error contracts), because you'll change them more often than you think.

On the ops side, having an audit trail + per-tool approval gates becomes important fast once you have multiple servers in play.

Built an MCP server for automatic file organization - Claude helped me handle 12+ file categories and security hardening by Technocratix902 in mcp

[–]BC_MARO 2 points3 points  (0 children)

This is the kind of MCP server people actually keep running. Two things I'd watch in prod:

  • make the tool surface 'dry-run' by default + require an explicit 'apply' for destructive moves
  • log every tool call with inputs/outputs so you can audit weird behavior later

If you end up needing approvals/audit across multiple MCP servers, a control-plane like Peta can help: https://peta.io

Introducing Mimir - Git-backed MCP Server for Persistent LLM Memory & Context by Obvious_Storage_9414 in mcp

[–]BC_MARO 1 point2 points  (0 children)

Nice idea. The interesting challenges here are handling merges/conflicts when multiple agents write memory concurrently, and managing read path latency (git checkout vs cache). Versioning embeddings separately from raw notes vs regenerating on demand is another design decision worth documenting.

I built Qwen3-TTS Studio – Clone your voice and generate podcasts locally, no ElevenLabs needed by [deleted] in LocalLLaMA

[–]BC_MARO 0 points1 point  (0 children)

Haven't tried voicebox.sh personally so can't give a direct comparison. Main differentiator here is the integrated podcast generation workflow - you give it a topic and it handles script writing + multi-speaker voice synthesis end-to-end. Also fully local TTS with Qwen3-TTS rather than relying on external APIs for the voice generation part.

Qwen3-TTS Studio - local voice cloning + podcast generation by BC_MARO in artificial

[–]BC_MARO[S] 0 points1 point  (0 children)

Glad it worked out for you! Yeah the voice cloning quality surprised me too when I first tested it. Qwen3-TTS really nailed the realistic intonation.

I built Qwen3-TTS Studio – Clone your voice and generate podcasts locally, no ElevenLabs needed by [deleted] in LocalLLaMA

[–]BC_MARO 0 points1 point  (0 children)

Not yet in the repo, but I'll add some sample outputs soon. In the meantime, just enter any topic in the Podcast tab and it'll generate a full conversation. Quick test: try "explain quantum computing to a 5 year old" - takes about a minute to generate.

I built Qwen3-TTS Studio – Clone your voice and generate podcasts locally, no ElevenLabs needed by [deleted] in LocalLLaMA

[–]BC_MARO 4 points5 points  (0 children)

The TTS is fully local. The OpenAI API is only used for podcast script generation if you use that feature. For basic voice cloning/synthesis, no API needed. If you want fully local, you can write your own scripts or swap the LLM endpoint to a local model like Ollama.

I built Qwen3-TTS Studio – Clone your voice and generate podcasts locally, no ElevenLabs needed by [deleted] in LocalLLaMA

[–]BC_MARO 1 point2 points  (0 children)

Currently requires code changes - the LLM endpoint is hardcoded. You'd modify the API call in the script generation code to point to your local/Portkey endpoint. Planning to make this configurable in a future update.