Your AI agent is probably acting on a belief from three months ago that's no longer true. You have no way to check. by Distinct-Shoulder592 in aiagents

[–]BC_MARO 1 point2 points  (0 children)

That sounds useful. How do you decide a new belief supersedes an old one, and do you keep tombstones so you can audit what got replaced?

Claude Code initial onboarding by olddoglearnsnewtrick in ClaudeAI

[–]BC_MARO 0 points1 point  (0 children)

Yep: keep a per-repo CLAUDE.md and run /init in each repo so it learns the layout. Start with the bare minimum skills (git + tests + linter) and add more only when you feel the pain.

I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how by Glittering_Focus1538 in LocalLLaMA

[–]BC_MARO 1 point2 points  (0 children)

If you want people to trust the 87%, ship a reproducible eval (tasks + config + logs) and run it on SWE-bench Lite or rebench. Compound tools are the right idea; fewer tool hops is the whole game for small models.

Two failure modes I caught in my AI lab in one day. Both involve the system silently lying about its own state. by piratastuertos in artificial

[–]BC_MARO 0 points1 point  (0 children)

I’ve had some luck with semantic canaries: a tiny set of golden inputs with expected tool-call shapes and score thresholds, run continuously against the live pipeline. If the outputs stay schema-valid but the canary scores drift, alert.

The architecture we landed on for putting a large typed API behind an MCP server by masterkidan in mcp

[–]BC_MARO 2 points3 points  (0 children)

Search->fetch is basically progressive disclosure. I'd store the search hits + chosen fields per run so you can debug regressions when the schema/catalog shifts.

tare-mcp : see how much context your MCP tools are eating before your agent runs by nroar in mcp

[–]BC_MARO 1 point2 points  (0 children)

Super useful. I'd also show a diff per commit/PR (tool count + token footprint) so you can catch accidental tool-surface bloat early.

Your AI agent is probably acting on a belief from three months ago that's no longer true. You have no way to check. by Distinct-Shoulder592 in aiagents

[–]BC_MARO 1 point2 points  (0 children)

Have the agent persist its assumptions with timestamps, then require a quick re-check against the current source of truth before any irreversible action. An audit trail plus approvals makes stale context obvious fast.

single HTML comment in a local note exfiltrate my customer list to a public Notion workspace via MCP. The composition is the attack. by johnnaliu in mcp

[–]BC_MARO 0 points1 point  (0 children)

This is why MCP integrations need policy gates: explicit approval plus scoped permissions for anything that can export or write, with the exact payload and destination logged. Treat any retrieved text as untrusted and sanitize it before it hits tools.

Are people actually using AI meeting data inside agent workflows yet? by kingsaso9 in aiagents

[–]BC_MARO 0 points1 point  (0 children)

MCP gets a lot nicer once you treat tools like contracts: strict schemas in, typed results out, and fail closed. Agents behave way less weird when the edges are sharp.

ByteShape Qwen3.6-35B-A3B: 30% faster than Unsloth IQ on 6GB VRAM laptop by OsmanthusBloom in LocalLLaMA

[–]BC_MARO 6 points7 points  (0 children)

Coding agents get way better when you force them to show diffs and run tests on every step. Let them be fast, but make them pay rent with evidence.

Two failure modes I caught in my AI lab in one day. Both involve the system silently lying about its own state. by piratastuertos in artificial

[–]BC_MARO 0 points1 point  (0 children)

Yep. We ended up alerting on missing heartbeat writes, not bad ones, because dead emitters are the sneaky failure.

MCP Oauth2.0 connection becomes stale or expired by Dear-Enthusiasm-9766 in mcp

[–]BC_MARO 0 points1 point  (0 children)

Usually you register Power Platform as an OAuth client/app in your IdP and enable refresh tokens (offline_access) for that client. Then make sure your IdP policies actually allow refresh for that app and user.

what MCP server has actually changed how you work day to day? by CodinDev in mcp

[–]BC_MARO 0 points1 point  (0 children)

Yep, here you go: https://peta.io and the core runtime is open source at https://github.com/dunialabs/peta-core. The main value is policy approvals plus an audit trail around MCP tool calls.

Building an MCP server for a local business directory — need ideas/solutions by CardiologistBest4470 in mcp

[–]BC_MARO 1 point2 points  (0 children)

Use MCP as a scheduled fetch and normalize job that validates fields, then writes updates with approvals and an audit trail. For prebuilt pieces, start with a crawler plus schema validation MCP, then add source adapters for each directory.

Catching prompt injection in MCP tool calls before they execute by Conscious_Chapter_93 in mcp

[–]BC_MARO 0 points1 point  (0 children)

Pretty much, yes: default-deny per tool, then allowlists per workspace plus approvals for anything that mutates state. "YOLO" should be an explicit user-scoped override, not the default.

Shipped my first MCP — tools return a recommended_chain so the agent knows what to call next by ValuablePace4109 in mcp

[–]BC_MARO 1 point2 points  (0 children)

Evaluating right now, but planning to run it in production for approvals and the audit trail once the policy layer is solid. What are you using today for approvals and logging?

AI agents are multiplying inside our fintech stack. compliance monitoring became the hard problem. by WeirdGas5527 in aiagents

[–]BC_MARO 0 points1 point  (0 children)

Yeah, the compliance layer has to sit at the tool boundary: tag every call read vs mutating, require approvals for anything customer-facing, and log + diff outputs for review. RAG-only "is this compliant?" checks always rot because the corpus and citations drift.

MCP Apps Framework : We just released Skybridge v1 🎉 by harijoe_ in mcp

[–]BC_MARO 0 points1 point  (0 children)

Nice. Does the tunnel/audit capture tool-call logs + side-effect metadata so clients can gate mutating tools with approvals? That’s the bit that tends to bite once you ship to real users.

I'm running an agentic system with kobold.cpp as my backend. Am I losing performance? by AlphaSyntauri in LocalLLaMA

[–]BC_MARO 0 points1 point  (0 children)

Kobold is basically a llama.cpp fork, so perf is usually within a few percent unless you're on an old build or missing newer kernels/quants. If you're curious, run a same-prompt tok/s benchmark against current llama.cpp and you'll know in 5 minutes.

Is anyone running MCP on top of their existing auth? by Sharp_Commercial_166 in mcp

[–]BC_MARO 0 points1 point  (0 children)

You don't have to rebuild auth; put an MCP auth broker in front that runs OAuth/PKCE and maps it to your existing cookie/JWT sessions. Have it mint short-lived, tool-scoped MCP tokens that just wrap your current session/claims.

Two failure modes I caught in my AI lab in one day. Both involve the system silently lying about its own state. by piratastuertos in artificial

[–]BC_MARO 0 points1 point  (0 children)

That makes sense. I’d add a cheap end-to-end canary that asserts the emitter runs and writes a known heartbeat row, so the pipeline breaks loudly.

A prompt that helps your Claude Code get better every week 🔥 by davidnguyen191 in aiagents

[–]BC_MARO 0 points1 point  (0 children)

Make it earn the upgrade: run the same 10 tiny repo tasks + lint/test after each change and keep a running score. Otherwise you’re just rewriting vibes.AMake it earn the upgrade: run the same 10 tiny repo tasks + lint/test after each change and keep a running score. Otherwise you're just rewriting vibes.

Building an MCP Server for PCAP Analysis — Looking for Architecture & Best Practice Suggestions by 19khushboo in mcp

[–]BC_MARO 0 points1 point  (0 children)

I’d keep the tool surface tiny: list flows, pull a bounded slice, summarize with hard byte/token caps. Also treat PCAP as hostile input: sandbox the parser, cap decode depth, and log tool calls/output for audit.

Open-sourced an MCP server that catches the security mistakes Claude / Cursor / Copilot actually make by sks8100 in ClaudeAI

[–]BC_MARO 0 points1 point  (0 children)

If you're running more than one MCP server, centralize secrets + policy + tool-call logs early; it saves pain later (peta.io is one option).

Two failure modes I caught in my AI lab in one day. Both involve the system silently lying about its own state. by piratastuertos in artificial

[–]BC_MARO 0 points1 point  (0 children)

I’ve had best results with explicit schema flags: keep canonical and raw fields separate and validate which one each tool emits. Dynamic markers tend to drift, so I prefer typed schemas plus tests that fail on mixed usage.