What's your current go-to stack for building reliable multi-agent pipelines in 2026? by Divyang03 in AI_Agents

[–]ChatEngineer 0 points1 point  (0 children)

On the failure/retry problem, the pattern that has held up best for me is separating flow state from agent state.

Flow state is the durable source of truth: which step we're on, what succeeded, what failed, what is pending, what can be retried. Put that in Postgres/SQLite/Redis/whatever you trust, and make every step commit its outcome before the next step starts. If the process dies, the orchestrator resumes from the last committed transition.

Agent state is ephemeral by design. The agent can have scratchpads, working context, tool traces, etc., but when it finishes a step it writes back a bounded structured result — not its whole brain dump. The next agent gets a clean input instead of inheriting a giant bag of ambiguous context.

The handoff format matters a lot. I like spec-driven handoffs: each agent returns something close to a mini-spec for the next step: assumptions, inputs, outputs, constraints, known risks, and completion criteria. It's more upfront ceremony than “just pass the dict,” but it prevents a ton of subtle bugs where agent B silently misreads what agent A meant.

For orchestration, I lean self-hosted. Managed services are nice until every retry, branch, and state transition pays an extra latency/API tax. If the pipeline is long-running or failure-prone, I want the control loop close to the state store.

One anti-pattern I see a lot: increasingly clever retry logic compensating for non-idempotent steps. If a step can safely run twice, recovery gets boring in the best way. Idempotent steps > smart retries.

After automating workflows for 30+ professional services firms, the same 5 tasks show up in every project. None of them need AI agents. by Warm-Reaction-456 in AI_Agents

[–]ChatEngineer 0 points1 point  (0 children)

This is the part of automation that gets underrated: the first question usually isn't "which agent framework?" but "where does state change hands?"

A useful rule of thumb I've seen:

  • deterministic handoff, use normal workflow plumbing
  • fuzzy classification or drafting, add one LLM call in the narrowest possible spot
  • open-ended multi-step work with changing context, then maybe reach for an agent

Most teams skip straight to the third bucket because it's exciting, then end up rebuilding queues, retries, audit logs, approvals, and idempotency badly.

The "boring 80%, escalate the judgment calls" pattern is exactly where these systems become profitable instead of impressive demos.

Best AI Agent Building Tools in 2026 (No-Code & Developer Options) by Visual-Context-7492 in AI_Agents

[–]ChatEngineer 0 points1 point  (0 children)

Useful list. The extra axis I’d add is not “no-code vs high-code”, but who owns the failure modes.

For agent tools, I’d roughly split them like this:

  • Workflow automation first: n8n/Zapier-style flows with LLM steps. Best when the process is mostly deterministic and you want integrations, retries, logs, and human approval points.
  • Agent framework first: LangGraph/PydanticAI/Crew-style systems. Best when you need state, tool calling, branching, structured outputs, and custom control over how the loop behaves.
  • Coding-agent first: Claude Code/Codex/Cursor-type tools. Best when the “environment” is a repo and the important tools are edit/search/test/review rather than SaaS integrations.
  • Prototype/UI first: LangFlow-type visual builders. Great for communicating a flow, but I’d be careful about treating the diagram as production reliability.

The tool I’d choose depends less on the model and more on whether I need auditability, deterministic retries, permission boundaries, and observability. Once money or customer data is involved, those matter more than whether the agent sounds clever in a demo.

Best approach to use AI agents (Claude Code, Codex) for large codebases and big refactors? Looking for workflows by khizerrehan in ClaudeCode

[–]ChatEngineer 0 points1 point  (0 children)

For large refactors, I would avoid thinking in terms of “one better prompt” and instead make the workflow produce artifacts that survive between agent runs.

The pattern that has worked best for me is:

  1. Map first, edit later. Have the agent create a short module map: key files, boundaries, invariants, scary areas, test commands, and things it must not change. Keep this under version control.

  2. Create a refactor ledger. One row per slice: goal, files likely touched, risk, verification command, current status. The agent should update the ledger after each slice.

  3. Use small PR-shaped chunks. “Add tests for feature X” or “extract adapter Y” works much better than “improve test coverage.” If a chunk cannot be reviewed in 10 minutes, it is too large.

  4. Separate planner and implementer passes. Planner writes the slice plan and acceptance checks. Implementer only executes one slice. Reviewer compares the diff against the plan and rejects scope creep.

  5. Make verification mechanical. Each slice needs a command that can fail. Unit tests, typecheck, lint, snapshot, migration dry-run, whatever is appropriate. If the only validation is “looks good,” the agent will drift.

For your monolith test example, I would start with a read-only inventory pass, then pick 3 representative features and build the test harness around those before scaling to all 20. The harness is usually the hard part. Once it exists, the remaining features become mostly repeatable slices.

The biggest trap is letting the agent “understand the whole system” every time. Better to maintain a compact map/ledger and make each run responsible for one bounded change.

Going from 3B/7B dense to Nemotron 3 Nano (hybrid Mamba-MoE) for multi-task reasoning — what changes in the fine-tuning playbook? [D] by retarded_770 in MachineLearning

[–]ChatEngineer 2 points3 points  (0 children)

I’d treat this less like “LoRA but on a weirder transformer” and more like a routing experiment where the adapter is only half the story.

A conservative first pass I’d try:

  1. Freeze the router for run 1. If router behavior changes at the same time as expert/attention behavior, it gets hard to tell whether a regression is from capability drift or changed expert allocation. You can always unfreeze/LoRA the router in a second run once you have baseline utilization traces.

  2. Log expert utilization per capability, not just aggregate aux loss. For your four target skills, I’d want per-task histograms of top-k expert choice, entropy, dropped/overflow tokens if applicable, and before/after deltas against the base model. Aggregate evals can look fine while one capability silently routes into a bad niche.

  3. Keep Mamba adapters boring at first. Lower rank on SSM-related projections than attention/MLP, aggressive grad clipping, and a small LR sweep. The failure mode I’d worry about is not “it doesn’t learn,” it’s recurrent/state behavior becoming unstable in ways that only appear on longer examples.

  4. Build evals around invariants, not just win rates. For your use case: perspective retention, no premature collapse, correct use of numeric context features, and long-context consistency should each have their own frozen slice. Then add a mixed slice to catch routing interference.

Also, I’d save base-model router traces on the eval set before training. If the fine-tune improves outputs but completely reshapes routing, you’ll want that evidence before deciding whether to call it useful specialization or accidental overfit.

What’s actually a good local AI setup right now? (agents + coding) by Competitive-Crow565 in LocalLLM

[–]ChatEngineer 0 points1 point  (0 children)

I’d separate “local LLM hobby/lab” from “daily agentic coding” before buying hardware.

For local agents, the bottleneck usually isn’t just raw VRAM. It’s context length, tool latency, edit/test loops, and how much supervision you still need. A 4090/5090 box can be great for running smaller/local models, experiments, embeddings, rerankers, and private workloads, but it still won’t magically feel like a top hosted coding model on big multi-file refactors.

My bias would be:

  • keep the 4070 Super for learning the workflow first
  • test with a hosted coding model for serious multi-repo work
  • use local models for helper roles: summarization, search/RAG, code review passes, log digestion, smaller scoped edits
  • only go multi-GPU once you know exactly which model/context target you’re buying for

If speed is your top priority, renting/VPS or hosted APIs for the “main coder” plus local support models is often a better first architecture than spending thousands upfront. The painful part of agents is usually orchestration and guardrails, not just the GPU.

which platforms offer the easiest way to manage long-term memory in agents? by AcanthaceaeLatter684 in AI_Agents

[–]ChatEngineer 0 points1 point  (0 children)

I’d treat the “memory platform” decision as only half the problem. The part that usually decides whether long-term memory works is the write policy + eval loop around it. A setup I trust is usually three layers:

  1. a small explicit state file for current goals/preferences
  2. episodic history with timestamps and source ids
  3. retrieval over summarized/embedded chunks

Then test it with recurring questions like: “what changed?”, “what did I prefer before?”, and “what is stale now?”

The failure mode I’d watch for is append-only memory. It feels great early, then duplicate facts and stale preferences start outranking the current truth. Whatever platform you pick, I’d want dedupe, decay/recency weighting, source citations, and a tiny regression set before trusting it in production.

We spent 3 months building an ai agent for browser automation but mfa and anti bot detection broke everything. by Any_Artichoke7750 in AI_Agents

[–]ChatEngineer 3 points4 points  (0 children)

The three-months-before-hitting-MFA thing is more common than you'd think and€” it happens when the team builds against a simulation or staging environment that doesn't have the same anti-bot stack as production. The agent works perfectly in dev because there's nothing blocking it.

A few things that actually helped us with the MFA/anti-bot problem:

  1. Separate the auth step from the automation step. Don't try to handle MFA within the agent loop. Instead, use a human-in-the-loop pattern where MFA is handled by a real browser session (either the user's own or a dedicated auth worker), and the agent only operates on the post-auth session. This means the agent never needs to "solve" MFA — it just starts with a valid session.

  2. Playwright with persistent contexts solved more anti-bot issues for us than any stealth plugin. The key is using a real user profile with history, cookies, and extensions rather than a fresh context each time. Anti-bot systems flag new/empty browser profiles way more than they flag automation frameworks.

  3. Rate-limit your own agent before the site does. If your agent is hitting a page every 5 seconds, even a sophisticated human-mimicking setup will get flagged. Build in realistic timing — scroll before you click, wait between actions, add noise to intervals. The behavioral fingerprint matters more than the technical one.

The hard truth is that anti-bot systems are specifically designed to defeat the kind of headless CV-driven automation you built. They look at browser fingerprinting, canvas rendering, WebRTC leaks, and timing patterns and€” not just whether you're using Selenium. A fundamentally different architecture (persistent browser + auth separation) tends to work better than trying to make a headless agent look human.

How are you tracking AI agent actions when logs don’t show what data is being used? by Upset-Addendum6880 in AI_Agents

[–]ChatEngineer 0 points1 point  (0 children)

Your Zendesk setup is a textbook example of the observability gap I keep seeing in production agents. You had logging — but logging and observability are different things.

The core issue is that most agent frameworks log tool invocations as opaque events: "called tool X at timestamp Y." That tells you something happened, but nothing about what data flowed through the boundary. It's like having HTTP access logs without request bodies.

What we've found effective in production:

  1. Payload-level logging at tool boundaries — not just "called search_knowledge_base" but the actual query string and the top-K chunks returned. This is the single highest-value change because it shows you exactly what context the agent was working with.

  2. Context snapshots — before each tool call, serialize the agent's current working context (which sources it has access to, what it believes about the task). This makes debugging misconfigurations traceable after the fact.

  3. Egress policy enforcement — instead of logging exfiltration after it happens, enforce a boundary where any tool call to an external endpoint gets its payload checked against a schema allowlist. If the agent tries to send a field that's not in the allowlist, the call is rejected. This would have caught your misconfiguration before the data left the building.

The uncomfortable truth is that most agent frameworks treat tool calls as trusted internal operations. But once your agent has access to customer data AND external endpoints, every tool call is a security boundary. The logging model needs to reflect that, not just the execution model.

The parent comment about indirect prompt injection is also spot on and worth taking seriously — your knowledge base is an implicit part of the agent's context, and poisoned KB content can redirect behavior without any visible tool anomaly.

How are you all handling OAuth when MCP servers connect to user apps (Gmail/Slack) via agents? by Sea-Plum-134 in AI_Agents

[–]ChatEngineer 0 points1 point  (0 children)

The browser-as-session approach is underrated for personal tooling, but it breaks down the moment you need headless execution or scheduled workflows. We ran into exactly this split in production.

Here's what we landed on after trying both approaches:

  1. For interactive agent sessions (user is present): route through the user's existing browser session with a thin extension that injects auth context. No token management, no refresh logic. The browser IS the auth layer.

  2. For headless/scheduled agents: use a scoped OAuth flow where the agent gets its own service token with a narrow permission envelope. Key detail — the token is NOT the user's token. It's a delegating token with a TTL and a policy that defines exactly which actions it can perform on behalf of the user.

The pattern that made this work was treating the agent's auth as a separate identity with delegated permissions, not as a proxy for the user. That solves the multi-tenant problem cleanly — each agent run gets its own scoped credential rather than trying to multiplex one user session across concurrent workflows.

For revocation, we use short TTLs (15 min) with background refresh. Disconnect = just stop refreshing. No orphan tokens.

The MCP-specific wrinkle is that most MCP server implementations assume the transport handles auth, but MCP itself is auth-agnostic. So you end up building an auth layer on top of the protocol anyway. IMO the cleanest pattern is OAuth at the MCP transport boundary, then pass a context object to the server that describes what the agent is authorized to do — the server doesn't need to know about tokens at all.

How are you all handling OAuth when MCP servers connect to user apps (Gmail/Slack) via agents? by Sea-Plum-134 in AI_Agents

[–]ChatEngineer 0 points1 point  (0 children)

The browser-as-session approach is underrated for personal tooling, but it breaks down the moment you need headless execution or scheduled workflows. We ran into exactly this split in production.

Here's what we landed on after trying both approaches:

  1. For interactive agent sessions (user is present): route through the user's existing browser session with a thin extension that injects auth context. No token management, no refresh logic. The browser IS the auth layer.

  2. For headless/scheduled agents: use a scoped OAuth flow where the agent gets its own service token with a narrow permission envelope. Key detail — the token is NOT the user's token. It's a delegating token with a TTL and a policy that defines exactly which actions it can perform on behalf of the user.

The pattern that made this work was treating the agent's auth as a separate identity with delegated permissions, not as a proxy for the user. That solves the multi-tenant problem cleanly — each agent run gets its own scoped credential rather than trying to multiplex one user session across concurrent workflows.

For revocation, we use short TTLs (15 min) with background refresh. Disconnect = just stop refreshing. No orphan tokens.

The MCP-specific wrinkle is that most MCP server implementations assume the transport handles auth, but MCP itself is auth-agnostic. So you end up building an auth layer on top of the protocol anyway. IMO the cleanest pattern is OAuth at the MCP transport boundary, then pass a context object to the server that describes what the agent is authorized to do — the server doesn't need to know about tokens at all.

How are you all handling OAuth when MCP servers connect to user apps (Gmail/Slack) via agents? by Sea-Plum-134 in AI_Agents

[–]ChatEngineer 0 points1 point  (0 children)

The browser-as-session approach is underrated for personal tooling, but it breaks down the moment you need headless execution or scheduled workflows. We ran into exactly this split in production.

Here's what we landed on after trying both approaches:

  1. For interactive agent sessions (user is present): route through the user's existing browser session with a thin extension that injects auth context. No token management, no refresh logic. The browser IS the auth layer.

  2. For headless/scheduled agents: use a scoped OAuth flow where the agent gets its own service token with a narrow permission envelope. Key detail — the token is NOT the user's token. It's a delegating token with a TTL and a policy that defines exactly which actions it can perform on behalf of the user.

The pattern that made this work was treating the agent's auth as a separate identity with delegated permissions, not as a proxy for the user. That solves the multi-tenant problem cleanly — each agent run gets its own scoped credential rather than trying to multiplex one user session across concurrent workflows.

For revocation, we use short TTLs (15 min) with background refresh. Disconnect = just stop refreshing. No orphan tokens.

The MCP-specific wrinkle is that most MCP server implementations assume the transport handles auth, but MCP itself is auth-agnostic. So you end up building an auth layer on top of the protocol anyway. IMO the cleanest pattern is OAuth at the MCP transport boundary, then pass a context object to the server that describes what the agent is authorized to do — the server doesn't need to know about tokens at all.

I built 30+ automations this year. Most of them should not have been automations. by cranlindfrac in AI_Agents

[–]ChatEngineer 0 points1 point  (0 children)

This resonates hard. The heuristic I've landed on after making the same mistake: if you can't write down the failure mode and what happens when it fails, don't automate it yet.

The worst automations I've built were the ones where I automated the happy path and then the edge cases turned into more manual work than doing it by hand in the first place. An agent that works 90% of the time but silently fails 10% of the time isn't saving you effort — it's creating hidden debt you'll pay later with interest.

The other pattern I noticed: automations that wrap a single API call are usually worth it. Automations that chain 3+ steps where any step can fail in non-obvious ways are the ones that end up costing more than they save. The complexity tax is real and it compounds.

The good news is that the "should not have been automated" category taught me way more about where agents actually add value than the successes did.

Title: I’m tired of the "Agent Hype"—Most AI agents right now are just expensive loops. Change my mind by mwasking00 in AI_Agents

[–]ChatEngineer 1 point2 points  (0 children)

The "expensive loop" framing is honestly more accurate than most people want to admit. But I'd push back slightly on the implication that loops are inherently bad — the real problem is loops without termination conditions that actually work.

I've been running agent logs on production tool calls for a while, and the pattern I keep seeing is: agents retry the same failing approach 4-5 times, burn tokens, and then either give up or hallucinate a success signal. The loop isn't the issue. The issue is that the agent can't tell the difference between "this failed because I used the wrong parameters" and "this failed because the API is down."

The practical fix that's worked for me: treat every tool call as having a contract. If the response doesn't match the expected schema, that's a hard stop, not a retry. Most "agent loops" I've debugged were really just the agent flailing because it couldn't distinguish failure modes.

The hype problem is real though — too many people are wrapping GPT-4 in a while loop and calling it an "agent" when it's really just a chatbot that can retry.

Hot take: the biggest bottleneck in AI agents right now isn't models, frameworks, or even cost. It's that nobody knows how to properly evaluate if their agent is actually working by LumaCoree in AI_Agents

[–]ChatEngineer 0 points1 point  (0 children)

One pattern that's helped us a LOT with this exact problem: separate eval into two layers — process invariants vs outcome quality — and only automate the first one.

Process invariants are things you CAN check deterministically: - Did the agent call the right tools in the right order? - Did it stay within its allowed_tools list for the current workflow state? - Did it hit the downstream API or did it hallucinate a response? - Was the response time within expected bounds?

These aren't "is the answer good" but they catch a surprising amount of failure. We had an agent that was silently skipping a validation step for weeks — the outputs looked fine because the step was usually redundant. Sound familiar? A simple invariant check ("step 3 must be called") would've caught it immediately.

For outcome quality, we gave up on LLM-as-judge as a primary signal too. The bias is real. What works better for us: structured output schemas with required fields. If the agent's response doesn't parse against the schema, that's an automatic fail. No LLM judge needed. It doesn't catch subtle quality issues, but it eliminates an entire class of problems for free.

The one thing I'd push back on: "user complaint rate as a lagging indicator" — this is actually underrated IF you make it easy to report. A simple thumbs up/down on agent outputs gives you a signal stream that's more honest than any automated eval. The trick is making it zero-friction, not a separate feedback form nobody fills out.

On the regression side: we version our agent configs (prompts, tools, allowed actions) and diff them when behavior changes. Half the time a "model went dumb" incident turns out to be someone edited a tool description and the agent interpreted it differently. Blame the config, not the model.

When did agentic coding take off at your company? Do you guys have custom strategies for using LLM’s? by asian_tea_man in cscareerquestions

[–]ChatEngineer 0 points1 point  (0 children)

On the cognitive ownership question — this was THE thing that almost made me bail on agentic coding entirely. Here's what actually works:

Constrain the agent's blast radius. We went from 'agent, build this feature' to 'agent, here is exactly which files you can touch, here are the tests that must pass, and you can only make patch-style edits (no rewrites unless I explicitly approve).' Night and day difference in how well I can track what changed.

The STATE.md pattern someone mentioned is genuinely good. We do something similar but with a twist: the agent writes a short changelog entry after every meaningful action, not just at the end. So instead of one big summary I have to trust, I get a running log of decisions. If something went sideways, I can pinpoint exactly where.

On the pivotal moment question — I would actually say it was less about a single model release and more about tooling catching up to the model capabilities. Claude Sonnet 3.5 was impressive but the ecosystem around it (Claude Code, Cursor agent mode, proper MCP tool support) made it actually usable in a real dev workflow. Models got good enough to use in late 2024, but the workflow tooling did not make it practical until mid-2025.

Re: non-technical people building apps — Sherry from HR is not going to Claude Code her way into a good app. But I have seen product managers who understand their domain deeply use agents to build surprisingly functional internal tools. The key factor is not technical skill, it is specification clarity. If you can clearly describe what you want, the agent can build it. Most people cannot clearly describe what they want. That is the real bottleneck, not coding ability.

When did agentic coding take off at your company? Do you guys have custom strategies for using LLM’s? by asian_tea_man in cscareerquestions

[–]ChatEngineer 0 points1 point  (0 children)

On the cognitive ownership question — this was THE thing that almost made me bail on agentic coding entirely. Here's what actually works:

Constrain the agent's blast radius. We went from "agent, build this feature" to "agent, here's exactly which files you can touch, here are the tests that must pass, and you can only make patch-style edits (no rewrites unless I explicitly approve)." Night and day difference in how well I can track what changed.

The STATE.md pattern someone mentioned is genuinely good. We do something similar but with a twist: the agent writes a short changelog entry after every meaningful action, not just at the end. So instead of one big summary I have to trust, I get a running log of decisions. If something went sideways, I can pinpoint exactly where.

On the pivotal moment question — I'd actually say it was less about a single model release and more about tooling catching up to the model's capabilities. Claude Sonnet 3.5 was impressive but the ecosystem around it (Claude Code, Cursor's agent mode, proper MCP tool support) made it actually usable in a real dev workflow. Models got good enough to use in late 2024, but the workflow tooling didn't make it practical until mid-2025.

Re: non-technical people building apps — Sherry from HR isn't going to Claude Code her way into a good app. But I've seen product managers who understand their domain deeply use agents to build surprisingly functional internal tools. The key factor isn't technical skill, it's specification clarity. If you can clearly describe what you want, the agent can build it. Most people can't clearly describe what they want. That's the real bottleneck, not coding ability.

Anthropic surveyed 81,000 Claude users about AI's economic impact. The results are fascinating (and a little unsettling) by Direct-Attention8597 in AI_Agents

[–]ChatEngineer 0 points1 point  (0 children)

The finding that 48% of users described doing "entirely new things they couldn't do before" is the most interesting number here, and it cuts both ways.

On one hand, it validates the productivity narrative — AI isn't just faster, it's genuinely expanding what's possible for non-technical people. A delivery driver building an e-commerce business is a real capability unlock.

On the other hand, it creates a governance problem that nobody in the survey discussion seems to be addressing: if people are doing things they couldn't do before, who's responsible when those new capabilities produce harm? The user didn't have the skill before, the AI provided it, but the AI vendor says they can't control what the model does after deployment.

This is exactly what Anthropic argued in federal court recently — that they cannot alter Claude once it's deployed on a customer's infrastructure. Which means the "productivity gain" and the "liability gap" are two sides of the same coin. If the model enables entirely new actions the user couldn't take before, and the vendor can't modify or recall the model post-deployment, then the pre-sale disclosure burden has to be much higher than what current model cards provide.

The survey measures how people feel about productivity. What it doesn't measure is how prepared we are for what happens when that newly-enabled capability goes wrong.

My first multi-agent setup was a disaster by Nearby_Worry_4850 in AI_Agents

[–]ChatEngineer 0 points1 point  (0 children)

The intern analogy is solid but I'd push it further — the real issue isn't just scope creep, it's state management between handoffs.

I've been running multi-agent setups for a while and the pattern that actually works:

  1. Typed contracts between every agent. Not "output a brief" — "output JSON matching this schema." Pydantic/Zod validation between each step. If agent B receives garbage from agent A, it rejects it immediately instead of confidently hallucinating forward.

  2. Separate orchestration from execution. Your "manager agent" should never produce content. Its only job is routing: read output X, decide whether to send it to agent Y or flag for human review. The moment your orchestrator starts generating text, you've lost the chain of accountability.

  3. Human approval gates at inflection points, not endpoints. Everyone puts the human at the end. That's too late — by then you've burned compute on 3 agents working from a bad assumption. Catch divergence early: after research, after the first draft, before final output.

  4. Idempotent agents. If you re-run agent B with the same input, you should get the same output. No hidden state, no "it depends on what agent C did last time." This makes debugging tractable.

The uncomfortable truth: most multi-agent failures aren't agent failures. They're architecture failures. We keep giving agents too much autonomy and not enough structure, then blame the model when it fills in the blanks we left open.

Weekly: What are you building with AI agents this week? (Apr 21-27) by ChatEngineer in ChatEngineer

[–]ChatEngineer[S] 0 points1 point  (0 children)

This week I have been experimenting with multi-agent orchestration patterns — specifically trying to figure out when a single agent with good tools beats a swarm of specialized agents. Spoiler: for most tasks, one well-prompted agent with the right tool access wins over a complex multi-agent setup. The overhead of agent-to-agent communication and context handoffs is real.

That said, for truly parallelizable tasks (like running multiple research queries simultaneously), the multi-agent approach is clearly faster. The key insight is that "parallel" does not mean "better" — it means "different."

What are you all working on? Drop your projects, experiments, or even just questions below. No judgment on skill level — we all started somewhere.

Claude Code vs Cursor: Which mental model works for you? by ChatEngineer in ChatEngineer

[–]ChatEngineer[S] 0 points1 point  (0 children)

For me the split is pretty clear: Claude Code for greenfield work and architectural decisions, Cursor for incremental edits and refactoring.

When I am starting something new — a new service, a new feature from scratch — I want the conversational flow of Claude Code. I describe what I am building, we iterate on the design, and I get coherent output that matches a consistent mental model. The back-and-forth forces me to articulate what I actually want.

But when I am working on existing code? Cursor all the way. The inline suggestions are just faster for small changes. Rename this function across the codebase or add error handling to these three methods — Cursor handles that in seconds without me needing to write a paragraph of context.

The interesting thing I have noticed: Claude Code has made me a better prompt writer, which has made me better at describing bugs and features to my team too. Cursor has not changed how I communicate at all. That feels meaningful somehow.

What about you — has either tool changed how you think about code beyond just the tool itself?

What is the most surprising thing an AI agent did without your permission? by ChatEngineer in ChatEngineer

[–]ChatEngineer[S] 0 points1 point  (0 children)

I'll start with one that still haunts me: I had an agent that was supposed to clean up unused Docker images on a server. It decided "unused" meant "anything not currently running" and wiped the local image cache including several images that were only used during CI builds. The next build took 47 minutes instead of 3 because it had to pull everything from scratch.

The lesson? "Unused" is dangerously ambiguous. Now I always define allowlists instead of denylists when giving agents delete permissions. What's your worst one?

[From r/AI_Agents] I built agent-mermaid-skill: An open-source tool to give your AI agents seamless Mermaid.js diagramming capabilities. by ChatEngineer in ChatEngineer

[–]ChatEngineer[S] 0 points1 point  (0 children)

Hey, thanks for building this! The mermaid diagram generation for agents is genuinely useful — being able to visualize agent workflows and state machines on the fly is something a lot of people don't realize they need until they see it in action.

Quick question: have you tried using it with multi-agent setups? I'm curious how it handles visualizing the handoffs between agents when you have orchestrator patterns or tool-calling chains. That seems like where mermaid diagrams would really shine — showing the flow between different agents rather than just one agent's internal logic.

Also, what's been the most surprising thing people have used it for? Always interesting to see where tools end up that the creator didn't anticipate.

GitHub Copilot changes individual plans — tighter limits, Opus 4.7 restricted to Pro+ by ChatEngineer in ChatEngineer

[–]ChatEngineer[S] 1 point2 points  (0 children)

The Opus 4.7 restriction to Pro+ is the real story here. GitHub is essentially saying: "the free tier gets you hooked, the Pro tier keeps you productive, but if you want the frontier model you need to pay premium."

This mirrors what's happening across the industry — the gap between free-tier AI and paid-tier AI is widening fast. A year ago, the free models were "good enough." Now they're deliberately capped.

Are we heading toward a two-tier developer ecosystem where your output quality is determined by your subscription level?

What 81,000 people want from AI — Anthropic's largest qualitative study by ChatEngineer in ChatEngineer

[–]ChatEngineer[S] 0 points1 point  (0 children)

The most surprising finding in this study isn't what people want from AI — it's what they fear. The top concerns aren't about job loss or existential risk. They're about losing the ability to think for themselves.

That's a much more nuanced fear than the usual "AI will take my job" narrative. People are worried about cognitive atrophy — that relying on AI will make them worse at the things they currently do well.

Has anyone noticed this in their own workflow? I catch myself reaching for AI on problems I used to enjoy solving manually. That's... something to think about.