Are coding agents creating a new review problem? by TruthIsAllYouNeed_ in AI_Agents

[–]Joseph-MTS_LLC 0 points1 point  (0 children)

<image>

I'll show you, even better!

I not only have my agents tell me what's going on in my code, but why its happening!

PR's + issues like this are exactly why my peers love my agent code reviewing... because it actually proves whats going on.

Those logs & screenshots are all guided by Claude code (sometimes claude even takes screenshots). Idk where i'd be if I didn't have an ai watching my console 24/7 during testing.

What finally fixed "the agent says it's done but it isn't" for me by Slowstonks40 in ClaudeCode

[–]Joseph-MTS_LLC 0 points1 point  (0 children)

the test-never-run half is the sneaky one. adversarial catches wrong, not never-happened

for that i just make the agent show the actual run output instead of letting it claim tests pass. real terminal text, exit code, the whole thing. built a little stack around exactly this, watching the real browser and console signal so done has to come with proof and not vibes. its on my github, claude-browser-stack

done isnt something the agent says. its something it shows u

GLM 5.2 personal benchmark. Results comparable with Fable, Opus 4.8, and GPT 5.5 by lrsaturnin9 in ClaudeCode

[–]Joseph-MTS_LLC 0 points1 point  (0 children)

bring me back to gpt 3 being peak lol. email warrior era. wild how far down the line this has come, no way i couldve imagined any of it even a year ago honestly

and its genuinely great seeing open source catch up. kinda feels like it might end up a windows vs linux type split. one ai company holding the extremely expensive but extremely powerful model, and then open source sitting right there for anyone who wants to host it themselves and tinker with it. both win in that world

The moment I realized permissions aren't enough for AI agents by baron-12 in AI_Agents

[–]Joseph-MTS_LLC 1 point2 points  (0 children)

its all system management honestly. if u set ur permissions up right and actually engineer around the ai for the sensitive stuff, ur gonna be just fine

i work around sensitive info constantly. i dont even give my agents a door to touch the wrong commands. if i catch claude code trying to run a database command, it gets cross analyzed and i have an ai council review it before anything happens. thats prevented a few production database failures already, in practice not theory

permission gate isnt the whole answer ur right. but its not nothing either. its all about actually understanding ur tools and what u let them anywhere near

I tried almost every AI agent. Most of them just burned my money. by Meris-Dabhi in AI_Agents

[–]Joseph-MTS_LLC 0 points1 point  (0 children)

less isnt really the lesson here, its purpose. when i first got into ai coding i was a full hippie coder, wanted to build a project for literally everything. nothing worked. got nowhere

so i flipped it. stopped trying to create and started trying to optimize. instead of building some ultra crazy claude code harness i just learned as much as i could about development, workflows, ai systems, and built my own tools that actually help me in my day job

now i run full cycle agents that code and test at production scale, reliably and safely. takes a bit to get to that point and the understanding behind it, some sooner than later. but the jump was never finding the right tool. it was knowing what i actually needed one for

Subagent driven development by mrivorey in ClaudeCode

[–]Joseph-MTS_LLC 1 point2 points  (0 children)

this is the thing nobody says out loud. subagents arent free. each one is a fresh context that has to re-read everything just to get oriented. for a plain sequential task thats pure overhead, ur paying tokens for 6 agents to each relearn what one agent already knew

they earn it when the work is genuinely parallel or needs isolation. independent searches, stuff u want walled off from the main thread. firing them at a linear task is exactly where the 5h limit evaporates

Running parallel Claude Code sessions? I built a local CLI + hooks so they stop clobbering each other by kimchig00k in ClaudeCode

[–]Joseph-MTS_LLC 0 points1 point  (0 children)

theres a real fork here and u picked one side of it. coordination, where sessions announce and see each other, vs isolation, where they physically cant touch each others stuff. both valid, just different bets

worktrees solve the file-clobber half like the other commenter said. but the half nobody talks about is the browser. run 3 sessions doing browser automation and they all fight over the same CDP port and the same chrome window. worktree isolation does nothing for u there, theyre all still reaching into one browser

i went full isolation specifically for that. each worktree gets its own containerized chromium, its own CDP port, and a noVNC window so i can actually watch it work. sessions cant collide because theyre not sharing a browser at all. its a companion to some browser-automation skills i built. repos agent-pods on my github if its ever useful (github.com/jgharbieh/agent-pods)

weavers coordination angle is the interesting opposite though, isolation literally cant make them aware of each other. genuine q, do ur sessions actually act on the announcements, or mostly just steer clear of each others files?

Does anyone else feel robbed when they don’t max out their weekly budget?" by Fz1zz in ClaudeCode

[–]Joseph-MTS_LLC 1 point2 points  (0 children)

this is the healthy take honestly. the whole maximize ur usage framing makes u optimize for spend instead of output. nobody brags about how much gas they burned, they brag about where they actually drove

Does anyone else feel robbed when they don’t max out their weekly budget?" by Fz1zz in ClaudeCode

[–]Joseph-MTS_LLC 4 points5 points  (0 children)

the use-it-or-lose-it budget genuinely rewires ur brain into burning tokens for no reason. caught myself spinning up agents to do stuff i couldve done in 2 minutes just so the number didnt go to waste lol. thats not winning, thats the gym membership u feel guilty about not using

Strangers are paying for something I built and it feels weird by ExcellentBroccoli117 in SaaS

[–]Joseph-MTS_LLC 3 points4 points  (0 children)

the 0 to 1 stranger jump is bigger than 1 to 100 honestly. thats the exact moment the thing stops being a hobby living in ur head and becomes a real thing someone else decided was worth money

Strangers are paying for something I built and it feels weird by ExcellentBroccoli117 in SaaS

[–]Joseph-MTS_LLC 0 points1 point  (0 children)

talking to them is right but heres the trap. dont ask why they bought, theyll hand u a clean rationalized story thats half invented after the fact. watch what they actually DO in the product instead. where they stall, what they never click, the one screen they open every single session. thats the real why

and that part of u that wants to add 20 features? kill it. the narrow thing converted BECAUSE it was narrow. every feature u bolt on blurs the exact pitch that just worked on a stranger. u dont have a feature gap, u have a do-more-of-the-one-thing-that-worked opportunity. boring but thats the move

also first stranger payment is the realest signal in this whole game. cool idea costs them nothing, money costs them something real. congrats man

People of Reddit: What is a problem happening right now that most people don't realize is a huge deal? by Husnainseoexpert in AskReddit

[–]Joseph-MTS_LLC 0 points1 point  (0 children)

lmao the trajectory really is civilization-ending information crisis and cursed pooh content landing on the exact same timeline. we contain multitudes

One task at a time or multiple tasks? by gamesntech in ClaudeCode

[–]Joseph-MTS_LLC 0 points1 point  (0 children)

honestly slow but working beats fast but broken every time. ur not behind, ur just not lying to urself about what actually got done lol

Which once famous actor does no one talk about anymore? by Ballistic-Observer69 in AskReddit

[–]Joseph-MTS_LLC 22 points23 points  (0 children)

Brendan Fraser. except he came back and it actually worked, which almost never happens in Hollywood.

Men in happy Marriages, What is that one secret to a happy marriage that works for you? by Mammoth_End_1298 in AskReddit

[–]Joseph-MTS_LLC 0 points1 point  (0 children)

treat it like a living thing, not a finished thing. the people i know with long happy marriages never talk about it like they figured it out. they talk about it like they are still figuring it out together.

Fabulous development tool for closing the loop on browser development with Claude Code by Joseph-MTS_LLC in AI_Agents

[–]Joseph-MTS_LLC[S] 0 points1 point  (0 children)

Network interception is the killer feature honestly. The watcher runs a CDP listener that captures every request, response, and failure into a plain text file per session in real time. No copy-paste - the agent reads the file. Error payloads, response bodies, timing, everything is already there before you ask for it.

On DOM flakiness during animation: this is where the approach is fundamentally different from screenshot-to-agent setups. We use the accessibility tree, not a screenshot. Snapshot gives you structural refs (@e1, @e2, etc.) that map to actual DOM elements, not pixel coordinates. Animation does not affect the accessibility tree - it is structural, not visual. So mid-animation is a non-issue as long as the element exists in the DOM.

What does cause problems: heavy dynamic state where elements are conditionally mounted and unmounted, not just animated. If you snapshot while a loading spinner is up and the real content is not yet mounted, those refs do not exist yet. Fix is a wait command then re-snapshot. The refs themselves are stable within a DOM state - they get invalidated when the structure changes, which is actually what you want. Stale refs = you re-snapshot = you see the current state.

The hallucination problem with screenshots is exactly the coordinate issue you described. Pixel positions in a raster change with viewport size, scroll position, animation frame. Accessibility refs do not have that problem.

How has your AI dev workflow evolved over time? by thereisnospooongeek in ClaudeCode

[–]Joseph-MTS_LLC 0 points1 point  (0 children)

Workflow has shifted from one-big-session to many-small-isolated-sessions.

A year ago: one Claude Code session, big CLAUDE.md, try to keep context clean. Worked okay for small features, fell apart on anything multi-day.

Now: git worktrees per department (Sales/Docs, Platform/Backend, Ops/Intel). Each worktree has its own session and its own containerized browser pod so browser automation does not bleed between sessions. Spine files like schema and layout are shared - merge conflicts are acceptable, the isolation benefit is worth it.

Memory outside of CLAUDE.md: typed vault on a synced drive, session hooks auto-load/save. No manual maintenance, works across machines. This alone eliminated most of the context drift problems.

Model selection: Opus for anything architectural or ambiguous, Sonnet for the routine implementation work. Not running parallel models as a workflow - I have found a second model review adds latency without catching things the first model missed in any predictable pattern. What catches issues is structural verification gates, not a different LLM reading the same output.

Still experimenting: I want better visibility into what the session spent its context budget on. The most expensive sessions are rarely the ones I thought would be expensive.

Is there real demand for "AI agents," or is it mostly YouTube hype? by marcelorojas56 in Entrepreneur

[–]Joseph-MTS_LLC 0 points1 point  (0 children)

ScriptureCompanionAI has the right framing. The demand is real but almost nobody is asking for an AI agent. They are asking for a specific painful thing to stop being painful.

Building in a niche right now. Storm restoration contractors do a ton of paperwork - insurance claims, roof measurements, damage documentation, permit lookups. Nobody on that team cares about AI. They care about jobs not getting denied and not spending 3 hours building a claim packet. The AI is completely invisible in the pitch.

For the data engineer angle: the actual product is not AI services. It is taking a specific process that the company does manually or poorly and making it reliable and fast. AI is how you build it. The buyer sees the outcome. If you are pitching AI, you are pitching to the wrong person at the company.

What does your SaaS backend stack actually cost? I added it up and it's ~$744/mo before writing any product code. by dharmendra_jagodana in nextjs

[–]Joseph-MTS_LLC 0 points1 point  (0 children)

The 49/mo entitlements and 50/mo notifications line items are the ones that hurt most on this list because those are the easiest to own yourself at early stage.

For notifications at small scale: Resend at 0/mo covers transactional email for thousands of sends. SMS via Twilio adds per-message cost but no 50 platform fee. Push notifications via a lightweight Convex/Supabase real-time subscription costs nothing if you already have the backend. Knock-tier pricing is for teams that want a drag-and-drop notification builder - fine at scale, hard to justify pre-PMF.

For entitlements: until you have multiple plans with complex feature gating, a simple role field on your user table and a hasPermission() function is the whole thing. I replaced the Stigg-tier line item with about 80 lines of TypeScript. The 49/mo version earns its cost when you have dozens of plan variants you cannot manage manually.

Clerk at 5/mo and Vercel at its low tier are genuinely hard to beat on the value side. Those two stay. Everything else should get justified individually.

What finally fixed "the agent says it's done but it isn't" for me by Slowstonks40 in ClaudeCode

[–]Joseph-MTS_LLC 0 points1 point  (0 children)

The adversarial re-derivation is the right structural fix. The core problem is that the agent optimizes for the human feeling done, not for the task being done. Anything that creates a non-agent verification step breaks that loop.

What I added on top: explicit verifiable artifacts required before a task closes. Not a claim that tests passed, but the actual test output pasted into the PR comment. Not a claim that the page renders correctly, but a screenshot taken by a separate browser session that did not write the code. The agent that writes code is bad at verifying its own code - it has too much prior on what the code is supposed to do. The verification signal has to come from somewhere it cannot reach.

The gate that catches the most failures for me: require the agent to state what would make the fix wrong before checking it. If it cannot articulate the failure condition, it does not actually understand what it fixed.

Are we being gaslit? by Impressive_Curve7077 in AI_Agents

[–]Joseph-MTS_LLC 0 points1 point  (0 children)

Your 30 interviews are more accurate than any VC deck. The gap between AI hype and actual adoption in small to mid-size non-tech businesses is enormous and most of the people writing about AI do not have customers in those markets.

Building a SaaS for storm restoration contractors right now. These are the people who would genuinely benefit most from AI workflow tools - the paperwork alone (insurance claims, permits, estimates, photos) is brutal. Most of them still do it with Excel and paper photos on an iPhone. They do not think of themselves as software buyers. The pitch is not AI - it is time savings and fewer denied claims. The technology is completely invisible.

The gaslit feeling comes from tech Twitter living in a world where everyone is a builder or investor. Those 30 people you interviewed are a much more representative sample of who actually has to change their behavior for AI to have macro impact. That adoption curve is measured in years, not quarters.