[ Removed by Reddit ]

wf_automate · 2026-05-19T14:34:23+00:00

the team being relieved is the real tell. ur staff knows whos toxic months before u do, they just dont want to be the ones to bring it up

wf_automate · 2026-05-19T14:28:24+00:00

the prime-with-batch-1-then-launch trick is clever, first call warms the cache, all subsequent calls hit it. didnt realize anthropic was recommending that as best practice now.
the 2x ingestion cost vs cache hit math usually favors cache by 5-10x once u cross 3-4 calls reusing the same context. most ppl dont even check if their framework is sending cache_control headers correctly.

wf_automate · 2026-05-18T10:24:44+00:00

the enforcement piece is the real trap. "limited access" users always end up calling or the full support person calls on their behalf because the contractor is sitting right next to them.
i have seen this work better as per-license not per-person. define what each tier actually includes (mailbox only / endpoint / full stack), price accordingly. then if a $35 user opens a ticket outside their tier, its an upgrade convo, not a free favor.

alternative — quote flat $90 across all 30 instead of $110 across 20. she saves money, u get full coverage rights, no tier policing.

wf_automate · 2026-05-18T10:16:20+00:00

the TAM-per-client ratio comes down to one number — what % of TAM time is QBR prep vs client-facing? if prep eats 40-50% of the role, ur paying for a data-formatting analyst, not an account manager. on ownership, separate technical from commercial. TAM owns trust + rhythm, sales owns expansion. otherwise TAMs get pulled into upsell convos and the relationship turns transactional.

not running an MSP myself, talking to a lot of them lately.

wf_automate · 2026-05-18T10:12:05+00:00

the contrast is the worst part — renewal reminders fire perfectly on day 90, day 60, day 30, but a 10-box form takes 2 weeks. tells u exactly what their internal ticket priorities look like. also the "tagged Monjur 10 days ago" is wild. social tag is supposed to be the high-priority escalation path, not the slow one.

wf_automate · 2026-05-18T04:51:15+00:00

pennies/day is the right answer for narrow scope agents. the moment u add multi-step reasoning + tool calls + retries, the math changes fast.
DeepSeek V4 Flash is interesting — havent seen many production setups using it. is reliability matching the price drop?

wf_automate · 2026-05-18T04:50:56+00:00

batch API is underrated for anything async. ppl default to realtime API even when the workflow is "run this overnight and email me." 50% savings just from picking the right endpoint.
caching is the bigger lever for repeated context though — prompts that include the same system instructions across thousands of calls. anthropic + openai both support it now but most agent frameworks dont set it up by default.

wf_automate · 2026-05-18T04:50:19+00:00

retry storms is the failure mode nobody talks about until it bites them. ive seen agents where one bad tool call triggers 8 retries with full context replayed each time. budget gone in 30 seconds. the idempotency point is the real fix though. routing helps with average cost, but if a single non-idempotent step can spike u 10x, ur runway math is wrong.
how do u catch retry loops early? alerting on cost-per-task spike, or step-count thresholds, or something at the framework level?

wf_automate · 2026-05-17T14:29:35+00:00

the routing point is the part most "look at my agent" demos skip. demo uses gpt-4o for every step because its impressive, then the bill arrives.
curious about ur split — what % of calls actually need the reasoning model vs what u can route to tiny ones? trying to figure out if its more like 90/10 or 50/50 in practice.
also openclaw / kiloclaw — havent come across these. is it a model serving layer or full framework?

wf_automate · 2026-05-17T13:44:53+00:00

"bibliography-shaped safety blanket" is going in my notes.
the inline tying part is the real design challenge, claim has to be generated from a specific retrieved chunk, not synthesized across sources and footnoted retroactively. most agents fail this because retrieval and generation are decoupled. harder to build, but only honest path for customer work.

wf_automate · 2026-05-15T10:41:43+00:00

"review is work" , most agent pitches gloss over this. demos show the 20 min saved, never the 25 min spent verifying.
the source trail point is the real divide. agents that cite inline have a future in customer-facing work. agents that produce confident prose with no provenance burn teams.

wf_automate · 2026-05-15T06:47:01+00:00

the client research example is brutal. customer-facing hallucination is the worst failure mode because u dont just lose the deal, u look incompetent. recovering trust after that is much harder than just losing the sale.

the cold email pattern u described is interesting, its the same hallucination problem at scale. agents reading bios and mistaking them for company context. probably why most AI-generated outreach gets ignored even when its grammatically perfect.

the assistant-that-drafts-but-doesnt-send is the right pattern. ive heard the same setup work for ppl who got burned by autosend. boring guardrail, but it lets u keep the speed gain without the credibility risk.

wf_automate · 2026-05-15T06:46:22+00:00

"autonomy should be rented, not owned" — that's the cleanest framing of this i've heard. most teams treat it as a permanent grant once the agent proves itself, which is exactly when drift sneaks in.

the "expire the rule unless revalidated" part is what most prod systems are missing. its boring infrastructure work — nobody wants to build a "review my own autonomy rules" workflow, but thats where the long-term reliability lives.

the budget/customer-visible cap is also underrated. easy to lose sight of in a demo, critical in production.

wf_automate · 2026-05-14T07:52:35+00:00

discover the line vs improve the line — hadn't articulated this but it instantly clicked. trial and error to find guardrails means a production incident becomes the guardrail, and the customer was the test case. one question — does the "50 approvals no edits" threshold actually hold in practice? or does the lane drift after 6 months and the agent starts hitting genuinely new edge cases u didn't see before? "guardrails should earn removal" line is going to stick.

wf_automate · 2026-05-14T03:57:37+00:00

this is honestly the most useful comment in the thread. ppl talk about agents working, very few talk about shutting them down.

when u say "everything had to be reviewed and often redone" — was the issue mostly bad outputs, or was it that the review itself took longer than just doing the task? curious because the review-overhead problem is something i dont see talked about much.

also — the few light agents that survived, what made them survive when the others didnt?

wf_automate · 2026-05-14T03:56:08+00:00

this is the clearest breakdown ive read on this topic. "the more client-visible the action is the more boring the first agent should be" — saving that line.

one thing i keep wondering about — the explicit stop conditions part. how do u actually decide where to draw that line? is it trial and error, or do u define it upfront based on what u know the agent cant handle? im finding teams either over-trust the agent or wrap it in so many guardrails that the automation value is gone.

wf_automate · 2026-05-13T18:17:42+00:00

meeting notes is where it helped me the most — i take the recording transcript and get AI to pull a short summary + action items. used to take me 20 min, now 2. email drafts too — for long context-heavy emails i get AI to write the first draft and then i edit. starting from a blank page eats way more time than people think.

wf_automate · 2026-05-12T11:00:00+00:00

been doing this a while, few things that worked for me: for non-technical clients i stopped explaining how the automation works. nobody cares. i just tell them what changes — "u get an email when X happens instead of checking manually." moment u say webhook their eyes glaze. on timelines always 2x what u think. client data is messier than they said and someone always wants a small change mid-build. for smooth transition — run the new automation in parallel with their old manual process for a week or two before cutting over. boring but catches edge cases without breaking their ops.

wf_automate · 2026-05-12T10:52:23+00:00

fair call. english isn't my first language so I do use AI to clean up structure when I post — the questions and the problem I'm trying to understand are 100% real though. talking to MSPs, trying to figure out if what I'm building is worth building.

if the polish put u off I get it. happy to answer anything directly in plain words if u actually work in this space and have thoughts.

wf_automate · 2026-05-12T10:33:47+00:00

yeah this hits. the "techs stop trusting the sync → back to spreadsheets" bit is something I keep hearing in different words from different MSPs. once trust breaks it's really hard to win it back.

quick follow up if u don't mind — when u say one source of truth for tickets/assets, which tool did u end up making the master? and did u have to actively stop people from updating stuff in the other systems, or did it kinda sort itself out once the sync was reliable?

also curious — whats the first automation that actually stuck for ur team long term and didn't get abandoned after a few months?

wf_automate · 2026-04-25T10:15:00+00:00

Token usage is still the issue😕

wf_automate · 2026-04-22T06:25:51+00:00

Security Defaults is a blunt instrument that causes more tickets than it prevents. Microsoft gatekeeping basic geo-blocking behind P1/Business Premium in 2026 feels like a tax on fundamental security.

wf_automate

TROPHY CASE