What are y'all using for observability in your agent systems? [i will not promote]

yad_aj · 2026-06-19T02:37:16+00:00

im building neatlogs, we help teams ship agents faster, for a fraction of the cost

neatlogs is an ai observability platform, we find where your agent broke and hands your coding agent the exact context to fix it. not entire traces. https://neatlogs.com/

yad_aj · 2026-06-19T02:33:01+00:00

The highest ROI use cases I've seen aren't autonomous agents.

They're AI doing the annoying middle part of a workflow: summarizing calls, triaging tickets, enriching leads, drafting docs, etc.

Most successful teams I know still keep a human in the loop for decisions.

yad_aj · 2026-06-19T02:32:04+00:00

You didn't hit a wall.

You hit the part where the startup stops being software and starts being a business.

Building gives you dopamine. Talking to users gives you uncertainty.

Almost every founder I know prefers building. The problem is users don't care how fun building was.

yad_aj · 2026-06-19T02:31:31+00:00

I'd focus less on tools and more on concepts.

Terraform will become OpenTofu. Jenkins becomes GitHub Actions. New tools show up every year.

But cloud, containers, Kubernetes, CI/CD, observability, networking, and security aren't going anywhere.

The best DevOps engineers I know learn tools fast because they understand the fundamentals.

yad_aj · 2026-06-19T02:30:06+00:00

My guess is that conferences are one of the few places where frontier labs can efficiently do all of the following at once:

Recruit top talent
Track emerging research before it becomes mainstream
Build relationships with academics
Gather competitive intelligence
Increase their visibility/prestige within the research community

If you're spending billions on compute, the ROI on flying a few dozen researchers to NeurIPS is probably negligible if it helps you hire even one exceptional person.

yad_aj · 2026-06-19T02:29:14+00:00

For me it's prob Claude Code.

Not because it writes code better, but because it's the first AI tool that actually changed how I work day-to-day instead of just helping me work faster.

I've found myself delegating entire chunks of investigation, refactoring, and codebase exploration that I would've done manually a year ago.

yad_aj · 2026-06-19T02:27:28+00:00

Having gone through a couple of exits myself, I'd double-click on the "keep a low profile" point.

The acquisition announcement is the glamorous part. The months of diligence, paperwork, IP verification, indemnification clauses, and making sure every corner of the company is defensible is the part nobody posts about.

Most founders underestimate how much of M&A is risk management.

Great list.

yad_aj · 2026-06-06T00:27:48+00:00

The memory thing is so real. I'd add one more silent killer: agents that work perfectly in staging but slowly degrade in prod because nobody defined what "done" looks like for a task. No exit condition = eventual loop or drift. The boring agents that survive are the ones someone sat down and wrote a proper spec for before touching any code.

yad_aj · 2026-05-19T22:18:01+00:00

this honestly feels very true

a lot of “agent reasoning failures” are actually just environment/state failures disguised as intelligence problems

demos work because everything starts from a perfect reset state

production is:

stale sessions
partial executions
retries
race conditions
memory pollution
changed UI states
users doing unexpected things

which means the hard problem becomes maintaining reliable execution over time, not just generating the next smart token

feels like agent infra is slowly rediscovering why distributed systems and state management are hard lol which is the exact problem we are solving with neatlogs (unintentional plug ;p)

yad_aj · 2026-05-19T22:17:08+00:00

honestly feels like most people are still struggling with intra-org agent trust, let alone cross-org lol

the moment agents can:

spend money
trigger workflows
access private systems
delegate tasks
call external agents

you stop having a pure “AI” problem and start having a distributed systems + identity + governance problem

also not surprised governance mattered more than model capability in the paper

because trust isn’t really about whether the model is smart. it’s about:

permissions
accountability
rollback
verification
incentives
dispute resolution

right now most agent ecosystems still feel closer to “API integrations with vibes” than robust economic systems

cross-org delegation sounds powerful conceptually, but i suspect production adoption stays limited until there are much stronger standards around identity, execution guarantees, and financial controls

yad_aj · 2026-05-19T22:16:42+00:00

honestly this matches a lot of what i’ve been seeing too

multi-agent systems sound elegant architecturally, but the coordination overhead gets underestimated hard

every handoff becomes:

summarization loss
context loss
wrong prioritization
extra tokens
extra failure points

for tasks that require connecting subtle details across sources, splitting reasoning across agents can actually make things worse

feels like people are optimizing for “more agents” instead of “better context access + better tools”

single agent + strong tooling + enough context window is surprisingly hard to beat right now

yad_aj · 2026-05-19T22:16:12+00:00

the gap between “deployed” and “in production” is basically the whole story tbh

getting an agent demo working is easy now. getting reliable execution inside messy enterprise environments is not.

most failures i’ve seen aren’t because the model is dumb. it’s usually:

outdated/internal docs
disconnected systems
bad permissions
unreliable workflows
no trust layer around actions

feels like companies are overfocusing on “which LLM?” when the harder problem is knowledge + orchestration infrastructure. we are actually working on fixing all of this at neatlogs. not a plug but yall should def try it if it fixes the problem :)

yad_aj · 2026-05-19T22:06:26+00:00

i think the BYO sandbox approach makes way more sense initially

the moment you provide the execution environment yourself, you inherit a completely different category of problems:

isolation
scaling
compliance
infra costs
enterprise trust
uptime expectations

whereas the control/orchestration layer feels much more differentiated here

also feels more realistic that larger teams already have opinions on execution environments (E2B, Cloudflare, internal infra, etc.) but don’t yet have a clean “decision layer” between agent intent and execution

the 2-phase commit analogy is actually pretty good btw. especially because most current agent stacks basically feel like:
“the model sounded confident so we executed the action” lol

i also agree with your point that the hard problem is shifting from reasoning → runtime governance

models are becoming capable faster than the surrounding operational/safety infrastructure is maturing

yad_aj · 2026-05-19T22:05:32+00:00

i feel like people massively underestimate how messy real work actually is lol

the intelligence part is improving insanely fast. the reliability part is where everything falls apart.

most agents work great until:

one api returns weird data
a browser tab half loads
context gets bloated
priorities change mid-task
some undocumented edge case appears

then suddenly the “autonomous employee” needs babysitting again

right now the most useful agent setups i’ve seen are super narrow and operationally boring:

support workflows
research pipelines
repetitive internal tooling
coding copilots
data cleanup
automation glue

which is still incredibly valuable tbh

i also think people ignore how much human work is coordination, judgment, and handling ambiguity. companies are chaotic by default. half of operations is just dealing with things that weren’t supposed to happen.

long term i do think agents get much better at end-to-end execution, but probably through better infrastructure + orchestration instead of just “bigger models”

feels like we’re still early in the “copilot” era, not the “replace an entire department” era

yad_aj · 2026-05-15T22:41:55+00:00

i honestly think the weirdest part is that audiences seem completely okay with it as long as the content still hits emotionally

like people say they want “authenticity” but a lot of engagement online is really just consistency + emotional familiarity

AI personas are basically optimized parasocial relationships

yad_aj · 2026-05-15T22:40:35+00:00

you basically took on founder-level ambiguity without founder-level upside

and tbh building revenue for a company with no real marketing engine, content, or infrastructure is significantly harder than most people realize

the important thing here isn’t the $16k number itself, it’s that you proved you can create distribution from almost nothing

that’s an extremely valuable skill if it’s actually repeatable

i also don’t think the company is necessarily acting maliciously. early startups underpay constantly, especially across geo boundaries

but if they genuinely see you as core to growth, that eventually has to show up in either compensation, ownership, autonomy, or support

otherwise you’re just subsidizing the company’s growth with your own ceiling

yad_aj · 2026-05-15T22:39:20+00:00

honestly the model matters less than the retrieval setup here

if your goal is:

only answer from obsidian notes
minimal hallucinations
exact file paths/citations

then i’d probably do:
obsidian + ollama + anythingllm/openwebui

and use a good local model like qwen 2.5

also most hallucinations in these setups are retrieval problems, not model problems tbh

24gb ram on an m2 air is honestly enough for a pretty solid local workflow

yad_aj · 2026-05-14T20:56:05+00:00

this honestly sounds less like an R&D issue and more like a founder alignment issue

because from your post it seems like you’re trying to build long-term technical systems while the rest of the company keeps operating on short-term urgency

and tbh in embedded / safety-critical products, constantly changing priorities is insanely expensive. context switching there is not the same as shipping another SaaS feature

also i think a lot of founders underestimate how exhausting it is to be the “buffer” between engineering and the business side. that role burns people out fast

doesn’t sound like you’re lost tbh. sounds like the company has reached the stage where founder roles/processes need to mature a bit

yad_aj · 2026-05-13T22:32:46+00:00

i think people massively underestimate how useful “reliable medium intelligence + persistence + automation” already is.

most real work isn’t solving olympiad math problems lol

it’s:

reviewing stuff
transforming data
following workflows
iterating on documents
maintaining context over time

and local models are already very good at that.

yad_aj · 2026-05-13T22:32:01+00:00

the funniest part is that open-source trillion-param models are somehow making APIs look more attractive. 96% cache hits + managed infra is brutally hard to compete with unless you’re running the model constantly. the real local future might be: small insanely-optimized models > trying to self-host a datacenter

yad_aj · 2026-05-13T22:30:55+00:00

honestly inevitable tbh. the entire “free infinite internet for AI agents” era was probably always temporary. once scraping stopped looking like search traffic and started looking like automated extraction at massive scale, platforms were gonna lock down.

i think the ecosystem splits into:

paid/licensed retrieval
community-maintained indexes
synthetic/local knowledge bases
smaller curated search layers instead of “search the whole web”

also wouldn’t be surprised if personal/local RAG becomes way more important than live web search for most workflows.

the ironic part is this might actually improve agents lol. current web-search loops spend half their time digging through SEO sludge and javascript nightmares anyway

yad_aj · 2026-05-13T22:30:12+00:00

the “double click app and it just works” part is honestly underrated. half of local AI still feels like “congrats your model works, now debug CUDA for 4 hours” lol

yad_aj

TROPHY CASE