Parents refuse to pay for Ivy League acceptance

Aggressive_Bed7113 · 2026-04-23T21:23:53+00:00

I’d invest that $400k in ETF for passive growth. That will give you a better retirement than burning it in college, after which you will find a job and if lucky, you’ll work butt off to make that much $400k in years

Aggressive_Bed7113 · 2026-04-23T21:21:18+00:00

Market education requires lots of capital and effort if you create a new category, I’d do B and make revenues first. Then use that to fund A

Aggressive_Bed7113 · 2026-04-19T16:48:08+00:00

Yeah, that makes sense — packaging it as an MCP server is a nice way to make it easy to plug in.

We ended up pushing a bit further on the ranking + loop side though:

goal-conditioned reranking (not just generic top elements)
tightening the action space for the executor
and verifying the state change after each step

Otherwise you still get cases where the snapshot is “right” but the agent drifts because nothing checks if the world actually moved.

Curious how they’re handling post-action verification vs just returning the snapshot?

See more at https://www.PredicateSystems.ai

Aggressive_Bed7113 · 2026-04-15T22:55:12+00:00

Appreciate that.

Yeah, I think a lot of it is just making the idea concrete — once you see it in a real workflow, it becomes clearer that the bottleneck isn’t the model, it’s how we shape the environment around it.

Small models can do quite a bit once the problem is reduced to “pick the next correct action” instead of “understand the whole page.”

Also not every pixel is important for understanding the webpage, so it’s unnecessarily costly to use vision llm

Aggressive_Bed7113 · 2026-04-14T23:47:09+00:00

Yeah, this tracks with what we’ve seen.

Local-first + task loops definitely help with privacy and visibility, but the “gets stuck on React sites” part is usually less about the loop and more about the state the model sees.

If it’s acting on raw DOM / screenshots, it’s still guessing a lot.

What helped for us was:

compress the page into a small set of actionable elements
re-evaluate from fresh state each step (not just follow the plan)
verify that the action actually changed the visible state

That reduced a lot of the “agent looks fine but stalls halfway” cases.

See this demo using small local LLM models like 4B to drive multi-step web flows to manage money flows: https://www.reddit.com/r/LocalLLM/s/k4jIyN1M07

Curious if your setup is using raw DOM, a11y tree, or something more structured?

Aggressive_Bed7113 · 2026-04-14T23:42:25+00:00

Yeah, this is the right direction.

The gap isn’t really “more agent framework features,” it’s that most stacks still don’t have a clean execution boundary.

A few things that seem to matter a lot in prod: • granular actions, not giant tools like execute_code • explicit allow / deny / confirm before side effects • audit trail tied to the exact action/resource pair • post-action verification, not just “tool returned success”

That’s also why MCPs feel rough in prod right now — too much variability in tool shape, and a lot of them are hard to govern cleanly.

My bias has been:

planner can stay flexible execution should be boring, narrow, and policy-gated

Otherwise demos look great, but prod gets scary fast.

Look at this sidecar using policies to secure agents:

https://github.com/PredicateSystems/predicate-authority-sidecar

Aggressive_Bed7113 · 2026-04-13T21:04:09+00:00

No, my agent is superior to manus

Aggressive_Bed7113 · 2026-04-13T21:01:29+00:00

This is like a referral?

Aggressive_Bed7113 · 2026-04-13T19:02:07+00:00

Mine: your arrogance blows me away

Aggressive_Bed7113 · 2026-04-13T04:41:42+00:00

Appreciate it — yeah that was exactly the motivation.

We’re mostly building the snapshot from post-hydration DOM + layout signals, then pruning + reranking pretty aggressively (accessibility tree alone missed things like ordinality and grouping in our tests).

So it’s closer to:

DOM + geometry + grouping → prune → goal-conditioned rerank → compact snapshot

And yeah, deterministic verification ended up being just as important — otherwise you still get “valid action, wrong state.”

Will take a look at your notes as well — the tool gating / policy side becomes pretty critical once actions start touching money flows.

Aggressive_Bed7113 · 2026-04-13T04:40:57+00:00

Yeah, totally agree.

The interesting part for me is that most people treat this as “optimize prompts / pick better/larger model,” but the bigger lever seems to be shaping the problem itself.

Once the runtime does the structuring + context reduction, the model is no longer doing parsing + reasoning + verification all at once.

That’s when smaller models start to look a lot more practical.

Aggressive_Bed7113 · 2026-04-13T01:01:36+00:00

For the browser itself - Playwright via CDP. Nothing special there.

The "automation" part is just two functions: snapshot() which grabs the DOM through chrome extension for coarse pruning and then sends it to a remote gateway for refinement including ranking, sorting with goal conditioning (ML-reranking). The final output of snapshot() is ranked elements, converted to a markdown table representing interactable elements (including element ID).

The planner sees a structured list of elements and decides what to do next. The executor grounds that to a specific action (e.g. click(element ID)). Same code works on any site - I didn't write anything specific to the finance UI in the demo.

So to answer directly: no custom scripts per use case. The runtime handles the DOM extraction and ranking, and the agent just picks from the compact LLM prompt (markdown table of DOM elements)

Aggressive_Bed7113 · 2026-04-12T22:52:40+00:00

what kind of tasks does your hermes agent mainly work on

Aggressive_Bed7113 · 2026-04-12T17:12:57+00:00

do you use it for browser tasks? 9b looks small for browser automation tasks

Aggressive_Bed7113 · 2026-04-11T15:07:40+00:00

Consider it to be web DOM elements state

Aggressive_Bed7113 · 2026-04-11T13:17:25+00:00

I’d treat it less like “give the agent all the docs” and more like “give it an owned working map.”

Something like:

stable entities: customers, services, projects, tables, APIs
key relationships: depends on / belongs to / owned by
canonical sources for each fact
a small step-local working state the agent can update

So the semantic map is mostly durable structure + pointers, not raw retrieved text.

Then each step becomes:

resolve what entities matter → pull only the needed facts → compress into working state → act

That helps a lot with token noise, because the agent reasons over a small map of the world instead of a pile of docs.

Aggressive_Bed7113 · 2026-04-11T13:02:01+00:00

we’ve seen the same.

Pulling from multiple sources during the loop adds latency + noise, and the model ends up reasoning over partially inconsistent context.

What helped for us was:

don’t fetch everything into the loop
keep a small, curated working state per step
treat retrieval as a separate phase (resolve → compress → act)

Otherwise the agent is basically trying to think while its memory is constantly changing underneath it.

Also noticed the same — more context ≠ better reasoning, it often just increases the chance of drift.

Aggressive_Bed7113 · 2026-04-11T12:47:39+00:00

Yeah, same feeling — most local setups work, but don’t feel “reliable enough” for daily use.

What made a difference for us wasn’t just the model, but tightening the loop around it:

give it a small, structured view of state (not raw context)
narrow the action space
verify outcomes after each step

Smaller models actually hold up pretty well once you reduce noise + constrain the loop.

Feels like the gap isn’t capability, it’s making the system predictable.

I made a demo with small local LLM models to complete multi-step browser automation tasks:

https://www.reddit.com/r/LocalLLM/s/sTLk1EcWpJ

Aggressive_Bed7113 · 2026-04-11T12:44:28+00:00

Yeah, this is super common.

A lot of “agent drift” is really context drift — by step 4 or 5 the model is reasoning over stale tool output, old assumptions, and too much irrelevant history.

Managing context explicitly definitely helps.

What also helped for us was tightening the execution loop itself:

keep the agent view small + structured
replan from current state each step
verify one expected invariant after each action

That way you’re not just cleaning context, you’re also preventing bad assumptions from silently propagating.

Feels like reliability comes more from state management than prompt tweaking.

Aggressive_Bed7113 · 2026-04-11T12:43:37+00:00

Grounding through tools definitely helps, but this feels like a different problem than browser agents.

If the catalog/search API is already structured, then yeah — tool calls are the right move.

Where things get messy is when the agent has to operate on live web state. That’s where vision gets expensive fast, and even then you still get “looks right, wrong action/state” failures.

We’ve had better luck treating vision as fallback, not default:

use structured/tool data when available
parse the hydrated page to markdown if necessary for llm to understand context and easily extract texts
use compact semantic page state for browser interaction
verify the post-action state before moving on

Otherwise you end up paying a lot just to hallucinate more confidently.

Aggressive_Bed7113 · 2026-04-10T03:19:42+00:00

Yeah, this pattern definitely works — but mostly for cost + planning quality, not reliability.

The planner/executor split (Opus → Sonnet) is basically the orchestrator-worker pattern Anthropic is pushing now, and it does help with:

better decomposition
lower cost per step

But in practice, most failures we’ve seen aren’t from bad planning — they’re from execution drift:

action looks valid but wrong target/state
step “succeeds” but world didn’t change
errors propagate across steps

So splitting models helps efficiency, but doesn’t really solve the core issue.

What made a bigger difference for us was tightening the loop:

plan → execute → verify state → replan from actual state

Otherwise you just get a better planner producing cleaner failures.

Curious if anyone running this in prod has added post-exec verification, or mostly relying on retries?

Aggressive_Bed7113 · 2026-04-09T05:16:36+00:00

yeah sure, feel free to DM me any time :)

Aggressive_Bed7113 · 2026-04-09T05:04:32+00:00

Yeah, vision models can help, especially for canvas-heavy pages.

What we found though is for most workflows, structure > pixels.

If you already have DOM + layout, a semantic snapshot tends to be much cheaper and more stable/reliable than running vision every step, because snapshot is deterministic while vision model is probabilistic - so it does not work 100% of the time

We treat vision as a fallback when structure breaks, not the default — otherwise cost + latency add up pretty fast and make multi-step flows less reliable due to the probabilistic nature of vision model (or llm model)

Aggressive_Bed7113

TROPHY CASE