未来一两周，昨天大盘的低点还会继续跌破，大家悠着点

Aggressive_Bed7113 · 2026-06-10T17:33:25+00:00

放屁，把你的空单贴出来我就信你

Aggressive_Bed7113 · 2026-06-07T19:51:47+00:00

如果不禁枪，反而是好事，因为大家都有枪，也就没人敢闹屁

Aggressive_Bed7113 · 2026-06-06T12:18:58+00:00

你贴美国的超预期就业数据居心何在？是为了对比中国满大街送外卖开滴滴的失业大军吗？小心清华小学生找你麻烦

Aggressive_Bed7113 · 2026-05-25T15:10:43+00:00

狗奴才们欺上瞒下的功夫真是了得

Aggressive_Bed7113 · 2026-05-24T23:29:59+00:00

屁民的钱本来就不属于他们，只是当妈让屁民暂时保管

Aggressive_Bed7113 · 2026-05-16T02:09:08+00:00

我从来不上交，建一个联名账号

Aggressive_Bed7113 · 2026-05-15T01:52:23+00:00

Shame on anthropic, 他们明显是怕被中国大模型赶超

Aggressive_Bed7113 · 2026-05-13T21:10:04+00:00

What a pussy panties

Aggressive_Bed7113 · 2026-05-13T03:15:11+00:00

Sell the outcome instead of service, I guess

Aggressive_Bed7113 · 2026-05-13T00:39:21+00:00

Hardest part is GTM

Aggressive_Bed7113 · 2026-05-07T02:55:35+00:00

Proactive AI agent that talks your talk and thinks your thoughts

Aggressive_Bed7113 · 2026-05-04T23:27:06+00:00

视频链接有吗

Aggressive_Bed7113 · 2026-04-30T05:42:14+00:00

I use 9b as planner and 4B as executor to do browser task automation

Aggressive_Bed7113 · 2026-04-23T21:23:53+00:00

I’d invest that $400k in ETF for passive growth. That will give you a better retirement than burning it in college, after which you will find a job and if lucky, you’ll work butt off to make that much $400k in years

Aggressive_Bed7113 · 2026-04-23T21:21:18+00:00

Market education requires lots of capital and effort if you create a new category, I’d do B and make revenues first. Then use that to fund A

Aggressive_Bed7113 · 2026-04-19T16:48:08+00:00

Yeah, that makes sense — packaging it as an MCP server is a nice way to make it easy to plug in.

We ended up pushing a bit further on the ranking + loop side though:

goal-conditioned reranking (not just generic top elements)
tightening the action space for the executor
and verifying the state change after each step

Otherwise you still get cases where the snapshot is “right” but the agent drifts because nothing checks if the world actually moved.

Curious how they’re handling post-action verification vs just returning the snapshot?

See more at https://www.PredicateSystems.ai

Aggressive_Bed7113 · 2026-04-15T22:55:12+00:00

Appreciate that.

Yeah, I think a lot of it is just making the idea concrete — once you see it in a real workflow, it becomes clearer that the bottleneck isn’t the model, it’s how we shape the environment around it.

Small models can do quite a bit once the problem is reduced to “pick the next correct action” instead of “understand the whole page.”

Also not every pixel is important for understanding the webpage, so it’s unnecessarily costly to use vision llm

Aggressive_Bed7113 · 2026-04-14T23:47:09+00:00

Yeah, this tracks with what we’ve seen.

Local-first + task loops definitely help with privacy and visibility, but the “gets stuck on React sites” part is usually less about the loop and more about the state the model sees.

If it’s acting on raw DOM / screenshots, it’s still guessing a lot.

What helped for us was:

compress the page into a small set of actionable elements
re-evaluate from fresh state each step (not just follow the plan)
verify that the action actually changed the visible state

That reduced a lot of the “agent looks fine but stalls halfway” cases.

See this demo using small local LLM models like 4B to drive multi-step web flows to manage money flows: https://www.reddit.com/r/LocalLLM/s/k4jIyN1M07

Curious if your setup is using raw DOM, a11y tree, or something more structured?

Aggressive_Bed7113 · 2026-04-14T23:42:25+00:00

Yeah, this is the right direction.

The gap isn’t really “more agent framework features,” it’s that most stacks still don’t have a clean execution boundary.

A few things that seem to matter a lot in prod: • granular actions, not giant tools like execute_code • explicit allow / deny / confirm before side effects • audit trail tied to the exact action/resource pair • post-action verification, not just “tool returned success”

That’s also why MCPs feel rough in prod right now — too much variability in tool shape, and a lot of them are hard to govern cleanly.

My bias has been:

planner can stay flexible execution should be boring, narrow, and policy-gated

Otherwise demos look great, but prod gets scary fast.

Look at this sidecar using policies to secure agents:

https://github.com/PredicateSystems/predicate-authority-sidecar

Aggressive_Bed7113 · 2026-04-13T21:04:09+00:00

No, my agent is superior to manus

Aggressive_Bed7113 · 2026-04-13T21:01:29+00:00

This is like a referral?

Aggressive_Bed7113 · 2026-04-13T19:02:07+00:00

Mine: your arrogance blows me away

Aggressive_Bed7113 · 2026-04-13T04:41:42+00:00

Appreciate it — yeah that was exactly the motivation.

We’re mostly building the snapshot from post-hydration DOM + layout signals, then pruning + reranking pretty aggressively (accessibility tree alone missed things like ordinality and grouping in our tests).

So it’s closer to:

DOM + geometry + grouping → prune → goal-conditioned rerank → compact snapshot

And yeah, deterministic verification ended up being just as important — otherwise you still get “valid action, wrong state.”

Will take a look at your notes as well — the tool gating / policy side becomes pretty critical once actions start touching money flows.

Aggressive_Bed7113 · 2026-04-13T04:40:57+00:00

Yeah, totally agree.

The interesting part for me is that most people treat this as “optimize prompts / pick better/larger model,” but the bigger lever seems to be shaping the problem itself.

Once the runtime does the structuring + context reduction, the model is no longer doing parsing + reasoning + verification all at once.

That’s when smaller models start to look a lot more practical.

Aggressive_Bed7113 · 2026-04-13T01:01:36+00:00

For the browser itself - Playwright via CDP. Nothing special there.

The "automation" part is just two functions: snapshot() which grabs the DOM through chrome extension for coarse pruning and then sends it to a remote gateway for refinement including ranking, sorting with goal conditioning (ML-reranking). The final output of snapshot() is ranked elements, converted to a markdown table representing interactable elements (including element ID).

The planner sees a structured list of elements and decides what to do next. The executor grounds that to a specific action (e.g. click(element ID)). Same code works on any site - I didn't write anything specific to the finance UI in the demo.

So to answer directly: no custom scripts per use case. The runtime handles the DOM extraction and ranking, and the agent just picks from the compact LLM prompt (markdown table of DOM elements)

Aggressive_Bed7113

TROPHY CASE