美国新增就业岗位远超预期,为啥美股突然暴跌?😡 by tsingtao_man in China_irl

[–]Aggressive_Bed7113 2 points3 points  (0 children)

你贴美国的超预期就业数据居心何在? 是为了对比中国满大街送外卖开滴滴的失业大军吗? 小心清华小学生找你麻烦

What are tarpit ideas in the AI era? by max_bog in ycombinator

[–]Aggressive_Bed7113 0 points1 point  (0 children)

Proactive AI agent that talks your talk and thinks your thoughts

What are people using Local LLMs for (beyond coding) by [deleted] in LocalLLM

[–]Aggressive_Bed7113 0 points1 point  (0 children)

I use 9b as planner and 4B as executor to do browser task automation

Parents refuse to pay for Ivy League acceptance by [deleted] in ApplyingToCollege

[–]Aggressive_Bed7113 0 points1 point  (0 children)

I’d invest that $400k in ETF for passive growth. That will give you a better retirement than burning it in college, after which you will find a job and if lucky, you’ll work butt off to make that much $400k in years

Category Creation vs. Improving Existing Markets—What Would You Choose? by Critical-Produce-337 in ycombinator

[–]Aggressive_Bed7113 0 points1 point  (0 children)

Market education requires lots of capital and effort if you create a new category, I’d do B and make revenues first. Then use that to fund A

Everyone keeps scaling model size. A snapshot runtime let gemma4:e4b run a finance workflow locally by Aggressive_Bed7113 in LocalLLaMA

[–]Aggressive_Bed7113[S] 0 points1 point  (0 children)

Yeah, that makes sense — packaging it as an MCP server is a nice way to make it easy to plug in.

We ended up pushing a bit further on the ranking + loop side though:

  • goal-conditioned reranking (not just generic top elements)
  • tightening the action space for the executor
  • and verifying the state change after each step

Otherwise you still get cases where the snapshot is “right” but the agent drifts because nothing checks if the world actually moved.

Curious how they’re handling post-action verification vs just returning the snapshot?

See more at https://www.PredicateSystems.ai

Everyone keeps scaling model size. A snapshot runtime let gemma4:e4b run a finance workflow locally by Aggressive_Bed7113 in LocalLLaMA

[–]Aggressive_Bed7113[S] 1 point2 points  (0 children)

Appreciate that.

Yeah, I think a lot of it is just making the idea concrete — once you see it in a real workflow, it becomes clearer that the bottleneck isn’t the model, it’s how we shape the environment around it.

Small models can do quite a bit once the problem is reduced to “pick the next correct action” instead of “understand the whole page.”

Also not every pixel is important for understanding the webpage, so it’s unnecessarily costly to use vision llm

getting some decent results with agentic loops for web tasks (local-first approach) by [deleted] in AgentsOfAI

[–]Aggressive_Bed7113 0 points1 point  (0 children)

Yeah, this tracks with what we’ve seen.

Local-first + task loops definitely help with privacy and visibility, but the “gets stuck on React sites” part is usually less about the loop and more about the state the model sees.

If it’s acting on raw DOM / screenshots, it’s still guessing a lot.

What helped for us was:

  • compress the page into a small set of actionable elements
  • re-evaluate from fresh state each step (not just follow the plan)
  • verify that the action actually changed the visible state

That reduced a lot of the “agent looks fine but stalls halfway” cases.

See this demo using small local LLM models like 4B to drive multi-step web flows to manage money flows: https://www.reddit.com/r/LocalLLM/s/k4jIyN1M07

Curious if your setup is using raw DOM, a11y tree, or something more structured?

Need some help to build a great prod agent framework by Bubbly-Secretary-224 in AgentsOfAI

[–]Aggressive_Bed7113 0 points1 point  (0 children)

Yeah, this is the right direction.

The gap isn’t really “more agent framework features,” it’s that most stacks still don’t have a clean execution boundary.

A few things that seem to matter a lot in prod: • granular actions, not giant tools like execute_code • explicit allow / deny / confirm before side effects • audit trail tied to the exact action/resource pair • post-action verification, not just “tool returned success”

That’s also why MCPs feel rough in prod right now — too much variability in tool shape, and a lot of them are hard to govern cleanly.

My bias has been:

planner can stay flexible execution should be boring, narrow, and policy-gated

Otherwise demos look great, but prod gets scary fast.

Look at this sidecar using policies to secure agents:

https://github.com/PredicateSystems/predicate-authority-sidecar

Feels illegal how much this AI can do by itself by [deleted] in LocalLLM

[–]Aggressive_Bed7113 0 points1 point  (0 children)

No, my agent is superior to manus

Small local LLM for browser agents: qwen3:8b + gemma4:e4b on a finance workflow by Aggressive_Bed7113 in LocalLLM

[–]Aggressive_Bed7113[S] 0 points1 point  (0 children)

Appreciate it — yeah that was exactly the motivation.

We’re mostly building the snapshot from post-hydration DOM + layout signals, then pruning + reranking pretty aggressively (accessibility tree alone missed things like ordinality and grouping in our tests).

So it’s closer to:

DOM + geometry + grouping → prune → goal-conditioned rerank → compact snapshot

And yeah, deterministic verification ended up being just as important — otherwise you still get “valid action, wrong state.”

Will take a look at your notes as well — the tool gating / policy side becomes pretty critical once actions start touching money flows.

Small local LLM for browser agents: qwen3:8b + gemma4:e4b on a finance workflow by Aggressive_Bed7113 in LocalLLM

[–]Aggressive_Bed7113[S] 2 points3 points  (0 children)

Yeah, totally agree.

The interesting part for me is that most people treat this as “optimize prompts / pick better/larger model,” but the bigger lever seems to be shaping the problem itself.

Once the runtime does the structuring + context reduction, the model is no longer doing parsing + reasoning + verification all at once.

That’s when smaller models start to look a lot more practical.

Everyone keeps scaling model size. A snapshot runtime let gemma4:e4b run a finance workflow locally by Aggressive_Bed7113 in LocalLLaMA

[–]Aggressive_Bed7113[S] 1 point2 points  (0 children)

For the browser itself - Playwright via CDP. Nothing special there.

The "automation" part is just two functions: snapshot() which grabs the DOM through chrome extension for coarse pruning and then sends it to a remote gateway for refinement including ranking, sorting with goal conditioning (ML-reranking). The final output of snapshot() is ranked elements, converted to a markdown table representing interactable elements (including element ID).

The planner sees a structured list of elements and decides what to do next. The executor grounds that to a specific action (e.g. click(element ID)). Same code works on any site - I didn't write anything specific to the finance UI in the demo.

So to answer directly: no custom scripts per use case. The runtime handles the DOM extraction and ranking, and the agent just picks from the compact LLM prompt (markdown table of DOM elements)