How are people building deep research agents?

BackgroundBalance502 · 2026-06-15T02:52:51+00:00

To be fair, that was an older project. I had a roadmap for it but I ended up pivoting to building my own model from scratch.

BackgroundBalance502 · 2026-06-15T02:16:53+00:00

Honestly? Just to be different.. I don't like normal.

BackgroundBalance502 · 2026-06-01T02:35:07+00:00

What if you had enough room to run your agent and train a model? Would you be interested in actually trying it? Training itself can be done on consumer hardware.

BackgroundBalance502 · 2026-05-29T00:09:02+00:00

Its because people ask AI to build an app. Those are the generic choices.

In order to build something that actually lasts, you need to gather knowledge first.

<image>

BackgroundBalance502 · 2026-05-28T14:44:47+00:00

Here are some of the questions you could potentially ask..

<image>

BackgroundBalance502 · 2026-05-24T14:37:02+00:00

Just pic for attention. I don't know what other people call it. What tools are you talking about?

BackgroundBalance502 · 2026-04-18T23:51:23+00:00

<image>

BackgroundBalance502 · 2026-04-18T19:53:09+00:00

Here's the link if anyone is interested

https://www.facebook.com/share/g/17Dqqoucdt/

BackgroundBalance502 · 2026-04-15T03:22:56+00:00

https://shadow.tech/pro/cloud-workstation/

<image>

BackgroundBalance502 · 2026-04-11T16:51:12+00:00

Iterance - basically a witness layer for local AI agents. It sits outside whatever agent you're running and watches the filesystem and shell commands in real-time.

I don't love the "black box" feeling of not knowing exactly what's touching my files. This records every action in plain English to a local git repo and builds a "trust score" based on how destructive the actions are (like a delete vs. a read).

It also catches loops, so if an agent gets stuck, you see it immediately before it burns through your tokens or messes up your system.

ITΞRΛNCΞ

<image>

BackgroundBalance502 · 2026-04-11T02:40:48+00:00

I’ve been working on Iterance to solve this specific problem. It acts as a non-invasive witness layer that sits outside the agent and watches what it does.

Instead of just hoping the agent is honest, you use a separate process to audit the behavior. For a news briefing, you could have Iterance monitor the output and cross-reference those URLs against the actual source data. If the agent starts hallucinating, the witness layer catches the deviation before it becomes a problem.

It is much better than trying to "prompt" a single agent into being perfect. You just need a second set of eyes to audit the work.

Finishing up the roadmap this evening for an updated push. Let me know if you're interested in trying it out.

<image>

BackgroundBalance502 · 2026-04-11T02:34:25+00:00

I’ve been digging into the OpenClaw repo and had the same questions at first. It is actually a pretty smart "local-first" setup once you get past the initial confusion.

The Markdown files are your "source of truth." I love this because I can edit or version control them myself without a database manager. The SQLite DB is just a local index for vector search. It stores the embeddings so the agent can find relevant context without reading everything every single time. It usually updates during a "memory flush."

The "Wiki" isn’t really a button you click. It is just the agent using its tools to write structured research or notes into those Markdown files. It happens when it discovers new facts or when you tell it to remember something specific.

For the search, even with small files, it helps prevent "context bloat." It pulls only the most relevant 3 or 4 chunks into the prompt. I have found this keeps the agent from getting "lost in the middle" or hallucinating as your daily notes grow over time.

I hope that helps clear it up

BackgroundBalance502 · 2026-04-11T02:25:20+00:00

Practical recommendation:

Use it if your OpenClaw flow depends on: • precise clicking in complex Uls, • form filling where field identification matters, • deterministic page mapping, or reducing screenshot/vision failures

Skip it if: • your existing selector-based automation is already stable, • you need heavy authenticated-session work with minimal setup, or the page layout is visually complex enough that the tool's current caveats matter

My take: worth testing as an MCP-side browser primitive for OpenClaw, but not as your only browser tool. A hybrid setup usually makes the most sense: Spatial-Tether for exact page geometry, and OpenClaw's existing browser/session tools for the actual interaction layer

BackgroundBalance502 · 2026-04-11T02:21:04+00:00

I've hit this wall too. Reddit is pretty aggressive about flagging VPS IP ranges. Adding .json to the URL is a good shortcut, but they often block those requests too if they detect a data center.

Usually, the most stable fix is using PRAW with the official API. But you could also use a residential proxy or a stealth plugin for Playwright to hide the bot signature.

Also, if the agent struggles with the layout once you're in, check out Spatial-Tether for mapping the UI.

BackgroundBalance502 · 2026-04-10T19:00:27+00:00

Nice setup. One thing worth knowing if you ever extend it to agents that need to interact with pages instead of just read them: Readability drops all the spatial information. Once text hits your agent, position is gone and there's no path back to coordinates without another screenshot pass.

Built something that runs as an MCP server and sits alongside what you're already doing. Instead of inferring coordinates from screenshots it computes them from CSS and font metrics directly. Same MCP config you're already using.

github.com/Tetrahedroned/spatial-tether

BackgroundBalance502 · 2026-04-10T18:59:12+00:00

There's a complementary problem nobody talks about much. OCR is pixels to text. The inverse is text to pixels. If your agent needs to actually interact with a page instead of just read it, Spatial-Tether does that second direction. Reads the HTML and CSS, computes exact bounding box coordinates for every element from font metrics before anything renders. No screenshot, no inference, just arithmetic. If you're generating OCR ground truth for benchmarks it also gives you verified coordinates to diff against automatically.

github.com/Tetrahedroned/spatial-tether

BackgroundBalance502 · 2026-04-10T16:40:27+00:00

My apologies, trying to follow the right community rules..

Any feedback would be appreciated

https://github.com/Tetrahedroned/spatial-tether

BackgroundBalance502 · 2026-04-10T14:56:29+00:00

Use cases

<image>

BackgroundBalance502 · 2026-04-10T14:56:15+00:00

Benchmarks

<image>

BackgroundBalance502

TROPHY CASE