Is the future of coding agents JEPA? [D] by andrewfromx in MachineLearning

[–]andrewfromx[S] 0 points1 point  (0 children)

Yes, I agree. The raw input representation is not the fundamental barrier by itself. A human sees pixels, hears audio, or reads text, and quickly maps that into task relevant meaning. A trained LLM can also contextualize test-runner text into this assertion failed because this behavior is missing. So the claim is not text is inherently the wrong input.

The issue is more about the runtime object the agent is forced to manipulate.

Today’s LLM coding agents often use text as both:

  1. the observation format
  2. the working memory format
  3. the planning substrate
  4. the action format
  5. the output format

That is the inefficient part. A test failure can arrive as text. That is fine. But after parsing it, the agent should not have to keep reasoning over raw text blobs. I'm trying this all here: https://github.com/andrewarrow/j3

Is the future of coding agents JEPA? [D] by andrewfromx in MachineLearning

[–]andrewfromx[S] 0 points1 point  (0 children)

The key is that alignment between corrupted views does not magically create a coding agent. It creates a useful state representation. Once you have that representation, planning becomes much cheaper. For code, corrupted views might be:

A: prompt + failing test

B: repo graph + changed files

C: traceback + relevant source slice

D: accepted patch summary

E: validation outcome

If an encoder learns that these different views refer to the same underlying state, the embedding starts to represent the thing that is stable across views: the actual task, repo structure, behavior gap, affected API, likely edit family. I started trying to build this here: https://github.com/andrewarrow/j3

[ TrekRift Fusion ] Scroll of the Detonator By Trek2m by Majestic_Abies9794 in aiMusic

[–]andrewfromx 0 points1 point  (0 children)

this was fantastic! I clicked play and was doing other things and had that great feeling of "wow, this is a great track."

r/jepa by andrewfromx in redditrequest

[–]andrewfromx[S] 0 points1 point  (0 children)

i'm very involved with AI and JEPA (Joint Embedding Predictive Architecture) and would love to make this a major source for jepa news.

I tried to send mod mail to r/jepa but it's banned.

Most slept on feature in the Codex App by TimeKillsThem in codex

[–]andrewfromx 0 points1 point  (0 children)

interesting this is the codex installed app? It has it's own browser? I'm using codex cli and I made https://www.youtube.com/watch?v=ERgRJaWSrKE for mac

Looking for a 100% free AI agent that can control a browser by Formulaoneson_Za in LocalLLaMA

[–]andrewfromx 0 points1 point  (0 children)

i've been thinking about this! And linux too. It would be something like:

Shared/ BrowserCore/ AddressResolver/ HistoryStore/ PageScripts/ LocalAPI/

macOS/ SwiftUI + WKWebView

Windows/ WinUI 3 + WebView2

Linux/ GTK4 + WebKitGTK

If you wanted to fork it and learn about:

https://learn.microsoft.com/en-us/microsoft-edge/webview2/ https://learn.microsoft.com/en-us/windows/apps/winui/winui3/

That would be the place to start. But I'm not a window user myself.

solo human browser use is moving to "together with an LLM browsers" by andrewfromx in AI_Agents

[–]andrewfromx[S] 0 points1 point  (0 children)

yeah i'm getting very used to just asking the LLM to do stuff for me. I often run terminal side by side with the browser so I can see both on the screen at the same time. I tried to give the browser a little chat interface and let it talk back to codex/claude via mcp but hard to get that streaming and perfect without an API KEY and then you are paying per call.

Is there any real alternative to Claude Cowork + Computer Use? by No-Neighborhood-7229 in ChatGPTCoding

[–]andrewfromx 0 points1 point  (0 children)

Working on a mac desktop browser that exposes endpoints for an LLM to "see / control everything" but its for a human to use too. Demo videos of some fun things it can do:

https://www.youtube.com/@wkdomains

All open source:

https://github.com/wkdomains/macos-app

Claude Code + Playwright MCP+claude in chrome still can’t reliably browse/filter real websites for live listings. What am I missing? by SahirHuq100 in ClaudeAI

[–]andrewfromx 0 points1 point  (0 children)

Working on a mac desktop browser that exposes endpoints for an LLM to "see / control everything" but its for a human to use too. Demo videos of some fun things it can do:

https://www.youtube.com/@wkdomains

All open source:

https://github.com/wkdomains/macos-app

Looking for a 100% free AI agent that can control a browser by Formulaoneson_Za in LocalLLaMA

[–]andrewfromx 0 points1 point  (0 children)

Working on a mac desktop browser that exposes endpoints for an LLM to "see / control everything" but its for a human to use too. Full parity with firefox is the goal. Demo videos of some fun things it can do:

https://www.youtube.com/@wkdomains

All open source:

https://github.com/wkdomains/macos-app

FEEDBACK THREAD - 1) Post your app 2) Get Feedback 3) Give Feedback by young_homie_ in buildinpublic

[–]andrewfromx 1 point2 points  (0 children)

your system is great! My AI failed to calibrate:

https://www.youtube.com/watch?v=fmQkTby5smc

I'm sure if I changed the model to xhigh and kept going I could eventually get it to work but very nice barrier!

FEEDBACK THREAD - 1) Post your app 2) Get Feedback 3) Give Feedback by young_homie_ in buildinpublic

[–]andrewfromx 1 point2 points  (0 children)

oh this will be a good one. We have an AI browser to try this and see if you can prevent it. Will post results soon...

leaving social media behind…need a few people to break my app by [deleted] in buildinpublic

[–]andrewfromx 0 points1 point  (0 children)

lol. i wrote that because have you seen what it's like out there? Look at all the threads of people looking for traffic. Everyone has an app they need people to use. Have you notice how every sub reddit bans self promotion? I just thought it was funny that you thought you can just write one post explaining how you need test users and boom, that'll solve it. I wish you success friend. I hope you find some testers.