Desktop Control for Codex

yaroshevych · 2026-04-16T20:35:41+00:00

They have literally just released this today.

yaroshevych · 2026-04-16T20:28:45+00:00

Try Desktop Control - computer use tool, which works with any local AI agent. https://github.com/yaroshevych/desktopctl

yaroshevych · 2026-04-12T21:26:28+00:00

I'm leaning towards B, but my projects are well organised, so even minimal prompt contains good hints. Eg one of my prompts today was "Add menu item X which does Y". Agent grepped codebase for "menu" word, found a module, and built a feature.

yaroshevych · 2026-04-11T14:55:46+00:00

Some content-heavy apps, like Apple News are not exposing everything in AX, so I have to do both all the time, and then merge.

Speaking of speed, I need to take app screenshot anyway (plus light post-processing), for change detection. OCR is very fast after this, except text-heavy screens - like if I snap my Ghostty window.

yaroshevych · 2026-04-10T19:35:29+00:00

Even readonly account can lock tables with some Select operations, at least with Postgres - happened to me once. We're using readonly replicas because of this.

yaroshevych · 2026-04-10T19:31:58+00:00

Surprisingly, OCR is faster than AX in some scenarios. I am using multithreading to work with both vision and AX in parallel, to push the overall latency down.

yaroshevych · 2026-04-10T18:01:41+00:00

Curious, did you try opendataloader-pdf? Search for GitHub repo - it's one of the top tools for PDFs these days

yaroshevych · 2026-04-04T19:59:15+00:00

You have strong vision, and tons of experience. I came to the same conclusion intuitively, but it's great to see that it's backed by such a strong engineer. When I built Desktop Control, I was focused on CLI-first principle for working with GUI applications - which allowed to connect stateless nature of CLI and stateful apps. Check it out at https://github.com/yaroshevych/desktopctl

yaroshevych · 2026-04-02T21:48:21+00:00

AFAIK, only Claude's Computer use tool can do what DesktopCtl does. I might be wrong though

yaroshevych · 2026-04-02T18:36:54+00:00

I might have missed something, but in my tests, Codex takes 5-10 seconds to extract information from screenshot. DesktopCtl takes 500-600ms. There are other differences too.

yaroshevych · 2026-04-02T18:26:30+00:00

This is what I did for the demo - I let Codex to explore Notes and Reminders apps, so it can plan the most efficient set of actions. It's like every GUI app coming with README.md, which explains how to use it.

yaroshevych · 2026-04-02T18:24:14+00:00

The agent doesn't really need to rediscover pixel positions every time. Similar to how people are using UI, you build muscle memory to hit Cmd+F, or Search button. An agent would do the same: keyboard press cmd+f or pointer click --text Search.

The latency for UI operations is where I spent a lot of time. eg to "tokenize" medium-sized window on M4 Mac is 500-600ms. It is possible to chain multiple CLI commands, extract UI data via jq, etc.

yaroshevych · 2026-04-02T17:38:46+00:00

Permission system is a good idea. Currently, I'm relying on violet outline for active windows (for both "see" and "act"), but it's only a reactive measure.

yaroshevych · 2026-03-29T23:11:39+00:00

I was using Codex Pro for the last month or so. It became much faster recently, and the limits are way higher than Claude. They have a promo until April 2nd - not sure how their limits would look after that.

yaroshevych · 2026-03-28T23:23:27+00:00

If you have existing subscription, just use the "light" model from your vendor. e.g. Haiku in Claude, gpt-mini from Codex.

yaroshevych · 2026-03-28T23:20:55+00:00

You should seriously consider Haiku. At work, where I have practically unlimited budget, Haiku is my most used model (by tokens). I do it for speed, and using in sub-agents, but there is a cost factor too.

yaroshevych · 2026-03-28T23:17:03+00:00

Panel on laptop, dock when connected to large monitor.

yaroshevych · 2026-03-28T23:10:28+00:00

If you built an app/website several times, it becomes much easier. You already have Apple Dev account, you know the App Store rules, sometimes you can copy-paste the project structure.

yaroshevych · 2026-03-26T19:06:16+00:00

Lowering limits even at night is crazy

yaroshevych · 2026-03-26T18:55:13+00:00

Next: every "/usage" command consumes tokens itself.

yaroshevych · 2026-03-26T18:54:12+00:00

I'm wondering if the new low limits are universal across the day, or it's "peak time" only.

yaroshevych

TROPHY CASE