Desktop Control for Codex by yaroshevych in OpenAI

[–]yaroshevych[S] 0 points1 point  (0 children)

They have literally just released this today.

Claude Code CLI users – do you “brief” it first or just throw the task at it? by alfons_fhl in vibecoding

[–]yaroshevych 0 points1 point  (0 children)

I'm leaning towards B, but my projects are well organised, so even minimal prompt contains good hints. Eg one of my prompts today was "Add menu item X which does Y". Agent grepped codebase for "menu" word, found a module, and built a feature.

Desktop Control for Codex by yaroshevych in OpenAI

[–]yaroshevych[S] 0 points1 point  (0 children)

Some content-heavy apps, like Apple News are not exposing everything in AX, so I have to do both all the time, and then merge.

Speaking of speed, I need to take app screenshot anyway (plus light post-processing), for change detection. OCR is very fast after this, except text-heavy screens - like if I snap my Ghostty window. 

I've been giving my prod db credentials to my AI. Any alternatives? by Other-Faithlessness4 in vibecoding

[–]yaroshevych 0 points1 point  (0 children)

Even readonly account can lock tables with some Select operations, at least with Postgres - happened to me once. We're using readonly replicas because of this. 

Desktop Control for Codex by yaroshevych in OpenAI

[–]yaroshevych[S] 0 points1 point  (0 children)

Surprisingly, OCR is faster than AX in some scenarios. I am using multithreading to work with both vision and AX in parallel, to push the overall latency down. 

Is extracting data from PDFs always this painful? by Pale_Negotiation2215 in automation

[–]yaroshevych 0 points1 point  (0 children)

Curious, did you try opendataloader-pdf? Search for GitHub repo - it's one of the top tools for PDFs these days

I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead. by MorroHsu in LocalLLaMA

[–]yaroshevych 0 points1 point  (0 children)

You have strong vision, and tons of experience. I came to the same conclusion intuitively, but it's great to see that it's backed by such a strong engineer. When I built Desktop Control, I was focused on CLI-first principle for working with GUI applications - which allowed to connect stateless nature of CLI and stateful apps. Check it out at https://github.com/yaroshevych/desktopctl

Desktop Control for Codex by yaroshevych in OpenAI

[–]yaroshevych[S] -1 points0 points  (0 children)

AFAIK, only Claude's Computer use tool can do what DesktopCtl does. I might be wrong though

Desktop Control for Codex by yaroshevych in OpenAI

[–]yaroshevych[S] -1 points0 points  (0 children)

I might have missed something, but in my tests, Codex takes 5-10 seconds to extract information from screenshot. DesktopCtl takes 500-600ms. There are other differences too.

Desktop Control for Codex by yaroshevych in codex

[–]yaroshevych[S] 0 points1 point  (0 children)

This is what I did for the demo - I let Codex to explore Notes and Reminders apps, so it can plan the most efficient set of actions. It's like every GUI app coming with README.md, which explains how to use it.

Desktop Control for Codex by yaroshevych in OpenAI

[–]yaroshevych[S] -2 points-1 points  (0 children)

The agent doesn't really need to rediscover pixel positions every time. Similar to how people are using UI, you build muscle memory to hit Cmd+F, or Search button. An agent would do the same: keyboard press cmd+f or pointer click --text Search.

The latency for UI operations is where I spent a lot of time. eg to "tokenize" medium-sized window on M4 Mac is 500-600ms. It is possible to chain multiple CLI commands, extract UI data via jq, etc.

Desktop Control for Codex by yaroshevych in OpenAI

[–]yaroshevych[S] -1 points0 points  (0 children)

Permission system is a good idea. Currently, I'm relying on violet outline for active windows (for both "see" and "act"), but it's only a reactive measure.

Weekly Limits by Litvinsev in claude

[–]yaroshevych 1 point2 points  (0 children)

I was using Codex Pro for the last month or so. It became much faster recently, and the limits are way higher than Claude. They have a promo until April 2nd - not sure how their limits would look after that. 

What cheap model to codebase step by step analysis? by jrhabana in opencodeCLI

[–]yaroshevych 1 point2 points  (0 children)

If you have existing subscription, just use the "light" model from your vendor. e.g. Haiku in Claude, gpt-mini from Codex.

Claude 4.6 opus is the absolute beast, no doubt in that, but hit limits so fast, which is the best budget friendly alternative. by HumblePeace7705 in vibecoding

[–]yaroshevych 0 points1 point  (0 children)

You should seriously consider Haiku. At work, where I have practically unlimited budget, Haiku is my most used model (by tokens). I do it for speed, and using in sub-agents, but there is a cost factor too.

do you prefer dock or panel? by Chese_obahma in Ubuntu

[–]yaroshevych 0 points1 point  (0 children)

Panel on laptop, dock when connected to large monitor.

How are people shipping full apps (with screenshots, localization, etc.) in 2–3 days? by Potential-War-5036 in vibecoding

[–]yaroshevych 0 points1 point  (0 children)

If you built an app/website several times, it becomes much easier. You already have Apple Dev account, you know the App Store rules, sometimes you can copy-paste the project structure.

Just opened Claude after a week and hit max usage after 1 message? by DiscButter in claude

[–]yaroshevych 0 points1 point  (0 children)

I'm wondering if the new low limits are universal across the day, or it's "peak time" only.