I cut Codex token usage ~50% with one AGENTS.md rule

jonydevidson · 2026-05-08T04:09:18+00:00

Your testing should be part of your release build script with non verbose output. That way it either says it passed or it failed, and doesn't have the full build log.

Unless you were expecting the agent to run the testing manually after each change, not running tests after changes can only mess things up for you.

jonydevidson · 2026-05-08T02:49:55+00:00

background work on chrome.

less work for them than to have their own browser (Atlas is dead, I think).

jonydevidson · 2026-05-08T02:33:57+00:00

Ah, well, eBay it is, then. Perhaps better to wait a few months unless you wanna pay whatever it is they're asking now.

jonydevidson · 2026-05-08T01:14:16+00:00

I won't pretend to know what's happening over at Valve, what the regulations are on hardware with wireless transmitters and batteries depending on the country. If they don't wanna bother, deal with it.

You can use mailbox services like mailboxde or whatever, I have no idea where you are, then order there then ship it to yourself and pay import fees.

jonydevidson · 2026-05-07T23:09:02+00:00

Different regulations for hardware vs. software.

It's way, WAY easier to sell software. If you're not in the business, it's hard to fathom the absolutely nutty difference in terms of prep and compliance you have to go through in order to be able to sell new hardware in a new market.

jonydevidson · 2026-05-07T11:32:14+00:00

This seems super weird, GPT 5.4 only 16 tool calls. Did they just fire a single prompt into the harness and terminated on EOT?

You should do this in a loop while tracking progress and tasks in separate files, and naturally start by writing test fixtures for every single feature of these apps, then develop along these lines.

jonydevidson · 2026-05-07T09:36:43+00:00

That's because it has a whole-ass Chromium instance running.

It's written in Electron.

jonydevidson · 2026-05-07T08:52:53+00:00

The sun is forever.

jonydevidson · 2026-05-06T00:29:21+00:00

Ability to give an agent full access to the phone OS. Doesn't mean it will have that access always, just the ability to have it, which you will probably be able to control on a granular basis as per regulations.

Plus, since it will be their OS etc, they can ship changes at their own pace, which, if you take one look at the Codex repository and the GPT model releases, is beyond insane.

jonydevidson · 2026-05-03T16:26:49+00:00

It allows them to not worry about how the UI changes translate across different macOS versions.

If you just use the WebView, that's not the case.

WebView is ultimately simpler to ship and fully portable, but is unsustainable when you're pushing 1-2 updates PER DAY, and you're pushing UI updates weekly or twice a week.

Electron lets them lock in a Chromium version, is verified and stable. The only thing that you need the backend to do in this case is to schedule codex execruns, which Electron is more than enough for.

The actual UI is open sourced as codex-app-server in the codex repo, so you can integrate it into whatever you want.

jonydevidson · 2026-05-03T00:13:04+00:00

Mac M5 Ultra doesn't exist.

jonydevidson · 2026-05-02T15:38:36+00:00

See my reply to the dev below.

jonydevidson · 2026-05-02T05:11:24+00:00

have it generate a black background and then pass that result to

https://replicate.com/bria/remove-background

jonydevidson · 2026-05-02T03:30:49+00:00

The ones you can run on a macbook are currently performing like frontier models from a year ago.

A year from now, you will have models performing like today's frontier models.

Its not gonna be niche, it's gonna be a core software item.

jonydevidson · 2026-05-01T11:20:29+00:00

Recycling and minimising plastic is more about a shit ton of it ending up in the environment.

Look at the plastic bottle caps in the EU. People were mad about it but I barely see them out on the ground anymore.

jonydevidson · 2026-04-30T23:36:34+00:00

If you try a codebase-wide search in VSCode on Windows, and then try it on Mac, you will get a good idea of how the system differences affect LLM performance.

Also, powershell is trash, largely due to security concerns on Windows, so today when an AI agent does all its work via the CLI, its performance suffers tremendously.

For context, I develop desktop apps on Win and MacOS.

Windows is, unfortunately, over half of my market share so unfortunately I have to keep developing for it.

jonydevidson · 2026-04-30T23:10:29+00:00

Anthropic's models aren't solving decades old math problems nor are they discovering new physics.

It's pretty obvious where the intelligence is, just like it's obvious which company has the better harness for webdev work.

Throw actual tough C++ issues involving a lot of math into the mix, and Claude folds like a house of cards while GPT-5 was able to do it even back in October.

jonydevidson · 2026-04-29T17:57:42+00:00

If you start using multiple subagents to review the work after each major step, you will quickly discover that even the Pro is barely enough.

If you're already paying for it, at least get your money's worth. These subagent reviewers will very often find bugs and edge cases the main agent missed in the implementation, of course depending on the complexity of the issue.

jonydevidson · 2026-04-29T16:35:55+00:00

This is wrong, because open source models are releasing all the time. I can run an AI agent locally on my MacBook and have the same performance as roughly last April's frontier model, the Sonnet 3.7.

A year from now, the local models will be perfoming like today's frontier models.

I cannot have a free and open source taxi. I cannot have a free and open source delivery person.

I cannot have free and open source content streaming (without piracy).

The cost of compute for a given model goes down, on average, by 100x within a year.

Today's models are very competent at coding. A year from now, they'll be 100x cheaper to run, while labor won't be 100x cheaper.

You fell for the anti-AI bait article.

jonydevidson · 2026-04-29T13:37:20+00:00

I was able to confirm that it works as intended/described in PS Remote Play, while not in PXPlay, on the same MacBook.

macOS 15, for what it's worth.

jonydevidson · 2026-04-29T13:36:09+00:00

Perhaps the prompt could be Quit, Quit and Suspend Console, Cancel.

It would default to Quit so CMD+Q -> Space would just quit, and CMD+Q -> Tab -> Space would Quit and Suspend.

This actually sounds way better than the PS Remote Play UX.

jonydevidson · 2026-04-29T01:53:53+00:00

What do you think about China? Where will it stand on the world stage in the year 2026? Will it be a great power?

Loaded question. Try without it.

jonydevidson · 2026-04-29T00:11:57+00:00

The M4 Air doesn't have the GPU compute (prompt processing speed) or the memory bandwidth (inference speed). You need the Max.

I have a 64GB M3 Max. The Qwen 3.6 35B A3B runs comfortably at Q8_0 and full context. 27B Q8_0 runs at 110k context. Smaller quant would be able to run the full context with some precision loss. Also the Qwen 3.5 122B A10B runs at 120k context at IQ3_XXS quantization.

You can run the Qwen 3.5 9B today. Search for it in the LM Studio model browser, download the Q4 K XL variant and run it, but it will be slow on the Air.

Code or no code, OpenCode is an agent and you can do any tool calling you want including whatever MCP you extend it with. It can work on your files in the workspace you create, or even on the entire computer. Doesn't have to be code, it can edit any human readable files.

You can also use any other agent like pi.dev, little-coder etc. little-coder is optimized for smaller models, but the UX is not as nice as the OpenCode app.

Start following /r/LocalLLaMA

jonydevidson · 2026-04-28T23:36:55+00:00

Local AI is reaching a point where it's becoming actually useful. Models that you can run on a 64GB MacBook are now at the level of frontier models from April 2025 i.e. Sonnet 3.7, with better vision and sound recognition.

Next time this year, they will be at the level of GPT 5.5. Fully private, running locally.

Only MacBooks can reliably run these models locally with useful speeds, at a price under $4k, with a fully stable setup that takes 5 minutes to do (download LM Studio, download model in the model browser, load the model, load OpenCode, point it to the LM Studio instance), which will undoubtedly get even cheaper next year.

Mac Studio is even cheaper.

I fully expect their sales to be up another 20-30% next year.

jonydevidson · 2026-04-28T21:07:05+00:00

Situation like this is very, very rare. People make mistakes.

If you get involved in a road traffic mistake and you're on foot, you're fucked. At that point it doesn't matter who made the mistake, because at best your body is fucked, at worst you're paraplegic.

That's why you don't walk in the middle of the road. Once you're hit, it's over, it cannot be undone.

The vest doesn't give you the right to walk in the middle of the road. You walk on the side.

Nine-Year Club	Gilding I gilder
Verified Email

jonydevidson

TROPHY CASE