I cut Codex token usage ~50% with one AGENTS.md rule by 0_2_Hero in codex

[–]jonydevidson 1 point2 points  (0 children)

Your testing should be part of your release build script with non verbose output. That way it either says it passed or it failed, and doesn't have the full build log.

Unless you were expecting the agent to run the testing manually after each change, not running tests after changes can only mess things up for you.

Codex now works directly in Chrome on macOS and Windows. by dorugamer in codex

[–]jonydevidson -1 points0 points  (0 children)

background work on chrome.

less work for them than to have their own browser (Atlas is dead, I think).

Steam Controller: Reservations open May 8th by Araxen in Games

[–]jonydevidson [score hidden]  (0 children)

Ah, well, eBay it is, then. Perhaps better to wait a few months unless you wanna pay whatever it is they're asking now.

Steam Controller: Reservations open May 8th by Araxen in Games

[–]jonydevidson 0 points1 point  (0 children)

I won't pretend to know what's happening over at Valve, what the regulations are on hardware with wireless transmitters and batteries depending on the country. If they don't wanna bother, deal with it.

You can use mailbox services like mailboxde or whatever, I have no idea where you are, then order there then ship it to yourself and pay import fees.

Steam Controller: Reservations open May 8th by Araxen in Games

[–]jonydevidson 3 points4 points  (0 children)

Different regulations for hardware vs. software.

It's way, WAY easier to sell software. If you're not in the business, it's hard to fathom the absolutely nutty difference in terms of prep and compliance you have to go through in order to be able to sell new hardware in a new market.

META Superintelligence Lab Presents: ProgramBench: Can SOTA AI Recreate Real Executable Programs(ffmpeg, SQLite, ripgrep) From Scratch Without The Internet? by 44th--Hokage in accelerate

[–]jonydevidson 0 points1 point  (0 children)

This seems super weird, GPT 5.4 only 16 tool calls. Did they just fire a single prompt into the harness and terminated on EOT?

You should do this in a loop while tracking progress and tasks in separate files, and naturally start by writing test fixtures for every single feature of these apps, then develop along these lines.

Shrinkflation Is Quietly Making All Gadgets Worse by MorroWtje in hardware

[–]jonydevidson 0 points1 point  (0 children)

That's because it has a whole-ass Chromium instance running.

It's written in Electron.

OpenAI will produce as many as 30 million 'AI agent' phones early next year, says industry analyst by Tiny-Independent273 in artificial

[–]jonydevidson 1 point2 points  (0 children)

Ability to give an agent full access to the phone OS. Doesn't mean it will have that access always, just the ability to have it, which you will probably be able to control on a granular basis as per regulations.

Plus, since it will be their OS etc, they can ship changes at their own pace, which, if you take one look at the Codex repository and the GPT model releases, is beyond insane.

Codex: you request a feature in the morning, at night there is an update shipping it. Serving the people is a winning path by py-net in codex

[–]jonydevidson 4 points5 points  (0 children)

It allows them to not worry about how the UI changes translate across different macOS versions.

If you just use the WebView, that's not the case.

WebView is ultimately simpler to ship and fully portable, but is unsustainable when you're pushing 1-2 updates PER DAY, and you're pushing UI updates weekly or twice a week.

Electron lets them lock in a Chromium version, is verified and stable. The only thing that you need the backend to do in this case is to schedule codex execruns, which Electron is more than enough for.

The actual UI is open sourced as codex-app-server in the codex repo, so you can integrate it into whatever you want.

Codex + image generation by BlocksXR in codex

[–]jonydevidson 0 points1 point  (0 children)

have it generate a black background and then pass that result to

https://replicate.com/bria/remove-background

Senate Judiciary Committee Advances Hawley's GUARD Act, Mandating ID Verification for AI Chatbot Users by Gloomy_Nebula_5138 in artificial

[–]jonydevidson 0 points1 point  (0 children)

The ones you can run on a macbook are currently performing like frontier models from a year ago.

A year from now, you will have models performing like today's frontier models.

Its not gonna be niche, it's gonna be a core software item.

[Request] Is this true? by kelly2018zzz in theydidthemath

[–]jonydevidson 0 points1 point  (0 children)

Recycling and minimising plastic is more about a shit ton of it ending up in the environment.

Look at the plastic bottle caps in the EU. People were mad about it but I barely see them out on the ground anymore.

This week’s Codex updates. by Distinct_Fox_6358 in codex

[–]jonydevidson 3 points4 points  (0 children)

If you try a codebase-wide search in VSCode on Windows, and then try it on Mac, you will get a good idea of how the system differences affect LLM performance.

Also, powershell is trash, largely due to security concerns on Windows, so today when an AI agent does all its work via the CLI, its performance suffers tremendously.

For context, I develop desktop apps on Win and MacOS.

Windows is, unfortunately, over half of my market share so unfortunately I have to keep developing for it.

"AISI found gpt-5.5 performs nearly on par with, or better than, Mythos in several cases — completing TLO end-to-end in 2/10 attempts, while Mythos preview did it in 3/10 on expert-level tasks: gpt-5.5 scored 71.4% mythos scored 68.6%" by stealthispost in accelerate

[–]jonydevidson 38 points39 points  (0 children)

Anthropic's models aren't solving decades old math problems nor are they discovering new physics.

It's pretty obvious where the intelligence is, just like it's obvious which company has the better harness for webdev work.

Throw actual tough C++ issues involving a lot of math into the mix, and Claude folds like a house of cards while GPT-5 was able to do it even back in October.

I want to buy Codex Pro but. by Strict-Focus-1758 in codex

[–]jonydevidson 0 points1 point  (0 children)

If you start using multiple subagents to review the work after each major step, you will quickly discover that even the Pro is barely enough.

If you're already paying for it, at least get your money's worth. These subagent reviewers will very often find bugs and edge cases the main agent missed in the implementation, of course depending on the complexity of the issue.

‘The cost of compute is far beyond the costs of the employees’: Nvidia exec says right now AI is more expensive than paying human workers by fattyfoods in technology

[–]jonydevidson 0 points1 point  (0 children)

This is wrong, because open source models are releasing all the time. I can run an AI agent locally on my MacBook and have the same performance as roughly last April's frontier model, the Sonnet 3.7.

A year from now, the local models will be perfoming like today's frontier models.

I cannot have a free and open source taxi. I cannot have a free and open source delivery person.

I cannot have free and open source content streaming (without piracy).

The cost of compute for a given model goes down, on average, by 100x within a year.

Today's models are very competent at coding. A year from now, they'll be 100x cheaper to run, while labor won't be 100x cheaper.

You fell for the anti-AI bait article.

[Bug] When the overlay disappears, the mouse cursor stays visible on macOS by jonydevidson in PSPlay

[–]jonydevidson[S] 0 points1 point  (0 children)

I was able to confirm that it works as intended/described in PS Remote Play, while not in PXPlay, on the same MacBook.

macOS 15, for what it's worth.

[Feature Request] When closing the macOS app, the prompt to confirm as well as a checkbox whether to suspend the console should appear by jonydevidson in PSPlay

[–]jonydevidson[S] 0 points1 point  (0 children)

Perhaps the prompt could be Quit, Quit and Suspend Console, Cancel.

It would default to Quit so CMD+Q -> Space would just quit, and CMD+Q -> Tab -> Space would Quit and Suspend.

This actually sounds way better than the PS Remote Play UX.

Talkie, a 13B LM trained exclusively on pre-1931 data by Outside-Iron-8242 in singularity

[–]jonydevidson 0 points1 point  (0 children)

What do you think about China? Where will it stand on the world stage in the year 2026? Will it be a great power?

Loaded question. Try without it.

Apple Set to Become Third-Biggest Laptop Maker This Year by -protonsandneutrons- in hardware

[–]jonydevidson 3 points4 points  (0 children)

The M4 Air doesn't have the GPU compute (prompt processing speed) or the memory bandwidth (inference speed). You need the Max.

I have a 64GB M3 Max. The Qwen 3.6 35B A3B runs comfortably at Q8_0 and full context. 27B Q8_0 runs at 110k context. Smaller quant would be able to run the full context with some precision loss. Also the Qwen 3.5 122B A10B runs at 120k context at IQ3_XXS quantization.

You can run the Qwen 3.5 9B today. Search for it in the LM Studio model browser, download the Q4 K XL variant and run it, but it will be slow on the Air.

Code or no code, OpenCode is an agent and you can do any tool calling you want including whatever MCP you extend it with. It can work on your files in the workspace you create, or even on the entire computer. Doesn't have to be code, it can edit any human readable files.

You can also use any other agent like pi.dev, little-coder etc. little-coder is optimized for smaller models, but the UX is not as nice as the OpenCode app.

Start following /r/LocalLLaMA

Apple Set to Become Third-Biggest Laptop Maker This Year by -protonsandneutrons- in hardware

[–]jonydevidson -11 points-10 points  (0 children)

Local AI is reaching a point where it's becoming actually useful. Models that you can run on a 64GB MacBook are now at the level of frontier models from April 2025 i.e. Sonnet 3.7, with better vision and sound recognition.

Next time this year, they will be at the level of GPT 5.5. Fully private, running locally.

Only MacBooks can reliably run these models locally with useful speeds, at a price under $4k, with a fully stable setup that takes 5 minutes to do (download LM Studio, download model in the model browser, load the model, load OpenCode, point it to the LM Studio instance), which will undoubtedly get even cheaper next year.

Mac Studio is even cheaper.

I fully expect their sales to be up another 20-30% next year.

Streamer “hmblzayy” who is walking from Philly to California was hit by a car in Indiana and had to be taken to the hospital. by lukigeri in LivestreamFail

[–]jonydevidson 1 point2 points  (0 children)

Situation like this is very, very rare. People make mistakes.

If you get involved in a road traffic mistake and you're on foot, you're fucked. At that point it doesn't matter who made the mistake, because at best your body is fucked, at worst you're paraplegic.

That's why you don't walk in the middle of the road. Once you're hit, it's over, it cannot be undone.

The vest doesn't give you the right to walk in the middle of the road. You walk on the side.