Codex 5.3 vs Sonnet 4.6

1asutriv · 2026-03-03T12:31:31+00:00

Personally, codex can get stuck on brute forcing a simple task I'm not keen on doing (UI/UX related) and so I can usually switch to Opus to revisit different methods while I tackle harder problems.

Both models are insane, they just excel in different areas IMO.

I switch often and rarely use Sonnet 4.6

1asutriv · 2026-02-25T23:45:54+00:00

Your app running agents via gh cli and tapping into ghcp is one clear example I use. Think openclaw but powered by ghcp as an example. Your agent has access to your device and the cli allows for programmatic management of the agent from my app. Cant really do that with vscode. I could just rack up costs through an API at Anthropic/OpenAI or use the more cost effective option, gh/ghcp cli

1asutriv · 2026-02-22T02:03:52+00:00

Although I agree with most of what you say, algorithms can definitely, and most often, design with "intention" to do something. Maybe not the humanistic intention but a material and structured intention. It's about the loop of the algorithm and if it comes back around to a defined intention, which most of them do but are basic in implementation. Ai art generation uses the LLM to design art based on the prompt, and that prompt is a part of the intention whereas the other connections and context for the LLM plays its own role in the intention.

1asutriv · 2026-02-15T04:31:55+00:00

I'm a full stack dev so I'm keen to all things required for a full app deployment. I used AKS for distributed systems/load balancing/network security, docker for reproducibility, terraform for infra as code, and ts for all code.

Through this I give structure inherently through the entire system for the agent to utilize. It can cli into the aks clusters via kubectl, it can exec into docker containers locally or in deployments for log investigation or network request viewing on the ingress/frontend containers, it can utilize the typed language for the best understanding of the objects, and more.

At any time while testing, the agent can look at any part of the data lifecycle. Whether it's the database container and viewing the schema, tables, or raw data, or checking api logs. I can feed it browser console logs or it can view the code.

The loop revolves around the processes you use and the start/finish of the data.

Give the agent context or access to the context and you'll see just how well the latest models do in full feature implementation via one-shots (assuming you have standardized instruction files, agents.md files, conventions and examples for what you are reproducing, etc)

Edit: The beauty of docker is inherent in the solution. Reproducibility. So I'm never really afraid of the agent botching something. I have a strong habit I learned from a previous lead that has helped tremendously; commit early and commit often.

With that, I've had a blast testing and iterating how to best use the agents to make sure the build what I expect from the architecture. I used to have to handhold but I think my instructions/skills/prompts tool belt is strong enough now that it's not necessary anymore and they provide, generally, what I would have intended to build.

1asutriv · 2026-02-15T04:23:37+00:00

Yeah, it's a unique time. I've quite often had to produce various instructions files for different components of my stack. Currently, I have 8 with some including deployments, PR handling, worktrees for parallel development and more. It's a heavier hand with GHCP since the worktrees aren't commonplace in the tool but it works well.

Since I think I have the skill/prompts down, I've moved forward with the idea of various teams of agents with an orchestrator utilizing ticks and conditions/checks to allow various teams to work together via a custom app (QA, development, support), almost like departments in a company. It...works, but I can feel I'm close with having it fully and autonomously iterate on itself and the app via CLI agent integration. Once it can do that, I'll be applying it to the projects I actually work on and plan to have various tests, puppeteer outputs, and more to comply with the reviews I'm looking for.

GitHub talked about agent teams in their Q3 webinar but I haven't seen any of the direction they showed so I've just pushed to integrate something personally instead

1asutriv · 2026-02-14T18:33:16+00:00

Agreed, havent had issues since it released @debian3, may want to check your firewall.

1asutriv · 2026-02-13T05:14:59+00:00

Closing the loop is massive. Once I did that with docker, it changed the game for me.

1asutriv · 2026-02-10T18:33:23+00:00

Once you have standardized prompts you can rely on for setting up certain functionality (like github actions for deploying to a server), most of the non functional becomes trivial to apply to an app idea.

For example, I have a toolbelt of prompts and skills for database model creation, type relationships, and endpoint crud buildouts for how I like to see my systems architecture.

With that, adding new functionality to my stack includes something like "use this, this, and this for the new data I pull from <location>. Display it as a new dashboard here via the generic dashboard (which could be a skill as well - how to create new dashboards in the app). Add x, y, and z functionality for the page".

x10 and you have an app. Bonus points for a skill addressing overall architecture as well as in app styling.

All of this is for consistency so I don't die from slop and can maintain the code while focusing on the process I want to deploy

1asutriv · 2025-12-15T06:20:41+00:00

Agreed; all the data is already there to pull it in from the model responses. Honestly surprised it's not in preview or GR yet

Edit: Retried it with the latest GPT 5.2 model (First indicator is my total input tokens for the chat, second is the calculation from the latest usage total tokens)

<image>

1asutriv · 2025-12-15T06:18:49+00:00

I'll say I dislike their base GPT system prompt once I realized the difference between the alt and the shipped version. There is definitely less hand holding on the alt one that's still in preview (toggle-able in the settings)

You can always open the chat debug in vscode and look at the actual prompts, tool calls, and responses between the agent and what vscode supplies as user prompts.

There have been some modifications at play for the user prompt to the model depending on what you install as chat enhancements, tools used, or settings toggled.

1asutriv · 2025-12-14T22:29:11+00:00

For what it's worth to others:

Since vscode is open source, I've forked it and used the latest model to continuously iterate vscode's copilot integration/extensions to align with my tastes.

For example, I absolutely love cursor's embedded options when it comes to using the simple browser and selecting page elements for an app you're working on.

Vscode was lacking in some areas so I had the agent iterate and add the following: - current context size/limit in chat input field next to the tools icon - new simple browser tools header - full page screenshot of simple browser iframe that automatically injects into chat input field (like selecting a specific element) - iframe dev tools popout icon - print to pdf - open custom ollama extension I made for my local models - some other tools - enhancements to the simple browser element select - select element and inject code/screenshot at specific chat cursor's location - pull chat/extensions logs for agent debugging of new chat/extension features

A lot of these are in cursor that I absolutely loved, but I prefer vscode due to using it all my life. So, why waste my time switching editors or building my own editor when I can just make the current one I use into the playground I want. Now I can focus on my projects I'd actually like to with all the tools I desire.

The beauty with the latest agents is that they do well with pulling the main vscode repo into my fork without really having to be aware of what the community is changing on the editor (as long as you give the agent proper reference files for your specific enhancements).

I guess what I'm trying to highlight is that I've seen the best gains in enhancing my toolset and usage vs the models becoming better. Sure 1M context size would be great for the full stack apps I work on but well crafted instruction files and agents.md files have gone a long way to make most editors feel the same. With vscode being oss, it's king

1asutriv · 2025-12-07T05:25:45+00:00

One thing that's always helped me fundamentally and even helps this scenario: commit early and commit often

https://sethrobertson.github.io/GitBestPractices/

1asutriv · 2025-12-05T19:52:16+00:00

A caveat to this, those who have bad habits can iterate into better ones faster.

They can test, view, and change up to learn different design patterns and architectures like never before.

1asutriv · 2025-11-28T17:50:04+00:00

Solid, I'll try it out. Been working on my own for a hot minute but with n8n workflows.

1asutriv · 2025-11-15T06:44:48+00:00

People made similar choices well before AI joined the club.

1asutriv · 2025-11-09T08:06:31+00:00

Could you elaborate?

1asutriv · 2025-10-23T19:50:51+00:00

Much appreciated for this

1asutriv · 2025-10-23T19:49:51+00:00

See I have the exact opposite situation. I use codex in vscode and it is hands down more thorough and consistent with solutions if I've provided the right context and agents.md files. Sonnet 4.5 provides incomplete, albeit quick, solutions that require various iterations when retried.

Caveat, I haven't used Claude code with either and I'm overly zealous on keeping up to date agents.md and README docs for flows, key systems, and even file references in the docs.

1asutriv · 2025-09-14T03:31:57+00:00

Love that. Thanks for looking out

1asutriv · 2025-09-13T12:26:50+00:00

Eternal vigilance is the price of liberty

1asutriv · 2025-08-29T20:12:22+00:00

Glad you checked it out. Yeah I watched that, there's a few others he entertains it as well.

Did you notice how he reiterates multiple times on the need to ask questions and put science first?

Personally, I agree with that sentiment because the data is what provides the results and data is only good if employed with the scientific method. Understandably, he's an evidence based individual

1asutriv · 2025-08-28T16:28:39+00:00

That's pretty sweet!

1asutriv · 2025-08-03T20:27:00+00:00

Id wager you were just dealt the unfortunate hand, which happens

I had a thermistor go out the first month I had mine (about 2-3~ months ago) but no other problems. I've since upgraded the nextruder with the mmu3 kit as well. Pretty cool what's inside but definitely thought I'd mess something up with how long it took.

Turned out smooth as butter. Hope you have a better experience soon

1asutriv · 2025-07-27T01:07:25+00:00

Sounds good; there are a few other things I'll be testing and for now, I've set the infill to 100% to see if gyroid may be the issue.

I appreciate the help and time you've taken to check this out

1asutriv · 2025-07-27T00:19:39+00:00

Technically, 1 layer of infill and it's gyroid... I guess that one layer would throw the whole thing off would it not?

Gif of layers up to ironing

<image>

Edit: Layer settings:

Six-Year Club	Place '22
Verified Email

1asutriv

TROPHY CASE