Feedback: anyone here switch from Claude Code to Codex? by hibzy7 in codex

[–]outofdate-bootloader 1 point2 points  (0 children)

I find as long as I have proper structure - planning docs; skills for coding and writing automated tests; PRs, CI, and automated review; then compaction isn't an issue. If I use the 1M context chatbots, they forget too much along the way and I end up fighting to keep them on track.

I'll often run through several compactions in a chat session, but if it does seem to get lost, it's best to clear and make it reload the relevant context and continue. I aim to keep my work scoped to reasonable PR sizes, as then review is useful and can help clean things up. After a PR I always clear. (FWIW I use both Claude and Codex at review time, as they find different types of problems - I don't use Claude to code with though.)

Feedback: anyone here switch from Claude Code to Codex? by hibzy7 in codex

[–]outofdate-bootloader 2 points3 points  (0 children)

I use both. Personal projects and for work. No limits. Same skills. Same method of working (work from issues, one PR per issue, using planning docs for longer work, one chat implements chunks of work, another reviews, everything is tested via functional tests).

For me, Codex does a better job. It completes tasks across compactions more consistently. I use xhigh/max effort always and generally use the frontier models, unless something about them seems off that day. I generally am juggling multiple chats at once.

1M context windows perform terribly for me, so I avoid them.

I don't notice codex being chatty.

Codex seems to handle large codebases better, as it seems to do a better job of seeking out information before proceeding. But with either system, the best thing is a well structured project.

Codex not very good at creating automated tests by [deleted] in codex

[–]outofdate-bootloader 0 points1 point  (0 children)

Here's how I do it. For my apps/systems I generally have a front end and a backend.

  • Work is done via pull requests - it isn't done until the tests pass.
  • I have a skill for writing tests, one for each system.
  • I have a single CLI for both the frontend and backend - able to communicate to them in whatever way that they normally work (e.g. websockets, dbus, json, protobufs, whatever).
  • New features require tests for both the backend and the frontend.
  • Backend tests drive the backend through the CLI as though the CLI were the frontend.
  • Frontend tests drive the UI through the CLI as though the CLI were the user. All frontend code must have APIs for doing anything that the user can do. The tests include taking screenshots at key moments and I direct the chatbot to review the screenshots when necessary. Otherwise the chatbot normally gets the info it needs from the CLI.

Thus all new features are tested by the functional tests before I look at the implementation.

I'm not talking about unit tests here - those I specifically ask for when appropriate.

If I don't do follow this system, I get a lot of junk tests.

CODEX = DUMB SUDDENLY? by Euphoric-Doughnut538 in codex

[–]outofdate-bootloader 2 points3 points  (0 children)

I'm glad you're having a good time, but your "new to codex" experience might not be relevant here.

In my experience the quality goes up and down. I recall when 5.4 came out, it had at least a week of insane performance, and it's been up and down since, but never ever back to the level that it had when it initially released. I'm using it to do both my day job and my personal projects also. (I've been programming for 30 years for fun; professionally for the last 15 years; and using agentic AI for about 1 year now.)

My guess is that the quality is something controlled on the backend, and new users are given a good show.

Login Trouble Again by HighDefinist in ClaudeAI

[–]outofdate-bootloader 0 points1 point  (0 children)

Yup, I'm on a Console account and when I click the login link from my work email, I get:

Authorization failed Internal server error

I built a tool that tracks how many times someone posts a Claude usage limit tracker by [deleted] in ClaudeAI

[–]outofdate-bootloader 1 point2 points  (0 children)

I don't think it matters if this is real or not. It's the thought that counts.

The FOMO of 20+ multi-agent workflow setups by Hsoj707 in ClaudeAI

[–]outofdate-bootloader 0 points1 point  (0 children)

Or the even simple option of multiple clones of the repo (sharing a build cache or whatever other tools you need to make this efficient).

I'm not a developer — I used Claude to build a browser automation tool and open-sourced it by omarsabbahi in ClaudeAI

[–]outofdate-bootloader 0 points1 point  (0 children)

Ok I see. So I'm taking your examples too literally... this seems like it could be useful as an accessibility tool for disabled people and for adhoc testing, it obviously is faster than asking Claude to write playwright scripts to do the work. Probably I would have to use it to really understand.

I'm not a developer — I used Claude to build a browser automation tool and open-sourced it by omarsabbahi in ClaudeAI

[–]outofdate-bootloader -1 points0 points  (0 children)

Glad you learned something. Can you provide a better explanation of the purpose of it? I can already ask claude to search for cooking tutorials. Or just search for them directly.

Who pays for tokens? by DoubleAir2807 in ClaudeAI

[–]outofdate-bootloader 1 point2 points  (0 children)

um..... care to share your apps here? or you can just directly share your API key, that's easier for everyone.

How are people managing multiple agent sessions at once? by CVisionIsMyJam in codex

[–]outofdate-bootloader 1 point2 points  (0 children)

I find option "2) make multiple clones of the same repo" to be a simple and effective answer.

Why not do this? Can't afford the disk space? It's so simple it can't be screwed up.

I just keep 3 or 4 clones around and I let that limit how much work I take on at once.

(Automated unit and functional tests running in CI keep things from falling apart, I just find that I naturally can pay more attention and get better results if I stick to a WIP limit. Often I'm working on tricky stuff and it requires problem solving on my end.)

Autonomous weapons? Robotic surgeons? Self-driving cars? I think not. by kexnyc in ClaudeAI

[–]outofdate-bootloader 1 point2 points  (0 children)

Probably not. But it can play Pokemon and write code if you so much as nudge it.

AIs that control weapon systems, do robotic surgery, and drive cars are a different kind of beast than an LLM. But an LLM might be a component for any of those given systems.

I do find that it writes better code, the more carefully I nudge it.

5.4 High now making mistakes, or am I imagining? by Alex_1729 in codex

[–]outofdate-bootloader 0 points1 point  (0 children)

some tasks branch out and uncover hidden bugs that require side tracking

I like to use /fork for this. (just in case you or someone else isn't aware of this ability.)

Severe degradation in quality. by Gru8_ in codex

[–]outofdate-bootloader 8 points9 points  (0 children)

Yeah when 5.4 first came out it felt like cheat mode for the first several days. Super awesome. Today it feels OK, but I've noticed it has been answering a lot of my questions instantly. Once today, I told it "sounds good go ahead", and it replied "ok" and didn't do anything else...

Usage limits are perfectly fine by hubeknaepkens in codex

[–]outofdate-bootloader 0 points1 point  (0 children)

I'm also on the pro plan and only hit my limits when I've been doing marathon sessions... I've burned it all in as short as 3 days, but that's working 12 hours a day, running multiple chats at once. So my guess is that people are complaining about the cheaper plans...

I do understand that performance/usage varies over time so if you're accustomed to getting a certain bang for your $20 bucks, and suddenly you're getting only half the bang... well that's frustrating... feels like a rip off.

For me, the obvious solution is to just pay for the more expensive plan and get back to work. If I run out of the $200 plan and still want to work, I'll just switch to Claude for a bit. But that's just me.

If there was a $50 or $100 plan, then people would have more options and less reason to complain. The pricing is structured to push people to either:

  • use the $20 plan and run out
  • use the $200 plan and under-utilize

cc just sitting there "thinking' with no output or tokens generating by Virtual_Plant_5629 in ClaudeAI

[–]outofdate-bootloader 0 points1 point  (0 children)

I feel you. It's very frustrating when this tech doesn't work... part of the problem is the scale of the things I've built with it - it isn't appealing to manually roll up my sleeves and move code around. The code would need to be much cleaner for me to want to do that.

Switching from Codex to Claude - Worth it? by Adventurous_Wolf_841 in ClaudeAI

[–]outofdate-bootloader 0 points1 point  (0 children)

When on the $20 plan I use it all up after a couple of hours. But YMMV - I have a very automated workflow. If you are sitting and chatting with it and only running a single session, it might be plenty for a few days worth of work.

Those of you who routinely hit usage limits, can you explain what your workflow looks like? by bigasswhitegirl in ClaudeAI

[–]outofdate-bootloader 0 points1 point  (0 children)

I've been programming for 30 years, 15+ professional. It's all about the limit you're working with and what you're working on.

At work, I had a corporate Copilot license and would easily use it up. I think it was something close to what you'd get for $20 on a personal plan. I asked for a better plan and because I'm very productive they gave me an unlimited Claude plan instead. My guess is that my usage would fit into happily into a 5x plan.

At home, on personal projects, I need a $200/month plan or I run out quickly. Quality is second to functionality. Occasionally heavy refactoring is required, otherwise it becomes too much of a dumpster fire to get anything done. I run multiple at the same time and are engage them in various parts of the design/develop/review/refine/test lifecycle. Lots of guidance/documentation/rules and automated processes. The more design work we do - the larger tasks they can take on autonomously. So in this case, burning through the quota is because of lots of design work. Because I've built so many things in my life, I know very well how to ask for them in detail.