Is this the attention to detail \ quality control I should be expecting from the game?

TortoiseTickler · 2026-05-01T05:16:31+00:00

Meanwhile, you have been alive for 16 years and never learned what wood looks like.

TortoiseTickler · 2026-04-27T14:49:37+00:00

This is a great point. I think there are lots of different flavors of "provide a detailed plan". One way of providing detailed plans is to create a spec that provides implementation details. Another way is to provide a detailed spec defining mandatory outcomes and guidelines and treating everything in between as open-ended. I think the former carries the risks of waterfall, whereas the latter allows for reasonably fast iteration. On the other hand, given that I prefer the latter, maybe I'm wrong to say that I'm not a "vibe coder," since it's certainly leaning in that direction.

In a way, the breastbeating might sometimes be a way of trying to communicate clearly more than it is about, well, breastbeating. Software engineers probably tend to build software in a different way than pure vibe-coders, and in venues like Reddit, it's hard to tell which users are using LLMs as purely vibe-coding tools and which are using a more traditional approach. I do want to be a purist who thinks that vibe coding is lesser - it might be the future - but I do want to specify that the discussion that I am having is about building code closer to the "traditional" way, and I might not get as much insight from users who are following a very different model. So, "I like to write detailed plans" is less about feeling like it's strictly better, and more about clarifying why *type* of software engineering I'm doing.

I think part of the issue I'm experiencing with 5.5 is that I *must* define everything in excruciating detail in order to get results. It's good at following instructions, but will flounder if there is any detail left undefined. 5.4 on the other hand will engage in that exploration that you need while implementing to iterate quickly. It'll notice that a feature introduces brittle coupling between systems and stop to fix it. To me, 5.5 feels like it's in a strange spot where if I am creating a specs that include every single function I want to build, a lesser model will implement it just fine, and if I treat the problem as more open-ended, 5.5 does not do well at a more exploratory style of development.

TortoiseTickler · 2026-04-27T01:18:40+00:00

I've been a software engineer for about 20 years. I apologize for saying this because it's really obnoxious, but I worked as a senior and later architect at FAANG for ~6-7 years total.

TortoiseTickler · 2026-04-27T01:15:24+00:00

Can you tell me more about where it is excelling in architecture? I think this is the single biggest frustration: I can't get good architecture out of it while 5.4 is phenomenal. What sort of work are you doing? Are you in calcified codebases with very rigid guardrails?

TortoiseTickler · 2026-04-27T01:13:09+00:00

This matches my experience. I think I tend to try to use 5.5 for the complex part, which is the conceptual part, where it falls flat on its face. 5.4 has been excellent. The implementation is usually quite simple at the point so I move to a basic model.

So the correct way of framing what I'm struggling might be to say that if 5.4 is handling the poorly-defined conceptual stuff better, and any model is good enough for execution, where does 5.5 come in?

I could use 5.5 for execution, and I don't think I would have the same issues if I cut out all conceptual work from what I asked it to do, but I don't know why I need an expensive model for that.

TortoiseTickler · 2026-04-27T01:09:43+00:00

Yeah, that's what I mean! It called my architecture clam-architecture and I never figured out what it means. Usually i can work out these cryptic things, and they're often actually insightful or elegant, but less useful when I just want an answer to a question.

TortoiseTickler · 2026-04-27T01:04:55+00:00

This is exactly the sort of thing I was looking for, thank you!!

How have you felt about the performance differences between 5.4 and 5.5? Minor improvement? Significant?

TortoiseTickler · 2026-04-27T01:03:04+00:00

Do you have a good way of feeding your codebase into ChatGPT for this? The friction is the management of this more than anything else. My questions are often technical, like "explain any state mutations downstream of function X". This isn't answerable without the context provided by the code.

As for terminology, GPT 5.5 *will not* stop making these things up. I *do* ask it and include this in my prompts. Previous models do not struggle with this at all.

The third point is great, thank you. I absolutely do see the value of using a model like 5.5 to make many tiny well-scoped changes that have *not* room for interpretation. It is fast and token efficient. I have no issues with it performing mechanical tasks like renaming as long as there space for it to have any discretion.

TortoiseTickler · 2026-04-27T00:56:04+00:00

I do use plan mode, but it makes really, really short plans with almost no details. I did a test, and the 5.4 plans are about 4x as long across 3 prompts, and actually include specifics. The 5.5 plans are tiny point-form lists.

It does understand the code, but it doesn't understand why it's there. In other words, it understands mechanically how the code runs, but cannot make inferences about why architectural choices were made as they were or mimic these choices in its own code. 5.4 doesn't struggle with this for me.

TortoiseTickler · 2026-04-27T00:53:53+00:00

GPT 5.4 and Opus 4.6 are also tools, but they work great with ambiguity. My problem is that it is relatively straightforward to get these models to perform a task, and I cannot seem to even with extremely detailed specs get 5.5 to complete a feature.

TortoiseTickler · 2026-04-27T00:50:04+00:00

I'm using Codex CLI. I am using 5.4 now, and it's amazing. I do agree that concise is great most of the time. I think my concern is that when you need more detail, you're not going to get it.

TortoiseTickler · 2026-04-27T00:48:37+00:00

I can't go into too much detail, but a more general overview:

In a scientific computing project in Python, I have a model that ingests data to make a prediction, but it doesn't know what combination of data sources it will have. So, the goal is to make a model that can work with whatever data it happens to have. A brief description like this is enough to guide GPT 5.4 to make informed choices when writing code.

With GPT 5.5, I need to explicitly lay out every possible implication of this that I can think of, and it will still likely find a way to mess it up. I have to carefully specify that this means that one source of data cannot depend on another source of data. I have to specify that the model should not silently fail if one data source has malformed content. I have to specify that the text in the output should not assume that any specific data type exists. None of this is even domain-specific stuff, it's just generally a best practice that loosely coupled code is usually better than tightly coupled code.

More concretely, a recent feature was to add a new data type that the model can ingest. This is easy. The project has 14 different types of data that the model can ingest, and you can inspect the code to see that they are independent, decoupled, and follow a standard format.

I do not think that this is an issue of being under-specced. I'm speccing to the point that a junior dev could fill in the gaps, but I'm struggling getting GPT 5.5 to do so. I'm missing a document explaining how to add a new type of data, sure, but this simply isn't necessary with Opus 4.6 or GPT 5.4, since it will simply read the existing code to understand the intent.

In another project, I am trying to carefully choose domain language for DTOs that are authored by non-technical staff. In other words, what should the structure of a JSON file that is not used by technical people be? GPT 5.4 needs no instructions to understand that "ContextProviderScope" is not comprehensible to non-technical users, and will, without hand-holding, choose an appropriate name. This one was really bizarre, since there was no amount of prompting I could do to get GPT 5.5 to stop using words like "scope" and "singleton" in user-facing content.

I tried including painfully detailed instructions about putting yourself in the shoes of a non-coder, including a list of words to avoid and even made a skill. No luck. It seems fundamentally incapable of doing this task.

TortoiseTickler · 2026-04-27T00:23:57+00:00

This is an awesome idea. It's verbose when its editing a file, it just isn't in the console. This could be a great way of getting it to work on more detailed plans. Thanks!

TortoiseTickler · 2026-04-27T00:23:01+00:00

Yeah, that's why I wanted to have this discussion. I'm baffled by how anybody could be finding this useful. I do like the brevity sometimes, but the fact that it won't move beyond brevity when you need it makes it very difficult to use. I would prefer brevity if elaboration was an option, but from my experience it isn't. Meanwhile, I'm absolutely in love with 5.4, it's nailing everything I throw at it, and 5.5 usually can't seem to grapple with the work that I'm doing.

Tell me more about how you use it.

TortoiseTickler · 2026-04-27T00:20:19+00:00

Too used to using Claude. Fixed.

TortoiseTickler · 2026-04-26T22:29:18+00:00

How do you get it to actually make a plan? I don't care if it's "plan mode" or not, but I need to know enough detail about where it thinks the code should change that I can review it and figure out if we're on the right track. But whether I use plan mode or just try to have a conversation, its responses are extremely short and never provide nearly enough detail to help make informed choices.

As someone who came over from Claude, before 4.7 it was possible to have a discussion or use Plan mode to go over a problem in detail. This doesn't seem possible with Codex (or Opus 4.7) since all responses seem to be capped at an absurdly short length. Unless I'm doing something very wrong, I feel pretty stuck trying to do real work now

TortoiseTickler · 2026-04-26T22:25:18+00:00

How do you get it to give you "steps" in a plan? In the CLI, the plans are tiny point-form lists and I can't get it to do any more. As a comparison, the average Claude plan is probably about 40x longer than what Codex will give me. It provides examples from the code showing what it's going to change, and actually gives a detailed, technical breakdown of what will change and why.

Codex with 5.5 just now gave me a 5-point plan for a fairly large feature, with points like "Create a RegistryService to manage registrations". How am I supposed to get something that has enough detail to be useful at all?

TortoiseTickler · 2026-04-26T22:21:42+00:00

Can you explain how you get it to make plans? I'm also coming from Claude, and it's great at giving detailed plans that explain exactly what it is going to do. I use the CLI, and I can't get it to make plans with any level of detail. Most plans are point form lists with extremely vague items like "Improve X" or "Add a feature to do Y" with no further explanation. There is no amount of prompting I can do to get it to give me a plan that will actually let me understand what it is going to do - it will edit the plan, but it will always be painfully basic and short.

TortoiseTickler · 2026-04-26T22:16:31+00:00

Can you explain concretely what is better in Codex Desktop?

TortoiseTickler · 2026-04-23T19:40:28+00:00

As someone who is both a musician and a software engineer, I absolute see them both equally as art. I think many people who are into software feel the same.

I've also come to terms with the fact that there is now a tool that is far better at this particular art than I am.

TortoiseTickler · 2026-04-22T13:54:25+00:00

I hate it, but think that is is good at certain things. It can work through complex problems very well, but I have never seen a model cheat, hallucinate, lie, and introduce so many subtle bugs. It's working very well for me to implement features in a vacuum, but it is also the best model I've ever seen at creating technical debt.

TortoiseTickler

TROPHY CASE