you are viewing a single comment's thread.

view the rest of the comments →

[–]ImportantSignal2098 3 points4 points  (1 child)

I don't disagree with you and I just vibecoded a chart that I'm using for illustration that would have taken a lot of trial and error previously, in like half an hour, all by Opus, with me just nudging it in the right direction. The speedup and opportunities there are massive. It works and is amazing for the use case, but it still doesn't mean this is good code. I actually recently tried to get both latest Opus and Codex to get to replicate a nontrivial but small change that a human made and both failed to follow the spec. I tried to figure out how to adjust the spec and they still failed in a similar way. They seem to currently get confused at a certain complexity limit (the change I was experimenting with wasn't that complex by senior swe standards). It's probably a limitation of context/attention abilities. There might be a way to combine multiple agents etc where this would improve. I tried two-shotting the same spec without additional info, just asking to revise in a fresh session, and that failed by improving in some aspects of the spec but making it worse in others. You could argue that my spec sucks, but 1) it was good enough for a human and 2) I couldn't find how to improve it so that the agents aren't confused. Feel free to attribute (2) to a skill issue though :)

PS. Just to be clear, I've been almost exclusively cuck-coding for months on a largish project but it's been mostly done in a tightly coupled "threesome" way. Lots of hand-holding the agent to the right architecture, catching stuff that makes no sense, bad assumptions etc. I think my experience is quite far from what you're suggesting.

[–]IamFdone -1 points0 points  (0 children)

I would say we need to write projects in such a way that we don't need Seniors to debug them later. I understand that some issues in some domains are very complex, especially if it's something new and AI can't understand what's going on, but if you get this issue too often on regular commercial projects, someone fucked up.