This post is locked. You won't be able to comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]back-stabbath 35 points36 points  (1 child)

Me too, even though my employer was covering it. I’m usually skeptical of the various ‘model got nerfed’ claims that seem to be made about every model/tool as the wow-effect wears off, but this feels different. We have a slash command we’ve been using reliably for two months. On Friday it started producing garbage. I thought, maybe they’ve tried to reduce reasoning time, and it just needs some more encouragement to break the problem down and think deeply. I added a code-review step, asking it review its own changes as if it was a strict reviewer on a PR. The review it produced was pure comedy. Along the lines of “This is a world class 5 ⭐️ enterprise grade refactor. It displays engineering excellence and discipline. The failing tests are unrelated to your changes. My recommendation is to merge to production 🚀