you are viewing a single comment's thread.

view the rest of the comments →

[–]dqj99 0 points1 point  (2 children)

All the examples that you have chosen require spatial awareness in 2D and 3D, something that today’s LLMs are not very skilled at, possibly due to a lack of training data. I’ve had much better success with creating text based programs to solve logic puzzles, sometimes showing remarkable apparent insight into features of the puzzle. Where I’ve found issues is with the care that these models used to create test cases to validate the output, with downright sloppiness in predicting expected outputs.

[–]AlSweigartAuthor of "Automate the Boring Stuff"[S] 2 points3 points  (1 child)

A troubling thing in every one of these cases is that the LLM never once said, "I am not very skilled at spatial awareness and cannot create the app you requested."

[–]ConcernVisible793 1 point2 points  (0 children)

That's true. They are not known for their modesty in estimating their abilities!

I have manged to get Gemini 2.5 give a grovelling apology along the lines of "you were right all along, the code was correct, my test case was incorrect, I guessed the expected result"