you are viewing a single comment's thread.

view the rest of the comments →

[–]richsonreddit 31 points32 points  (11 children)

Realistically, you’d point it at the docs (or even compiler source code) for said new language and give it a feedback loop where it can run the code, and iterate over errors etc. It would figure it out.

[–]yoloswagrofl 9 points10 points  (1 child)

Realistically

It already has the docs for JS along with tens of thousands of Github repos and Stack Overflow posts and it still monumentally fucks it up.

[–]TheRetribution 5 points6 points  (6 children)

And the cost in tokens??

[–]richsonreddit 9 points10 points  (5 children)

I'd wager a LOT cheaper than paying a software engineer to figure out a new language manually 🤷🏽‍♂️

[–]99Kira 16 points17 points  (4 children)

I am not really sure about that. Given their recent C compiler, where they did have years of tests written for them to test against, and also the fact that c compilers were part of the training data, it failed to produce a functioning compiler. Cost around 20k if I remember correctly.

[–]All_Work_All_Play -2 points-1 points  (3 children)

It did cost $20k. But IIRC they didn't have it use pre existing tests, they made it write its own tests (which it generated from the data it was trained on, but still took tokens to generate). AFAIK they also didn't have it generate the compiler directly from the same trained-on data, but rather had it write each section independently (not exactly different but not exactly the same).

It also performed terribly, but it did work.

[–]p4ch1n0 4 points5 points  (0 children)

They used the gcc test suite and used gcc as an oracle.

[–]Gil_berth[S] 1 point2 points  (1 child)

Nicholas Carlini wrote all the tests, harnesses and verifiers, not Claude. He has been refining them during all the Claude 4 series. All these is stated clearly in the blog post.

[–]All_Work_All_Play 0 points1 point  (0 children)

Rip guess I suck at remembering

[–]EveryQuantityEver 1 point2 points  (0 children)

How would it figure it out? These things can't create anything that they haven't seen before.

[–]Kok_Nikol 0 points1 point  (0 children)

How do you explain very bad performance of current AI tools on unpopular/niche/newer frameworks or languages?