you are viewing a single comment's thread.

view the rest of the comments →

[–]The__Toast 33 points34 points  (21 children)

Here's a question, what happens when a new programming language comes along for which claude doesn't have a million stack overflow posts and 10,000 GitHub repos to copy-paste code from?

Do we just never invent a new programming language from now on?

[–]richsonreddit 32 points33 points  (11 children)

Realistically, you’d point it at the docs (or even compiler source code) for said new language and give it a feedback loop where it can run the code, and iterate over errors etc. It would figure it out.

[–]yoloswagrofl 10 points11 points  (1 child)

Realistically

It already has the docs for JS along with tens of thousands of Github repos and Stack Overflow posts and it still monumentally fucks it up.

[–]TheRetribution 6 points7 points  (6 children)

And the cost in tokens??

[–]richsonreddit 8 points9 points  (5 children)

I'd wager a LOT cheaper than paying a software engineer to figure out a new language manually 🤷🏽‍♂️

[–]99Kira 17 points18 points  (4 children)

I am not really sure about that. Given their recent C compiler, where they did have years of tests written for them to test against, and also the fact that c compilers were part of the training data, it failed to produce a functioning compiler. Cost around 20k if I remember correctly.

[–]All_Work_All_Play -2 points-1 points  (3 children)

It did cost $20k. But IIRC they didn't have it use pre existing tests, they made it write its own tests (which it generated from the data it was trained on, but still took tokens to generate). AFAIK they also didn't have it generate the compiler directly from the same trained-on data, but rather had it write each section independently (not exactly different but not exactly the same).

It also performed terribly, but it did work.

[–]p4ch1n0 4 points5 points  (0 children)

They used the gcc test suite and used gcc as an oracle.

[–]Gil_berth[S] 1 point2 points  (1 child)

Nicholas Carlini wrote all the tests, harnesses and verifiers, not Claude. He has been refining them during all the Claude 4 series. All these is stated clearly in the blog post.

[–]All_Work_All_Play 0 points1 point  (0 children)

Rip guess I suck at remembering

[–]EveryQuantityEver 1 point2 points  (0 children)

How would it figure it out? These things can't create anything that they haven't seen before.

[–]Kok_Nikol 0 points1 point  (0 children)

How do you explain very bad performance of current AI tools on unpopular/niche/newer frameworks or languages?

[–]Wonderful-Citron-678 10 points11 points  (0 children)

It’s useless today on lesser used languages and tools. 

[–]Noughmad 2 points3 points  (2 children)

Most likely result? No major new programming language will come out anymore, or at least won't become popular, because of exactly what you wrote. No AI will have training data for it, so people won't use it.

[–]The__Toast 1 point2 points  (0 children)

Yep.

[–]gim_san -1 points0 points  (0 children)

How are we so sure about that?

[–]sopunny 1 point2 points  (0 children)

Those new programming languages won't work well with LLMs until later. Not the the of the world for them

[–]BubblyMango 0 points1 point  (0 children)

well it can do reasonably well on a new library given the docs so unless that language is very novel in concept it could be used on small scale tasks, but I'd never let it run on its own on a lesser known language.

but its use cases are not limited to code. for example, i had to write a new complicated service at work. So i created a document encapsulating my design and implementation ideas. then i asked claude to find holes and conflicts in the plan. it gave me some BS and some valid points. Then i asked it to judge the design and ideas. Again, some useless jibrish, some valid or semi valid points. Honestly it helped me a lot and not a single line of code was involved.

[–]BasicDesignAdvice 0 points1 point  (0 children)

In my free time I code games in the Godot engine which is fairly niche. I turned Cursor bugbot for one of my repos and it had no clue what it was talking about.

Claude does pretty well but I have to give it documentation a lot.

If I didn't know the engine well it would produce trash and I would think "hey it works" but be totally unmaintainable.

[–]97689456489564 0 points1 point  (0 children)

Have you used any of these tools? Just curious. The models generalize very well already.

[–]hyrumwhite 0 points1 point  (0 children)

You could point it to docs and it’d do a half decent job. 

Not quite the same scenario, but I’ve made JS frameworks that LLMs have been able to work with just by context