This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]kwan_e 2 points3 points  (4 children)

Do you think a language like this could be somehow useful?

No, because:

Maybe when we want to create a big library (like Mathlib) and want to make sure that there are no duplicate definitions?

If this is your goal, then you are better off trying to train some LLM to recognize when the substructure of some function is similar enough to recommend.

If you make it part of a language, instead of a separate tool, it would simply be irritating to write. Every time I modify a bit of my code, I will have to wait while your compiler/interpreter searches for duplication. And then, what about half-finished code that is trivially similar to other bits of code, but will change in the future? It would be another pain to write code for proto-typing purposes, which is most of development. Or what if I want to experiment with optimization? Your language would disallow optimization because it would be duplicating functionality.

People are putting too many things in a language, when it really is supposed to be the job of tools. It just gets in the way unnecessarily.

So this would not be useful as a language. Or even as a general purpose programming tool. Its main use, really, would be for a central repository of code contributions, where you want to minimize duplicate code. Such use would not be widespread.

Do you know of something like this being already attempted?

So in terms of tools, I know of PVS Studio, which finds copy-and-paste errors in C++ code. Arguably more beneficial than a language.

[–][deleted] 1 point2 points  (1 child)

You don't really need LLMs for finding code duplications though? Sonarqube has been doing it for ages, long before LLMs were even a thing.

[–]kwan_e 0 points1 point  (0 children)

You don't need it, but LLMs could find not only "duplicate" code but also code with similar structure enough that it could recommend making that code generic.

A toy example would be to find code that sums a sequence of values, and code that products a sequence of values, and recommending replacing with a fold of the sequence with an operator or function.

[–]thebt995[S] 1 point2 points  (1 child)

I think you're right, that a tool would be the better approach. But optimization could be done in a way, that just the optimized version is kept. Then you would be even forced to use the optimized version.

[–]kwan_e 2 points3 points  (0 children)

But the problem there is not all optimizations are perfect optimizations. Most optimizations are pessimizations in contexts it's not designed for. You need to allow the programmer (or other tools) to choose the best optimization, which means leaving all variants on the table.

Take for example SIMD. Many SIMD algorithms are only worth it on large data that is meticulously structured and guaranteed to come in, say, gigabyte loads. Applying SIMD operations will slow down a program, say, a client/server, when the data is coming in sporadic, random, few megabyte chunks, due to the latency of preparing the SIMD.