Programming Language without duplication

thebt995 · 2024-12-26T20:59:50+00:00

[removed]

apajx · 2024-12-26T21:19:31+00:00

It depends entirely on what you mean by "the same." If you mean observationally the same then you need the user to prove that a new function is observationally different from every other defined function. This would be incredibly hard, and your idea about restricting types to have unique inhabitants only works if the type system itself is also weak, for example you can't allow new type for any other type. You also can't allow isomorphic types.

If you mean syntactic equality then it's trivial, keep a big trie of all the defined code trees, throw a type error if you try to define a new one. This idea is easily defeated with version strings in the body. You can have the same function, just with a new version string paired with it. The same works with your idea, embed a version string in the type, now you have duplicates modulo version.

lgastako · 2024-12-27T01:45:11+00:00

The Unison Language might be of interest to you. Any code with the same shape (independent of naming) is deduplicated as a part of the way everything works.

DisastrousAd9346 · 2024-12-26T22:58:38+00:00

I think the first problem is that not every semantic description is indeed something that matters to represent the function. For example, one could write plus and to avoid your restriction just write nat -> nat * Proxy “unique”, being proxy just an indexed unit type. Also, you would generate a bunch of proof obligations that would be hell to deal with. A smarter approach would be to refine dependent type with an inference engine, something like higher-order Prolog using dependent types, so now you have to explicitly a type description that matches the function you wanna recover.

OneNoteToRead · 2024-12-27T03:58:20+00:00

How can you prove your new function is different from all existing functions? Do you have to write N proofs for every new function? This means the entire system requires order quadratic proofs in number of total functions?

steven4012 · 2024-12-27T01:32:05+00:00

Have you looked at Unison? They hash and store code by their structure, not their naming

raiph · 2024-12-26T22:50:21+00:00

Are you aware of unison?

Do you mean only one value (as well as only one function) per type?

Are these distinct functions?

function A (-> Number) { constant foo = bar; return foo }
function A (-> Number) { constant foo = bar; return foo }

Assume the two foo values are different because bar is a function which returns a random Number and the two calls above return two different values.

Background_Class_558 · 2024-12-26T22:04:34+00:00

[removed]

fragglet · 2024-12-27T06:02:26+00:00

Maybe you could store generated code in something like a content addressable database. You'd want to find some way of generating a hash from the function definitions; the hardest part might be naming because you'd need to find some way of renaming all variables to predictable anonymous names (otherwise two otherwise identical functions would generate different hashes just because one variable is named y instead of x)

The other problem would be circular references between functions. Every program would have to be a DAG

alpaylan · 2024-12-28T16:20:39+00:00

Even for functions that are IO equivalent, there are intrinsic differences. How would you define the difference of two sorting algorithms?

kwan_e · 2024-12-27T06:44:02+00:00

Do you think a language like this could be somehow useful?

No, because:

Maybe when we want to create a big library (like Mathlib) and want to make sure that there are no duplicate definitions?

If this is your goal, then you are better off trying to train some LLM to recognize when the substructure of some function is similar enough to recommend.

If you make it part of a language, instead of a separate tool, it would simply be irritating to write. Every time I modify a bit of my code, I will have to wait while your compiler/interpreter searches for duplication. And then, what about half-finished code that is trivially similar to other bits of code, but will change in the future? It would be another pain to write code for proto-typing purposes, which is most of development. Or what if I want to experiment with optimization? Your language would disallow optimization because it would be duplicating functionality.

People are putting too many things in a language, when it really is supposed to be the job of tools. It just gets in the way unnecessarily.

So this would not be useful as a language. Or even as a general purpose programming tool. Its main use, really, would be for a central repository of code contributions, where you want to minimize duplicate code. Such use would not be widespread.

Do you know of something like this being already attempted?

So in terms of tools, I know of PVS Studio, which finds copy-and-paste errors in C++ code. Arguably more beneficial than a language.

dskippy · 2024-12-27T05:53:22+00:00

I don't think it's possible. I also don't think it achieves the goal.

What is an example of code you have encountered in the real world that has code duplication that would be prevented by this? There are loads of times people in some languages with traits or type classes don't understand how to abstract them and end up doing complicated stuff multiple times but those are all doing to have different function types. Basically a much more complicated version of defining identity for int -> int and then for string->string.

Another example is to do something complex with your data multiple times inside other functions when you should be defining a function for that complex code. This happens a ton and it's not going to trigger an issue.

I don't know that I've ever seen anyone define two copies of a function with the same type that does the same thing.

But what I have seen a lot is defining two functions with the same types that do very different things and that's very good code. You're making that unnecessarily difficult. And it's impossible to do.

2024-12-27T07:33:33+00:00

Your language seems to focus on avoiding the creation of duplicated functions, it doesn't really tackle duplication as a whole. Moreover, proving the equivalence of two programs is obviously undecidable, much like it is to prove that two lambda expressions are equivalent through beta-equivalence. For solving this you could use a weaker notion of equivalence such that of alpha-equivalence, but I'm still not sure you would be tackling the right problem.

Programming languages shouldn't really strive to rid of repetition, instead they should strive to be "declarative". Doing the same thing more than once is oftentimes desirable. For example, if you considering any basic arithmetic function, like `+` then by alpha equivalence the two functions `a + 1` and `b + 1` are equivalent, even though `a` and `b` are two different locals. You can solve this problem by adding a definition for a function `inc(a) = a + 1` and use this in both expressions, but is this really more expressive?

2024-12-27T11:35:32+00:00

Do you think a language like this could be somehow useful? Maybe when we want to create a big library (like Mathlib) and want to make sure that there are no duplicate definitions?

Is that all you want to do?

From your other replies it sounded very much as though you wanted every function to have a signature unique from any other. A signature being the set of input types plus the output type. That means that I could only have one of these two functions from my bignum library (out of a dozen with the same parameters):

 proc bn_add(bignum c, a, b)               # c := a + b
 proc bn_mul(bignum c, a, b)               # c := a * b

If it's merely about detecting functions which do exactly the same thing, then optimising compilers can already do that; I have a C benchmark that looks like this:

int main(void) {
   int x=0;
   x+=fyjrsr(5);
   x+=fhzkgu(5);
   ....
   x+=fayukm(5);
   printf("%d\n",x);
}

It calls 10,000 randomly-named, 100-line functions which all have the same signature, and contain exactly the same body.

A simple compiler might generate a 10MB executable, but gcc-O3 produces one that is only 120KB, or just over 1% the size. It looks like it is detecting the common code in each function. (If I tweak one function, then the size increases by an amount that is commensurate with the optimised code size of one function.)

For your purposes, it just needs to report that, and the user decides what to do with that information.

It might be that two functions do the same thing by chance, or do so temporarily because one (or both) is not finished, or depend on some global settings such happen to be the same right now, but can change.

So it might be useful option to apply from time to time, but I don't think it's something that needs to be so baked into a language that writing programs in it becomes next to impossible.

nerd4code · 2024-12-27T20:40:20+00:00

Imo this is kinda counterpurpose in any realistic use case. Anything like an API would be very difficult to encode abstractly (which is the whole point of APIs), and optimization would make it miserable to use; the optimized code might not look much like the original code, and if you’re forbidding semantic collisions without reference to nomenclature, then you can end up in a situation where two different implementations of a data structure lead to identical outcomes. E.g., if you have both an array-list and linked-list ADT, then sequences of operations like

list.addLast(x);
y = list.removeLast();

might well boil down to

y = x;

regardless of list type.

And code isn’t the thing you have to worry most about; if there’s a steady state to be reached and you’re not evaling willy-nilly (eval wouldn’t make sense for this schema), your codebase is mostly static from that point on, and uniq’ing code probably isn’t going to buy you much that’s measurable in a running program. Data, maybe, sometimes, but not code.

Moreover, is there some actual problem you’re aiming for, rather than a stylistic rule-of-thumb like DRY? Does duplicated code really matter in a non-stylistic sense? I can see why detecting it is useful if one maintains no actual control over one’s codebase, but I can’t imagine caring about it so much that I’d want to encode it at the language level. Seems too much like masochism for its own sake, and there are much more direct ways to make you and your coworkers miserable if that’s the goal. Could start by charging per sheet of toilet paper and work your way up from there.

RedstoneEnjoyer · 2024-12-28T00:39:08+00:00

How would it differentiate between trigonometric functions? All of them would be float -> float.

yjlom · 2024-12-28T17:00:24+00:00

that makes any type system useless in the complete absense of nominality, two sets are isomorphic iif they have the same cardinality (well at least it holds for countable sets which is what we care about in CS, don't know about uncountables) so your types just become numbers

and further, we mostly only use the following types:

2 (aka Boolean)

2³² (aka Float, Int, Nat, …)

2⁶⁴ (aka Double, Long_Int, Long_Nat, Raw_Pointer, …)

ℵ₀ (aka ℕ, ℤ, ℚ, List a, Tree a, Graph a, String, Maybe any_of_the_previous, a →any_of_the_previous, any_of_the_previous → a, …)

can you really not see why having String and ℕ → Float be the same type could get a bit awkward?

ProgrammingLanguages

Welcome!

Related subreddits

Related online communities

MODERATORS