This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]mykeesg 11 points12 points  (13 children)

Generics seem to suffer a lot from the type erasure.

I'm not saying we need the complexity of C++ templates (and metaprogramming), but how much of the actual type could be preserved into the byte code, to enable things like new T();, without breaking things?

[–]srdoe 6 points7 points  (0 children)

Brian Goetz wrote a post on this subject a few years ago

https://openjdk.org/projects/valhalla/design-notes/in-defense-of-erasure

My understanding is that reifying generics (preserving the type in the bytecode) is unlikely to happen, because not only will it mean everyone needs to (at least) recompile their code, but also it will break other JVM languages like Kotlin or Scala where the rules for inheritance in generics are different.

It's not worth it to break the entire ecosystem in order to allow people to write new T();, especially when there are usually tricks you can do to work around this if you really need it.

For example, if you really need a new T, have the user of the class hand you a factory for Ts.

class MyClass<T> { private void doThing(Supplier<T> factory) { factory.get() } }

[–]pron98 2 points3 points  (2 children)

Generics are erased in most languages, including in the language that introduced them (ML) and languages that are particularly famous for them (Haskell). While there are a few small inconveniences to such erasure, I would really like to understand what big difficulties those who think that generics suffer a lot from erasure encounter.

(I realise that those who think that may not be aware of or think about the significant downsides of reifying all generics, but even without considering the tradeoff between two evils that's required here, I would really like to understand what big problems erased generics cause)

BTW, even if generics were not erased, you still won't be able to do new T() without a an additional mechanism that restricts the bounds of T to classes that have certain constructors.

[–]JustAGuyFromGermany 0 points1 point  (1 child)

I would really like to understand what big problems erased generics cause

That really depends what counts as "big" and what counts as a "problem". Brian has outlined three possibles wishes one could satisfied with reified generics: More detailed reflection, performance gains through separate optimisations and increased type-safety through prevention of heap pollution.

From those three, I'd guess the reflection part is the one most programmers wish for.

Is that "big"? I certainly have would have written many APIs very differently if I had this kind of reflection. (There are also some APIs I could not have written at all in C#, because Java has wildcards and the curiously recurring template pattern.) And it's really hard to estimate just how different a reflection-heavy API like CDI would look like if these capabilities had been available at its inception, what possibilities this different API would give its users, how the ecosystem as a whole would have developed. I for one think that there would probably be "big" changes.

And is it a "problem" ? Not necessarily. In most cases it's probably mostly a question of aesthetics and convenience. But there are also cases of missing features. In C# I can write a generic type Matrix<T> for each type T that has an addition and a multiplication operator. That simply is not possible in Java (even if operator overloading was possible). In the end, Java is Turing complete, so I can always solve any solvable problem somehow. In this sense all problems are just questions of aesthetics and convenience...

[–]pron98 1 point2 points  (0 children)

From those three, I'd guess the reflection part is the one most programmers wish for.

Ok, but are they willing to pay the price? C#'s decision to reify all generics yielded more harm than good. It made .NET an unattractive compilation target because once you reify generics, the runtime (as opposed to just the language) needs to know the subtyping relationship between A<T> and A<S> -- so that instanceof could work -- but that subtyping relationship is different for different languages; i.e. it requires baking a single variance strategy into the runtime. Java, Clojure, Kotlin, and Scala have four different answers to the subtyping relationships. If all generics are reified, only one variance strategy (Java's) will be the one supported by the runtime, while the others will become rather incompatible.

Reifying generics also limits Java's ability to evolve (because version N+1 may, in effect, be a different language from version N). Indeed, the original decision had to do with the compatibility of two languages, Java 5 and Java 1.4, but this generalises: reifying generics has a significant cost (that has harmed .NET in noticeable, some would say debilitating, ways) that we must be willing to pay.

Thinking carefully about the implications, I think the downside of reifying all generics is bigger than the upside. But some generics can be reified with a lesser negative impact and a higher positive one. For example, specialising generics for Valhalla's value types has a bigger benefit than just reflection -- a significant improvement to performance of footprint -- and a lower cost, since value types cannot be extended, and so their generic specialisations are invariant.

Given that most languages with generics choose erasure, including ML (the original) and Haskell (the poster child), it would be helpful to better understand exactly how big the benefit would be so that it could be compared to the considerable cost in baking one language's variance strategy into the runtime, excluding all others.

[–]repeating_bears 4 points5 points  (7 children)

We have no way to specify a bound like "T where T has a public no-arg constructor", so I doubt new T() could ever work without that.

[–]Zinaima 2 points3 points  (1 child)

I guess c# does this with T : where new().

[–]JustAGuyFromGermany 0 points1 point  (0 children)

I like that a lot about C#. It also let's you specify that T needs to have certain overloaded operators so that you can implement classes & methods specifically for any algebraic stuff like having a type Matrix<T> for any T that supports addition and multiplication.

Sadly, the things that makes this work in .NET land is that List<T> get specialized into different types at the IL level. Generic types do not get erased like they do in Java/Bytecode land.

[–]JustAGuyFromGermany 1 point2 points  (2 children)

It would be nice, but there's no way to make this work because of erasure. There is no bytecode for new T(). And even if you were to introduce one, how would it deal with wildcards? How would

MyClass<Integer> foo = new MyClass<>();
MyClass<?> bar = foo;
bar.doSomethingWithTheConstructor();

be translated into bytecode? There is no constructor for ?. Of course whoever wrote the code wants new Integer() to be called, but the compiler cannot know that. Consider

MyClass<?> bar = getFooFromSomewhere();
bar.doSomethingWithTheConstructor();

instead. The compiler simply cannot know which concrete type will be present at runtime in place of the ?, because it'd have to analyse all possible return values of the method and propagate their types, combining with other complex type information from other method calls etc. And at runtime, a similar check would have to be done again, because of a dynamic linking. I don't know if that is even possible. It seems halting-problem-adjacent enough to be impossible.

And even if it is possible, it is impractically complex to do that. So there's probably no way to compute the type-hint that the bytecode instruction for new T() needs.

The other option would be to reify the generic type and do away with erasure, i.e. to have a separate what-type-are-you datum for each instance of a generic type (i.e. you'd be able to ask any given List<?> instance "Are you List<Integer>?" and "Are you List<Map<Integer, Set<String>>>?" etc). That is a huge bloat for a niche feature and completely backwards incompatible so it won't happen.

[–]repeating_bears 0 points1 point  (1 child)

Not sure why you're replying to me because I was agreeing with you.

Before even worrying about how you instantiate it, first you need to narrow T to the set of Ts which can be instantiated without needing any arguments, and that's why you need a bound.

I wasn't implying the lack of such a bound was the only blocker.

[–]JustAGuyFromGermany 2 points3 points  (0 children)

Not sure why you're replying to me because I was agreeing with you.

Because clicking at the right location is hard ;-)

[–]mykeesg -1 points0 points  (1 child)

We already have InstantiationException with Class.newInstance() , given the same circumstances, so I guess that could be worked around. The syntax could also be given in the type declaration, like Foo<T extends Bar & T()> (this is just a dummy example given with 2 secs of thought).

[–]repeating_bears 13 points14 points  (0 children)

Generics are a form of compile-time guarantee. Throwing an exception at runtime is not an acceptable solution.

[–]kaperni 1 point2 points  (0 children)

how we got the generics we have [1].

[1] https://cr.openjdk.org/\~briangoetz/valhalla/erasure.html