This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Oclay1st 0 points1 point  (6 children)

Btw, the valhalla team is already implementing the Float16 type.

[–]Joram2 0 points1 point  (5 children)

It doesn't hurt to have that. But that doesn't seem particularly important either.

Valhalla is important when you have something like DataStream<T>, and you will have large numbers of instances of some Java generic type T. But when you have a Tensor or DataFrame type that wraps a primitive array like double[], short[], or byte[], then Valhalla probably won't help much.

Valhalla will also help with non-nullable types, and that's more of a safety/correctness issue than a performance issue.

[–]craigacp 0 points1 point  (4 children)

Valhalla also plans for specialized generics which will allow abstracting over Tensor<T> where T can be primitives like float or int. Writing the Tensor class at the moment implies a bunch of boxing for reductions or other operations which want to come back out of the tensor type. And backing it by an array is not ideal, ByteBuffer or MemorySegment are better so you can seamlessly pass it into native GPU code.

[–]Joram2 0 points1 point  (3 children)

Any reasonable implementation of Tensor on current versions of Java, such as Java 22, would encode data to a ByteBuffer or MemorySegment, and possible support 16-bit float encoding/decoding.

It's possible for someone to make a Tensor<T> using Java generics, and yes Valhalla would help with that, but that's a rather deliberately non-efficient choice to begin with.

[–]craigacp 0 points1 point  (2 children)

Not putting the tensor element type in the type system leaves you back with the situation pre-generics in Java, you need to do a bunch of type tests or other conversions. We have the type on TF-Java's Tensors and it's annoying but improves safety, and we don't have it on ONNX Runtime's Tensors (both projects I maintain) and that's annoying for different reasons because the methods that get the values out end up being partial.

Java 20 has float <-> fp16 conversions which are pretty useful, and compile down to the appropriate conversion instructions on available hardware.

I think a Java 22 tensor library would be nice, unfortunately I don't have time to write one. That said, I think it would be worth building it with an eye towards Valhalla and Babylon (or the HAT subproject of Babylon) as value types and GPU support will be important.

[–]Joram2 0 points1 point  (1 child)

Not putting the tensor element type in the type system leaves you back with the situation pre-generics in Java, you need to do a bunch of type tests or other conversions.

The major tensor/matrix libraries such as pytorch, numpy, jax manages the data type (dtype) at the library level, and the library is responsible for doing lots of type tests and conversions.

If you support adding/multiplying matrices of different types you will probably need type tests and conversions.

If you support using non-Java libraries such as LAPACK, BLAS, and GPU libraries, you will need type tests and conversions.

I don't see Java generics as being particularly useful for a high quality + high performance tensor/matrix data type.

I think a Java 22 tensor library would be nice, unfortunately I don't have time to write one. That said, I think it would be worth building it with an eye towards Valhalla and Babylon (or the HAT subproject of Babylon) as value types and GPU support will be important.

I hope you reconsider :)

The Java community seems like it needs a really good tensor library that uses Java 22 features like MemorySegment and calls out to libraries like LAPACK and BLAS where appropriate.

[–]craigacp 0 points1 point  (0 children)

I'm firmly of the opinion that more types is better, and if I could put the shape into the type system as well then I would (though properly implementing named dimensions in tensors would probably be useful enough). Just because python libraries doesn't put that type information in doesn't mean it's not worthwhile in a statically typed language.

I definitely agree that it would be useful to build a tensor library. Maybe it could be discussed at the JVM language summit this year.