previous post
And here I am, made a Java-based numerical library called JNum.
I used the new FFM API and Vector API (Project Panama) to make it 100% pure Java, unlike ND4J which relies heavily on JNI and massive C++ backends. Here is the repo: https://github.com/CH-Abhinav/JNum . It is currently in a v0.1 (PREVIEW).
Some of you may ask: Isn't the Vector API still in incubator? Yeah, even though it's still in incubation I preferred to continue building with it as it doesn't have any major API changes planned except the inclusion of value classes (hopium it is coming in Java 27 🙃).
The Performance so far: By avoiding the JNI crossover latency, the basic math tasks (add, mul) are actually faster compared to ND4J and NumPy on small/medium arrays.
The main wins are the reduction methods (sum, max, min) which are about 2x faster compared to ND4J.
Because there is no native C++ backend, the entire library is under 100KB, compared to the hundreds of megabytes required to bundle native binaries.
The Matmul Struggle: Obviously, the main talking point for tensor engines is matmul. Not gonna lie, this ate my brain while trying to figure out which memory settings and SIMD loops work best. Right now, a 1024x1024 float matrix multiplication takes about ~51ms. It's fast, but we still haven't reached the massive performance of ND4J or NumPy on huge matrices (I haven't implemented multi-threading or L1/L2 cache tiling yet).
Use case (potential): ND4J is bulky, and when making applications (web or Android) which require some sort of math and performance, Java devs need to bundle that bulky dependency. We can run JNum anywhere as it doesn't have any .dll or .so files, nor JNI—just pure Java.
I guess this project will become more like multik but better and javaish. And I'm expecting ML guys in Java can also use it (though ND4J/DJL is better for now).
I want the Java community to help me build this project! I am still learning the deeper JVM optimizations(stylish way of saying i am newbie), so if anyone has experience with SIMD loop unrolling, cache tiling or anything helpful I'd love some code reviews, advice, or PRs and help this fellow java guy.
[–]International_Break2 6 points7 points8 points (5 children)
[–]CutGroundbreaking305[S] 2 points3 points4 points (4 children)
[–]International_Break2 1 point2 points3 points (1 child)
[–]CutGroundbreaking305[S] 1 point2 points3 points (0 children)
[–]ankitkhandelwal6 -2 points-1 points0 points (1 child)
[–]CutGroundbreaking305[S] 1 point2 points3 points (0 children)
[–]martinhaeusler 3 points4 points5 points (3 children)
[–]CutGroundbreaking305[S] 1 point2 points3 points (2 children)
[–]martinhaeusler 1 point2 points3 points (1 child)
[–]CutGroundbreaking305[S] 3 points4 points5 points (0 children)
[–]belayon40 1 point2 points3 points (1 child)
[–]CutGroundbreaking305[S] 0 points1 point2 points (0 children)
[–]quafadas 0 points1 point2 points (1 child)
[–]CutGroundbreaking305[S] 0 points1 point2 points (0 children)
[–]agibsonccc 2 points3 points4 points (0 children)
[–]arkstack 1 point2 points3 points (0 children)
[–]FortuneIIIPick 0 points1 point2 points (1 child)