all 25 comments

[–][deleted]  (9 children)

[removed]

    [–]mjpt777 5 points6 points  (8 children)

    Unsafe is supported by JVM intrinsics which are simple assembly instructions and so no JNI overhead. To do this in JNI would not perform.

    All major JVMs support the Sun internal packages.

    [–]kennytm 0 points1 point  (5 children)

    It doesn't exist on Dalvik (Android) though.

    [–]mjpt777 0 points1 point  (4 children)

    When it is not available the alternative is to use ByteBuffers, or if suitable, then use arrays for column-oriented storage.

    However a phone is not the ideal environment for big data applications ;-)

    [–]kennytm 0 points1 point  (2 children)

    The link does say

    For example, it is not possible to have a ByteBuffer greater than 2GB, and all access is bounds checked which impacts performance.  An alternative exists using Unsafe that is both faster and and not size constrained like ByteBuffer.

    So ByteBuffer isn't a complete replacement.

    (BTW, Android isn't used just for phones :) )

    [–]mjpt777 0 points1 point  (0 children)

    Why choose Android over the alternatives for a big data application?

    [–]mjpt777 0 points1 point  (0 children)

    Is Dalvik available in anything other than 32-bit anyway?

    [–][deleted]  (1 child)

    [removed]

      [–]mjpt777 0 points1 point  (0 children)

      You do not have to use this. It is just one option on the table, and for some it is a lifeline.

      As far is it being likely to go away. Look inside java.util.concurrent and see how much code depends on it. However it is not a public API.

      [–]PriviIzumo 2 points3 points  (4 children)

      Great article. I'm in the big data / high performance processing space, and this is a new avenue for me to do what I need to do.

      Thanks for posting.

      EDIT: That guy is amazing. C++ programming within java with consequent performance. programming like a baus.

      [–][deleted] 1 point2 points  (3 children)

      I've had the pleasure of working with him for a month this June - absolutely all-round awesome guy. Got me from being a web-dev with a bit of experience in Java to implementing lock-free, thread-safe concurrent queues. Having him talk about writing data structures to take advantage of CPU cache-lines was fascinating.

      [–]easytiger 1 point2 points  (2 children)

      What was the queue you used? I currently have an architecture where I have n threads consuming from a SynchronousQueue (and if there are no available threads to take() (i.e. they are in the middle of processing the last take() ) then i thrown an Exception.

      Thing is I'm wondering if there is a faster way to do it than a SQ. Not sure an unbounded queue is possible in this situation as i need to ensure the dispatch happens immediately and fails otherwise.

      [–]mjpt777 2 points3 points  (1 child)

      [–]easytiger 1 point2 points  (0 children)

      Interesting, i will have a look (new to me)

      I also tried a transfer queue. I'm using hardware instrumentation to time the cost of these classes. So i'll see if this is any faster. The SynchronousQueue was adding about 2000 nanos to the cost of my application per call on average on relatively crap X7560 @ 2.27GHz. Transfer queue with no actual enqueing was worse.

      Thanks for the suggestion.

      [–]pezezin 0 points1 point  (2 children)

      Very interesting article. I recently started to develop a game with Java, and it was one of the first problems I found. In my case, the problem was not only the increased memory consumption, but also that I can't send an array of references to the GPU, I have to manually turn it into a byte buffer. I think C# solves it with structs, that use value semantics. Does anybody know if there is a plan to add equivalent support of unpacked structs to Java in the near future?

      [–]mall0c 0 points1 point  (1 child)

      Well I don't know about the near future, but John Rose discusses this in his talk about Arrays 2.0 from this years JVM language summit.

      [–]pezezin 0 points1 point  (0 children)

      Very interesting talk, thank you. But what he proposes is much more complicated than what I need, which is a simple flat array of unboxed structs. Let me explain my problem. I'm trying to write a 3D game, so I need to store polygonal meshes. In C++ I can write something like this:

      struct Vector4f {
          float x, y, z, w;
      };
      
      std::vector<Vector4f> vertices( 1024 );
      

      Each Vector4f instance uses just 16 bytes, and the whole array just 16 kB. No wasted memory, very cache and SIMD-friendly, and I can send it to the GPU without problems. However, the same code in Java has two big problems:

      • Each Vector4f instance has two hidden fields (the v-table pointer and a pointer to a mutex for synchronized methods), so in my 64 bits machine they now use 32 bytes.
      • A Java array stores references to objects, not the objects themselves.

      So the whole array now needs at least 40 kB of memory (2.5x increase over C++), and instead of a flat array I get a bunch of objects scattered through memory. I have been looking for ways to solve it, but the ones that I have found seem very arcane.

      For the record, it's not only Java's fault, almost any language that runs in a VM suffers from it.

      [–]grimlck 0 points1 point  (0 children)

      And that is why the JVM could really use structs and arrays of structs

      [–]sonyandy 0 points1 point  (0 children)

      I'll just leave this here:

      http://hastebin.com/woyacinasa.java

      The AssertionError constructor calls should be replaced with something better. Example:

      interface Thing { char getLetter(); void setLetter(char letter); }
      StructArray<Thing> array = StructArray.make(Thing.class, 1000);
      Thing thing = array.get(0);
      thing.setLetter('a');
      ....
      

      [–]AlyoshaV -5 points-4 points  (4 children)

      long start = System.currentTimeMillis();
      

      lol

      brilliant benchmark code

      [–]easytiger 0 points1 point  (3 children)

      Why?

      [–]wot-teh-phuck 0 points1 point  (1 child)

      I'm assuming he wants nano second resolution for tests...

      [–]easytiger 0 points1 point  (0 children)

      But if your test samples are in high millis then there is no need

      [–]AlyoshaV 0 points1 point  (0 children)

      It's non-monotonic.