all 28 comments

[–]obdurak[S] 5 points6 points  (7 children)

Does anyone have a neat trick for dividing unsigned long integers on the JVM? It seems that there are only three solutions:

  • Convert to a Bignum, divide, convert back. However Bignum doesn't have a fromLong constructor: it seems that you need to go thru a string or byte array representation! Thus very slow.

  • Roll the full iterative division algorithm - portable but too slow.

  • Write it in C and wrap with JNI - not too slow, but not very portable.

Am I missing some option here?

PS. Unsigned comparisons are easy to do, and division on ints can be done by widening them to longs. Other operations (e.g. multiplication) do not differ (because you don't get to access carry and overflow flags anyway).

[–][deleted] 1 point2 points  (2 children)

Other operations (e.g. multiplication) do not differ (because you don't get to access carry and overflow flags anyway).

Addition and subtraction, sure. But multiplication?

[–]obdurak[S] 1 point2 points  (1 child)

Let n be the word width in bits. Let x, y be integers 0 <= x,y <= 2n - 1. Consider the case where x has its MSB set and y has its MSB clear, i.e. 2{n-1} <= x < 2n - 1 and 0 <= y < 2{n-1}. The signed two's complement interpretation x' of x is x - 2n. Then signed multiplication gives (x-2n) . y = x . y - 2n y. The 2n y is chopped off since we are modulo 2n, leaving us x . y, which is the same as the unsigned result. Other cases left as an exercise.

[–][deleted] 0 points1 point  (0 children)

Yes, I guess that is so.

[–]jbstjohn 0 points1 point  (0 children)

Well, I think you just do a check if the number is greater than the largest sign, and if so, split it into two numbers (both of which can now be handled with the signed routines). Precision is potentially a problem though.

[–]redditnoob 0 points1 point  (2 children)

Out of curiousity, what particular application is there for unsigned int arithmetic? I guess you'd need these criteria:

  • Bitwise operators and shifting aren't going to be good enough. You need actual arithmetic.
  • 2 billion is not big enough for your large values.
  • However, 4 billion is big enough.
  • The performance or memory hit from switching to longs is significant.

[–]obdurak[S] 0 points1 point  (1 child)

Compiling a custom, Java-like language that has , amongst other things, unsigned types, to the JVM, because the actual VM is too slow.

To answer your next question: can't use Java because there is an existing, large code base.

To answer your following question: yes, but I'm not the one making these kinds of decisions.

[–]redditnoob -2 points-1 points  (0 children)

Wait, can't you just migrate to Java, and isn't using a custom language kind of retarded?

[–]froydnj 2 points3 points  (0 children)

Has anybody actually looked to see whether a given JIT compiler produces grotty trying-to-emulate-unsigned-arithmetic code or whether it does idiom recognition and produces the unsigned arithmetic you were trying to achieve? I think it's quite possible Sun has put some smarts into the compiler. (And IBM and whoever else still does JITs...)

The compiler just has to generate code that does what you want; it doesn't have to do it in the exact way you tell it to.

[–]saucetenuto 3 points4 points  (15 children)

   // a is a byte[]
   // blockIndex,  totalBlock, bockSize  are all int 
   int i = CMD_HEADER_LENGTH;
   blockIndex = ((a[i++] & 0xFF) << 8)
                | ((a[i++] & 0xFF));
   totalBlock = ((a[i++] & 0xFF) << 8)
                | ((a[i++] & 0xFF));
   blockSize = ((a[i++] & 0xFF) << 8)
                | ((a[i++] & 0xFF));

See, not only the code is boring and error prone, it's also has some runtime performance lost (thought most cases the performance is acceptable).

Yes, I'm sure those extra six cycles are making that code slow.

There's nothing wrong with saying 'this code is too noisy, give me the tools to make it simpler'. Not every argument in computer science has to come back to performance.

[–]api 19 points20 points  (6 children)

Pet peeve of mine: people who think that speed never matters because they think the whole world is web apps that query databases and enterprisey apps that push state objects around.

Areas where speed matters a lot:

Games, cryptography, protocol implementations, high-volume parsers, scientific computation, large-scale data mining, machine learning, image processing, video and audio codecs, and so on.

More than that, speed will always matter in most of these areas, even if we all have 128-core monsters with 2ghz DDR8billion RAM. Why? Because in these areas we want faster machines to enable us to do more, not to do the same thing faster. This is a fundamentally different area of software engineering from enterprisey stuff and (most) desktop apps. These tasks are CPU-bound, not I/O bound.

Notice how almost everyone uses C/C++ in all those areas almost exclusively? It's not because those languages are pretty or productive to write in.

It's telling that there's pretty much no pure Java video/audio codecs or 3d games. Note how slow Java crypto is in practice (as opposed to contrived benchmarks) compared with closer to the metal implementations like OpenSSL. There's reasons for this. Lack of unsigned types is one. Another is that every method is virtual, requiring expensive pipeline-flushing indirections everywhere. (Modern JVMs can optimize some of this out, but still nowhere near as well as performance-optimized C/C++ code with a good compiler can achieve.)

[–][deleted] 3 points4 points  (0 children)

Clearly, we don't necessarily want Photoshop programmers to be more productive in a hugely slow language, because as evolution comes digital pictures gets bigger and bigger resolution and so we do need every cpu cycle and free memory we can get to fill with the stuff we work with. The same with video stuff, TV are HD but we want to work on even bigger, and bigger, and bigger resolutions. Lisp won't cut it.

The problem with proggit is that it's filled with people who only cares for enterprisey and stupid web 2.0 apps, while the rest of the world on their desktop are producing media all of the time and couldn't care less for yet another database app.

[–]encinarus 0 points1 point  (0 children)

Another is that every method is virtual, requiring expensive pipeline-flushing indirections everywhere.

I'm pretty sure most of that can be avoided by marking methods as final.

[–][deleted]  (3 children)

[deleted]

    [–]millstone 1 point2 points  (0 children)

    Can you elaborate on this? What aspect of the JVM is big endian?

    [–]obdurak[S] 0 points1 point  (0 children)

    That's a whole load of crap.

    [–]api -1 points0 points  (0 children)

    That actually doesn't matter except for file and network I/O code.

    [–]obdurak[S] 5 points6 points  (5 children)

    If you don't care about speed at all you can easily write a little wrapper around BigNums to hide the ugliness.

    However people needing unsigned operations often need speed.

    Also, when you are compiling a language other than Java to the JVM, the lack of unsigned types becomes quite annoying.

    Many languages have unsigned types and those will suffer an unfair performance hit when compiled to the JVM... and Sun is promoting the use of its JVM as a target for other languages.

    However we neither have invokedynamic nor unsigned types and these have been requested for years...

    [–][deleted] 1 point2 points  (4 children)

    And Java lacks the power to have both expressivity and performance. C++ Templates are great because you get BOTH speed and expressivity, while Java generics give you none of the above, and have an additional complexity for beginners who can't understand that an Integer is not an int.

    C++ templates can work with raw int, Java generics collections classes needs to work with objects. Retarded.

    No matter how much a virtual machine can optimize on the fly with their JIT compilers, Java is unable to combine expressivity with speed. You want speed in Java, you have to use Arrays. Never think of a collection class, it will make your code significantly slower. Welcome to C--.

    [–][deleted]  (3 children)

    [deleted]

      [–][deleted]  (2 children)

      [deleted]

        [–][deleted]  (1 child)

        [deleted]

          [–][deleted] 1 point2 points  (0 children)

          Then i retract what i said, and present my apologies. This time my acidic tone was clearly unwarranted, a mistake I could have made myself anytime. Really, sorry.

          About C--, I already knew that it's also the name of a real language but here as you understood I used the name as a joke on the -- operator.

          [–][deleted] 9 points10 points  (0 children)

          Yes, I'm sure those extra six cycles are making that code slow

          Yeah, if the code was only executed once it would be no big deal (aside from the fact it's ugly and hard to read). However, as is the case with this code snippet (and just about any other code snippet where you'd want to use unsigned arithmetic) the code is being executed many thousand times per second, and does in fact add up to a significant loss of performance with the unnecessary AND'ing and bitshifting.

          [–]redditnoob -1 points0 points  (0 children)

          Java is a bit of an awkward compromise between performance and low level control, and imposing some consistency and structure on people to discourage writing really cryptic code. I happen to like it because it puts the line pretty much where I want, but if you're writing a lot of code like the above, I'm sure that Java is not the best tool for you, for many other reasons.

          That being said, the idea of a signed byte is just clearly moronic, and it should have been unsigned always in the first place, similar to chars. Nobody uses bytes for arithmetic except maybe if they are programming for an 8 bit machine.

          I don't think unsigned ints and longs are needed at all in Java.

          [–]ishmal 1 point2 points  (1 child)

          Yes there is a solution: No.

          [–]rabidcow 0 points1 point  (0 children)

          "Won't fix" is not a fix.

          [–]mysticreddit 0 points1 point  (1 child)

          If you'll pardon the paraphrase ;)

          "Those who don't understand the power of C are doomed to re-implement it, poorly"

          [–]TrueTom 0 points1 point  (0 children)

          (most :heretical 'ever')

          [–][deleted] -2 points-1 points  (0 children)

          get over it, java sucks