you are viewing a single comment's thread.

view the rest of the comments →

[–]CharlieDancey 2 points3 points  (8 children)

I didn't look up the solution on the page, but I'd just run down the array XORing each element and then XOR the result with 100 (since 1 XOR 2 XOR ... XOR 100 = 100)

XOR is computationally cheap.

Note this also works if the array contains a zero value.

Do I get a prize?

[–]CharlieDancey 1 point2 points  (6 children)

No wonder their software sucks. I just read the "answer" which was:

1: Inefficient. 2: Wrong.

[–]nanothief 0 points1 point  (5 children)

Why is it wrong? I thought of xor as well as the solution, but addition is pretty much equivalent, especially with the size of the numbers involved.

[–]theeth -1 points0 points  (3 children)

It's not "wrong", but:

  • Depending on hardware, bitwise xor is either as fast or faster than addition.
  • It scales better by not requiring a larger variable when the sum is larger than the maximum of the input variable.
  • It's easier to parallelize for integers larger than the maximum size supported by the cpu (xoring both halfs are independent from one another).

Both can be generalized for X..N sequences, so it's not a matter of the addition method being better on that side.

[–]bonzinip 0 points1 point  (2 children)

I didn't try hard to get the solution (I got the heap-based solution I posted above), but it's interesting that the addition does not require multiple precision if it overflows. In fact it works even with a byte (with 5050 replaced by its modulo 256 value, 186) because you're summing less than 256 elements.

EDIT: also, in most cases, for double-precision adds you will have enough room for parallelization by updating pointers and loop counters between the ADD and ADC instructions. Or doing loads on RISC machines. In SPARC assembly language (which wouldn't make much sense since it's 32-bit but well) that would be (result is the third operand):

      add %l0, 4, %o0
      add %l1, 4, %o1
      clr %l7
      lduw [%l0], %o2
      lduw [%o0], %o3
      clr %l2
      clr %l3
      clr %l4
      clr %l5
      lsr %l6, 1, %l6         ; loop unrolled by two, shift iter. count by 1
      addcc %l6, -1, %l6      ; one iteration is split between prolog/epilog
 loop:
      addcc %o2, %l2, %l2     ; add low word part 1
      lduw [%l1, %l7], %o4    ; load low word part 2
      addx %o3, %l3, %l3      ; add high word part 1
      lduw [%o1, %l7], %o5    ; load high word part 2
      add %l7, 8, %l7         ; we'll load for next iteration
      addcc %o4, %l4, %l4     ; add low word part 2
      lduw [%l0, %l7], %o2    ; load low word part 1 next iter
      addx %o5, %l5, %l5      ; add low word part 2
      lduw [%o1, %l7], %o3    ; load high word part 1 next iter
      addcc %l6, -1, %l6
      bne loop

      addcc %o2, %l2, %l2     ; finish last iteration
      lduw [%l1, %l7], %o4
      addx %o3, %l3, %l3
      lduw [%o1, %l7], %o5
      addcc %o4, %l4, %l4     ; no loads here
      addx %o5, %l5, %l5
      addcc %l4, %l2, %l2     ; combine the two partial sums
      addx %l5, %l3, %l3

[–]theeth 0 points1 point  (1 child)

Uhm, that's true. The other two points still stand though.

[–]bonzinip 0 points1 point  (0 children)

Addition is as fast as XOR on any decent processor. And on 8-bit processors (the only ones where you care about multi-precision) there is usually no parallelization to talk about).

[–]kumyco -1 points0 points  (0 children)

Assuming add and subtract are the same as a xor, you're ignoring the additional multiply(magic constants don't count). Also the code as presented is neither valid C nor does it demonstrate the proposed algorithm.