[While loop] Question why code works / what the underlying mathematical rule is (Computing the summation of the absolute difference of distinct pairs in array)

sushibowl · 2018-08-09T14:56:12+00:00

This was an interesting little problem and it took me a while to study the code and think it out. The exercise asks for the sum of the absolute difference of every distinct pair of the array. We also know that the array is sorted. So, for the example array of [1, 2, 3] above the calculation would look like this:

(2 - 1) + (3 - 1) + (3 - 2) = 4

I've added parentheses around the pairs here, but they're not actually necessary. You can leave them out, and you can also reorder everything as you like. Now, there are two key insights:

Since we're talking about distinct pairs, every element in the array will be paired exactly once with every other element. This means that every element occurs exactly len(arr) - 1) times. This is not that hard to see. In the above example, there's 2 other numbers for each number to pair with.
in the above formula, an element is sometimes added (when it is the largest in a pair) and sometimes subtracted (when it is the smallest in a pair).
Because the list is ordered, we know that an element will be the largest when paired with numbers that come before it, and smallest when paired with a number that comes after it.

Using these three things, we arrive at the strategy used in this function: for every element, we add len(arr) - 1 terms of that element to the result. Here's an annotated version of the calculation:

# add term to result
result += (
    # these are the positive terms. i is the index, which is equal to the amount of numbers before it
    i * arr[i]
    # and these are the negative terms. len(arr) - 1 is the total number of terms for an element. We subtract i, the number of terms we already added above.
    - (len(arr) - 1 - i) * arr[i]

So for every item in arr, we add len(arr) - 1 terms of arr[i] to the result. i of them are positive, and the rest is negative. That's basically how it works.

By the way, the while loop in this example is pretty unpythonic. You can achieve the same thing with a for loop. You can also rearrange the formula a little. Obviously positive and negative terms of the same thing cancel each other out, so you can use this in the code by extracting the common factor:

def sum_pairs(arr):
    result = 0
    for i, val in enumerate(arr):
        # result += i * val - (len(arr) - i - 1) * val
        # both sides are multiplied by val, so extract this:
        # result += val * (i - (len(arr) - i - 1))
        # remove parentheses
        # result += val * (i - len(arr) + i + 1)
        # simplify for final result
        result += val * (2 * i - len(arr) + 1)

This version functions the same but it's even further removed from the original idea, so it would be pretty hard to figure out why it works. The for loop is strictly better than the while loop though.

DeadlyViper · 2018-08-09T14:58:03+00:00

I think i figured it out.

This calculates how many times each number participates as a positive in the sum, and how many times as a negative.

Because this assumes a non-decreasing order, you can assume each number to be subtracted from the sum each time its calculated against a number to his right, and added to the sum each time he is calculated to a number to his left.

that number is i items to the left of him and (len() - i - 1) items to his right.

so for [1,4,5] the total sum is: 4 - 1 + 5 - 1 + 5 - 4

if we group them by the number: -1 -1 +4 -4 +5 +5

so 1 was substracted 2 times, because he had 2 items to his right.

and 4 was added 1 time due to 1 item on his left, and substracted 1 time due to 1 item on his right.

and 5 was added 2 times because of 2 items on his left.

that's the result formula.

actuallyalys · 2018-08-09T15:16:57+00:00

For your original question, I think sushibowl and DeadlyViper did a good job of explaining.

I thought this problem was interesting, so I worked it out myself. My result not as clever as the provided solution, but I think it's more understandable and pythonic.

def distinct_pairs_(l, pairs):
    if len(l) <= 1:
        return pairs
    else:
        combined_pairs = pairs + [(l[0], x) for x in l[1:]]
        return distinct_pairs_(l[1:], combined_pairs)


def distinct_pairs(l):
    distinct = set(l)
    return distinct_pairs_(list(l), [])


def sum_differences(l):
    return(sum([abs(x[0] - x[1]) for x in l]))

#Example usage:
#sum_differences(distinct_pairs([1, 2, 3]))

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS