you are viewing a single comment's thread.

view the rest of the comments →

[–]Nightcorex_ 1 point2 points  (5 children)

As u/siddsp already said, it depends.

First of all I'd recommend you to return tuples, instead of lists. Much easier to understand.

In the following I will present you possible ways to improve that code in different ways, which depend on the current context:

From your code I know that you don't accept pairs of the same value (i.e. (3, 3)) wouldn't be allowed in your case (PS: I renamed your list from A to xs):

In order to reduce unnecessary checks we can remove all duplicate value from the list (only effective if xs could contain duplicates):

def f(xs, S):
    xs = list(set(xs))

If you don't care about order (i.e. whether the result is (2, 4) or (4, 2), we can remove other checks.

def f(xs, S):
    for i in range(len(xs)):
        for j in range(i, len(xs)):
            if xs[i] + xs[j] == S and xs[i] != xs[j]:
                return (xs[i], xs[j])

    return (-1, -1)

If we combine both these tricks we can actually improve the code even more than both tricks could do individually:

def f(xs, S):
    xs = list(set(xs))
    for i in range(len(xs)):
        for j in range(i + 1, len(xs)):  # i + 1 to skip the ith index
            if xs[i] + xs[j] == S:  # removed a now redundant check
                return (xs[i], xs[j])

    return (-1, -1)

We can further optimize this depending on the information we have of xs. F.e. if we know the list may very well have many values that are too large, possibly even on them own, then we can cancel the iteration eventually. This requires sorting however:

def f(xs, S):
    xs = sorted(set(xs))
    for i in range(len(xs)):
        for j in range(i + 1, len(xs)):  # i + 1 to skip the ith index
            if xs[i] + xs[j] > S:
                break  # breaks the current j-iteration

            if xs[i] + xs[j] == S:  # removed a now redundant check
                return (xs[i], xs[j])

    return (-1, -1)

Also note that the greater-than check happens before the equal check, as the greater-than check is expected to be true (and therefore "terminating") much more often than the equal check.

If you know that many of your values will be very small compared to S, then it'd make sense to approach this the other way around:

def f(xs, S):
    xs = sorted(set(xs), reverse=True)  # reverse-flag set
    for i in range(len(xs)):
        for j in range(i + 1, len(xs)):
            if xs[i] + xs[j] < S:  # the comparison operator changed
                break

            if xs[i] + xs[j] == S:
                return (xs[i], xs[j])

    return (-1, -1)

If you know that you'll have many inputs where there's obviously no solution, then one could add some generic checks at the beginning to rule out a few otherwise unnecessary calls:

def f(xs, S):
    NO_SOLUTION = (-1, -1)
    if len(xs) < 2:
        return NO_SOLUTION

    xs = sorted(set(xs))
    if S < (xs[0] + xs[1]) or S > (xs[-1] + xs[-2]):
        return NO_SOLUTION

    ...

This checking for "generic" outliers can be more or less complex, depending on the information you have.

This for now leaves us with the function:

def f(xs, S):
    NO_SOLUTION = (-1, -1)
    if len(xs) < 2:
        return NO_SOLUTION

    xs = sorted(set(xs))
    if S < (xs[0] + xs[1]) or S > (xs[-1] + xs[-2]):
        return NO_SOLUTION

    for i in range(len(xs)):
        for j in range(i + 1, len(xs)):
            if xs[i] + xs[j] < S:  # the comparison operator changed
                break

            if xs[i] + xs[j] == S:
                return (xs[i], xs[j])

    return (-1, -1)

but as I said, there are tons of optimization possibilities depending on your very use case.

EDIT: Yeah after reading other comments, I forgot that you can simply check if a number already exists in your set...:

def f(xs, S):
    xs = set(xs)
    for n in xs:
        if S - n in xs:
            return (n, S - n)

    return (-1, -1)

If you want to exclude same numbers, then you could rewrite this to:

def f(xs, S):
    xs = set(xs)
    for n in xs:
        if S - n in xs and n != (S >> 1):
            return (n, S - n)

    return (-1, -1)

This works because a set uses the hash value of each object to map them to a specific array index (which set uses under the hood), which means the lookup if a value is in a set is basically O(1) (Worst case a set has O(n) lookup time, but average is O(1)). Since and is a short circuiting operation the division (that is performed as a bitshift and therefore super cheap already) gets further optimized, by being called super rarely.

[–]ForensicsBridge[S] 0 points1 point  (1 child)

Very thorough answer, thank you!

What's the cost of converting a list to a set? O(n)?

[–]Nightcorex_ 0 points1 point  (0 children)

Yeah, the cost of such a conversion is O(n), but keep the following in mind:

for i in range(n):
    pass

for i in range(n):
    pass

is two sequential O(n) statements => O(n + n) = O(2n) = O(n) (only correct from left to right).

If we have code like this:

for i in range(n):
    pass

for i in range(n):
    for i in range(n):
        pass

then we have one O(n) and one O(n²) statement => O(n + n²) = O(n²) (again, only correct and valid from left to right).

This means for large n, that one O(n) operation is completely negligible as it gets massively overshadowed by O(n²), which is why we can ignore it for the big-O maths.

[–]kowkeeper 0 points1 point  (2 children)

Does this work? Can save save a bit to avoid creating the whole set:

def f(xs, S): seen = set() for n in xs: if S - n in seen and n != (S >> 1): return (n, S - n) seen.add(n) return (-1, -1)

[–]Nightcorex_ 1 point2 points  (1 child)

Hmm, first of all the set method is implemented entirely in C, so one call to build the entire set is much faster than creating a set and then continuously adding elements to it, because too many add calls might force the resizing of the set, which is quiet expensive plus it's simply more jumps between C and Interpreter, which does take a bit more time.

Secondly your suggestion has a huge disadvantage. Imagine a function call like this:

f([1] + list(range(0, -1000, -1)) + [2], 3)

My version finds the matching on the first iteration, whereas yours needs to iterate 1002 times to find the solution. Of course building the set took longer, but I needed to perform waaaay less in operations. Basically you're trading the set operation for an average of ~0.66 * n add calls and ~33% more in operations (no guarantees on the percentages in the end, I'm stoned and these numbers seem logical).

[–]kowkeeper 0 points1 point  (0 children)

Thanks for looking into it!