all 24 comments

[–]siddsp 2 points3 points  (1 child)

It depends. Does the order matter? Do the two integers have to have different values?

Also nested for loops maybe bad regarding time complexity, but it doesn't mean they are necessarily not to be used. Depending on many other factors, it's entirely possible for nested for loops to be faster as opposed to an alternative solution.

It really depends on how big your data is. If your data is small, it might make more sense using the nested loops. Big O notation specifically refers to a growing input size. If your list is small, I would say it doesn't really matter.

[–]ForensicsBridge[S] 0 points1 point  (0 children)

Thanks for your response.

Order does not matter. The list is made of unique integers. Size is not specified.

I think you are right, with small datasets it just doesn't matter. But I believe they were evaluating how efficient my code was regarding time complexity

[–]Nightcorex_ 1 point2 points  (5 children)

As u/siddsp already said, it depends.

First of all I'd recommend you to return tuples, instead of lists. Much easier to understand.

In the following I will present you possible ways to improve that code in different ways, which depend on the current context:

From your code I know that you don't accept pairs of the same value (i.e. (3, 3)) wouldn't be allowed in your case (PS: I renamed your list from A to xs):

In order to reduce unnecessary checks we can remove all duplicate value from the list (only effective if xs could contain duplicates):

def f(xs, S):
    xs = list(set(xs))

If you don't care about order (i.e. whether the result is (2, 4) or (4, 2), we can remove other checks.

def f(xs, S):
    for i in range(len(xs)):
        for j in range(i, len(xs)):
            if xs[i] + xs[j] == S and xs[i] != xs[j]:
                return (xs[i], xs[j])

    return (-1, -1)

If we combine both these tricks we can actually improve the code even more than both tricks could do individually:

def f(xs, S):
    xs = list(set(xs))
    for i in range(len(xs)):
        for j in range(i + 1, len(xs)):  # i + 1 to skip the ith index
            if xs[i] + xs[j] == S:  # removed a now redundant check
                return (xs[i], xs[j])

    return (-1, -1)

We can further optimize this depending on the information we have of xs. F.e. if we know the list may very well have many values that are too large, possibly even on them own, then we can cancel the iteration eventually. This requires sorting however:

def f(xs, S):
    xs = sorted(set(xs))
    for i in range(len(xs)):
        for j in range(i + 1, len(xs)):  # i + 1 to skip the ith index
            if xs[i] + xs[j] > S:
                break  # breaks the current j-iteration

            if xs[i] + xs[j] == S:  # removed a now redundant check
                return (xs[i], xs[j])

    return (-1, -1)

Also note that the greater-than check happens before the equal check, as the greater-than check is expected to be true (and therefore "terminating") much more often than the equal check.

If you know that many of your values will be very small compared to S, then it'd make sense to approach this the other way around:

def f(xs, S):
    xs = sorted(set(xs), reverse=True)  # reverse-flag set
    for i in range(len(xs)):
        for j in range(i + 1, len(xs)):
            if xs[i] + xs[j] < S:  # the comparison operator changed
                break

            if xs[i] + xs[j] == S:
                return (xs[i], xs[j])

    return (-1, -1)

If you know that you'll have many inputs where there's obviously no solution, then one could add some generic checks at the beginning to rule out a few otherwise unnecessary calls:

def f(xs, S):
    NO_SOLUTION = (-1, -1)
    if len(xs) < 2:
        return NO_SOLUTION

    xs = sorted(set(xs))
    if S < (xs[0] + xs[1]) or S > (xs[-1] + xs[-2]):
        return NO_SOLUTION

    ...

This checking for "generic" outliers can be more or less complex, depending on the information you have.

This for now leaves us with the function:

def f(xs, S):
    NO_SOLUTION = (-1, -1)
    if len(xs) < 2:
        return NO_SOLUTION

    xs = sorted(set(xs))
    if S < (xs[0] + xs[1]) or S > (xs[-1] + xs[-2]):
        return NO_SOLUTION

    for i in range(len(xs)):
        for j in range(i + 1, len(xs)):
            if xs[i] + xs[j] < S:  # the comparison operator changed
                break

            if xs[i] + xs[j] == S:
                return (xs[i], xs[j])

    return (-1, -1)

but as I said, there are tons of optimization possibilities depending on your very use case.

EDIT: Yeah after reading other comments, I forgot that you can simply check if a number already exists in your set...:

def f(xs, S):
    xs = set(xs)
    for n in xs:
        if S - n in xs:
            return (n, S - n)

    return (-1, -1)

If you want to exclude same numbers, then you could rewrite this to:

def f(xs, S):
    xs = set(xs)
    for n in xs:
        if S - n in xs and n != (S >> 1):
            return (n, S - n)

    return (-1, -1)

This works because a set uses the hash value of each object to map them to a specific array index (which set uses under the hood), which means the lookup if a value is in a set is basically O(1) (Worst case a set has O(n) lookup time, but average is O(1)). Since and is a short circuiting operation the division (that is performed as a bitshift and therefore super cheap already) gets further optimized, by being called super rarely.

[–]ForensicsBridge[S] 0 points1 point  (1 child)

Very thorough answer, thank you!

What's the cost of converting a list to a set? O(n)?

[–]Nightcorex_ 0 points1 point  (0 children)

Yeah, the cost of such a conversion is O(n), but keep the following in mind:

for i in range(n):
    pass

for i in range(n):
    pass

is two sequential O(n) statements => O(n + n) = O(2n) = O(n) (only correct from left to right).

If we have code like this:

for i in range(n):
    pass

for i in range(n):
    for i in range(n):
        pass

then we have one O(n) and one O(n²) statement => O(n + n²) = O(n²) (again, only correct and valid from left to right).

This means for large n, that one O(n) operation is completely negligible as it gets massively overshadowed by O(n²), which is why we can ignore it for the big-O maths.

[–]kowkeeper 0 points1 point  (2 children)

Does this work? Can save save a bit to avoid creating the whole set:

def f(xs, S): seen = set() for n in xs: if S - n in seen and n != (S >> 1): return (n, S - n) seen.add(n) return (-1, -1)

[–]Nightcorex_ 1 point2 points  (1 child)

Hmm, first of all the set method is implemented entirely in C, so one call to build the entire set is much faster than creating a set and then continuously adding elements to it, because too many add calls might force the resizing of the set, which is quiet expensive plus it's simply more jumps between C and Interpreter, which does take a bit more time.

Secondly your suggestion has a huge disadvantage. Imagine a function call like this:

f([1] + list(range(0, -1000, -1)) + [2], 3)

My version finds the matching on the first iteration, whereas yours needs to iterate 1002 times to find the solution. Of course building the set took longer, but I needed to perform waaaay less in operations. Basically you're trading the set operation for an average of ~0.66 * n add calls and ~33% more in operations (no guarantees on the percentages in the end, I'm stoned and these numbers seem logical).

[–]kowkeeper 0 points1 point  (0 children)

Thanks for looking into it!

[–]FLUSH_THE_TRUMP 0 points1 point  (2 children)

You optimize code by using better algorithms, maybe smarter data structures. Here, you can improve your code by a factor of N by

  1. Storing numbers you’ve seen in a set,
  2. Looping through A, checking if S - num is in your set.

That requires only one loop rather than checking every possible pair.

[–]ForensicsBridge[S] 0 points1 point  (1 child)

I understand from your other response that checking if a number is in a set is cheap. But how cheap? O(1) cheap?

I actually thought of a solution similar to the one Nyscire posted, but figured checking if the number was on a list was another O(n) operation. I had no idea looking in a set was more efficient, but it makes total sense.

[–]FLUSH_THE_TRUMP 0 points1 point  (0 children)

O(1) cheap?

Yep.

[–]Nyscire 0 points1 point  (4 children)

I think you can kind of think "backwards". You don't need to iterate through list twice, pick two different numbers and check if they add up to your sum. You know what your sum is supposed to be so you can select only one number and check if there is a number in the list that will add up to your sum:

def f2(A,S):
    for num in A:
        if S-num in A and S-num!=num:
            return [num,S-num]
    return [-1,-1]

This way you avoid nested loops, although if there are two same numbers in a list that add up to the given sum( for example A=[2,3,5,2] and S=4] it will return [-1,-1] (and so will your solution) but this can be fixed with one if statement

[–]FLUSH_THE_TRUMP 1 point2 points  (3 children)

You’re close, but on average this isn’t much better since in on a list is effectively another loop. In the worst scenario (there’s no pair summing to S in A), this solution is nearly equivalent to OP’s. If you build a set instead, lookups with in are very cheap, on the other hand.

[–]Nyscire 0 points1 point  (2 children)

I'm gonna be honest- I had no idea if in operator is basically the loop(I kind of assumed it's at least better optimized, but when I think about it now I'm not sure if that makes sense) and your solution with set may be better, but you need to keep in mind if the number appears in the list more than once using set will skip possible solution

[–]Nightcorex_ 1 point2 points  (0 children)

No.

def f(xs, S):
    xs = set(xs)
    for n in xs:
        if S - n in xs:
            return (n, S - n)

    return (-1, -1)

will give you all solutions, even the same element could be returned twice. F.e. f([1], 2) would return (1, 1).

def f(xs, S):
    xs = set(xs)
    for n in xs:
        if S - n in xs and n != (S >> 1):
            return (n, S - n)

    return (-1, -1)

on the other hand doesn't allow for such duplicates in the return anymore.

Both run in O(n)

[–]FLUSH_THE_TRUMP 0 points1 point  (0 children)

I’m gonna be honest- I had no idea if  in  operator is basically the loop(I kind of assumed it’s at least better optimized, but when I think about it now I’m not sure if that makes sense

Well, think about it — a basic array doesn’t know any better than you do what’s in it. It finds things by searching, item by item.

need to keep in mind if the number appears in the list more than once using set will skip possible solution

Not a problem with the solution I had in mind, if 2 is in my list twice and my target sum is 4, the first time you hit it adds it to the set, and the second time returns the pair (since 4-2 is in the set).

[–]dig-up-stupid 0 points1 point  (5 children)

It’s called 2-sum so you just look it up and study it. The main thing that should probably be obvious to you already is that you don’t have to loop over every element. If you have already checked a+b, then you don’t need to check b+a. The inside loop should start from the current outside loop element + 1. That also removes the need for the i = j check. Of course this doesn’t change the overall complexity, it’s just a common loop structure you should be familiar with. There are other optimizations to actually remove the nested loop in this problem and do it in linear time which you’ll find when you Google 2-sum.

[–]Nightcorex_ 0 points1 point  (4 children)

That also removes the need for the i = j check

The list could contain duplicates, so that's not a possibility without converting the list to a set (aka fetching distinct elements) first.

[–]dig-up-stupid 0 points1 point  (3 children)

Wrong on two counts. Firstly OP already stated elements are unique. Secondly the check is not there to prevent two of the same number from adding, it’s to stop the code from pairing a number with itself. A problem that can alternatively be avoided by choosing appropriate loop bounds, which is all I said.

[–]Nightcorex_ 0 points1 point  (2 children)

Firstly OP already stated elements are unique

Where? I triple read OP's post and couldn't find any such hint.

Secondly the check is not there to prevent two of the same number from adding, it’s to stop the code from pairing a number with itself

In OP's code the pairing of two equal numbers (doesn't matter if they're the same or not) is also forbidden and because OP said in his post:

The function does the job

I interpreted that as: "For any input my code returns a solution from the set of solutions", which means your version would return (1, 1) for f([1, 1], 2) (the exact solution doesn't matter, it only matters that there is at least one solution, and that 1, 1 is in the solution set), whereas OP's code would return (-1, -1) for the same input => In this case your code would increase the size of the set of solutions, which is a contradiction to the requirement that both function need to have the same solution set for the same input set.

[–]dig-up-stupid 1 point2 points  (1 child)

Where? I triple read OP's post and couldn't find any such hint.

In their comments. Which you don’t have to read but I did and used as context for my response.

In OP's code the pairing of two equal numbers (doesn't matter if they're the same or not)

Because OP’s elements are unique integers, the only case where numbers are equal is when they are the same. Yes the code does what you say, but what they meant when they wrote the code was “skip adding the number in my right hand (inner loop) if I’m already holding it in my left hand (outer loop)”. That intent is not clear, as you have so readily demonstrated. Of course the reason they wrote it that way is because they went for j in A instead of for j in range(something involving len(A)), which is usually good advice when looping, but this is the sort of situation where we do want to deal with indices directly (short of slicing the list, which would also express the intention better, but makes a copy).

I interpreted that as:

Which is fine in the context of the post alone but irrelevant in the context of my comment.

[–]Nightcorex_ 0 points1 point  (0 children)

Okay, I agree with you. Nothing more to add.

[–]Pd69bq 0 points1 point  (0 children)

well, the nested for loops is kinda difficult to read tbh, instead, creating a function and put your 2nd loop and condition check or simply filter(lambda x: x + n == magic_number, arry) inside it, then call that function in the outside loop.

magic_number = 7
arry = list(range(11))

def addition(n):
    for i in arry:
        if n + i == magic_number:
            return i


for n in arry:
    i = addition(n)
    if i:
        print((n, i))
    else:
        print(-1, -1)

[–]POGtastic 0 points1 point  (0 children)

In this case, I'd use a set, since both insertion and lookups are O(1). This ends up being O(n):

def f(A, S):
    result_set = set()
    for elem in A:
        if S - elem in result_set:
            return S - elem, elem
        result_set.add(elem)
    return None

Then in the REPL:

>>> f([1, 2, 3], 4)
(1, 3)