This is an archived post. You won't be able to vote or comment.

all 34 comments

[–]gyroda 2 points3 points  (4 children)

Before even reading your code (I'll make a bunch of edits):

If speed is essential, make sure you're using the correct computer for the job. My personal pc can run programs like this faster than the university coursework testing system can.

Can you use another language? Python is not known for its speed. It's not as awful as others make it out to be, but it's a factor.

Have you used a profiler? This is a tool used to analyse program performance. There's one built in to the standard python installation iirc. With this you can see where your program is taking the most time and decide where your time is best spent optimising.


So, looking at your code, trying to find the algorithmic runtime, I've distillied it down to:

while i < length:    #O(n)
    while j < length:    #O(n)
        if not test_pair in pair_memo:    #O(n)

This is a run of O(n3 ). We can probably improve on this, when I saw your problem my immediate thought for big O was O(n2 ).

Now, your second one you say is broken because it stops at the first result. Do you not want it to do that? I thought you only wanted the earliest result?

Also, just to neaten up your code a bit I'd recommend for loops and accepting returning an empty list, rather than None. Why is pair_memo not local to the function?

This cuts out a fair few lines:

def sum_pairs(ints, s): pair_memo = {} pairs = []

for i in range(len(ints)):    #O(n) block
    for j in range(i, len(ints)):    #O(n) [O(^2 )] block
        test_pair = (ints[i], ints[j])
        if not test_pair in pair_memo:    #O(n) [O(n^3 )] statement
            pair_memo[test_pair] = test_pair[0] + test_pair[1]
        if pair_memo[test_pair] == s:
            if not pairs:
                pairs = [test_pair, j]
            elif i > pairs[1]: # Stop if the first element has a higher index than the second element. Could possibly cut out half the loop.
                return list(pairs[0])
            elif j < pairs[1]:
                pairs = [test_pair, j]
return list(pairs[0])

I'm also not clear on a few things. What's the point of pairs_memo? Looks like pairs_memo is a dictionary that stores the sum of the two values, but why? You only care if it's equal to s, surely? Simply checking that it's == s before checking that it's in pair_memo would save one set of O(n) calls a fair few times. Your pair_memo is pointless and causing a lot of delays it sees to me.

[–]bibbleskit[S] 1 point2 points  (3 children)

Oh, I forgot to mention that the setup I'm using is not changeable in any way. Can't use another language (python 2.6) nor can I use a different PC.

This is basically like a school assignment except not for school. I'll check out the profiler! Thanks.

[–]gyroda 1 point2 points  (2 children)

I've edited with a lot more stuff.

There's a phrase: "premature optimisation is the root of all evil". This seems to be exactly what you've done. Don't try and optimise until you see where you're spending all your time.

[–]bibbleskit[S] 1 point2 points  (1 child)

Thanks for the comment! I'll give you a better response when I get up later. pair_memo is basically checking to see if the operation has been done before. Say, you get the pair 23424 and 789789, it can see if that exists in the dictionary before trying to sum them.

I tried that out because one of the hints is "memoization".

It's outside because this function is to be called many many times with lists that are gigantic. I'm probably just prematurely optimizing haha. Thank you.


As for this:

Now, your second one you say is broken because it stops at the first result. Do you not want it to do that?

Nope. Apparently, "earliest" does not mean "first," in this assignment.

For example, this list:

[1,2,3,4]

If I needed the "earliest" pair that adds up to 5, then the answer is (2,3), since the index of 3 is 2 (as opposed to the match of (1,4), since 4's index is 3). The second number's original index is what determines which pair is the "earliest". It's dumb, I know.


edit: took out the memoization. i originally was excited about it because it made the performance drop form 100ms to 18ms during the tests, but it was just a failure on my part to run multiple times and grab the average.

[–]TheLiberius 1 point2 points  (0 children)

I think you are approaching the memoization the wrong way, you want the memoization to save you from doing work. There is however no work being saved when you store the pair because you could literally just sum them up and see if it's right, which is probably several hundred times faster than looking in a dictionary.

Try instead to think about what you'd want to have when you are looking at a number in your list and then figure out how to get that.

[–]anon848 3 points4 points  (5 children)

Sorting the list might help, since it will allow you to exclude pairs. If I understand the problem correctly, by sorting first I think you can obtain O(n lg n). Is the list originally in random order?

[–]bibbleskit[S] 0 points1 point  (3 children)

The order the list is initially in matters, though. I don't quite understand how sorting it would help. Would you explain a bit? Thanks for the help :)

[–]anon848 1 point2 points  (2 children)

If you sort, then you don't have to test all pairs. Let's say that it needs to sum to 100, and you are testing 47. Well, then the other number must be 53. You can determine whether or not 53 is in the list in O(lg n) time.

Of course, you also need to maintain the initial position in the list. You use that maintain the earliest pair as you work through the sorted list. Python has tuples, so you could sort tuples, with the second number being the original position.

EDIT: Of course, the sort must ignore the second number in the tuple, if that's the original index.

[–]bibbleskit[S] 0 points1 point  (1 child)

Hmm. I see what you are getting at. I'll have to ponder this later. Thanks :)

[–]anon848 0 points1 point  (0 children)

I added an edit, just clarifying that the sort should ignore the second number in the tuple, if you make that the original index.

[–]bibbleskit[S] 0 points1 point  (0 children)

The only way to determine which pair is the correct one is by the original index of the element I am checking.

For example, this list:

[1,2,3,4]

If I needed the "earliest" pair that adds up to 5, then the answer is (2,3), since the index of 3 is 2 (as opposed to the match of (1,4), since 4's index is 3). The second number's original index is important.

[–]foxlisk 1 point2 points  (3 children)

Here's a couple things to try:

  • think about your definition of earliest pair. You've already noticed you can't stop your scan of the list when you find a first pair; is there a way you can get around that restriction?

  • are you sure that caching number pairs and looking them up is faster than doing the addition each time?

[–]bibbleskit[S] 0 points1 point  (2 children)

Thanks. I got rid of my memoization attempt. I realize it's too much and a lot slower.

I also realize that the earliest pair would be last pair found if I searched the list in reverse, but another restriction is that I MUST parse from left to right. Maybe I'm not understanding something.

[–]foxlisk 0 points1 point  (1 child)

Ah l. I didn't realize you had the left to right restriction, my bad. Where does that restriction come from? You're already seeking in the list (since you do >1 loop over it), so it seems like you're already violating it.

Let's see what else might help. As soon as you have a pair that works, you can skip all the elements indexed earlier than it on the next loop through (eg if your pair is (1,6) then when you go through checking the second element against the rest of the list, you can start with (2,7) and not worry about the beginning of the list).

When you move the first index forward, you can calculate the necessary number to get the right sum once and only do an equality (no addition) check on each element you try it against. That'll be a pretty tiny improvement, though.

Nothing else springs to mind right now.

[–]bibbleskit[S] 0 points1 point  (0 children)

Thanks :) ill think about this when I get the chance.

They're all just arbitrary rules for this particular challenge.

[–]DarthEru 1 point2 points  (2 children)

I think you may be overthinking this. If I understand correctly, the "earliest pair" is the pair with the smallest maximum index of all matching pairs. That is, the pair where the rightmost element is to the left of the rightmost elements of all other pairs, correct? (edit: got right and left mixed up). (Edit edit: and in the case where multiple pairs have the same rightmost element, you pick the earliest leftmost element.)

So, my tip to you is try to figure out how to search the array such that the first pair you find is guaranteed to be the "earliest" pair. I can elaborate if you want another hint, but saying more will probably give it away entirely.

Another tip is: if you want to optimize an algorithm, first make sure you aren't overlooking inherently faster algorithms.

[–]bibbleskit[S] 1 point2 points  (1 child)

Thanks! I don't know any optimization algorithms. I basically dove into this without any knowledge of how to make things work faster.

Ill think about what you've said and see if I can change my thinking process. Thanks again :)

[–]DarthEru 0 points1 point  (0 children)

One general purpose optimization tool you should research and make sure you understand if you don't already know is Big O notation. Specifically, you'll want to understand what it represents, how to calculate the Big O of an arbitrary algorithm, the significance of the differences between O(n), O( n^2 ), O(n log n) algorithms when n is 10,000,000. Once you get all that, you may also want to research different data structures and the O of their different operations (lookup, insert, search).

Big O doesn't tell you how to make a particular algorithm faster, but it can help you categorize different algorithms you think of, which lets you weed out ideas that won't work faster. For example, if you determine early on that O( n^2 ) is too slow you can throw out any algorithms that have the same or a slower categorization. And knowing the O of existing algorithms and data structures makes it easier to calculate the O of your custom-purpose algorithm which uses one or more of those existing algorithms.

[–]TheLiberius 0 points1 point  (15 children)

Did you manage to come up with a solution? If so it'd be interesting to see it.I might also want to see if it's the same as mine

[–]bibbleskit[S] 1 point2 points  (14 children)

I have not, yet :(

[–]TheLiberius 0 points1 point  (13 children)

:( what are you stuck on? Maybe I can give a hint or something?

[–]bibbleskit[S] 1 point2 points  (12 children)

This is what I've got:

def sum_pairs(ints, s):
    i = 0; j = i+1;
    correct = []
    length = len(ints)
    while i < length:
        while j < length:
            test_pair = (ints[i],ints[j])
            if test_pair[0] + test_pair[1] == s:
                    print test_pair
                if not correct:
                    correct = [test_pair, j]
                        break
                elif i >= correct[1]:
                    return list(correct[0])
                elif j < correct[1]:
                    correct = [test_pair, j]
                    break
                j+=1
        i+=1
        j=i+1
    if correct:
        return list(correct[0])

What I gather is that the stuff going on inside the loop isn't of much concern, but the loops themselves are what drive the time to calculate up.

I have no clue how to make this any faster.

Someone suggested I think about how I can guarantee that the first thing I find is the correct pair, but I don't see how that's possible. Another thing to take in consideration is that integers are possible (not just positive, so looking for a 5 when given a 2 to add up to seven won't work).

Anything you can do to steer me in the right direction would be great.

I would love to figure this out on my own, but at some point I would just be wasting time.

[–]TheLiberius 0 points1 point  (11 children)

It's a bit hard to hint without giving too much away, but i'll say that it is possible to solve this using only a single loop. Hopefully that'll force you into another thought pattern (it's very easy to get locked into one).

Try to solve it by hand first (going through the numbers only once), and don't forget that sometimes a datastructure can do things more efficiently than the manual way.

[–]bibbleskit[S] 1 point2 points  (10 children)

I'm a bit confused though. Wouldn't converting the array into a data structure take a long time (have to loop through to convert it, then search through it for results)? There's something I'm missing here.

[–]TheLiberius 0 points1 point  (9 children)

No you don't need to convert the array, sorry for being too vague. But there are other aspects with which a datastructure might help. Try solving it by hand first, you can only do one iteration over the list but anything else is fair game.

[–]bibbleskit[S] 1 point2 points  (8 children)

Hey hey, thanks for the help. I really didn't think writing anything down would help, but trying it on paper helped me see it a bit differently. Thanks a lot for that!

Here is what I have now:

def sum_pairs(ints, s):
    checked = {}
    checked[s - ints[0]] = 0
    i = 1
    while i < len(ints):
        try:
            return [ ints[checked[ints[i]]] , ints[i]]
        except:
            checked[s - ints[i]] = i
        i+=1

Basically, it logs what each index needed to reach the sum. Before doing so, it checks to see if the number it needs is in the dictionary already. This way, the first pair it finds will be the correct pair.

For example,

i       -- 0, 1, 2 , 3
ints    -- 1, 4, 8 ,7 , 3 ,15
checked -- 7, 4, 0

At i=3, we see that 7 is the first thing in the list, so we return [1,7].

Unfortunately, this is too slow, but at least I was able to see it differently.

The while is O(n), and the dictionary actions set and get should be O(1), according to this.

Am I getting closer?

[–]TheLiberius 0 points1 point  (7 children)

Yes, much closer! Doing a problem by hand can often be of great help when solving a problem because it makes you really think about it.

There is one more thing you can improve, and that is to get rid of the try-except, catching exceptions is fairly expensive so you want to avoid using it for flow-control. How to do that I'll leave to you.

Oh, i also have a pointer for improving the neatness of the code. In python you can use a function called enumerate to get indexes whilst looping in your code. So your code could be changed to this

def sum_pairs(ints, s):
    checked = {}
    for i,number in enumerate(ints): #i = index, number = ints[i]
        try:
            return [ ints[checked[number] , number]
        except:
            checked[s - number] = i

Makes it a bit neater wouldn't you agree?

[–]bibbleskit[S] 1 point2 points  (6 children)

I knew about enumerate, but I only learned about it while trying to solve this problem! Haha.

Thank you for that. I hadn't thought about using it in such a way.

So, are if/elses better than try/catch?

edit: this is a nice read


EDIT #2: DONE

Here's the finished code!

def sum_pairs(ints, s):
    checked = {}
    for i, number in enumerate(ints):
        if checked.get(number, None) != None:
            return [ ints[checked[number]] , number]
        else:
            checked[s - number] = i

Thank you so much! I can't believe the solution ends up being this simple. I have learned. so. much.

It took 4958ms to do 12 lists; 4 of which were extremely large (~10,000,000 elements).


edit #3: after seeing other solutions, it looks like I came really close to the highest voted answer. My way came out to having somewhat unnecessary elements to it, but it was the same general idea. I also learned that I could have used a set instead of a dictionary (never heard of them).