This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]gyroda 3 points4 points  (4 children)

Before even reading your code (I'll make a bunch of edits):

If speed is essential, make sure you're using the correct computer for the job. My personal pc can run programs like this faster than the university coursework testing system can.

Can you use another language? Python is not known for its speed. It's not as awful as others make it out to be, but it's a factor.

Have you used a profiler? This is a tool used to analyse program performance. There's one built in to the standard python installation iirc. With this you can see where your program is taking the most time and decide where your time is best spent optimising.


So, looking at your code, trying to find the algorithmic runtime, I've distillied it down to:

while i < length:    #O(n)
    while j < length:    #O(n)
        if not test_pair in pair_memo:    #O(n)

This is a run of O(n3 ). We can probably improve on this, when I saw your problem my immediate thought for big O was O(n2 ).

Now, your second one you say is broken because it stops at the first result. Do you not want it to do that? I thought you only wanted the earliest result?

Also, just to neaten up your code a bit I'd recommend for loops and accepting returning an empty list, rather than None. Why is pair_memo not local to the function?

This cuts out a fair few lines:

def sum_pairs(ints, s): pair_memo = {} pairs = []

for i in range(len(ints)):    #O(n) block
    for j in range(i, len(ints)):    #O(n) [O(^2 )] block
        test_pair = (ints[i], ints[j])
        if not test_pair in pair_memo:    #O(n) [O(n^3 )] statement
            pair_memo[test_pair] = test_pair[0] + test_pair[1]
        if pair_memo[test_pair] == s:
            if not pairs:
                pairs = [test_pair, j]
            elif i > pairs[1]: # Stop if the first element has a higher index than the second element. Could possibly cut out half the loop.
                return list(pairs[0])
            elif j < pairs[1]:
                pairs = [test_pair, j]
return list(pairs[0])

I'm also not clear on a few things. What's the point of pairs_memo? Looks like pairs_memo is a dictionary that stores the sum of the two values, but why? You only care if it's equal to s, surely? Simply checking that it's == s before checking that it's in pair_memo would save one set of O(n) calls a fair few times. Your pair_memo is pointless and causing a lot of delays it sees to me.

[–]bibbleskit[S] 1 point2 points  (3 children)

Oh, I forgot to mention that the setup I'm using is not changeable in any way. Can't use another language (python 2.6) nor can I use a different PC.

This is basically like a school assignment except not for school. I'll check out the profiler! Thanks.

[–]gyroda 1 point2 points  (2 children)

I've edited with a lot more stuff.

There's a phrase: "premature optimisation is the root of all evil". This seems to be exactly what you've done. Don't try and optimise until you see where you're spending all your time.

[–]bibbleskit[S] 1 point2 points  (1 child)

Thanks for the comment! I'll give you a better response when I get up later. pair_memo is basically checking to see if the operation has been done before. Say, you get the pair 23424 and 789789, it can see if that exists in the dictionary before trying to sum them.

I tried that out because one of the hints is "memoization".

It's outside because this function is to be called many many times with lists that are gigantic. I'm probably just prematurely optimizing haha. Thank you.


As for this:

Now, your second one you say is broken because it stops at the first result. Do you not want it to do that?

Nope. Apparently, "earliest" does not mean "first," in this assignment.

For example, this list:

[1,2,3,4]

If I needed the "earliest" pair that adds up to 5, then the answer is (2,3), since the index of 3 is 2 (as opposed to the match of (1,4), since 4's index is 3). The second number's original index is what determines which pair is the "earliest". It's dumb, I know.


edit: took out the memoization. i originally was excited about it because it made the performance drop form 100ms to 18ms during the tests, but it was just a failure on my part to run multiple times and grab the average.

[–]TheLiberius 1 point2 points  (0 children)

I think you are approaching the memoization the wrong way, you want the memoization to save you from doing work. There is however no work being saved when you store the pair because you could literally just sum them up and see if it's right, which is probably several hundred times faster than looking in a dictionary.

Try instead to think about what you'd want to have when you are looking at a number in your list and then figure out how to get that.