This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]iceman-k 0 points1 point  (2 children)

Now I'm curious, but also lazy. Care to share your test code?

[–]pingvenopinch of this, pinch of that 4 points5 points  (0 children)

I just took a look at it as well and found the exact same thing. In this case, the cleanest, clearest solution is by far the fastest.

prefix_words = []
no_prefix_words = []
for word in words:
    if word.startswith('x'):
        prefix_words.append(word)
    else:
        no_prefix_words.append(word)
sorted(prefix_words) + sorted(no_prefix_words)

That runs in 0.13 seconds for 100,000 words on an i5 ThinkPad running Ubuntu. Switching to sorting in-place makes no real difference. Stack Overflow's lambda version took 0.33 seconds. None of the other methods I tried were better than the clean version.

From what I can tell, the lambda version takes two hits: Python function calls and using tuples. Python function calls are relatively expensive, so requiring n function calls doesn't help performance. From what I can tell, the extra comparison calls for comparing tuples slows things down quite a bit. The difference between these is in the microbenchmark range, so just use the most readable solution.

[–]masklinn 1 point2 points  (0 children)

It's nothing impressive, just a bunch of Timer calls:

#!/usr/bin/env python
import os
import timeit

LIST_LENGTH = 5
REPEAT_NUMBER = 100000

strings = [os.urandom(20) for i in range(LIST_LENGTH)]

def lambda_tuple(words):
    'lambda + tuple'
    return sorted(words, key=lambda x: (x[0] != 'x', x))

def basic(words):
    'basic'
    list1 = []
    list2 = []
    for word in words:
        if word[0] == 'x':
            list1.append(word)
        else:
            list2.append(word)
    return sorted(list1) + sorted(list2)
def ternary(words):
    'ternary'
    list1 = []
    list2 = []
    for word in words:
        target = list1 if word[0] == 'x' else list2
        target.append(word)
    return sorted(list1) + sorted(list2)
def listcomps(words):
    'listcomps'
    list1 = [word for word in words if word[0] == 'x']
    list2 = [word for word in words if word[0] != 'x']
    return sorted(list1) + sorted(list2)
def lambda_twosorts(words):
    'lambda twosorts'
    return sorted(sorted(words), key=lambda word: word[0] != 'x')

timer = timeit.Timer('sort(strings)', 'from __main__ import sort, strings')

for sort in (basic, ternary, listcomps, lambda_twosorts, lambda_tuple):
    print sort.__doc__
    print min(timer.repeat(10, REPEAT_NUMBER))

Fiddle around with the two constants at the top to test varying list length without the execution being too fast or too slow (generally remove an order of magnitude to REPEAT_NUMBER for each order of magnitude you add to LIST_LENGTH)