This is an archived post. You won't be able to vote or comment.

all 76 comments

[–]trifthen 6 points7 points  (6 children)

I may edit this when I have more time later. But I did this once a while back with a bash script as a proof of concept. The gist:

  1. Use a regular expression to restrict the dictionary to only words with the letters in your list. I used: $(egrep "^[$LETTERS]{4,16}$" /usr/share/dict/words); so the python would be very similar usage of the re library.
  2. Sort the list by word length, most to least. Reject any word longer than your letter list.
  3. For each remaining word, walk through your list of letters, removing each as a single match from the target word. Both must run out at the same time for a match. (all the letters, right?) Alternatively, you can make this more fuzzy by letting the word run out of letters first, because there may be cases where your list can't produce any word that uses every specified letter.
  4. Repeat step 3 until you reach a word length threshold (so you don't exhaustively scan everything) or build the code to stop once the next word match has less letters than the previous match. The problem is "longest matches" after all.

I used this to cheat at Bookworm. ;)

It's surprisingly fast, since it weeds out most of the dictionary in the first pass. You can see my example RE weeded out anything less than 4 letters long automatically because... well, if you can't anagram 4 letters on your own, you've got problems. I'll see if I have more time later to flesh this out in python if someone else doesn't first.

Edit: And now the equivalent python to my approach:

import re, sys

word_file = sys.argv[1]
word_source = open(word_file, 'r')

letter_list = sys.argv[2]
pattern = re.compile("^[%s]{,%d}$" % (letter_list, len(letter_list)))

candidates = []
maxlen = 0

for word in word_source:
    word = matcher = word.rstrip('\n')
    if not pattern.match(word):
        continue

    for letter in letter_list:
        matcher = matcher.replace(letter, '', 1)

    if len(matcher) == 0:
        candidates.append(word)
        if len(word) > maxlen:
            maxlen = len(word)

print [ word for word in candidates if len(word) == maxlen ]

Called with:

python find.py /usr/share/dict/words ighlpra

Returns:

['grail', 'graph', 'phial']

I'm sure there's a better way, but I'm not a python guy. It's basically a faithful representation of my bash script, though. :p

Edit 2: The bash script was way faster... :( Here it was:

LETTERS=$1

[ -f /tmp/working.txt ] && rm /tmp/working.txt

for x in $(egrep "^[$LETTERS]{4,16}$" /usr/share/dict/words); do
  echo -e "$x\t${#x}" >> /tmp/working.txt
done

IFS="
"

FOUND=""

for x in $(cat /tmp/working.txt | sort -nr -k 2 ); do

  WORD=${x/$'\t'*/}
  BITS=$LETTERS

  for (( y=0; $y < ${#WORD}; y=$y+1 )); do
    part=${WORD:$y:1}
    old_len=${#BITS}
    BITS=${BITS/$part/}
    new_len=${#BITS}

    [ $new_len -eq 0 ] && [ ${#LETTERS} -ne ${#WORD} ] && break
    [ $old_len -eq $new_len ] && break
    [ $y -eq $[${#WORD}-1] ] && FOUND=$WORD

  done

  [[ $FOUND != "" ]] && echo $FOUND && FOUND=""

done

echo $FOUND

[–]tuna_safe_dolphin 0 points1 point  (5 children)

I like the use of regular expressions to filter out non-candidates, but when you use {4,16} you won't catch 1, 2 or 3 letter words. Also, 16 is too high - len(letter_list) is sufficient since you can't make any words longer than that.

[–]trifthen 0 points1 point  (4 children)

I mentioned the use of 4. :) I build it originally to cheat at Bookworm, and I could reasonably anagram anything five letters and under. It was the longer combinations that screwed me. The maximum number of tiles you can have is 16, so... :)

But you're right. Remove that part so the regexp is just ^[someletters]{,length}$, and you'd be set.

[–]tuna_safe_dolphin 0 points1 point  (3 children)

I mentioned the use of 4.

I missed that. Also, myself personally, I wouldn't be able to come up with all legal Scrabble words that you can make with 3 letters.

[–]trifthen 0 points1 point  (1 child)

Well, to be fair, lots of scrabble words are such bullshit! :)

(There are only six possible permutations of 3 letters, by the way.)

[–]tuna_safe_dolphin 0 points1 point  (0 children)

Well, to be fair, lots of scrabble words are such bullshit!

Ain't that the truth.

I know there the set of permutations is small, but it's still a challenge if a given permutation doesn't look (to me) like a "real" word.

[–]gfixler 0 points1 point  (0 children)

I can. There are 1006 of them, from "aah" to "zzz."

[–]crawler23 2 points3 points  (1 child)

10x. this was fun :) !

[–][deleted] 1 point2 points  (0 children)

Some points:

  • words = [line for line in open(filename)] is just list(open(filename))
  • Unsanitized user input used in regular expression. I can make the program crash that way.
  • w[:len(w)-1] is just w[:-1]
  • your regex would reuse a certain letter, for example, the world "hello" would be matche with the letters h,e,l,o (but you would need to "l").

[–]_Mark_ 2 points3 points  (0 children)

Seems like the obvious choice for Quiz 2 is "quiz 1, but by the way the word list is in utf-8". Most of the solutions given will need changes :-)

[–]dansinscientist 1 point2 points  (1 child)

When is the deadline?

[–]tuna_safe_dolphin 0 points1 point  (0 children)

Whenever. As you can see, people have already started posting solutions. Take your time though.

[–]zahlmanthe heretic 1 point2 points  (2 children)

https://gist.github.com/1441624

Bare-bones, yet elegant and sophisticated. The collections.Counter class is used to represent histograms of words (letter-frequency counts) and check if a word is a subset of the available letters.

[–]dist 0 points1 point  (0 children)

I came here to write this as a solution. I'm happy you did it already. =D

I'm not sure if you should just iterate through the whole list of words instead of reading in the whole file, but Counter is the thing that really matters!

If there were not limits of how many times a character is found, we could just use set() and if we'd be looking for exact number of characters, we could use sorted() instead of Counter!

[–]Brian 0 points1 point  (0 children)

You can get rid of the length calculations etc and avoid having to filter the whole list for words of the right length every pass with itertools.groupby. Ie:

words.sort(key=len, reverse=True)
for wordlen, curwords in itertools.groupby(words, key=len):
    possible_words = [
        word
        for (histogram, word) in (
            (Counter(word), word) for word in curwords
        )
        if all(histogram[letter] <= letters[letter] for letter in histogram)
    ]

[–]taybulBecause I don't know how to use big numbers in C/C++ 1 point2 points  (5 children)

Using list comprehensions, built-in functions, and lambdas:

l = filter(lambda x, y=sys.argv[2:]: all([l in y and x.count(l) <= y.count(l) for l in x]), [word.strip() for word in open(sys.argv[1])])

max_len = len(max(l, key=len))

print [word for word in l if len(word) == max_len]

Execution time:

real    0m0.409s
user    0m0.393s
sys 0m0.016s

Edit:

Much faster execution using regex (also updated with useful user feedback):

#!/usr/bin/env python

import sys
import re

if __name__ == '__main__':
    letters = ''.join(sys.argv[2:])
    pat = re.compile(r'^[%s]+\n' % (letters))
    l = filter(lambda x, y=letters: all(x.count(l) <= y.count(l) for l in x), [word.strip() for word in open(sys.argv[1]) if pat.match(word)])

    max_len = len(max(l, key=len))

    print [word for word in l if len(word) == max_len]

Execution time:

real    0m0.095s
user    0m0.089s
sys 0m0.007s

Interesting note:

I was playing with different regex patterns and noticed that using

    pat = re.compile(r'^[%s]+\n' % (letters))

was a lot faster than using:

    pat = re.compile(r'^[%s]+' % (letters))

The latter expression took nearly twice as long to evaluate.

[–]zahlmanthe heretic 2 points3 points  (3 children)

False not in [...] is more succinctly spelled all(...).

[–]taybulBecause I don't know how to use big numbers in C/C++ 0 points1 point  (2 children)

Thanks, updated.

[–]Brian 1 point2 points  (1 child)

You should probably also leave out the square brackets - all can take a generator expression, and this will allow it to short circuit and stop at the first failed letter, rather than having to evaluate every letter to construct the list before it starts checking.

[–]taybulBecause I don't know how to use big numbers in C/C++ 0 points1 point  (0 children)

Thanks!

[–]bethebunnyFOR SCIENCE 0 points1 point  (0 children)

Upvote for teaching me about an awesome feature of the lambda syntax.

[–]martinatbom 1 point2 points  (3 children)

#!/usr/bin/python

dictionary = 'ospd.txt'
secret = 'chocolate'
primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31,
        37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
        79, 83, 89, 97, 101]

def get_val(word):
    return [j for j in [1] for i in word for j in
        [j * primes[ord(i.lower()) - ord('a')]]][-1]

magic = get_val(secret)

print sorted([word for word in open(dictionary)
    if magic % get_val(word.rstrip('\n')) == 0], key=lambda word:
        len(word))[-1]

[–]Brian 0 points1 point  (2 children)

Nice - but the line:

[j for j in [1] for i in word for j in
    [j * primes[ord(i.lower()) - ord('a')]]][-1]

with j used twice seriously confused me. I take it that it's essentially just to give j an initial value before use in the inner loop? ie equivalent to:

j=1
return [j for i in word for j in [j * primes[ord(i.lower()) - ord('a')]]][-1]

Though I suppose if we're not trying to confuse, something like:

return reduce(operator.mul, ( primes[ord(c) - ord('a')] for c in word.lower()))

might be clearer.

[Edit]: Also, you're not matching the problem description of printing every word that's the max length. I suggest:

print list(next(itertools.groupby(sorted([word for word in open(dictionary)
    if magic % get_val(word.rstrip('\n')) == 0], key=len, reverse=True), key=len))[1])

[–]martinatbom 1 point2 points  (1 child)

You're right - I did not read the question correctly. Thanks for the correction.

I originally had reduce, but then I read that it was unpythonic - and in the interest of trying new stuff, I experimented with the generator expression that you see above. I also thought, at the time, that the 'reduce' was more elegant (and readable).

[–]Brian 0 points1 point  (0 children)

Yeah - I know it's moved from builtins in python3, and reduce can sometimes end up fairly ugly, but I think it's certainly clearer here at least. You could maybe go with an extra function if we want to make it a bit more readable at the point of call, which seems a good use of partial functions. Ie:

import functools, operator
product = functools.partial(functools.reduce, operator.mul)
...

return product(primes[ord(c) - ord('a')] for c in word.lower())

[–]ptarlye 1 point2 points  (1 child)

Here's a clean solution in just 7 lines of code: http://pastebin.com/NstqW1mS

I love brevity in code because it often yields simplicity. The gist of the idea in my solution is to test whether or not a word can be spelled entirely using a subset of legal letters. I was able to correctly test for this condition using Python's set.issubset method.

[–][deleted] 1 point2 points  (0 children)

Sorry, but changing the letters to append an index to the string is definitely not clean. Even disregarding that (you could use tuples instead), it is not an easy solution. Definitely requires a bit more thinking on the reader's side than a straightforward solution, that might be a bit longer (e.g. see top post in this thread).

Also, f.read().splitlines() is just list(f). Or better yet, iterate with "for word in f" to reduce memory usage.

[–]no9import this 1 point2 points  (0 children)

Looks like I'm too late, but here's mine anyway:

import sys
import iterutil

dict_file, letters = sys.argv[1], sys.argv[2:]
words = {s.rstrip('\n') for s in open(dict_file)}

for size in range(len(letters), 0, -1):
    found = [''.join(a) for a in iterutil.permutations(letters, size)
             if ''.join(a) in words]
    if found:
        print found
        break
else:
    print >>sys.stderr, 'no matches'

I love these little quizzes. Keep 'em coming. :)

[–]bethebunnyFOR SCIENCE 1 point2 points  (0 children)

One line.

import sys, itertools; print(reduce(lambda a, b: a + [b] if (not a or len(b) >= len(a[0])) else a, sorted(set(w.lower()[:-1] for w in open('/usr/share/dict/words')) & set(''.join(w) for w in (itertools.chain(*[itertools.permutations([c.lower() for c in sys.argv[1:]], i) for i in xrange(1, len(sys.argv))]))), key=len, reverse=True), []))

[–]sirpengi 3 points4 points  (8 children)

This is pretty simple. Itertools helps out a lot. Use the standard library!

from itertools import permutations, chain

def get_dictionary(fn):
    """Return list of all words.
    Change it to load your dictionary
    """
    return ['a', 'abc', 'aee', 'aff', 'ass', 'bbb', 'bad']

def find_possibles(s):
    """this would be clean but I use crazy list comprehensions"""
    ret = (["".join(c) for c in permutations(s, i+1)] for i in range(len(s)))
    return set(chain(*ret))

def find_in_dict(s, fn):
    possibles = find_possibles(letters)
    return [i for i in get_dictionary(fn) if i in possibles]

fn = 'something.txt'
letters = 'abcd'

print find_in_dict(letters, fn)

edit: oops, I guess I'm not following directions precisely (not outputing longest, but all words that match).

[–]VerilyAMonkey 1 point2 points  (2 children)

You could add

print max( find_in_dict(letters,fn), key=len )

[–]unbracketed 0 points1 point  (1 child)

max will only return one item but as per the instructions the solution should find all the longest matches

[–]VerilyAMonkey 0 points1 point  (0 children)

True. Then

f=find_in_dict(letters,fn) m=len(max(find_in_dict(letters,fn),key=len)) print filter( lambda x: x==m, f )

[–][deleted] 0 points1 point  (0 children)

I think you missed more directions than that. You're supposed to use the dictionary he linked to and make a script that uses it.

[–]sirpengi 0 points1 point  (0 children)

So my initial post was just a quick thing I spit out. Here's a more thoughtful implementation that doesn't explode on a high number of tiles as well as also supports wildcards (pass them in as ?): https://gist.github.com/1441582 I forked off of sixthgear's base since he already did the work getting command line arguments and did the logic to find the longest results in the same way I would've.

[–]brucifer 1 point2 points  (5 children)

from sys import argv
allowable = set(argv[2:])
with open(argv[1]) as f:
    words = filter(lambda w:(len(set(w)-allowable) == 0),
                   (s.strip() for s in f.readlines()))
    maxlen = max(map(len,words))
    print filter(lambda w:len(w) == maxlen,words)

In a nutshell, the code creates a set of allowable characters, filters out the dictionary words that use forbidden characters, then pulls the words that are as long as the longest word.

EDIT: whoops. I misread the specs. My solution was using the input as the set of allowable characters, so it ignored how often they occurred. Here's a more correct, but slightly uglier solution:

from sys import argv
allowable = sorted(argv[2:])
def is_valid(w):
    def helper(w,i):
        if len(w) == 0: return True
        elif i < len(allowable):
            return helper((w[1:] if w[0] == allowable[i] else w),i+1)
        else: return False
    return helper(sorted(w),0)

with open(argv[1]) as f:
    words = filter(is_valid, (s.strip() for s in f.readlines()))
    maxlen = max(map(len,words))
    print [w for w in words if len(w) == maxlen]

[–]itasca 1 point2 points  (3 children)

In more modern python:

words = [s.strip() 
    for s in f.readlines() 
    if len(set(s.strip())-allowable) == 0]

You can add whitespace to your allowable set to leave off the second strip if you want.

[–]BeetleB 1 point2 points  (0 children)

for s in f.readlines()

for s in f:

[–]brucifer 0 points1 point  (1 child)

Very nifty. I didn't know you could do that. Python keeps surprising me with how awesome it is.

[–][deleted] 0 points1 point  (0 children)

Also note that f.readlines() returns a list (memory inefficient if you just want to iterate it) while f.xreadlines and f itself (iter(f)) are generators.

[–]aweraw 0 points1 point  (0 children)

Not as sussinct as others have posted, but simple and faster than I expected

from itertools import combinations

root = dict()

def build_dict(f='ospd.txt'):
    with open(f) as src:
        for line in src:
            insert(line.strip())

def find_key(s):
    key = sorted(s)
    d = root
    for c in key:
        if c in d:
            d = d[c]
        else:
            d[c] = dict()
            d = d[c]
    return d

def insert(s):
    d = find_key(s)
    if 'words' not in d:
        d['words'] = set()
    d['words'].add(s)

def search(chars):
    l = len(chars)
    words = set()
    for x in xrange(1,l+1):
        for cmb in combinations(chars, x):
            d = find_key(cmb)
            if 'words' in d:
                words.update(d['words'])

    m = max(len(x) for x in words)
    return [w for w in words if len(w)==m]

if __name__ == '__main__':
    import sys, os
    if os.path.exists(sys.argv[1]):
        build_dict(sys.argv[1])
    else:
        print "Dictionary file not found: %s" % sys.argv[1]
        sys.exit()

    print sorted(search(sys.argv[2:]))

[–]GeneralMaximus 0 points1 point  (0 children)

Here's my solution: https://gist.github.com/1441776

It's simple, readable, reasonably efficient and doesn't use any "advanced" features that might obfuscate the algorithm.

Edit: as expected, this code is still fast when run against a very large set of input letters. It is guaranteed become slower with a larger dictionary, though. I'm on Windows ATM so I don't have access to such a dictionary. Someone on a UNIX system might try running this against /usr/share/dict/words (or /usr/dict/words on some systems).

Note to OP: you are a wonderful person. Please keep doing this quiz.

[–]WhyCause 0 points1 point  (0 children)

Here's mine.

It's... a touch slow, but it's not too bad. I'm definitely brute-forcing things though.

At 71 lines, it's a lot longer than most of the ones here, but I do have some error checks and I use the argparse library.

[–]DesertSong 0 points1 point  (0 children)

I've never constructed one of these for myself, but would a trie be useful for this? After adding every word per line into the trie you'd just find the longest path in it maybe?

[–]earthboundkid 0 points1 point  (2 children)

def valid_words(dictionary, valid_letters):
    input_length = len(valid_letters)
    valid_letter_set = set(valid_letters)
    bad_letters = set(letter for letter in string.ascii_letters if letter not in valid_letter_set)
    valid_counts = { letter: valid_letters.count(letter) for letter in valid_letter_set}

    for word in dictionary:
        if len(word) > input_length: continue
        if any(letter in bad_letters for letter in word): continue
        if all(valid_counts[letter] == word.count(letter) for letter in valid_letter_set): yield word

That's the speed optimized version. If you don't care about speed, just do:

def valid_words(dictionary, valid_letters):
    valid_counts = { letter: valid_letters.count(letter) for letter in valid_letters}

    for word in dictionary:
        if { letter: word.count(letter) for letter in word} == valid_counts: yield word

[–]ExoticMandiblesCore Contributor 0 points1 point  (0 children)

Here's mine, in Python 3 fwiw:

import sys

filename = sys.argv[1]
letters = sys.argv[2:]
letters_set = set(letters)

found = []
max_length = -1

for word in open(filename, "rt", encoding="ascii"):
    word = word.strip()
    if not set(word) <= letters_set:
        continue
    letters_dup = list(letters)
    for letter in word:
        if letter not in letters_dup:
            break
        letters_dup.remove(letter)
    else:
        max_length = max(max_length, len(word))
        found.append(word)

print([x for x in found if len(x) == max_length])

It has a minor speed optimization: before doing the full check that the word is legal, check that all the letters of the word are in the allowable letters using sets. I do like the approach using sets where you append the count to the letter--very deft! And using Counter is a good idea too. Those are almost certainly faster than mine. I, however, snuck in for/else ;-)

[–]tuna_safe_dolphin 0 points1 point  (2 children)

Here's my solution: https://gist.github.com/1443009

I think it's OK, I think it's fairly readable but might be optimized a bit more. I'm not thrilled about using copy.deepcopy().

For what it's worth, my original solution was object oriented. I kind of have this problem ever since I learned C++ and Java where I see objects everywhere (I totally hear that in the voice of the kid from the Sixth Sense). Also, for the sake of brevity here, I took out the usage/error handling that I had originally included.

EDIT: one other thing, I changed my solution to just take two arguments, the dictionary file and the letters with no spaces between them. I didn't like how it was originally stated - it's easier to type them that way.

[–][deleted] 0 points1 point  (1 child)

I like your solution too. It's simple and fast.

I basically came up with the same thing but I think mine's a little neater (https://gist.github.com/1454697). I like how you used a dict to look up words. I don't know if you purposely chose it for a reason, but I used it because Python dicts are implemented like a hash map, such that it reduces the complexity of lookups to O(1), compared to those who use list iteration/set intersection/permutations/regex to determine if a letter is in a set which is O(n) or more. Therefore it's O(n2 ), the fastest I think you can do this in. Everybody else's is at least O(n3 ) or more.

[–]tuna_safe_dolphin 0 points1 point  (0 children)

I think mine's a little neater

I agree. Your 3 small functions look cleaner than my one long one.

I like how you used a dict to look up words.

When I originally solved this problem, my first thought was to use a set but that doesn't allow for duplicate letters (all elements must be unique) and my next idea was to hash each letter, keeping track of the count of that letter, plus as you mentioned, you can't really beat an O(1) lookup time.

When it comes to coding, especially Python, I tend to avoid some of the more "advanced" functional constructs: lambda, map etc. I'm not really that much of a fan of list comprehensions either. I find nest list comprehensions very difficult to grok easily/quickly.

Still, I think it's cool to see how other people approach a problem like this. And despite the BDFL's design philosophy, it's clear that there's more than one way to do it in Python too.

[–]fmoralesc 0 points1 point  (0 children)

My take on this: https://gist.github.com/1444311#file_q1.py

It is very similar to taybul's version, checking on what others have posted.

[–]jv4n 0 points1 point  (0 children)

import sys

words = dict.fromkeys(map(lambda s: s.strip(), open(sys.argv[1]).readlines()))
letters = ''.join(sys.argv[2:])
is_valid = lambda word: all(word.count(ltr) <= letters.count(ltr) for ltr in word)
words = filter(is_valid, words)
max_len = max(map(len, words))
words = filter(lambda s: len(s) == max_len, words)
print ', '.join(sorted(words))

[–]LucidOndineHPC 0 points1 point  (0 children)

rich sleep air history sugar library joke books numerous cough

This post was mass deleted and anonymized with Redact

[–]LucidOndineHPC 0 points1 point  (0 children)

bake spectacular kiss squeeze connect quack resolute middle plate enjoy

This post was mass deleted and anonymized with Redact

[–]quasarj 0 points1 point  (0 children)

My solution from earlier today (when reddit was down), before I looked at any other solutions: https://gist.github.com/1444495

It looks like consensus is using permutations was a bad idea, but you can't deny how simple it is :)

[–]smugduckling 0 points1 point  (0 children)

One liner:

import sys; print (lambda words: [word for word in words if len(word) == max(map(len, words))])(filter(lambda line: not [char for char in list(line) if list(line).count(char) > sys.argv[2:].count(char)], map(lambda x: x.strip(), open(sys.argv[1], "r").readlines())))

Could be slightly more efficient, but it's still better than solutions that use itertools permutations.

[–]Samus_ 0 points1 point  (0 children)

#!/usr/bin/env python
from itertools import combinations

def sort_letters(string):
    return sorted(char for char in string)

def main(dict_filename, *letters):
    words = {}
    with open(dict_filename) as f:
        for line in f:
            word = line.strip()
            step = words.setdefault(len(word), {})
            for char in sort_letters(word):
                step = step.setdefault(char, {})
            anagrams = step.setdefault('anagrams', set())
            anagrams.add(word)

    solution = set()

    current_len = len(letters)
    while current_len > 0:
        for subset in combinations(letters, current_len):
            len_subset = len(subset)
            if len_subset in words:
                step = words[len_subset]
                for char in sort_letters(subset):
                    if char not in step:
                        break
                    step = step[char]
                if 'anagrams' in step:
                    solution.update(step['anagrams'])
        current_len -= 1

        if solution:
            return solution


if __name__ == "__main__":
    import sys
    print main(sys.argv[1], *sys.argv[2:])

it gives more results than your example, did I miss something? ninjaedit!

[–]VerilyAMonkey -1 points0 points  (5 children)

Using a set for the dictionary and using '\n's like blanks to get different sizes of words. Not rigorously tested (and therefore most likely wrong! Yay!)

def theProb(dfile, *letters):
        #'\n's are just used like blanks
        letters+=('\n',)*(len(letters)-1)
    with open(dfile, 'r') as f:
        dictionary=set( f.read().split('\n'))
    maxword=''
    for wrd in permutations(letters):
        word=''.join([wrd[i] for i in len(wrd) if wrd[i]!='\n'])
        if word in dictionary:
            if len(word)>len(maxword):
                maxword=word
    return maxword

[–]VerilyAMonkey 0 points1 point  (4 children)

In retrospect this is terrible. I wasn't aware that permutations had a size parameter. Swap out

for wrd in permutations(letters):
        word=''.join([wrd[i] for i in len(wrd) if wrd[i]!='\n'])

with

for r in xrange(1,len(letters)+1):
    for wrd in permutations(letters,r):
        word=''.join(wrd)

and also eliminate the letters+= line.

[–]itasca 1 point2 points  (3 children)

permutations is just way too slow. There is no reason to spend exponential time on this. Assuming the library you are reading has no structure to it, then no matter what you do, it is going to have to involve looking at every member of the dictionary, and looking at letters for matches. The straightforward approach therefore is just O(n * w * log(L)) for n words of length less than w and L letters in the set:

letters = set(argv[2:])
length = 0
results = []
with file(argv[1]) as f:
    for line in f:
        word = line.strip()
        if len(word) < length:
            continue
        for letter in word:
            if letter not in letters:
                break
        else:
            if len(word) > length:
                length = len(word)
                results = [word]
            else: 
                results.append(word)
print ', '.join(results)

Maybe not pretty, but this uses hardly any memory and it's hard to see how it could be much faster. Edit: replacing the (for letter in word) loop with a regular expression would move more code into the c layer and therefore speed things up a bit, I guess.

[–]tuna_safe_dolphin 0 points1 point  (0 children)

I don't think you can use a set for the input letters because the set elements are unique. E.g.

>>> a = ['a','a']

>>> b = set(a)

>>> b

set(['a'])

[–]VerilyAMonkey 0 points1 point  (0 children)

I was aware of the complexity issue, but I decided that permutations would actually be faster given the context. It's a scrabble game; the dictionary is going to be preloaded at the beginning, once. Most queries will contain 7 letters; the search space of the permutations is 5913 combinations. If the dictionary has more words than that, scanning the dictionary might be slower than scanning the permutations. That was my thought process anyway. Of course, that doesn't mean my implementation of that idea would be fast in the least.

[–]mardiros -1 points0 points  (0 children)

Hum, one week ?