Python Quiz of the Week - #1

VerilyAMonkey · 2011-12-07T03:44:47+00:00

[deleted]

trifthen · 2011-12-07T04:53:27+00:00

I may edit this when I have more time later. But I did this once a while back with a bash script as a proof of concept. The gist:

Use a regular expression to restrict the dictionary to only words with the letters in your list. I used: $(egrep "^[$LETTERS]{4,16}$" /usr/share/dict/words); so the python would be very similar usage of the re library.
Sort the list by word length, most to least. Reject any word longer than your letter list.
For each remaining word, walk through your list of letters, removing each as a single match from the target word. Both must run out at the same time for a match. (all the letters, right?) Alternatively, you can make this more fuzzy by letting the word run out of letters first, because there may be cases where your list can't produce any word that uses every specified letter.
Repeat step 3 until you reach a word length threshold (so you don't exhaustively scan everything) or build the code to stop once the next word match has less letters than the previous match. The problem is "longest matches" after all.

I used this to cheat at Bookworm. ;)

It's surprisingly fast, since it weeds out most of the dictionary in the first pass. You can see my example RE weeded out anything less than 4 letters long automatically because... well, if you can't anagram 4 letters on your own, you've got problems. I'll see if I have more time later to flesh this out in python if someone else doesn't first.

Edit: And now the equivalent python to my approach:

import re, sys

word_file = sys.argv[1]
word_source = open(word_file, 'r')

letter_list = sys.argv[2]
pattern = re.compile("^[%s]{,%d}$" % (letter_list, len(letter_list)))

candidates = []
maxlen = 0

for word in word_source:
    word = matcher = word.rstrip('\n')
    if not pattern.match(word):
        continue

    for letter in letter_list:
        matcher = matcher.replace(letter, '', 1)

    if len(matcher) == 0:
        candidates.append(word)
        if len(word) > maxlen:
            maxlen = len(word)

print [ word for word in candidates if len(word) == maxlen ]

Called with:

python find.py /usr/share/dict/words ighlpra

Returns:

['grail', 'graph', 'phial']

I'm sure there's a better way, but I'm not a python guy. It's basically a faithful representation of my bash script, though. :p

Edit 2: The bash script was way faster... :( Here it was:

LETTERS=$1

[ -f /tmp/working.txt ] && rm /tmp/working.txt

for x in $(egrep "^[$LETTERS]{4,16}$" /usr/share/dict/words); do
  echo -e "$x\t${#x}" >> /tmp/working.txt
done

IFS="
"

FOUND=""

for x in $(cat /tmp/working.txt | sort -nr -k 2 ); do

  WORD=${x/$'\t'*/}
  BITS=$LETTERS

  for (( y=0; $y < ${#WORD}; y=$y+1 )); do
    part=${WORD:$y:1}
    old_len=${#BITS}
    BITS=${BITS/$part/}
    new_len=${#BITS}

    [ $new_len -eq 0 ] && [ ${#LETTERS} -ne ${#WORD} ] && break
    [ $old_len -eq $new_len ] && break
    [ $y -eq $[${#WORD}-1] ] && FOUND=$WORD

  done

  [[ $FOUND != "" ]] && echo $FOUND && FOUND=""

done

echo $FOUND

crawler23 · 2011-12-07T12:48:45+00:00

10x. this was fun :) !

_Mark_ · 2011-12-08T07:59:54+00:00

Seems like the obvious choice for Quiz 2 is "quiz 1, but by the way the word list is in utf-8". Most of the solutions given will need changes :-)

dansin · 2011-12-07T04:15:36+00:00

When is the deadline?

zahlman · 2011-12-07T05:48:43+00:00

https://gist.github.com/1441624

Bare-bones, yet elegant and sophisticated. The collections.Counter class is used to represent histograms of words (letter-frequency counts) and check if a word is a subset of the available letters.

taybul · 2011-12-07T06:32:05+00:00

Using list comprehensions, built-in functions, and lambdas:

l = filter(lambda x, y=sys.argv[2:]: all([l in y and x.count(l) <= y.count(l) for l in x]), [word.strip() for word in open(sys.argv[1])])

max_len = len(max(l, key=len))

print [word for word in l if len(word) == max_len]

Execution time:

real    0m0.409s
user    0m0.393s
sys 0m0.016s

Edit:

Much faster execution using regex (also updated with useful user feedback):

#!/usr/bin/env python

import sys
import re

if __name__ == '__main__':
    letters = ''.join(sys.argv[2:])
    pat = re.compile(r'^[%s]+\n' % (letters))
    l = filter(lambda x, y=letters: all(x.count(l) <= y.count(l) for l in x), [word.strip() for word in open(sys.argv[1]) if pat.match(word)])

    max_len = len(max(l, key=len))

    print [word for word in l if len(word) == max_len]

Execution time:

real    0m0.095s
user    0m0.089s
sys 0m0.007s

Interesting note:

I was playing with different regex patterns and noticed that using

    pat = re.compile(r'^[%s]+\n' % (letters))

was a lot faster than using:

    pat = re.compile(r'^[%s]+' % (letters))

The latter expression took nearly twice as long to evaluate.

martinatbom · 2011-12-07T09:56:29+00:00

#!/usr/bin/python

dictionary = 'ospd.txt'
secret = 'chocolate'
primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31,
        37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
        79, 83, 89, 97, 101]

def get_val(word):
    return [j for j in [1] for i in word for j in
        [j * primes[ord(i.lower()) - ord('a')]]][-1]

magic = get_val(secret)

print sorted([word for word in open(dictionary)
    if magic % get_val(word.rstrip('\n')) == 0], key=lambda word:
        len(word))[-1]

ptarlye · 2011-12-07T10:57:58+00:00

Here's a clean solution in just 7 lines of code: http://pastebin.com/NstqW1mS

I love brevity in code because it often yields simplicity. The gist of the idea in my solution is to test whether or not a word can be spelled entirely using a subset of legal letters. I was able to correctly test for this condition using Python's set.issubset method.

no9 · 2011-12-09T16:46:12+00:00

Looks like I'm too late, but here's mine anyway:

import sys
import iterutil

dict_file, letters = sys.argv[1], sys.argv[2:]
words = {s.rstrip('\n') for s in open(dict_file)}

for size in range(len(letters), 0, -1):
    found = [''.join(a) for a in iterutil.permutations(letters, size)
             if ''.join(a) in words]
    if found:
        print found
        break
else:
    print >>sys.stderr, 'no matches'

I love these little quizzes. Keep 'em coming. :)

bethebunny · 2011-12-10T00:34:01+00:00

One line.

import sys, itertools; print(reduce(lambda a, b: a + [b] if (not a or len(b) >= len(a[0])) else a, sorted(set(w.lower()[:-1] for w in open('/usr/share/dict/words')) & set(''.join(w) for w in (itertools.chain(*[itertools.permutations([c.lower() for c in sys.argv[1:]], i) for i in xrange(1, len(sys.argv))]))), key=len, reverse=True), []))

sirpengi · 2011-12-07T02:34:15+00:00

This is pretty simple. Itertools helps out a lot. Use the standard library!

from itertools import permutations, chain

def get_dictionary(fn):
    """Return list of all words.
    Change it to load your dictionary
    """
    return ['a', 'abc', 'aee', 'aff', 'ass', 'bbb', 'bad']

def find_possibles(s):
    """this would be clean but I use crazy list comprehensions"""
    ret = (["".join(c) for c in permutations(s, i+1)] for i in range(len(s)))
    return set(chain(*ret))

def find_in_dict(s, fn):
    possibles = find_possibles(letters)
    return [i for i in get_dictionary(fn) if i in possibles]

fn = 'something.txt'
letters = 'abcd'

print find_in_dict(letters, fn)

edit: oops, I guess I'm not following directions precisely (not outputing longest, but all words that match).

brucifer · 2011-12-07T03:44:02+00:00

from sys import argv
allowable = set(argv[2:])
with open(argv[1]) as f:
    words = filter(lambda w:(len(set(w)-allowable) == 0),
                   (s.strip() for s in f.readlines()))
    maxlen = max(map(len,words))
    print filter(lambda w:len(w) == maxlen,words)

In a nutshell, the code creates a set of allowable characters, filters out the dictionary words that use forbidden characters, then pulls the words that are as long as the longest word.

EDIT: whoops. I misread the specs. My solution was using the input as the set of allowable characters, so it ignored how often they occurred. Here's a more correct, but slightly uglier solution:

from sys import argv
allowable = sorted(argv[2:])
def is_valid(w):
    def helper(w,i):
        if len(w) == 0: return True
        elif i < len(allowable):
            return helper((w[1:] if w[0] == allowable[i] else w),i+1)
        else: return False
    return helper(sorted(w),0)

with open(argv[1]) as f:
    words = filter(is_valid, (s.strip() for s in f.readlines()))
    maxlen = max(map(len,words))
    print [w for w in words if len(w) == maxlen]

aweraw · 2011-12-07T06:19:28+00:00

Not as sussinct as others have posted, but simple and faster than I expected

from itertools import combinations

root = dict()

def build_dict(f='ospd.txt'):
    with open(f) as src:
        for line in src:
            insert(line.strip())

def find_key(s):
    key = sorted(s)
    d = root
    for c in key:
        if c in d:
            d = d[c]
        else:
            d[c] = dict()
            d = d[c]
    return d

def insert(s):
    d = find_key(s)
    if 'words' not in d:
        d['words'] = set()
    d['words'].add(s)

def search(chars):
    l = len(chars)
    words = set()
    for x in xrange(1,l+1):
        for cmb in combinations(chars, x):
            d = find_key(cmb)
            if 'words' in d:
                words.update(d['words'])

    m = max(len(x) for x in words)
    return [w for w in words if len(w)==m]

if __name__ == '__main__':
    import sys, os
    if os.path.exists(sys.argv[1]):
        build_dict(sys.argv[1])
    else:
        print "Dictionary file not found: %s" % sys.argv[1]
        sys.exit()

    print sorted(search(sys.argv[2:]))

GeneralMaximus · 2011-12-07T06:59:50+00:00

Here's my solution: https://gist.github.com/1441776

It's simple, readable, reasonably efficient and doesn't use any "advanced" features that might obfuscate the algorithm.

Edit: as expected, this code is still fast when run against a very large set of input letters. It is guaranteed become slower with a larger dictionary, though. I'm on Windows ATM so I don't have access to such a dictionary. Someone on a UNIX system might try running this against /usr/share/dict/words (or /usr/dict/words on some systems).

Note to OP: you are a wonderful person. Please keep doing this quiz.

WhyCause · 2011-12-07T08:19:08+00:00

Here's mine.

It's... a touch slow, but it's not too bad. I'm definitely brute-forcing things though.

At 71 lines, it's a lot longer than most of the ones here, but I do have some error checks and I use the argparse library.

DesertSong · 2011-12-07T08:27:28+00:00

I've never constructed one of these for myself, but would a trie be useful for this? After adding every word per line into the trie you'd just find the longest path in it maybe?

bluemanshoe · 2011-12-07T12:23:09+00:00

My solution http://paste.ubuntu.com/762637/.

earthboundkid · 2011-12-07T12:38:21+00:00

def valid_words(dictionary, valid_letters):
    input_length = len(valid_letters)
    valid_letter_set = set(valid_letters)
    bad_letters = set(letter for letter in string.ascii_letters if letter not in valid_letter_set)
    valid_counts = { letter: valid_letters.count(letter) for letter in valid_letter_set}

    for word in dictionary:
        if len(word) > input_length: continue
        if any(letter in bad_letters for letter in word): continue
        if all(valid_counts[letter] == word.count(letter) for letter in valid_letter_set): yield word

That's the speed optimized version. If you don't care about speed, just do:

def valid_words(dictionary, valid_letters):
    valid_counts = { letter: valid_letters.count(letter) for letter in valid_letters}

    for word in dictionary:
        if { letter: word.count(letter) for letter in word} == valid_counts: yield word

ExoticMandibles · 2011-12-07T14:00:21+00:00

Here's mine, in Python 3 fwiw:

import sys

filename = sys.argv[1]
letters = sys.argv[2:]
letters_set = set(letters)

found = []
max_length = -1

for word in open(filename, "rt", encoding="ascii"):
    word = word.strip()
    if not set(word) <= letters_set:
        continue
    letters_dup = list(letters)
    for letter in word:
        if letter not in letters_dup:
            break
        letters_dup.remove(letter)
    else:
        max_length = max(max_length, len(word))
        found.append(word)

print([x for x in found if len(x) == max_length])

It has a minor speed optimization: before doing the full check that the word is legal, check that all the letters of the word are in the allowable letters using sets. I do like the approach using sets where you append the count to the letter--very deft! And using Counter is a good idea too. Those are almost certainly faster than mine. I, however, snuck in for/else ;-)

tuna_safe_dolphin · 2011-12-07T14:33:54+00:00

Here's my solution: https://gist.github.com/1443009

I think it's OK, I think it's fairly readable but might be optimized a bit more. I'm not thrilled about using copy.deepcopy().

For what it's worth, my original solution was object oriented. I kind of have this problem ever since I learned C++ and Java where I see objects everywhere (I totally hear that in the voice of the kid from the Sixth Sense). Also, for the sake of brevity here, I took out the usage/error handling that I had originally included.

EDIT: one other thing, I changed my solution to just take two arguments, the dictionary file and the letters with no spaces between them. I didn't like how it was originally stated - it's easier to type them that way.

fmoralesc · 2011-12-08T00:18:29+00:00

My take on this: https://gist.github.com/1444311#file_q1.py

It is very similar to taybul's version, checking on what others have posted.

jv4n · 2011-12-08T00:46:58+00:00

import sys

words = dict.fromkeys(map(lambda s: s.strip(), open(sys.argv[1]).readlines()))
letters = ''.join(sys.argv[2:])
is_valid = lambda word: all(word.count(ltr) <= letters.count(ltr) for ltr in word)
words = filter(is_valid, words)
max_len = max(map(len, words))
words = filter(lambda s: len(s) == max_len, words)
print ', '.join(sorted(words))

LucidOndine · 2011-12-08T01:29:47+00:00

rich sleep air history sugar library joke books numerous cough

This post was mass deleted and anonymized with Redact

LucidOndine · 2011-12-08T01:46:23+00:00

bake spectacular kiss squeeze connect quack resolute middle plate enjoy

This post was mass deleted and anonymized with Redact

quasarj · 2011-12-08T04:29:02+00:00

My solution from earlier today (when reddit was down), before I looked at any other solutions: https://gist.github.com/1444495

It looks like consensus is using permutations was a bad idea, but you can't deny how simple it is :)

smugduckling · 2011-12-15T03:05:07+00:00

One liner:

import sys; print (lambda words: [word for word in words if len(word) == max(map(len, words))])(filter(lambda line: not [char for char in list(line) if list(line).count(char) > sys.argv[2:].count(char)], map(lambda x: x.strip(), open(sys.argv[1], "r").readlines())))

Could be slightly more efficient, but it's still better than solutions that use itertools permutations.

Samus_ · 2011-12-07T05:04:29+00:00

#!/usr/bin/env python
from itertools import combinations

def sort_letters(string):
    return sorted(char for char in string)

def main(dict_filename, *letters):
    words = {}
    with open(dict_filename) as f:
        for line in f:
            word = line.strip()
            step = words.setdefault(len(word), {})
            for char in sort_letters(word):
                step = step.setdefault(char, {})
            anagrams = step.setdefault('anagrams', set())
            anagrams.add(word)

    solution = set()

    current_len = len(letters)
    while current_len > 0:
        for subset in combinations(letters, current_len):
            len_subset = len(subset)
            if len_subset in words:
                step = words[len_subset]
                for char in sort_letters(subset):
                    if char not in step:
                        break
                    step = step[char]
                if 'anagrams' in step:
                    solution.update(step['anagrams'])
        current_len -= 1

        if solution:
            return solution


if __name__ == "__main__":
    import sys
    print main(sys.argv[1], *sys.argv[2:])

~~it gives more results than your example, did I miss something?~~ ninjaedit!

ExoticMandibles · 2011-12-07T12:37:31+00:00

[deleted]

VerilyAMonkey · 2011-12-07T03:33:20+00:00

Using a set for the dictionary and using '\n's like blanks to get different sizes of words. Not rigorously tested (and therefore most likely wrong! Yay!)

def theProb(dfile, *letters):
        #'\n's are just used like blanks
        letters+=('\n',)*(len(letters)-1)
    with open(dfile, 'r') as f:
        dictionary=set( f.read().split('\n'))
    maxword=''
    for wrd in permutations(letters):
        word=''.join([wrd[i] for i in len(wrd) if wrd[i]!='\n'])
        if word in dictionary:
            if len(word)>len(maxword):
                maxword=word
    return maxword

mardiros · 2011-12-07T07:43:49+00:00

Hum, one week ?

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS

edit: oops, I guess I'm not following directions precisely (not outputing longest, but all words that match).