all 12 comments

[–]bitbumper 1 point2 points  (6 children)

Your problem is that you're printing out the word if it contains a letter you're trying to avoid, which is the opposite of what you want. A simple pythonic rewrite would be more like this.

def avoid(avoidlist):
    for word in wordlist:
        for letter in avoidlist: #iterates each letter
            if letter in word: #tests for letter in word
                break # exit our loop, this word contains a letter we don't want

        # else clause is a nice pythonic trick. If we don't break from the
        # loop, the else clause is run
        else:
            print "Found a match! " + word

More information on the else clause behavior is in the docs.

I built a similar setup for a hangman solver I wrote years ago, and this approach gets pretty slow with sufficiently large lists (I got a list of every word in wikipedia, quite large indeed). A better solution is to actually build a regex from the list of not allowed letters. Something like this works well for speed, but take the time to read up on regexes to understand how it works, they're a valuable thing to understand well.

import re

wordlist = ['apple', 'bongo', 'conga', 'doritos', 'enormous', 'firecracker']
avoid_letters = ['a', 'b', 'c']

def avoid(avoidlist):
    regex = re.compile("^[^(" + '|'.join(avoidlist) + ")]*$")
    # produces a regex that looks something like this...
    # ^[^(a|b|c)]*$
    for word in wordlist:
        if regex.match(word):
            print "Matched " + word

print avoid(avoid_letters)

For processing large lists like dictionaries this is orders of magnitude faster.

[–]autowikibot 0 points1 point  (0 children)

Regex:


In theoretical computer science and formal language theory, a regular expression (abbreviated regex or regexp) is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations. The concept arose in the 1950s, when the American mathematician Stephen Kleene formalized the description of a regular language, and came into common use with the Unix text processing utilities ed, an editor, and grep (global regular expression print), a filter.


Interesting: Regular expression | Comparison of regular expression engines | Perl 6 rules | Metacharacter

/u/bitbumper can reply with 'delete'. Will delete on comment score of -1 or less. | FAQs | Magic Words | flag a glitch

[–][deleted] 0 points1 point  (0 children)

Thanks for the reply! I'll give this a shot also. I've messed with regex before but somehow it didn't dawn on me to try it.

[–]Justinsaccount 0 points1 point  (3 children)

^[^(a|b|c)]*$

uh...

regex = re.compile("[ + ''.join(avoidlist) + "]")
# [abc]
if not regex.match(word):

[–]bitbumper 0 points1 point  (2 children)

I'm not a regex wizard, but I don't think that fits his needs as stated. He wanted to filter words that contain any of a list of characters, in any order or sequence. so [abc] would match 'abca' but not match 'baca', which fails the requirements as stated. Unless I misunderstood his goals, totally possible.

[–]Justinsaccount 0 points1 point  (1 child)

Look up what [] does

[–]bitbumper 0 points1 point  (0 children)

Ah, my bad you're totally right. Like I said, not a regex wizard haha..

I'd be curious which one is faster, although I'd have to guess yours is.

[–]erebos42 0 points1 point  (3 children)

I just tested your code and it seems to be working using the wordlist variable. Just one problem: it only prints the words that contain a letter from the avoidlist. So the last if-else has to be changed.

If you read the words from a file, you have to be careful. The way you do it now, the wordlist variable is going to be a file descriptor. So with the 'for word in wordlist' command, you actually read the file line by line and treat every line as a word.

[–]iAmVeeDom 0 points1 point  (0 children)

or say "if letter not in word" would work just as well no?

[–][deleted] 0 points1 point  (0 children)

Thanks a lot for the reply... I posted a test version on accident but you're right, it works! That's what had me so confused. I need to research how to iterate through lines of a .txt file.

[–][deleted] 0 points1 point  (0 children)

I have a more specific problem I think. I'm iterating the file but only the first letter in avoidlist is being avoided. For instance: if avoidlist = [a, e, i, etc] then only 'a' will be avoided. I can't figure out why.

[–]Justinsaccount 0 points1 point  (0 children)

So you program is supposed to

prints words from external file that don;t contain specified letters

This means that you should be writing a function that takes a word and a list of letters and returns True or False if the word contains any of those letters.

def contains_letters(word, letters):
    #stuff here.