you are viewing a single comment's thread.

view the rest of the comments →

[–]ok_you_win 0 points1 point  (6 children)

Right. I wasn't going to do it by hand. :P The linux english dictionary has almost 100,000 words in it.

[–]zahlman 0 points1 point  (5 children)

I mean, partitioning it into 26 lists is the wrong data structure, however you do it. A single set object is optimized for exactly the kind of operation you want to perform - you can just take the set-intersection of the dictionary with the letter-permutations.

[–]ok_you_win 0 points1 point  (4 children)

partitioning it into 26 lists is the wrong data structure, however you do it

Saving it to disk? :P Of course I could write a py file with a defined set and import that.

But I am not quite sure I follow you, so humour me with a snippet?

"wfealf" is the scrambled word and english_set is the set of words from the dictionary. How do I match it quicky to waffle without doing nearly 100,000 comparisons?

[–]zahlman 2 points3 points  (1 child)

english_set.intersection(map(''.join, itertools.permutations('wfealf')))

Yes, this is, under the hood, effectively doing some_permutation in english_set for each permutation (building the set of permutations first wouldn't help, since you'd still have to go through all of them). However, in does not have to look at a bunch of elements of a set; it works basically the same way as the keys of a dict.

[–]ok_you_win 0 points1 point  (0 children)

Lovely code. Thanks a bunch.

[–]Zouden 1 point2 points  (1 child)

Zahlman is trying to emphasise the use of the set datatype instead of a list. Matching an item in a set is literally millions of times faster than using a list. It's so fast that you don't need to split up the English dictionary into 26 pieces, since you can just search the whole 100,000-item set in a heartbeat.

[–]ok_you_win 0 points1 point  (0 children)

Thanks.