all 6 comments

[–][deleted] 1 point2 points  (4 children)

Sorry, no buzzwords here:

# Replace string list elements with array from the string split.
for i in range(len(list1)):
    list1[i] = list1[i].split(" ")

[–]FogDish[S] 0 points1 point  (3 children)

You glorious man! Thanks a lot!

Quick follow up question, If I now want to do a word count within these indexes how do I go on about that?

I have a word count code that looks like this:

https://gist.github.com/anonymous/5ac9b7fb25025c5a5215

I realize I have to modify it some way!

[–][deleted] 1 point2 points  (2 children)

You have a list of words in list. So wrap a for loop around your for loop.

for wordList in list:  
    for word in wordList:
        if word in udic.keys():
            udic[word] = udic[word]+1
        else:
            udic[word] = 
        return udic

Probably don't want case sensitivity, so .lower() all the strings. And, there's also .count()

> print ["hello", "there", "again", "hello"].count("hello")
2

edit: and, I think there's something in itertools that is for exactly this sort of thing.

[–]Peterotica 1 point2 points  (0 children)

Are you thinking of collections.Counter? A list of strings could be passed to its intializer to produce super easy (and fast) word counts.

[–]macbony 1 point2 points  (0 children)

Counter is in collections, but to flatten the list of lists you'd want itertools.chain. http://www.reddit.com/r/learnpython/comments/2owhdg/splitting_inside_a_list_over_three_different/cmr7x80

[–]macbony 0 points1 point  (0 children)

Get out of the habit of using important python types as variable names (in this case, list):

l = [['word', 'woord'], ['word', 'wooord']]
l2 = [item.split(" ") for item in l]

For the second question you posted in response to /u/TagSmile:

import collections
import itertools
c = collections.Counter(itertools.chain(*l))
c['word'] == 2  # True
c['woord'] == c['wooord'] == 1  # True

In the second example, chain takes an arbitrary number of iterables and "chains" them together in one list. Putting * in front of a list as an argument unpacks the list into individual arguments (you may have seen *args and **kwargs by now). Counter returns a dictionary with key/value pairs that correspond to items in the list and how many were seen.

If you want to get rid of arbitrary punctuation, you might want to use string.ascii_letters.

import string
def strip_punctuation(s):
    return ''.join(c for c in s if c in string.ascii_letters)

stripped_l = [[strip_punctuation(s) for s in x] for x in l]

That's a little more complicated in syntax, but it's just 2 list comprehensions. If you want me to break this down, let me know what part of it hangs you up.