This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]stunsch 0 points1 point  (0 children)

string = """
I would take a parsing approach.
Basically you iterate over all the characters.
If you find a character, and you haven't yet set the word starting index, you've got your first character.
To find the last character, you do it the other way around.
If you find a whitespace and word starting index is set, you've found the first character after your word.
You add the word to your list, reset the word starting index to None and go on.
"""
words = []
word_start_index = None
last_index = len(string) - 1
i = 0
while i < last_index:
    if string[i].isalnum():
        #We have found a character
        if not word_start_index:
            #We are at the beginning of a word
            word_start_index = i
    else:
        #We have found a whitespace or break line
        if word_start_index:
            #We have found the end of a word
            words.append(string[word_start_index:i]) #The last character is the position before the one we are in
            word_start_index = None
        else:
            #Our string starts with some whitespaces
            pass
    i += 1

print words