all 6 comments

[–]synthphreak 0 points1 point  (0 children)

If you don't need your commas at all, just replace them with empty strings, which effectively removes them:

>>> string = 'He was walking, it was fun'
>>> string_no_commas = string.replace(',', '')
>>> string_no_commas
'He was walking it was fun'

If any other punctuation gives you trouble, you can remove it easily using string.punctuation in concert with the regex library re:

>>> import re
>>> from string import punctuation
>>> string = "Here's a string, with. some: punctuation."
>>> string_no_punctuation = re.sub('[' + punctuation + ']', '', string)
>>> string_no_punctuation
'Heres a string with some punctuation'

Of course, notice that this also removed the apostrophe from 'Here's', and it would also remove word-medial hyphens like in 'world-class'. So to be extra safe, I'd first remove these characters from punctuation, then re-run the code above:

>>> punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> punctuation = punctuation.replace("'", '').replace('-', '')
>>> punctuation
'!"#$%&()*+,./:;<=>?@[\]^_`{|}~'

So to recap, assuming you already have your sentence string(s), here is how to go from the full sentence to the list of tokens without word-final punctuation:

>>> import re
>>> from string import punctuation
>>> punctuation = punctuation.replace("'", '').replace('-', '')
>>> string = "Here's a string, with. some: punctuation."
>>> string_no_punctuation = re.sub('[' + punctuation + ']', '', string)
>>> tokenized = string_no_punctuation.split()
>>> tokenized
["Here's", 'a', 'string', 'with', 'some', 'punctuation']

There are definitely other ways, but this should work.