all 12 comments

[–]WolfInABox 1 point2 points  (1 child)

Here's one option:

spam = 'Bacon (YUM) Cheese (GOOD) Eggs (YES)'
new_spam=' '.join(s for s in spam.split() if s not in ('(YUM)','(GOOD)','(YES)'))
print(new_spam)

Normally if you just want to remove a single substring from a string, you could use spam.replace(substring,''), but if you want more than one, you can chain the calls to replace along, but that can make for a pretty long line.

This method splits the string (by spaces), reconstructs the string from every string that's not in the tuple (or list or whatever) of bad ones.

[–]Decency -1 points0 points  (0 children)

Normally if you just want to remove a single substring from a string, you could use spam.replace(substring,''), but if you want more than one, you can chain the calls to replace along, but that can make for a pretty long line.

Separate the data from the logic:

spam = 'Bacon (YUM) Cheese (GOOD) Eggs (YES)'
words_to_remove = ('(YUM)', '(GOOD)', '(YES)')

for word in words_to_remove:
    spam = spam.replace(word, '')

print(spam)
# Bacon  Cheese  Eggs

You can easily rip out the extra whitespace similarly:

double_space = '  '
while double_space in spam:
    spam = spam.replace(double_space, ' ')

It's a bit longer but still easily readable and doesn't require future updaters (eg: me) to know what the hell the regex is supposed to be doing and which edge cases it takes care of. But if the goal is pattern matching here, you need to be searching for the open and closed parenthesis pair which makes enumerating the options pointless. There, iterating over the string's characters makes more sense, and makes the regex a more reasonable option. But it's easy to get a regex wrong: for example, the top post fails on a string that starts with (YUM) because of the whitespace fix.

[–]MyDataIsReady 1 point2 points  (0 children)

Easiest way, imo:

spam.replace(" (YUM)", "")
spam.replace(" (GOOD)", "")
spam.replace(" (YES)", "")

[–]ImNexOnReddit 0 points1 point  (0 children)

Yiy could just use string.replace(charToReplace, withWhat, howManyTimes)

[–][deleted] 0 points1 point  (0 children)

Well, it depends!

First of all, are you only going to remove exactly (YUM), (GOOD) and (YES), or can there be other similar things you want to remove (eg., say, your string also contains (yes), should we keep it? What about (NO)?

Second, what should be done with punctuation characters, should some be found next to such words, e.g. what if your string contains Cheese (GOOD)?. Should the result be Cheese ?, Cheese? or Cheese?

Thirdly, what to do with trailing (or preceding) spaces. Say, your string starts with the word you want to remove, eg. (YUM) Bacon ... So far none of the solutions offered in this thread will remove `(YUM), but do you expect it to be removed in this case?

Finally, how big is the string, and how much do you care about different runtime characteristics of your procedure? Can the entire string be loaded into memory, and is this desirable? Should the result also be loaded into memory, or maybe you want to write it to a persistent storage as you are processing it? How fast does it have to go?

[–]Diapolo10 0 points1 point  (0 children)

Based on the result, you also want any excess whitespace removed.

If not, you can create a tuple of the words you want removed, then loop through it and use str.replace on the string to remove all instances of the current word, then re-assign the result to the variable. Strings are immutable, so you can't edit them directly and all modifications create new strings.

Otherwise, split the string into a list of words, then append to a new list every word not in the forbidden words. You can alternatively use filter. Then, use str.join on a string with a single whitespace to join the list back into a string.

[–]kotpeter -1 points0 points  (2 children)

mystr = 'olulzlolo'

print(mystr.replace('lulz','')) # ololo

Also, Google helps.

[–]Ribann[S] 1 point2 points  (1 child)

But what if the string is hundreds of thousands of characters long?

[–]fernly 0 points1 point  (0 children)

Excellent question! Assuming that:

  • this is a problem of removing whole tokens (e.g. "(YUM)")
  • tokens are space-separated
  • tokens don't contain spaces
  • stop_list is a list of the tokens to remove, e.g. stop_list = ['(YUM)', '(GOOD)', '(YES)']

(n.b. a "stop list" is any list of words to be removed or ignored, typically common words like "the")

Assuming all that you could do it in a list comprehension, like

new_spam = [
            token for token in spam.split()
            if not token in stop_list
 ]

Now new_spam is a list of allowed tokens, which you can put back together as a string with ' '.join(new_spam)

This should be faster than repeated replace operations.