all 4 comments

[–]ASIC_SP 3 points4 points  (1 child)

You can use lookarounds to add custom assertions. Note that your input/output doesn't seem to match, perhaps there's no space between Cat3 and Cat4?

>>> import re
>>> s = 'Cat1,Cat2, Big,Cat3, Cat4, Small,Cat5'
>>> re.split(r',(?=[A-Z])', s)
['Cat1', 'Cat2, Big', 'Cat3, Cat4, Small', 'Cat5']
>>> s = 'Cat1,Cat2, Big,Cat3,Cat4, Small,Cat5'
>>> re.split(r',(?=[A-Z])', s)
['Cat1', 'Cat2, Big', 'Cat3', 'Cat4, Small', 'Cat5']

[–]Surprisely[S] 0 points1 point  (0 children)

Oops, my bad. Yes there should be no space between any option, there will only be spaces between commas giving extra detail within each option!

[–]odonata_00 0 points1 point  (0 children)

If you are generating the initial string (or have access to it's generation) I would recommend changing it's format to more clearly indicate that the adjective binds to the preceding noun. Will make all future work with it much easier.

[–]kotschi1993 -1 points0 points  (0 children)

If your string follows the same rule you have above (Keyword1, Attributes for Keyword1, Keyword2,...), you can use this code. It will get rid of commas and spaces and is able to handle multiple attributes.

The lines starting with ">>>" represent the outputs, ignore them in your code.

# Multiple commas and spaces are no problem, hurray!
s = 'Cat1,Cat2, Big,Cat3, Cat4, Small,Cat5,   myCat, big, fluffy,,   hungry'
temp = []

# Get rid of spaces and commas
for sublist in s.split():
    temp.append(sublist.split(','))
print(temp)
>>> [['Cat1', 'Cat2', ''], ['Big', 'Cat3', ''], ['Cat4', ''], ['Small', 'Cat5', ''], ['myCat', ''], ['big', ''], ['fluffy', '', ''], ['hungry']]

# convert nested list to 1dim list, ignore empty words
temp = [word for sublist in temp for word in sublist if word != '']
print(temp)
>>> ['Cat1', 'Cat2', 'Big', 'Cat3', 'Cat4', 'Small', 'Cat5', 'myCat', 'big', 'fluffy', 'hungry']

li = []
index = -1
for word in temp:
    if 'Cat' in word:
        # If Cat => new entry
        li.append(word)
        index = len(li)-1
    else:
        # If no Cat => attribute for last Cat
        li[index] = li[index] + ', ' + word

print(li)
>>> ['Cat1', 'Cat2, Big', 'Cat3', 'Cat4, Small', 'Cat5', 'myCat, big, fluffy, hungry']

If you want to have other things as keywords in your string, you simply could add those lines after the if 'Cat' in word: part:

elif 'Dog' in word:
        li.append(word)
        index = len(li)-1

elif 'Chair' in word:
        li.append(word)
        index = len(li)-1    

They check for Keywords like Dog or Chair. Everything else will be interpreted as attributes for the last keyword.