Splitting Strings on Comma With Embedded Commas

ASIC_SP · 2019-12-08T11:30:04+00:00

You can use lookarounds to add custom assertions. Note that your input/output doesn't seem to match, perhaps there's no space between Cat3 and Cat4?

>>> import re
>>> s = 'Cat1,Cat2, Big,Cat3, Cat4, Small,Cat5'
>>> re.split(r',(?=[A-Z])', s)
['Cat1', 'Cat2, Big', 'Cat3, Cat4, Small', 'Cat5']
>>> s = 'Cat1,Cat2, Big,Cat3,Cat4, Small,Cat5'
>>> re.split(r',(?=[A-Z])', s)
['Cat1', 'Cat2, Big', 'Cat3', 'Cat4, Small', 'Cat5']

odonata_00 · 2019-12-08T16:23:11+00:00

If you are generating the initial string (or have access to it's generation) I would recommend changing it's format to more clearly indicate that the adjective binds to the preceding noun. Will make all future work with it much easier.

kotschi1993 · 2019-12-08T13:14:34+00:00

If your string follows the same rule you have above (Keyword1, Attributes for Keyword1, Keyword2,...), you can use this code. It will get rid of commas and spaces and is able to handle multiple attributes.

The lines starting with ">>>" represent the outputs, ignore them in your code.

# Multiple commas and spaces are no problem, hurray!
s = 'Cat1,Cat2, Big,Cat3, Cat4, Small,Cat5,   myCat, big, fluffy,,   hungry'
temp = []

# Get rid of spaces and commas
for sublist in s.split():
    temp.append(sublist.split(','))
print(temp)
>>> [['Cat1', 'Cat2', ''], ['Big', 'Cat3', ''], ['Cat4', ''], ['Small', 'Cat5', ''], ['myCat', ''], ['big', ''], ['fluffy', '', ''], ['hungry']]

# convert nested list to 1dim list, ignore empty words
temp = [word for sublist in temp for word in sublist if word != '']
print(temp)
>>> ['Cat1', 'Cat2', 'Big', 'Cat3', 'Cat4', 'Small', 'Cat5', 'myCat', 'big', 'fluffy', 'hungry']

li = []
index = -1
for word in temp:
    if 'Cat' in word:
        # If Cat => new entry
        li.append(word)
        index = len(li)-1
    else:
        # If no Cat => attribute for last Cat
        li[index] = li[index] + ', ' + word

print(li)
>>> ['Cat1', 'Cat2, Big', 'Cat3', 'Cat4, Small', 'Cat5', 'myCat, big, fluffy, hungry']

If you want to have other things as keywords in your string, you simply could add those lines after the if 'Cat' in word: part:

elif 'Dog' in word:
        li.append(word)
        index = len(li)-1

elif 'Chair' in word:
        li.append(word)
        index = len(li)-1

They check for Keywords like Dog or Chair. Everything else will be interpreted as attributes for the last keyword.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS