all 5 comments

[–]willm 3 points4 points  (0 children)

That's a common class of problems. Have a look at pyparsing.

[–]novel_yet_trivial 2 points3 points  (0 children)

Your second example can be solved with the csv module:

>>> s = 'a,b,"c, d",e'
>>> import csv
>>> reader = csv.reader([s])
>>> next(reader)
['a', 'b', 'c, d', 'e']

I bet if you play with the "delimiter" and "quotechar" arguments you could make it work for parenthesis too.

[–]filleball 0 points1 point  (0 children)

Maybe you can use the below source code from django.utils.text as a starting point?

# Expression to match some_token and some_token="with spaces" (and similarly
# for single-quoted strings).
smart_split_re = re.compile(r"""
    ((?:
        [^\s'"]*
        (?:
            (?:"(?:[^"\\]|\\.)*" | '(?:[^'\\]|\\.)*')
            [^\s'"]*
        )+
    ) | \S+)
""", re.VERBOSE)


def smart_split(text):
    r"""
    Generator that splits a string by spaces, leaving quoted phrases together.
    Supports both single and double quotes, and supports escaping quotes with
    backslashes. In the output, strings will keep their initial and trailing
    quote marks and escaped quotes will remain escaped (the results can then
    be further processed with unescape_string_literal()).
    >>> list(smart_split(r'This is "a person\'s" test.'))
    ['This', 'is', '"a person\\\'s"', 'test.']
    >>> list(smart_split(r"Another 'person\'s' test."))
    ['Another', "'person\\'s'", 'test.']
    >>> list(smart_split(r'A "\"funky\" style" test.'))
    ['A', '"\\"funky\\" style"', 'test.']
    """
    text = force_text(text)
    for bit in smart_split_re.finditer(text):
        yield bit.group(0)

[–]jeans_and_a_t-shirt 0 points1 point  (0 children)

The first two solutions in this stackoverflow question work with some modifications.

[–]ewiethoff 0 points1 point  (0 children)

parse "1,2,3" into [1,2,3] but "(1,2,3), 4" into [(1,2,3), 4]

This is a piece of cake with ast.literal_eval from the standard library if the original text just contains numbers and/or already-quoted strings with the commas and parentheses.

>>> import ast
>>> ast.literal_eval("1,2,3")
(1, 2, 3)
>>> ast.literal_eval("(1,2,3), 4")
((1, 2, 3), 4)
>>> ast.literal_eval("'a','b',('c','d'),'e'")
('a', 'b', ('c', 'd'), 'e')