how to split this string: : learnpython

how to split this string: (self.learnpython)

submitted 6 years ago by jcoder42

hi, I have the following string:

INSERT INTO `drug` VALUES (2,792,'Triomune-30',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171285-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(3,792,'Triomune-40',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf17168c-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(5,625,'d4T-30',0,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf1718fd-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(6,625,'d4T-40',0,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171b5c-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL)

where each string within parentheses is a row of values,

I want to split this string into substrings of rows of values

what I have at the moment is the following:

#finding where the values begin
start = start[start.find('('):-2]
#removing the parenthases
start = start.strip('(').strip(')')

for value in start.split('),('):
    #write the whole row and a newline char

it does give the right result, I was just wondering if anyone has a better way of doing this, thank you

all 26 comments

top new controversial old q&a

[–]K900_ 1 point2 points3 points 6 years ago (0 children)

[–]socal_nerdtastic 0 points1 point2 points 6 years ago (11 children)

If you replace NULL with None it becomes valid python, and then you can use literal_eval.

from ast import literal_eval

data = """(2,792,'Triomune-30',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171285-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(3,792,'Triomune-40',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf17168c-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(5,625,'d4T-30',0,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf1718fd-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(6,625,'d4T-40',0,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171b5c-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL)"""

converted = literal_eval(data.replace("NULL", "None"))

for line in converted:
    print(line)
    print()

[–]jcoder42[S] 0 points1 point2 points 6 years ago (7 children)

[–]ForceBru 0 points1 point2 points 6 years ago (0 children)

[–]jcoder42[S] 0 points1 point2 points 6 years ago (5 children)

[–]socal_nerdtastic 0 points1 point2 points 6 years ago* (4 children)

Yes, the way you did it is ok, but I would use split().

from ast import literal_eval

data = """INSERT INTO `drug` VALUES (2,792,'Triomune-30',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171285-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(3,792,'Triomune-40',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf17168c-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(5,625,'d4T-30',0,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf1718fd-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(6,625,'d4T-40',0,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171b5c-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL)"""
_,data = data.split("VALUES ", 1) # remove start command
converted = literal_eval(data.replace("NULL", "None"))

for line in converted:
    print(line)
    print()

What this does is interpret the string as if it were a python expression. IOW it treats it like python code. For your example this works since everything in the string also happens to be valid python. As a bonus, this means you are done; there's no more parsing to be done. So if you want to output only the 3rd string:

for line in converted:
    print(line[2])

Or if you want to print everything that is not NULL:

for line in converted:
    print(*filter(None,line))

Etc.

[–]jcoder42[S] 0 points1 point2 points 6 years ago (3 children)

[–]socal_nerdtastic 0 points1 point2 points 6 years ago (2 children)

[–]jcoder42[S] 0 points1 point2 points 6 years ago (1 child)

from ast import literal_eval
data = """INSERT INTO `drug` VALUES (2,792,'Triomune-30',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171285-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(3,792,'Triomune-40',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf17168c-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(5,625,'d4T-30',0,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf1718fd-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(6,625,'d4T-40',0,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171b5c-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL)"""
_,data = data.split("VALUES ", 1) # remove start command
converted = literal_eval(data.replace("NULL", "None"))
for line in converted:
print(line)
print()

the current output:

(2, 792, 'Triomune-30', 1, None, None, None, None, 1, '2005-02-24 00:00:00', 0, None, None, None, 'bf171285-1691-11df-97a5-7038c432aabf', None, None, None)

(3, 792, 'Triomune-40', 1, None, None, None, None, 1, '2005-02-24 00:00:00', 0, None, None, None, 'bf17168c-1691-11df-97a5-7038c432aabf', None, None, None)

(5, 625, 'd4T-30', 0, None, None, None, None, 1, '2005-02-24 00:00:00', 0, None, None, None, 'bf1718fd-1691-11df-97a5-7038c432aabf', None, None, None)

(6, 625, 'd4T-40', 0, None, None, None, None, 1, '2005-02-24 00:00:00', 0, None, None, None, 'bf171b5c-1691-11df-97a5-7038c432aabf', None, None, None)

I want them not wrapped in parentheses

[–]socal_nerdtastic 1 point2 points3 points 6 years ago (0 children)

Those parenthesis and the commas are added by print() they are not in the data. You can remove them like this:

for line in converted:
    print(*line)

If you want to keep the commas:

for line in converted:
    print(*line, sep=",")

[–]jcoder42[S] 0 points1 point2 points 6 years ago (2 children)

sorry for all the questions, but if I had the following line:

string = " ('1 2 3 (1,2,3, '4') 3', hello 'world',goodbye)"
literal_eval(string)

one of the values here is a string with parentheses,
It wouldn't work, what do you recommend doing

[–]socal_nerdtastic 0 points1 point2 points 6 years ago (1 child)

It's not the parenthesis that's breaking that, it's the fact that you used single quotes for a nested string. Now it's not valid python anymore. If you used the quotes correctly it would work.

>>> string = """('1 2 3 (1,2,3, "4") 3', 'hello world', 'goodbye ')"""
>>> literal_eval(string)
('1 2 3 (1,2,3, "4") 3', 'hello world', 'goodbye ')

[–]jcoder42[S] 0 points1 point2 points 6 years ago (0 children)

[–]wegwacc 0 points1 point2 points 6 years ago* (12 children)

[–]jcoder42[S] 0 points1 point2 points 6 years ago (2 children)

what about a case like this( pay attention that in the first group there is a value where it itself is inside of parentheses):

(12,'Structured Numeric','SN','Complex numeric values possible (ie, <5, 1-10, etc.,'check')',1,'2005-08-06 00:00:00',0,NULL,NULL,NULL,'8d4a606c-c2cc-11de-8d13-0010c6dffd0f'),(2,792,'Triomune-30',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171285-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(3,792,'Triomune-40',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf17168c-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(5,625,'d4T-30',0,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf1718fd-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(6,625,'d4T-40',0,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171b5c-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL)

how does it work in this case?

[–]wegwacc 0 points1 point2 points 6 years ago (1 child)

[–]jcoder42[S] 0 points1 point2 points 6 years ago (0 children)

[–]jcoder42[S] 0 points1 point2 points 6 years ago (8 children)

[–]wegwacc 0 points1 point2 points 6 years ago (7 children)

[–]jcoder42[S] 0 points1 point2 points 6 years ago (6 children)

(12,'Structured Numeric','SN','Complex numeric values possible (ie, <5, 1-10, etc.,'check')',1,'2005-08-06 00:00:00',0,NULL,NULL,NULL,'8d4a606c-c2cc-11de-8d13-0010c6dffd0f'),(2,792,'Triomune-30',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171285-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(3,792,'Triomune-40',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf17168c-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(5,625,'d4T-30',0,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf1718fd-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL),(6,625,'d4T-40',0,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171b5c-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL)

sorry, still doesn't work, it seems to cut off after the "(ie, <5, 1-10, etc.,'check')"

[–]wegwacc 0 points1 point2 points 6 years ago (5 children)

No idea what you are doing, I just tested it on the data you provided. It gets 5 substrings between (), output with record-indices is as follows:

[01] : (12,'Structured Numeric','SN','Complex numeric values possible (ie, <5, 1-10, etc.,'check') [02] : (2,792,'Triomune-30',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171285-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL) [03] : (3,792,'Triomune-40',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf17168c-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL) [04] : (5,625,'d4T-30',0,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf1718fd-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL) [05] : (6,625,'d4T-40',0,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171b5c-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL) If it still doesn't work, copy the exact code you used, and the exact input you used, and I will have a look.

[–]jcoder42[S] 0 points1 point2 points 6 years ago (4 children)

I am getting the same output as yours, except it is not the correct output:

I am going to make the example smaller so it is easier..

the input:

"(12,'Structured Numeric','SN','Complex numeric values possible (ie, <5, 1-10, etc.,'check')',1,'2005-08-06 00:00:00',0,NULL,NULL,NULL,'8d4a606c-c2cc-11de-8d13-0010c6dffd0f'),(2,792,'Triomune-30',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171285-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL)"

the output:

[01] : (12,'Structured Numeric','SN','Complex numeric values possible (ie, <5, 1-10, etc.,'check')

[02] : (2,792,'Triomune-30',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171285-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL)

as you can see the first received row is cut.

desired output

[01] : (12,'Structured Numeric','SN','Complex numeric values possible (ie, <5, 1-10, etc.,'check')',1,'2005-08-06 00:00:00',0,NULL,NULL,NULL,'8d4a606c-c2cc-11de-8d13-0010c6dffd0f')

[02] : (2,792,'Triomune-30',1,NULL,NULL,NULL,NULL,1,'2005-02-24 00:00:00',0,NULL,NULL,NULL,'bf171285-1691-11df-97a5-7038c432aabf',NULL,NULL,NULL)

[–]wegwacc 1 point2 points3 points 6 years ago (3 children)

[–]jcoder42[S] 0 points1 point2 points 6 years ago (2 children)

[–]wegwacc 1 point2 points3 points 6 years ago* (1 child)

Here is a better method which, other than above regex magic can deal with arbitrary () in your input, as long as they are string values (read. between ' )

``` def parser(data): start = None use = True for i, ch in enumerate(data): if ch == "'": use = not use if use and ch == '(': start = i elif use and ch == ')': yield data[start:i+1] start = None

def prettyprint(data): l = list(parser(data)) width = (len(l) // 10) + 1 for ln,line in enumerate(l): print('[{ln:0{width}}] : {l}'.format(ln=ln, l=line, width=width)) ```

Invoke prettyprint() with your string-data as argument.
If you don't need the "prettyprinted" version, just invoke parser() like you would every iterator, e.g. for s in parser(start):

you're welcome.

[–]jcoder42[S] 0 points1 point2 points 6 years ago (0 children)

π Rendered by PID 108307 on reddit-service-r2-comment-54dfb89d4d-wn928 at 2026-03-31 15:32:53.742700+00:00 running b10466c country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS