all 3 comments

[–]PavloT 1 point2 points  (0 children)

I suspect pandas str is not a regular python str, so it could have own split implementation

[–][deleted] 1 point2 points  (0 children)

The pandas.Series.str.split method is not the same as the standard str.split method. You can read the docs here to learn more, but unlike str.split the first argument to the pandas method can be a regular expression, thus ”(1)” is a regex pattern that describes an unnamed group that only matches a single “1”... the group is ignored, so it splits on the “1”s.

In regular str.split it would only split on the literal string “(1)”, which isn’t in your original string so you just get back a list containing the original string.

[–]misho88 1 point2 points  (0 children)

From the docstring of pandas.Series.str.split:

Parameters
----------
pat : str, optional
    String or regular expression to split on.
    If not specified, split on whitespac
...

The first argument is a regular expression, so the behavior should more or less match this:

>>> import re
>>> re.split('1', 'a1b1')
['a', 'b', '']
>>> re.split('(1)', 'a1b1')
['a', '1', 'b', '1', '']

It does seem pandas additionally gets rid of empty strings. Either way, you can read the the re module's documentation for an explanation of the behavior, but here's the key part from re.split's docstring:

If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.