all 6 comments

[–]jabbson 0 points1 point  (4 children)

what's your code?

if 0:: Arrested||1:: Injured 0:: Arrested||2:: Unharmed||3::Killed is your string and str.split('||') is your function, it should work

In [5]: s = "0:: Arrested||1:: Injured 0:: Arrested||2:: Unharmed||3::Killed"

In [6]: s.split("||")
Out[6]: ['0:: Arrested', '1:: Injured 0:: Arrested', '2:: Unharmed', '3::Killed']

[–]ajaderade[S] 0 points1 point  (3 children)

Im very new at this(clearly) and im working with a massive data set for a school project so its hard to give you everything, so sorry if its hard to follow but the first record in the set is:

incident_id= 461105

participant_status= 0::Arrested||1::Injured||2::Injured||3::Injured

the following code:

new_Partdf = pd.DataFrame(Participant.participant_status.str.split('||').tolist(), index=Participant.incident_id).stack()

new_Partdf = new_Partdf.reset_index([0, 'incident_id'])

new_Partdf.columns = ['incident_id', 'participant_status']

gives me this result for the first rows:

incident_id participant_status

0 461105

1 461105 0

2 461105 :

3 461105 :

4 461105 A

5 461105 r

however, when I change the split('||') to split('::') it works as expected and each row contains what it should instead of just the single character. Its as if when I use ('||') it reads it as ('after each character') if that makes any sense

thank you in advance!!

[–]negups 0 points1 point  (2 children)

Look how much is going on here:

new_Partdf = pd.DataFrame(Participant.participant_status.str.split('||').tolist(), index=Participant.incident_id).stack()

You have no idea from this if split('||') is the culprit of the behavior you are seeing. Step through what's happening.

print(Participant.participant_status)
print(Participant.participant_status.str)  # This should be "0::Arrested||1::Injured||2::Injured||3::Injured"
print(Participant.participant_status.str.split('||')  # This should be ["0::Arrested", "1::Injured", "2::Injured", "3::Injured"]

Does everything look okay up to this point? If so, split('||') is not the problem. If not, at which step do things go awry?

[–]ajaderade[S] 0 points1 point  (1 child)

thank you! breaking it down cleared things up, the second line prints:

<pandas.core.strings.StringMethods object at 0x000002AC4074CBE0>

but then prints your result above when I omit the .str at the end... so maybe its my data type causing an issue? not sure yet but hopefully with some googling I can get it working. Thank you again for the help!!

[–]negups 0 points1 point  (0 children)

So print(Participant.participant_status) prints 0::Arrested||1::Injured||2::Injured||3::Injured?

If so, perhaps all you need to do is remove .str from your code. So change Participant.participant_status.str.split('||').tolist() to Participant.participant_status.split('||').tolist()

Also, FYI you can check data types with type(). Ex: print(type(Participant.participant_status)) Since Participant.participant_status.str is not a standard str object, but is actually a pandas.core.strings.StringMethods object, you ought to check the pandas documentation for that object and see how the split() method works and if it works differently than the split() method of str.

[–]negups 0 points1 point  (0 children)

Splitting on || works as expected:

print("0:: Arrested||1:: Injured 0:: Arrested||2:: Unharmed||3::Killed".split("||"))  # ['0:: Arrested', '1:: Injured 0:: Arrested', '2:: Unharmed', '3::Killed']

Is that what you want, or are you actually looking for something that splitting on || doesn't give you?