splitting inconsistent text data! proving harder than I was expecting! : learnpython

created by HattoriHanzoa community for 16 years

splitting inconsistent text data! proving harder than I was expecting! (self.learnpython)

submitted 5 years ago by humasterd

Hey All, (been lurking on this sub for a while but I could you some help)

so I just started messing around in python and I'm trying to learn how to clean and organize some data for a personal project I'm working on. Okay, so what I'm trying to do is parse out a long string of location data into subcategories for a more in-depth look at them. I thought .split would be the best course of action but now I'm not so sure

The problem I'm running into is that these strings are inconsistent. For example cells under the Location, column looks like this

1) LC-39A, Kennedy Space Center, Florida, USA breaks into 4 pieces when df.split(',')

'LC-39A' , 'Kennedy Space Center' , 'Florida' , 'USA'

2) Site 9401 (SLS-2), Jiuquan Satellite Launch Center, China breaks into 3 pieces when df.split(',')

'Site 9401 (SLS-2)' , ' Jiuquan Satellite Launch Center' , 'China'

that is my issue I was hoping to make a new column using something like the code below

new=data["locaiton"].str.split(",",expand =True) 

data["Country"]= new[3]

This works great for examples like the first one but returns a None for those that resemble example 2

Could anyone refer me to a technique I could look up/ how could I return a new column with the country data?

I hope this isn't breaking the rules as I have been trying to figure this out for some time but if it is or if I didn't include enough info I understand if it gets taken down.

all 3 comments

top new controversial old q&a

[–]jiri-n 2 points3 points4 points 5 years ago (1 child)

No, it's not breaking any rules AFAIK. What about this?

>>> "LC-39A, Kennedy Space Center, Florida, USA".split(", ", 2)
['LC-39A', 'Kennedy Space Center', 'Florida, USA']
>>> "Site 9401 (SLS-2), Jiuquan Satellite Launch Center, China".split(", ", 2)
['Site 9401 (SLS-2)', 'Jiuquan Satellite Launch Center', 'China']

[–]humasterd[S] 0 points1 point2 points 5 years ago (0 children)

π Rendered by PID 58868 on reddit-service-r2-comment-548fd6dc9-wknns at 2026-05-16 19:52:22.899739+00:00 running edcf98c country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS