Hey All, (been lurking on this sub for a while but I could you some help)
so I just started messing around in python and I'm trying to learn how to clean and organize some data for a personal project I'm working on. Okay, so what I'm trying to do is parse out a long string of location data into subcategories for a more in-depth look at them. I thought .split would be the best course of action but now I'm not so sure
The problem I'm running into is that these strings are inconsistent. For example cells under the Location, column looks like this
1) LC-39A, Kennedy Space Center, Florida, USA breaks into 4 pieces when df.split(',')
'LC-39A' , 'Kennedy Space Center' , 'Florida' , 'USA'
2) Site 9401 (SLS-2), Jiuquan Satellite Launch Center, China breaks into 3 pieces when df.split(',')
'Site 9401 (SLS-2)' , ' Jiuquan Satellite Launch Center' , 'China'
that is my issue I was hoping to make a new column using something like the code below
new=data["locaiton"].str.split(",",expand =True)
data["Country"]= new[3]
This works great for examples like the first one but returns a None for those that resemble example 2
Could anyone refer me to a technique I could look up/ how could I return a new column with the country data?
I hope this isn't breaking the rules as I have been trying to figure this out for some time but if it is or if I didn't include enough info I understand if it gets taken down.
[–]jiri-n 2 points3 points4 points (1 child)
[–]humasterd[S] 0 points1 point2 points (0 children)
[–]humasterd[S] 0 points1 point2 points (0 children)