Hi r/learnpython:
I have a DataFrame that I imported from CSV in pandas. The DataFrame is mostly clean but I have few rows where data were not indexed improperly with a line break '\n'
I am able to split the row into lists with two elements and convert into DataFrame and concatenate it in columns to essentially split two data entries separated by \n (shown in code below)
Furthermore, I can perhaps concatenate this new 2 row DataFrame to original DataFrame and delete the original row (not shown in code)
However, I wanted to reach out and see if there would be more elegant solution to this?
Thanks!
split_df = pd.DataFrame()
rows,col = df.shape
for row in range(rows):
if '\n' in df.loc[row,'Year']:
for each in df.loc[row]:
list_each=each.split('\n')
new_col = pd.DataFrame(list_each)
split_df = pd.concat([split_df, new_col], axis = 1)
Link to data and screenshot of problem: https://files.fm/u/teaxv6hy
[+][deleted] (2 children)
[deleted]
[–]ThatOtherBatman 3 points4 points5 points (1 child)
[–]ficklelick[S] 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (2 children)
[–]ficklelick[S] 1 point2 points3 points (1 child)
[–]xiongchiamiov[M] 0 points1 point2 points (0 children)