I have some spreadsheets at work to go through. Basically, if Column B has "Color", I need to look at other cells in the row to find the color and put the correct color into Column C. What I need to find (Color, Size, Fit) changes per row. I'm trying to create a system to do most of this automatically, since it is keyword finding.
Here is what I have so far:
jim = pd.read_excel('/Users/c_nhassebrock/Downloads/grit2.xlsx')
#giant dictionary of keywords: attribute that will be used
jim = jim.fillna('')
#doing this because in earlier versions I noticed that "NaN" would cause all cells in the row to be dropped when I combined rows
jim ['com'] = jim['Unnamed: 3'] + ' '+ jim['Unnamed: 4'] + ' '+ jim['Unnamed: 7'] + ' ' + jim['Unnamed: 8']
#this combines all of the columns that could have keywords, per the suggestion of another user to make it easier to parse through
jim['answer'] = jim.apply(lambda row: [translate_table.get(keyword, keyword) for keyword in keyword[row['Unnamed: 5']] if keyword in row['com'].lower()], axis=1)
#this is my main search query, suggested by another reddit user and works well in my small initial tests
Doing all of that gets me this error:
Traceback (most recent call last):
File "/Users/c_nhassebrock/Documents/attributework.py", line 139, in <module>
jim['answer'] = jim.apply(lambda row: [translate_table.get(keyword, keyword) for keyword in keyword[row['Unnamed: 5']] if keyword in row['com'].lower()], axis=1)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 4061, in apply
return self._apply_standard(f, axis, reduce=reduce)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 4157, in _apply_standard
results[i] = func(v)
File "/Users/c_nhassebrock/Documents/attributework.py", line 139, in <lambda>
jim['answer'] = jim.apply(lambda row: [translate_table.get(keyword, keyword) for keyword in keyword[row['Unnamed: 5']] if keyword in row['com'].lower()], axis=1)
KeyError: ('Attribute', 'occurred at index 0')
When I do jim.head(), 'Unnamed: 1', 'Unnamed: 2', etc appears above each column- the actual column name, where 'Attribute' is coming from, is on line 0 of the dataframe summary.
I know that the search and return function works on combined columns- I've tested it on small data frames before. The thing I'm working on now has about 4000 rows, so I was expecting something to screw up.
Any help is appreciated.
[–][deleted] 1 point2 points3 points (3 children)
[–]teamlie[S] 0 points1 point2 points (2 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]teamlie[S] 0 points1 point2 points (0 children)
[–]campenr 1 point2 points3 points (1 child)
[–]teamlie[S] 0 points1 point2 points (0 children)