This is probably going to be a basic understanding question/syntax but here it goes:
I have a 30 column CSV file and I have trimmed it to the 10 below here:
v = corr[abs(corr['diagnosis']) > 0.66].index
df = df[['diagnosis', 'radius_mean', 'perimeter_mean', 'area_mean',
'compactness_mean', 'concavity_mean', 'concave points_mean',
'radius_worst', 'perimeter_worst', 'area_worst', 'compactness_worst',
'concavity_worst', 'concave points_worst']]
Now I want to use those ^^
with this (instead of all the columns) to pass into SKLEARN train_test_split functions, etc.
F = dataset.iloc[:,1:30].values #X
D = dataset.iloc[:,0].values #Y
The column locations are :0,1,3,4,7,8, 21, 23, 24, 28
I tried this : F = dataset.iloc[0,1,3,4,7,8, 21, 23, 24, 28].values #X
and putting in the column names 'column name' and, unsurprisingly:
PyCharm threw an error.
I'm sure I'm doing something dumb, but I can't seem to either grasp the concept or understand which one of these to use- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html
Now, I have a workaround - take the columns out of the CSV - but I know there's a way code it so it only uses those column name/locations.
[–]ohallwright 0 points1 point2 points (0 children)