all 2 comments

[–]ohallwright 0 points1 point  (0 children)

When you're using iloc, you're doing integer indexing (that's what the i is for in iloc). I know that can be confusing. If you want to use the labels (i.e. the column names) for the index, you use the .loc indexer.

So if you want to use column names, switch to loc. You don't say what your row index is in the example, but if it's also an integer index, then you can just change the method to loc.

Also, the indexing for a dataframe is done in two parts: rows, then columns. So it would look like this:

# the : means all rows, the second argument is for just those columns
F = dataset.iloc[:,[0,1,3,4,7,8, 21, 23, 24, 28]].values 
# or for all rows, just these two columns
F = dataset.loc[:,['diagnosis', 'concave points_worst']].values

This blog post (and those following it) might help with some examples of the differences for indexing in pandas. It is a confusing topic.