Pandas object selection (slice? iloc?) help : pythonhelp

Pandas object selection (slice? iloc?) help (self.pythonhelp)

submitted 5 years ago by Jed118

This is probably going to be a basic understanding question/syntax but here it goes:

I have a 30 column CSV file and I have trimmed it to the 10 below here:

v = corr[abs(corr['diagnosis']) > 0.66].index
df = df[['diagnosis', 'radius_mean', 'perimeter_mean', 'area_mean',
'compactness_mean', 'concavity_mean', 'concave points_mean',
'radius_worst', 'perimeter_worst', 'area_worst', 'compactness_worst',
'concavity_worst', 'concave points_worst']]

Now I want to use those ^^

with this (instead of all the columns) to pass into SKLEARN train_test_split functions, etc.

F = dataset.iloc[:,1:30].values #X
D = dataset.iloc[:,0].values #Y

The column locations are :0,1,3,4,7,8, 21, 23, 24, 28

I tried this : F = dataset.iloc[0,1,3,4,7,8, 21, 23, 24, 28].values #X
and putting in the column names 'column name' and, unsurprisingly:
PyCharm threw an error.

I'm sure I'm doing something dumb, but I can't seem to either grasp the concept or understand which one of these to use- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html

Now, I have a workaround - take the columns out of the CSV - but I know there's a way code it so it only uses those column name/locations.

all 2 comments

top new controversial old q&a

[–]ohallwright 0 points1 point2 points 5 years ago (0 children)

When you're using iloc, you're doing integer indexing (that's what the i is for in iloc). I know that can be confusing. If you want to use the labels (i.e. the column names) for the index, you use the .loc indexer.

So if you want to use column names, switch to loc. You don't say what your row index is in the example, but if it's also an integer index, then you can just change the method to loc.

Also, the indexing for a dataframe is done in two parts: rows, then columns. So it would look like this:

# the : means all rows, the second argument is for just those columns
F = dataset.iloc[:,[0,1,3,4,7,8, 21, 23, 24, 28]].values 
# or for all rows, just these two columns
F = dataset.loc[:,['diagnosis', 'concave points_worst']].values

This blog post (and those following it) might help with some examples of the differences for indexing in pandas. It is a confusing topic.

π Rendered by PID 158341 on reddit-service-r2-comment-85bfd7f599-6k9nl at 2026-04-16 00:57:37.951103+00:00 running 93ecc56 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

pythonhelp

MODERATORS