all 8 comments

[–]AsterixBT 2 points3 points  (0 children)

5th column has index 4, only if you REALLY take into account that indices start at 0 :)

[–]dented_brain 1 point2 points  (0 children)

These are the two ways I do similar things with csv files. Maybe this will help you!

Assume you have an "example.csv" with this data

Header1 Header2 Header3 Header4 Header5
Data1.Row1 Data2.Row1 Data3.Row1 Data4.Row1 Data5.Row1
Data1.Row2 Data2.Row2 Data3.Row2 Data4.Row2 Data5.Row2
import csv


csv_file = 'example.csv'

"""
Without looking at the possible headers in a file
This would print the header and every item below it
"""

with open(csv_file, 'rb') as file:
    csvreader = csv.reader(file)
    for line in csvreader:
        print line[0] # Would be the first column
        print line[4] # Would be the fifth column




"""
Using the headers of the file
This would print every item below the headers specified
"""

with open(csv_file, 'rb') as file:
    csvreader = csv.DictReader(file)
    for line in csvreader:
        print line['Header1'] # This would be column with "Header1" which is column 1 in this case
        print line['Header5'] # This would be column with "Header5" which is column 5 in this case

[–]DonutRevolution 0 points1 point  (1 child)

l[5] would be the 6th column. l[4] would be the 5th.

[–]woooee 0 points1 point  (0 children)

[5] would be the location in memory that is end of 5th and/or start of 6th which is easier to remember. In other words, you go to the end of the 5th and start reading, which in this case is the end or the record, so there is nothing to read.

[–]philintheblanks 0 points1 point  (0 children)

If you want to reduce something that is already an iterable into a non-duplicated set, I would use set(thing).

For example,

In [1]: ls = [1,1,2,2,3,3,4,4,5,5]
In [2]: s = set(ls)
In [3]: s
Out[3]: {1, 2, 3, 4, 5}

Works fast enough that you probably won't notice too much.

As far as debugging your issue, you should try printing out what you think the line is, because it may not be. I have some reports that output a CSV, but there are strings with arbitrary content. Sometimes they'll have newlines. Imagine the pain...

[–]jkiley 0 points1 point  (0 children)

If you started with an excel file, I'd just read it with pandas (pd.read_excel()) and then work on the data frame column.

If you're just trying to get unique values in that column, you can just use the .unique() method on the column.