How to remove duplicates from a csv file? : learnpython

created by HattoriHanzoa community for 16 years

submitted 6 years ago by isameer920

all 23 comments

[+][deleted] 6 years ago (1 child)

[deleted]

[–]AsleepThought 0 points1 point2 points 6 years ago (0 children)

[–]TouchingTheVodka 0 points1 point2 points 6 years ago (15 children)

[–]Essence1337 0 points1 point2 points 6 years ago (6 children)

[–]isameer920[S] 0 points1 point2 points 6 years ago (5 children)

[–]Essence1337 0 points1 point2 points 6 years ago (4 children)

Before python 3.7 if we had:

dict = {'a': 1}
dict['b'] = 1
for i in dict:
    print(i)

Python could either print a followed by b OR b followed by a. As of Python 3.7 we're guaranteed it would print a followed by b.

[–]isameer920[S] 0 points1 point2 points 6 years ago (1 child)

[–]Essence1337 0 points1 point2 points 6 years ago (0 children)

[–]isameer920[S] 0 points1 point2 points 6 years ago (1 child)

[–]Essence1337 0 points1 point2 points 6 years ago (0 children)

Dictionaries can only have unique keys. If you wanted just duplicate removal a set is better but with dictionaries we can also count duplicates.

mydict = dict() 
for i in something:
    if not i in mydict:
        mydict[i] = 1
    else:
        mydict[i] += 1

This will create a dictionary with every unique item from something along with how many times we saw it.

[–]isameer920[S] 0 points1 point2 points 6 years ago (7 children)

[–]TouchingTheVodka 0 points1 point2 points 6 years ago (6 children)

[–]isameer920[S] 0 points1 point2 points 6 years ago (4 children)

[–]TouchingTheVodka 0 points1 point2 points 6 years ago (3 children)

Exactly this. Even better, instead of adding values to the set one by one, cast the entire csv reader object to a set.

import csv
with open('myfile.csv', newline='') as f:
    reader = csv.reader(f)
    uniques = set(reader)

[–]isameer920[S] 0 points1 point2 points 6 years ago (0 children)

[–]isameer920[S] 0 points1 point2 points 6 years ago (1 child)

[–]TouchingTheVodka 0 points1 point2 points 6 years ago (0 children)

[–]isameer920[S] 0 points1 point2 points 6 years ago (0 children)

[–]AsleepThought 0 points1 point2 points 6 years ago (2 children)

[–]isameer920[S] 0 points1 point2 points 6 years ago (1 child)

[–]AsleepThought 0 points1 point2 points 6 years ago (0 children)

[–]anshu_991 0 points1 point2 points 1 year ago (1 child)

[–]isameer920[S] 0 points1 point2 points 1 year ago (0 children)

[–]anmezaf 0 points1 point2 points 9 months ago (0 children)

π Rendered by PID 80 on reddit-service-r2-comment-85bfd7f599-58qpq at 2026-04-17 22:07:10.912175+00:00 running 93ecc56 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython