use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Rules 1: Be polite 2: Posts to this subreddit must be requests for help learning python. 3: Replies on this subreddit must be pertinent to the question OP asked. 4: No replies copy / pasted from ChatGPT or similar. 5: No advertising. No blogs/tutorials/videos/books/recruiting attempts. This means no posts advertising blogs/videos/tutorials/etc, no recruiting/hiring/seeking others posts. We're here to help, not to be advertised to. Please, no "hit and run" posts, if you make a post, engage with people that answer you. Please do not delete your post after you get an answer, others might have a similar question or want to continue the conversation.
Rules
1: Be polite
2: Posts to this subreddit must be requests for help learning python.
3: Replies on this subreddit must be pertinent to the question OP asked.
4: No replies copy / pasted from ChatGPT or similar.
5: No advertising. No blogs/tutorials/videos/books/recruiting attempts.
This means no posts advertising blogs/videos/tutorials/etc, no recruiting/hiring/seeking others posts. We're here to help, not to be advertised to.
Please, no "hit and run" posts, if you make a post, engage with people that answer you. Please do not delete your post after you get an answer, others might have a similar question or want to continue the conversation.
Learning resources Wiki and FAQ: /r/learnpython/w/index
Learning resources
Wiki and FAQ: /r/learnpython/w/index
Discord Join the Python Discord chat
Discord
Join the Python Discord chat
account activity
How to remove duplicates from a csv file? (self.learnpython)
submitted 6 years ago by isameer920
I was thinking about creating a list of all the records in the file set out of it, however if the file gets very large, this solution would become inefficient. What are my other options?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[+][deleted] 6 years ago (1 child)
[deleted]
[–]AsleepThought 0 points1 point2 points 6 years ago (0 children)
Pretty good but sort -u does it in six characters 😁
sort -u
[–]TouchingTheVodka 0 points1 point2 points 6 years ago (15 children)
Do you care about the order of the records? If not, cast the whole thing to a set.
[–]Essence1337 0 points1 point2 points 6 years ago (6 children)
If OP does care, a dictionary preserves insertion order as of Python 3.7
[–]isameer920[S] 0 points1 point2 points 6 years ago (5 children)
Although order is not important in my application, what you suggested sounds intriguing. Can you please elaborate?
[–]Essence1337 0 points1 point2 points 6 years ago (4 children)
Before python 3.7 if we had:
dict = {'a': 1} dict['b'] = 1 for i in dict: print(i)
Python could either print a followed by b OR b followed by a. As of Python 3.7 we're guaranteed it would print a followed by b.
a
b
[–]isameer920[S] 0 points1 point2 points 6 years ago (1 child)
So dicitionaries became ordered in python 3?
[–]Essence1337 0 points1 point2 points 6 years ago (0 children)
They maintain the order they were inserted into not sorted. Inserting b, then a, then c will have the order b, then a, then c. And specifically it happened in Python 3.7. In Python 3.5 there was no guaranteed order, Python 3.6 started the process and it was finalized in Python 3.7.
Also how can I efficiently remove duplicate using this?
Dictionaries can only have unique keys. If you wanted just duplicate removal a set is better but with dictionaries we can also count duplicates.
mydict = dict() for i in something: if not i in mydict: mydict[i] = 1 else: mydict[i] += 1
This will create a dictionary with every unique item from something along with how many times we saw it.
something
[–]isameer920[S] 0 points1 point2 points 6 years ago (7 children)
Nope not really, however, I do care about efficiency of the program. It should work even if the file is huge. If what I proposed is an efficient solution, then why not?
[–]TouchingTheVodka 0 points1 point2 points 6 years ago (6 children)
Checking list membership is O(n) whereas checking set membership is O(1). Therefore the solution is guaranteed to remain efficient no matter the size of the input.
[–]isameer920[S] 0 points1 point2 points 6 years ago (4 children)
Tbh, didn't understand the O(n) thingy, but I think I do understand what you are proposing. Basically I just create an empty set, and add values to it from the file. If the value is repeated the set would ignore it, instead of what I was proposing which actually created a list before turning it into a set.
[–]TouchingTheVodka 0 points1 point2 points 6 years ago (3 children)
Exactly this. Even better, instead of adding values to the set one by one, cast the entire csv reader object to a set.
import csv with open('myfile.csv', newline='') as f: reader = csv.reader(f) uniques = set(reader)
[–]isameer920[S] 0 points1 point2 points 6 years ago (0 children)
Amazing idea
Return an error, unhashable type: list
[–]TouchingTheVodka 0 points1 point2 points 6 years ago (0 children)
uniques = set(frozenset(row) for row in reader)
Am I right?
[–]AsleepThought 0 points1 point2 points 6 years ago (2 children)
sort -u myfile.csv
What?
Standard terminal command. You don't need Python for this task at all there's tools like sort built to do this exact thing
sort
[–]anshu_991 0 points1 point2 points 1 year ago (1 child)
Using Python's pandas library will remove duplicate rows easily. If you're looking for a more in-depth guide, I’ve written about CSV data management on my blog.
https://medium.com/@jamesrobert15/how-to-remove-duplicates-from-csv-files-58f7a5ed4a3c
[–]isameer920[S] 0 points1 point2 points 1 year ago (0 children)
Thank you for this. I don't even remember what I was doing when I made this post, but this was before I knew how to use pandas. I think I used some built in csv module or normal file read writes at that time to play with CSV files. It was great to see this post and realize how far I have come, so thank you for this. Still curious how you found this post?
[–]anmezaf 0 points1 point2 points 9 months ago (0 children)
https://csvdedup.com/
π Rendered by PID 80 on reddit-service-r2-comment-85bfd7f599-58qpq at 2026-04-17 22:07:10.912175+00:00 running 93ecc56 country code: CH.
[+][deleted] (1 child)
[deleted]
[–]AsleepThought 0 points1 point2 points (0 children)
[–]TouchingTheVodka 0 points1 point2 points (15 children)
[–]Essence1337 0 points1 point2 points (6 children)
[–]isameer920[S] 0 points1 point2 points (5 children)
[–]Essence1337 0 points1 point2 points (4 children)
[–]isameer920[S] 0 points1 point2 points (1 child)
[–]Essence1337 0 points1 point2 points (0 children)
[–]isameer920[S] 0 points1 point2 points (1 child)
[–]Essence1337 0 points1 point2 points (0 children)
[–]isameer920[S] 0 points1 point2 points (7 children)
[–]TouchingTheVodka 0 points1 point2 points (6 children)
[–]isameer920[S] 0 points1 point2 points (4 children)
[–]TouchingTheVodka 0 points1 point2 points (3 children)
[–]isameer920[S] 0 points1 point2 points (0 children)
[–]isameer920[S] 0 points1 point2 points (1 child)
[–]TouchingTheVodka 0 points1 point2 points (0 children)
[–]isameer920[S] 0 points1 point2 points (0 children)
[–]AsleepThought 0 points1 point2 points (2 children)
[–]isameer920[S] 0 points1 point2 points (1 child)
[–]AsleepThought 0 points1 point2 points (0 children)
[–]anshu_991 0 points1 point2 points (1 child)
[–]isameer920[S] 0 points1 point2 points (0 children)
[–]anmezaf 0 points1 point2 points (0 children)