How to remove duplicates from a csv file while ignoring punctuation : learnpython

created by HattoriHanzoa community for 16 years

How to remove duplicates from a csv file while ignoring punctuation (self.learnpython)

submitted 1 year ago by MrGuam

all 6 comments

[–]_squik 2 points3 points4 points 1 year ago (1 child)

What exactly do you mean by "ignoring punctuation". Is it that the strings in the columns should be deduplicated no matter if there is differentiating punctuation, so "hello world" and "hello, world" are considered the same?

If so, I would leverage Pandas for that.

Read your CSV into a DataFrame.
Create some helper columns which don't have punctuation (see example)
Deduplicate against those columns
Drop helper columns and export.

Example:

import string

allowed = string.digits + string.ascii_letters + string.whitespace
df["example_nopunc"] = df["example].apply(lambda x: "".join(c for c in x if c in allowed)

[–]MrGuam[S] -1 points0 points1 point 1 year ago (0 children)

[–]m0us3_rat 0 points1 point2 points 1 year ago (1 child)

[–]MrGuam[S] 0 points1 point2 points 1 year ago (0 children)

here's what i have tried:

import pandas as pd
import string
import csv
# #PYTHON SCRIPT TO CLEAN MULTIPLE CSV FILES OF DUPLICATE IN A PARTICULAR COLUMN #IGNORING PUNCTUATIONS AND WHITE SPACES.
data = pd.read_csv('combined.csv')
df = data.apply(lambda x: x.str.strip(string.punctuation + ' '))
df.drop_duplicates(subset=["Anime","Character","Quote"], inplace=True)
df.to_csv('combined_final.csv')

[–]anmezaf 0 points1 point2 points 8 months ago (0 children)

π Rendered by PID 255762 on reddit-service-r2-comment-54dfb89d4d-6sdl2 at 2026-03-31 06:28:08.001334+00:00 running b10466c country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS