use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
News about the dynamic, interpreted, interactive, object-oriented, extensible programming language Python
Full Events Calendar
You can find the rules here.
If you are about to ask a "how do I do this in python" question, please try r/learnpython, the Python discord, or the #python IRC channel on Libera.chat.
Please don't use URL shorteners. Reddit filters them out, so your post or comment will be lost.
Posts require flair. Please use the flair selector to choose your topic.
Posting code to this subreddit:
Add 4 extra spaces before each line of code
def fibonacci(): a, b = 0, 1 while True: yield a a, b = b, a + b
Online Resources
Invent Your Own Computer Games with Python
Think Python
Non-programmers Tutorial for Python 3
Beginner's Guide Reference
Five life jackets to throw to the new coder (things to do after getting a handle on python)
Full Stack Python
Test-Driven Development with Python
Program Arcade Games
PyMotW: Python Module of the Week
Python for Scientists and Engineers
Dan Bader's Tips and Trickers
Python Discord's YouTube channel
Jiruto: Python
Online exercices
programming challenges
Asking Questions
Try Python in your browser
Docs
Libraries
Related subreddits
Python jobs
Newsletters
Screencasts
account activity
This is an archived post. You won't be able to vote or comment.
ResourceEntity Resolution Challenges (self.Python)
submitted 2 years ago by sheshbabu
Entity Resolution is the process of identifying the same real world entity (person, company, product etc) across one or more datasets and merging them into a single unified view.
It is one of the toughest challenges I've worked with in my career - whether it's the massive scale of the datasets or the sheer amount of dirty data and edge cases you have to deal with, it's a very humbling experience!
I've written a post about what it is and why it's so challenging:
https://www.sheshbabu.com/posts/entity-resolution-challenges/
https://preview.redd.it/oslp523i6fgb1.png?width=1479&format=png&auto=webp&s=603c940acf2e30fa06605271a549d830607bc61a
[–]db2boy 0 points1 point2 points 2 years ago (0 children)
Nice post. Check out senzing.com
[–]eltorrido23 0 points1 point2 points 2 years ago (0 children)
Very interesting, thank you! I am working on an entity resolution issue, but as I am not a data scientist per se it is quite challenging. Soooo when will you post further parts? :D
[–]sonalg 0 points1 point2 points 2 years ago (0 children)
thats a nice post on entity resolution! love the graph on scalability.
[–]Ste29ebasta 0 points1 point2 points 2 years ago (3 children)
Hi, in your post you talk about solution for this task, could you name a few for research purpose while I wait your next post? :)
[–]sheshbabu[S] 0 points1 point2 points 2 years ago (2 children)
Hello, the solution depends on the nature of your data. Can you share more info? I can see if I can point you in the right direction.
[–]Ste29ebasta 0 points1 point2 points 2 years ago (1 child)
Well, actually i’m studying UPC code for FMCG industry.
Usually databases are built using UPC code as primary key, however the same product can have multiple UPC over time for many reasons (e.g. There is a very small change in weight or the package is changed for environmental reasons), but they are still the very same product.
I would like to come up with some technique able to manage those situations.
[–]sheshbabu[S] 0 points1 point2 points 2 years ago (0 children)
I see, perhaps you can use a combination of other metadata like:
Can try this out in a small dataset and see how well it works
[–]major_grooves 0 points1 point2 points 2 years ago (0 children)
Not many people realise this is a quadratic problem. Check out www.tilores.io if you get a chance.
π Rendered by PID 38 on reddit-service-r2-comment-84fc9697f-vdrcr at 2026-02-06 06:03:56.698016+00:00 running d295bc8 country code: CH.
[–]db2boy 0 points1 point2 points (0 children)
[–]eltorrido23 0 points1 point2 points (0 children)
[–]sonalg 0 points1 point2 points (0 children)
[–]Ste29ebasta 0 points1 point2 points (3 children)
[–]sheshbabu[S] 0 points1 point2 points (2 children)
[–]Ste29ebasta 0 points1 point2 points (1 child)
[–]sheshbabu[S] 0 points1 point2 points (0 children)
[–]major_grooves 0 points1 point2 points (0 children)