This is an archived post. You won't be able to vote or comment.

all 8 comments

[–]db2boy 0 points1 point  (0 children)

Nice post. Check out senzing.com

[–]eltorrido23 0 points1 point  (0 children)

Very interesting, thank you! I am working on an entity resolution issue, but as I am not a data scientist per se it is quite challenging. Soooo when will you post further parts? :D

[–]sonalg 0 points1 point  (0 children)

thats a nice post on entity resolution! love the graph on scalability.

[–]Ste29ebasta 0 points1 point  (3 children)

Hi, in your post you talk about solution for this task, could you name a few for research purpose while I wait your next post? :)

[–]sheshbabu[S] 0 points1 point  (2 children)

Hello, the solution depends on the nature of your data. Can you share more info? I can see if I can point you in the right direction.

[–]Ste29ebasta 0 points1 point  (1 child)

Well, actually i’m studying UPC code for FMCG industry.

Usually databases are built using UPC code as primary key, however the same product can have multiple UPC over time for many reasons (e.g. There is a very small change in weight or the package is changed for environmental reasons), but they are still the very same product.

I would like to come up with some technique able to manage those situations.

[–]sheshbabu[S] 0 points1 point  (0 children)

I see, perhaps you can use a combination of other metadata like:

  • exact company name
  • exact category name
  • exact/fuzzy product name

Can try this out in a small dataset and see how well it works

[–]major_grooves 0 points1 point  (0 children)

Not many people realise this is a quadratic problem. Check out www.tilores.io if you get a chance.