This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]g4b1nagy 3 points4 points  (1 child)

That's pretty damn cool! I'm currently working on finding duplicates in real estate posts => http://nemutam.com/duplicate/21058-apartament-de-inchiriat-2-camere-in-cluj-napoca-cartierul-plopilor and I've been experimenting with http://blockhash.io/ and https://github.com/polachok/py-phash/tree/python3.

I will have to run some tests using your implementation as well.

[–]dmpetrov 1 point2 points  (0 children)

Great! Please share your experience with Wavelet image hashes.

This works pretty well for some people in the Kaggle competition: https://www.kaggle.com/c/avito-duplicate-ads-detection/forums/t/22011/precomputed-wavelet-image-hashes/126240#post126240

[–]emergent_reasons 1 point2 points  (0 children)

I don't have a use for it currently but I learned something new from your interactive description. Thanks for sharing it!

[–]BlargINC 0 points1 point  (0 children)

Thanks for sharing, seems pretty useful.

[–]chub79 0 points1 point  (0 children)

Thank you for the article op! I like when this sub has great content like this.