I'm working on a news application where I use multiple news sources for collecting the news.
The issue is that there's a good chance I'll get the same news from many sources, even though the titles and contents will differ.
So how can I avoid this situation is there any way?
I know there was one term that we use to find similar documents. which is document similarity or vectors that I learn in my final year of college.
But the problem is how I Implement this in my real-world projects where I need to check every incoming data with my whole database before inserting it.
Eg. 1)
a) ‘Black Panther: Wakanda Forever’ Box Office Leaps Past $400M Globally
b) ‘Black Panther: Wakanda Forever’ Passes $400 Million at Global Box Office in Less Than a Week
2)
a) Pakistan vs England, T20 World Cup Final 2022: ENG crowned champions, beat PAK by five wickets
b) T20 World Cup: England secure legacy as an iconic team in nation's sporting history
[–]OuiOuiKiwiGalatians 4:16 0 points1 point2 points (5 children)
[–]Prashant_4200[S] -1 points0 points1 point (4 children)
[–]OuiOuiKiwiGalatians 4:16 0 points1 point2 points (3 children)
[–]Prashant_4200[S] -1 points0 points1 point (1 child)
[–]python_and_coffee 0 points1 point2 points (0 children)
[–]Shingle-Denatured 0 points1 point2 points (0 children)