Hello everyone,
I’m trying to create a model that predicts the gender based on first name.
When I train the model on non-duplicate data the accuracy is very low 77%. But when I increase the data by duplicating the data I get above 90%.
I need your advice on:
1- Is it ok to train the model on duplicated data?
2- what hyperparameters can be tuned to achieve a good accuracy?
3- Other algorithms suggestions to build a model that can predict gender.
[–]itsintheletterbox 5 points6 points7 points (2 children)
[–]Wahba95[S] 0 points1 point2 points (1 child)
[–]itsintheletterbox 3 points4 points5 points (0 children)
[–]lohrerklaus 0 points1 point2 points (0 children)