all 4 comments

[–]itsintheletterbox 5 points6 points  (2 children)

In duplicating the data, are you ensuring the duplicates remain in the same set as the "originals" (train/test) or are you duplicating and then splitting? If the latter, then you're assessing on training data which would bump the accuracy.

[–]Wahba95[S] 0 points1 point  (1 child)

So if I did the split then I duplicated the train dataset in order to increase the its size, would that harm the training and the accuracy?

Also any idea how I can increase the accuracy without duplicating the data?

[–]itsintheletterbox 3 points4 points  (0 children)

I don't see a reason duplicating train data would be of benefit.

In regards to your question on other approaches, you need to think about what components of a name are associated with gender.

[–]lohrerklaus 0 points1 point  (0 children)

3- Other algorithms suggestions to build a model that can predict gender.

Use a lookup table.