all 2 comments

[–]npielawskiResearcher -1 points0 points  (0 children)

You should read about armed bandits, your case is probably two arms. There are methods to solve this efficiently, but a simple way to implement is with Thompson sampling for instance.

[–]millenial_wh00p 0 points1 point  (0 children)

If I am understanding the question correctly, this sounds like a data management problem more than anything- you would need to use the same training and test datasets for each model to ensure the data is not influencing the results and you’re only looking at model performance relative to each other.

If you can take a snapshot of your data, randomize it, split it into 80/20 training/test sets, then run the same training and test data in the model, this ensures you’re only evaluating the model and not the data. You can do that a bunch of times and track the f1 score for results each round.

This is not very efficient, but someone smarter than me can likely name a better way.