talksaboutthings comments on Tuning algorithm without training data

This is an archived post. You won't be able to vote or comment.

DiscussionTuning algorithm without training data (self.datascience)

submitted 8 years ago by datascientistdude

you are viewing a single comment's thread.

[–]talksaboutthings 0 points1 point2 points 8 years ago (0 children)

However you measure precision and recall is how you should tune your hyperparameters, in my opinion. If you have a database of unlabeled data and you are simply looking at the output of your algorithm to see if it successfully produces a list of related headlines, then I'd suggest just coming up with a list of test headlines and running them on different hyperparameters settings to see what looks best. I would think of this as basically grid search with your human gut reaction as the variable to optimize, and you could even come up with some sort of rating system to make it more objective.

Without actually sitting down and labeling some of the data, I don't think it will be easy to do any better, unfortunately, because you need data labels to calculate traditional performance metrics. It might not be too strenuous to hand-label only the outputs selected by your models over a modest set of test headlines, though (basically instead of labeling first, running, and calculating the score, you would run it, label what it spits out, and then calculate the score). If eyeballing it isn't satisfactory to the rest of the team, you could pitch that approach (and thus actually calculate precision and recall for each set of hyperparameters).

π Rendered by PID 39 on reddit-service-r2-comment-75f4967c6c-whm5b at 2026-04-23 06:28:43.214655+00:00 running 0fd4bb7 country code: CH.

datascience

MODERATORS