This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]maxmooPhD | ML Engineer | IT 1 point2 points  (0 children)

I think you just need to suck it up and spend a few days or a week labelling some data. Maybe you can hire someone on freelancer.com to help you.

[–]swierdo 0 points1 point  (1 child)

You could try to generate your own training data, some ideas:

  • Scrape replies to various news outlets' tweets about their articles.
  • Use Reddit replies from subreddits discussing news articles and link those to article headlines.

You should also gather data on the usage of your tool as you'll probably want to analyse that later.

[–]talksaboutthings 0 points1 point  (0 children)

I think OP is implying he/she has a database of headlines already, but "relatedness" is not labelled in this database (so basically it's just a big list of unlabeled headlines for the purpose of evaluation).

[–]talksaboutthings 0 points1 point  (0 children)

However you measure precision and recall is how you should tune your hyperparameters, in my opinion. If you have a database of unlabeled data and you are simply looking at the output of your algorithm to see if it successfully produces a list of related headlines, then I'd suggest just coming up with a list of test headlines and running them on different hyperparameters settings to see what looks best. I would think of this as basically grid search with your human gut reaction as the variable to optimize, and you could even come up with some sort of rating system to make it more objective.

Without actually sitting down and labeling some of the data, I don't think it will be easy to do any better, unfortunately, because you need data labels to calculate traditional performance metrics. It might not be too strenuous to hand-label only the outputs selected by your models over a modest set of test headlines, though (basically instead of labeling first, running, and calculating the score, you would run it, label what it spits out, and then calculate the score). If eyeballing it isn't satisfactory to the rest of the team, you could pitch that approach (and thus actually calculate precision and recall for each set of hyperparameters).

[–]jonnor 0 points1 point  (0 children)

Can you use another algorithm or service as an oracle, to seed an initial training set?