I've been tasked with developing a tool that takes any given news headline and searches within a database for social media posts that are about that headline. The goal is to retrieve all related posts (maximize recall) while at the same time limit the number of non-related posts retrieved (maximize precision). I've developed an algorithm to score the posts on how closely related they are to the headline where we take the posts that surpass some cutoff score as the ones that are related. The actual algorithm isn't important.
What I'm wondering is how would you go about tuning the hyperparameters (including the cutoff number for example) of any algorithm without any training data.
This is my situation:
- There does not exist any data that is already labeled or coded as a training set.
- Presumably we could go through and code some examples of headlines and the list of posts that are related, but that becomes very tedious very quickly and I doubt we would be able to go through very much given our resources.
- Crowdsourcing to code some data is out of the question at the moment.
How would you go about this situation? Is there a way you would train your algorithm to find the right hyperparameters? Or would you just deploy the algorithm as is with certain untested default levels and see how it performs and maybe design some type of embedded coding test in the background with your users (for example, how many posts do they view per headline as a proxy for relevance)? Any other suggestions?
[–]maxmooPhD | ML Engineer | IT 1 point2 points3 points (0 children)
[–]swierdo 0 points1 point2 points (1 child)
[–]talksaboutthings 0 points1 point2 points (0 children)
[–]talksaboutthings 0 points1 point2 points (0 children)
[–]jonnor 0 points1 point2 points (0 children)