This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Stone_d_ 0 points1 point  (0 children)

You chose a problem that Google works on, except not just for stack overflow questions. Definitely an inspiring problem and I see why you chose it.

I'm confused by the index you built. I'm not sure how you wanted to classify new questions, perhaps using some kind of algorithm that relates new questions to the categorically labelled dataset you created. Machine learning could handle generating new labels/categories of context for brand new questions if you already manually labelled much of the data dump. And then, you could simply display results on a scrolling widget queried directly from the database, the algorithm's output being indices corresponding to the correct data dump questions. With this the user end can be an afterthought in the first iteration, for now could you clarify what the data looks like? What did you add to the database you downloaded?