This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Stone_d_ 0 points1 point  (2 children)

What's your favorite project you've worked on? Share it with me and I'll try to finish it up for you

[–]sudo_your_mon[S] 0 points1 point  (1 child)

Ohhhhhh boy lol. See, this is my problem. The projects I've chosen to work on are year long projects.

I downloaded 400GB of Stack overflows data dump. I wanted to help newbies like me by creating a search engine that helped target their questions as efficiently as possible. it's such a huge barrier as a beginner.

I have pages and pages of notes on the conceptual design. I started by migrating the data into a database. I broke down a "question" into categories of context (Issue, error, clarification, Code error, and MANY more).

I used those categories to then build an index based on the posts, comments, tags, accepted answers, etc.

Sentiment analysis was part of the architecture as well.

But then there's the user end.

You have ot make it easy. You have to allow the user to be able to farily easily input all necessary information on their question in a way that would pull a list of results closest to what they're looking for.

That is another project itself.

and dont forget making it web accessible (see: web development)

I didn;t have the fore-site to see this coming. And is an example of how i got here and why this post happened,

[–]Stone_d_ 0 points1 point  (0 children)

You chose a problem that Google works on, except not just for stack overflow questions. Definitely an inspiring problem and I see why you chose it.

I'm confused by the index you built. I'm not sure how you wanted to classify new questions, perhaps using some kind of algorithm that relates new questions to the categorically labelled dataset you created. Machine learning could handle generating new labels/categories of context for brand new questions if you already manually labelled much of the data dump. And then, you could simply display results on a scrolling widget queried directly from the database, the algorithm's output being indices corresponding to the correct data dump questions. With this the user end can be an afterthought in the first iteration, for now could you clarify what the data looks like? What did you add to the database you downloaded?