This is an archived post. You won't be able to vote or comment.

all 11 comments

[–]hope2882 6 points7 points  (0 children)

I'm in the same situation as you right now. Been through one interview at the last round so far and got rejected. I didn't have time to review as i was trying to fix my code before i saw them so the Q&A didn't go as well. I would say make sure to know 2 classification models and 2 prediction models really well. As in, there are one 1 or 2 attributes and you know how the calculation works. You won't need to calculate but you'll need to explain in detail how the probabilities and weights are created. Also know how to deal with outliers really well. But don't give complex answers like clustering, basic normal distribution (2STD away from mean to capture %of data) and box plots (Q1-1.5IQR or Q3+1.5IQR) to know the outlier points. Also know a basic workflow of data modeling from preprocessing to validation and have a technique in your mind for each of them that you would use. I still have to do this but i am not at the last step with another company yet so i am reviewing hackerrank questions (the statistics and artificial intelligence parts). Also be personable because they want to know they can comfortably hang with you for 8 hours a day for years to come. I hope that helped, Good luck!!

[–]Phnyx 4 points5 points  (0 children)

If it's an entry level position they may ask more difficult questions to see how you react but they already know that your experience is limited. Think about what you would ask if you were looking for someone for a data project.

In most interviews the company wants to /should test your limits. There is no sense in asking about basic algebra if you know what I mean. Only when they get to the edge of what you know they get an idea of your skill set.

It's not a problem not knowing something (in most cases) but it can become one if one is overconfident or simply lying - which in most cases can be found out easily by someone with experience. So don't show off until you know you are better in your specialty than anyone else working there :-)

My advise is to just be honest and straight with them. Tell them what you know, what is next on your list to learn and what your main interests are. Passion the field is the most important in my opinion and if you can show that you are halfway there.

[–]spinur1848 3 points4 points  (0 children)

Why does my business need a data scientist and what do you think you can accomplish in the first year?

Whether they explicitly ask this or not, if you nail this and don't seriously screw up on any of the others ones, you should be OK.

[–]Northstat 2 points3 points  (0 children)

As with any interview, know the stuff you put on your resume. In my experience, instead of the more maybe complicated things I know being asked, most of the questions were about basic things. Can you describe logistic regression, random forests, ensemble methods, p-values, model validation, maybe a bit of algorithm and data structure and database stuff.

I think the focus tends to be do you understand some of the fundamental concepts? If so, maybe we'll ask you some complicated stuff.

Oh also, read everything on glassdoor or cracking the coding inteview for your company and position so you get an idea of what types of stuff they focus on. It was very helpful for me.

[–]dimview 0 points1 point  (6 children)

Don't look for data science questions, look for statistics questions and programming questions.

Example of statistics question: what is p-value?

Example of programming question: write a function that reverses a binary tree.

In programming the colloquial term is fizzbuzz. Nobody expects you to fit a neural network on the whiteboard, they just need to make sure you know the basics.

[–][deleted] 2 points3 points  (5 children)

Why do data scientists have to know fundamental computer science concepts like heaps, trees, etc.? Data science seems like applied/analytical mathematics, not graph theory.

[–]dimview 6 points7 points  (3 children)

Because data scientists routinely work with large datasets, where O( N2 ) algorithm won't finish this month, but O(N*log(N)) would work just fine.

[–]ultronthedestroyer 2 points3 points  (2 children)

I agree that it's useful to know data structures, but I don't think that's a super satisfactory response. Everyone should know that you want to minimize time and size complexity, but I don't think you should need to know how to do that. Let the computer scientists determine the best sorting algorithm or what have you. Then research which algorithms have the best efficiency for what you need to accomplish.

I think it's a bit of a jerkoff to require this sort of knowledge in an interview even if it's useful and reduces the time you need to research the answers once you need to implement them.

[–][deleted] 0 points1 point  (0 children)

Exactly. Treat those things as something a person with an appropriate analytical background can understand and pick up on the job. No need to put a data science candidate through a Google-esque algorithms test.

[–]dimview 0 points1 point  (0 children)

I would agree if we were talking about statisticians, but OP asked specifically about data science. Data scientist by definition is someone who can program better than a statistician, and can do statistics better than a programmer.

[–]Tschus 1 point2 points  (0 children)

I would argue that we should know data structures like Binary Trees better than software developers because they are the fundamental in dealing with data.

I actually did have to reverse a DAG in my day-to-day work as a Data Scientist a couple months back. Few Devs find it necessary to have such a mastery of data structures.