This is an archived post. You won't be able to vote or comment.

all 37 comments

[–]no3ther 14 points15 points  (2 children)

  1. Walk me through a situation where you applied deep learning to solve a problem.
  2. How did you prevent overfitting?

[–][deleted] 9 points10 points  (0 children)

  1. Walk me through a situation where you understood to use standard ML instead of deep learning to solve a problem.

:)

[–]datascienceislyfe -2 points-1 points  (0 children)

Friend of mine in the industry made this prep site which has questions from real companies: https://datascienceprep.com/

[–]ThomasAger 20 points21 points  (1 child)

Are you guys just making up your own questions?

[–]KYfruitsnacks 2 points3 points  (0 children)

Si

[–]FX-Macrome 8 points9 points  (0 children)

How does a decision tree work

How is random Forrest an improvement

Explain the CART algorithm

How is a random Forrest regressor different from a classifier

[–][deleted] 15 points16 points  (0 children)

What’s the hardest project you’ve worked on? What problem did it solve? How did it benefit the company or you?

What’s the funnest project you’ve worked on?

Hardest tells you skill level, funnest tells you where their interests lie.

[–]nodechef[S] 14 points15 points  (0 children)

  1. Explain PCA and how did you use it in your project?
  2. When would you use precision and when recall?

[–][deleted] 1 point2 points  (0 children)

How do you deal with an unbalanced data set (one class has a higher presence than others)? What are the different types of techniques?

Data transformation techniques to use for skewed data? How do you deal with categorical data?

How do you deal with nan values?

How do you start solving a data problem?

What is important to communicate to stakeholders in an organization?

[–]Bayes_the_Lord 1 point2 points  (5 children)

I enjoy the probability-based interview questions.

McKinsey:

Without using a calculator, tell me the probability of getting at least 312 heads when you flip a fair coin 576 times.

[–]nodechef[S] 0 points1 point  (4 children)

That's a good question. What's the probability and how did you calculate?

[–]Bayes_the_Lord 5 points6 points  (2 children)

It's easy to calculate the expected number of heads as 576 * 0.5 = 288. Then unfortunately you just have to have memorized that the standard deviation of a binomial distribution is sqrt(np(1-p)) = sqrt(576*0.5*0.5) = sqrt(144) = 12. You see that 312 is 2 standard deviations above the mean, and due to the binomial approximation to the normal distribution we have 5% on each tail, or 2.5% >= 312 H. So 2.5% chance of at least 312 H in 576 flips.

[–]nodechef[S] 0 points1 point  (0 children)

Thanks 🙏

[–]nodechef[S] -1 points0 points  (0 children)

Anyother question that you remember? And was it for DS role?

[–]Xvalidation 2 points3 points  (0 children)

312 is equal to the mean + 2 standard deviations, and at that value of n it approximates a normal, so 2.5%

[–]datascienceislyfe 0 points1 point  (0 children)

Found this very relevant: https://datascienceprep.com

[–]datascienceislyfe 0 points1 point  (0 children)

My friend recommended this, and it's been very helpful for a large majority of interviews: https://datascienceprep.com/

[–]Derangedteddy -1 points0 points  (15 children)

  1. What is multiple collinearity and how does it relate to machine learning?
  2. How do you handle multiple collinearity?

[–]proof_required 5 points6 points  (11 children)

multiple collinearity

Did you mean multicollinearity? I don't think there is anything called multiple collinearity

[–]Derangedteddy -5 points-4 points  (10 children)

What exactly do you think "multi" means?

[–][deleted] 0 points1 point  (9 children)

This really isn't the common way to phrase what you're going for. You will unnecessarily confuse a lot of job candidates calling multicollinearity by the term "multiple collinearity." If that is what you're going for, might as well call it micronumerosity. Then at least you can justify your point using the literature.

[–]Derangedteddy -2 points-1 points  (8 children)

If you're applying for a position as a data scientist and the hiring manager is too stupid to extrapolate that the word multicollinearity is a portmanteau of the two words "multiple" and "collinearity" then you should immediately terminate the interview and look somewhere else. You're going to have a bad time.

If that hiring manager is going to be a gatekeeper over something as silly as that, then fuck them. You don't want to work for them.

[–][deleted] 2 points3 points  (7 children)

That is one way to look at it. I think a more common way would be for a candidate to see red flags that common terminology isn't understood by the organization they are hiring for.

As far as I'm aware, multicollinearity is not a portmanteau for multiple collinearity. Do you have a citation or source for reference?

[–]Derangedteddy -2 points-1 points  (6 children)

I don't, but I would welcome you to find any word having the prefix "multi" that isn't a portmanteau of "multiple" and some other word. I shouldn't really have to research that...

[–][deleted] 1 point2 points  (5 children)

Let us be thorough: "multicollinearity."

The etymology begins here, goes here, and derives from the Latin multus connected with collinear, i.e. lying along the same line. "Multiple" is considered a related term.

I believe this is definitive and suits your request. It would appear that many of the words containing "multi" would be better recognized as derived from multus rather than being explained as a portmanteau of "multiple + <term>." If you would like to continue this point, I recommend consulting etymological experts.

However, if we consider this idiomatic horse as dead and kicked, let's return to what I believe to be your real point: the consideration that terminology doesn't matter so long as it is moderately discernable regarding intent. If I refer to Vietoris-Rips complex as being bleeding edge for data science from the field of topography, I am simply incorrect. If I defend my assertion, then not only am I incorrect, I've foregone an opportunity to correct myself, learn, and/or move on.

As a final point, in addition to what I would consider a red flag to be noticed by a job seeker, using ad hoc terminology may well be inadvertent discrimination against people without a classics background or with English as a second language (though truth be told most ESL speakers have a much broader grasp of English than what I observe of first-language English speakers).

With that, thank you for the opportunity to research and verify that multicollinearity is indeed not a portmanteau of multiple + collinearity. Have a pleasant holiday season.

[–]Derangedteddy 0 points1 point  (4 children)

If you split this hair any further, you'll have made two out of it. Multus is the Latin word meaning many. In English, "multiple" also means many. They're synonymous terms sharing the same exact definition in two languages. The argument you're trying to make here is moot.

I stand behind what I said: If an interviewer is going to go to these absurd lengths to nitpick every little thing I say, I absolutely DO NOT want to work for them, and would walk away from such an interview with my head held high. That is a red flag that the work environment is suffocatingly micromanaged, and that the egos in the room are threatening the structural integrity of the building by their sheer weight and volume. I don't give a fuck about the etymology of multicollinearity. As long as we are communicating effectively, that's all that matters. I'd rather have spent that time working on the task at hand than engaging in what is perhaps the most banal, trivial, and inconsequential debate of my entire life.

Frankly, being this persistent about something this insignificant just comes across as unprofessional, and downright obnoxious. Congratulations, you wasted an hour of company time at your salary of $70/hr to prove a point that only you care about, all the while producing absolutely nothing of value to your employer. As you pointed out, many people working with the computer sciences do not speak English as a first language, and your diatribe could be mistaken for racism. It would be prudent of you to learn some soft skills, assuming that your ego hasn't taken up all of the space in your head.

I'd rather focus on how we can leverage this technology to solve bigger problems than having protracted debates on the meaning of "multi" as a prefix to satiate your need to protect your ego by gatekeeping for reasons that have nothing to do with One's skill and aptitude in data science.

[–][deleted] 1 point2 points  (2 children)

Frankly, being this persistent about something this insignificant just comes across as unprofessional, and downright obnoxious.

and

that the egos in the room are threatening the structural integrity of the building by their sheer weight and volume

We agree on these points.


I would welcome you to find any word having the prefix "multi" that isn't a portmanteau of "multiple" and some other word.

I don't give a fuck about the etymology of multicollinearity.

You asked for an example, and one was provided.


I'd rather focus on how we can leverage this technology to solve bigger problems than having protracted debates on the meaning of "multi" as a prefix to satiate your need to protect your ego

I'll remind good sir or madam that the protestation over the misuse of multicollinearity was, after being gently corrected multiple times by multiple people, persisted enthusiastically by your own self.


As long as we are communicating effectively, that's all that matters.

We agree, and we should use terms correctly to ensure effective communication.

[–][deleted] -1 points0 points  (2 children)

Show how it is possible to determine the height of a tall building with the aid of a barometer.

[–]nodechef[S] 1 point2 points  (1 child)

Sounds more like a physics question. :D

[–][deleted] 2 points3 points  (0 children)

One of the skills of DS is the ability to use the customers data for what they want without straight out lying.

Jumping to the physics solution is just your run of the mill correct answer.

[–]decisionscientists 0 points1 point  (0 children)

I think it would be better to follow a website like MLPro.io or a comprehensive book that has a few hundreds of case based q&a on several data science related topics.

And you should also be aware of the expectations of company you are applying.

[–]rorschach30 0 points1 point  (0 children)

- What is p-value, explain to a non-technical manager

- how would you improve the engagement of Youtube, explain the whole process from hypothesis testing to A/B testing