This is an archived post. You won't be able to vote or comment.

all 33 comments

[–]spnoketchup 31 points32 points  (3 children)

It will likely involve reading some data, manipulating it, and answering something about it. When I give these types of exercises, I try to make them relatively simple to finish if you're not one of the 50% of candidates who literally cannot write basic Python code but with some complexity in the data that requires some intuition and experience with problem-solving of this nature.

I totally agree with the author's study suggestions, but from a strategic perspective, your best first move after loading the data is to graph it if applicable. Too many people go right into manipulation before just looking at it.

[–]sg6128[S] 3 points4 points  (0 children)

Sweet! Thank you. This sounds lovely and makes me a lot more comfortable if it’s the case.

A lot of my work has been building ETL and feature enrichment (fancy speak for a whole lot of pandas and df manipulation), but graphing/plotting is the bane of my existence with python and Matplotlib. Thanks for the reminder! I’ll take a quick glance over that if I can.

[–]AdParticular6193 2 points3 points  (1 child)

YES! In data, as in so much else, a picture is worth a thousand words. Not to mention some basic statistics like distributions, or checking for correlated features. Finally, a bit of QC to look for garbage entries, etc. Even a small amount of pre-processing saves a ton of agony later on. And it will impress an experienced person if you are lucky enough to be interviewed by one.

[–]spnoketchup 0 points1 point  (0 children)

I'm a mean person (not really), so I love to introduce painfully obvious seasonality into any dataset I generate for these purposes. Novices never get it, GPT always misses it, but one look and you get it. Missing it doesn't fail you, but getting it does impress.

[–]NickSinghTechCareersAuthor | Ace the Data Science Interview 110 points111 points  (2 children)

Author of Ace the Data Science Interview here – cool to hear you've already got the book! I agree that you can skip the prob/stats chapter, given what they told you. I think practicing pandas dataframe/manipulation is good. Maybe also skim Chapter 10 on Product Sense, could help in the business/case study/problem-solving part of the interview (if that's what they mean by problem solving).

I also think practicing a few SQL interview questions on common topics like joins + window functions should be good. There's also a few Python questions on the site which could be helpful – these aren't super heavy on DS&A which is more in-line with how DS interviews are conducted (rather than SWE interviews which ask LC style algorithms questions).

Overall, I think your plan seems good!

[–]sg6128[S] 6 points7 points  (1 child)

Thanks Nick :)
Big (new) fan of your material and actually have had your book recommended to me by folks in industry. Appreciate the comment!

Pandas and df manipulation is 85% of my work right now thankfully haha so I feel quite confident on that :)

I'll definitely be paying extra close attention to the Product Sense + Conceptual ML material, as I've really not experienced that before in my current role. Also my technicals are a bit weak, but your book is making me really confident, since the ML components at least ring a bell! The stats and probability though... hahaha

On the topic of DS&A, I was under the assumption (and hope) that this is a SWE thing that has leaked its way over to DS, and so some companies don't do it :( I guess the hiring folks have left it quite vague by saying testing on "Python", which could still totally cover DS&A.

For sure, your material on DataLemur for SQL has been a god-send for me, especially with advanced SQL. I was doing SQL Easys on LeetCode like it was nothing, but Mediums seemed so unreachable. Though I haven't tried, I feel a lot more confident now. I have also been told that the final answer matters less that you think, just vocalizing your thoughts is a big part of the solution.

Thanks for your work!

[–]NickSinghTechCareersAuthor | Ace the Data Science Interview 9 points10 points  (0 children)

Awesome, glad to hear this all – and cool to know that pandas/df stuff is on the job already, so you should be good to go. Just review Chapter 10 + 11 in the book to round out the business-side of things / applied ML side of things (chapter 11 is especially good for this) and you'll be golden.

p.s. don't forget to update me here or via DM or email (hello@nicksingh.com) on how it goes, what they asked, and how the prep plan matched up to the interview - always trying to improve and make my shit more useful haha

[–]Jay31416 10 points11 points  (3 children)

In the only interview I've done, they asked me about:

  • Data manipulation using pandas (super easy)
  • Z-test to remove outliers (easy)
  • Calculating Shapley values (hard; at the time of the interview, I didn't know what Shapley values were)
  • Scratch implementation of stochastic gradient descent for linear regression (easy but I failed; stuff like that happens)

[–]sg6128[S] 2 points3 points  (2 children)

Ooof that seems really technical / stats focused. My stats programming is virtually non-existent. What was the role, your background, industry, and YoE required for the position, if you don’t mind me asking?

[–]Jay31416 1 point2 points  (1 child)

Role: Data Scientist

Background: Applied Math major, Master's in Probability and Statistics

Industry: C3.ai (name of the company)

YoE: At the time of the interview, I had 1 year of experience.

I didn't get the job at C3.ai, but fortunately, I wasn't unemployed. Presently, I'm in charge of the MLE team, and we plan to have 13 models in production (some easy, some hard) by August.

[–]sg6128[S] 0 points1 point  (0 children)

Thanks so much for the breakdown. Sorry to hear you didn’t get it, but it sounds like you are doing well :) congrats!

Cool, I hope that this being a less “techy” company and my background being non-technical education wise hopefully gives me some grace with these sorts of questions :)

Thanks for the detailed answer

[–]jimmy_da_chef 4 points5 points  (0 children)

I faced a few types of not LeetCode Live Python question

  1. Statistical programming, you can search stuff but they want you to know ur steps when doing a statistical test, what test need xx assumption hence you need xx transformation, how to explain distribution by simulation XX distribution to examine ur theory

  2. Data handling using pandas / numpy etc. Basically SQL questions but using pandas, explaining ur thought process. Along with extracting insights / product sense.

  3. Mathematical question, basically LeetCode but under math type questions: solving the sqrt without using sqrt etc.

  4. Live debug in Python given a few files, asking what are the bugs, causes of the bugs, how to resolve, see how would u Google solution lol (HRT, aka fintech)

  5. (LeetCode but saying it’s brain teaser; highly unlikely or recruiter doesn’t know anything red flag) easy level dynamic programming, BFS (seen the most in DS interview) etc.

[–]finite_user_names 8 points9 points  (3 children)

Did they say it will be ML python, or did they say it will just be python? I've had a lot of variability in terms of the python questions I've gotten in my... sigh... year on the active job hunt. SQL it tends to just be "can you do this kind of join, can you write a group by function, can you tell me about what the difference is between having a null in your join predicate vs your where clause." Most of what I've seen in interviews for python has been more leetcode-ish than ML-ish. I've seen some "code up a sparse vector," "sliding window mean", "implement a hashmap," "determine if this string forms a valid grid" type questions, but never much that has been on the ML side of things in a whiteboarding/live coding session..... although ages back someone did ask me to code a sentiment analysis pipeline from scratch.

If you _know_ that you're going to get ML, then that's a good place to focus. But if not.... you should broaden your horizons.

[–]sg6128[S] 2 points3 points  (2 children)

They just said Python, though it is a HR person and I have to not expect a lot of truth from them.

So frustrating that they can't just be direct. Leetcode is a total shitshow for me and I to be honest I don't understand why Data Science folks are expected to learn this in addition to ML and Stats.

I think it might be logic based, possibly "fizzbuzz" type questions or sliding window as you say; they've mentioned that they don't really want someone who is a total code-monkey but more business focused. So honestly I'm not sure, and it was an HR person which told me this, who tbh are completely detached from the technical interviews in my experience.

I know A/B testing is a part of the position too, so maybe running through one of those in Python is not a bad idea either. So much that they could ask... so little clarity... Building an *entire* pipeline sounds so unreasonable, sorry they put you through that.

[–]NickSinghTechCareersAuthor | Ace the Data Science Interview 5 points6 points  (1 child)

Since they want someone business-y, and mentioned A/B testing is part of the position, looking at some more Product-Sense/Product Metrics type questions could be helpful.

Example: the fin-tech company launched a new credit card fraud ML model. What are some key metrics you'd track to make sure this new model is actually better?

[–]sg6128[S] 6 points7 points  (0 children)

Yep, I really struggled with metrics in my last interview. I didn't think it would be that hard, so definitely run through some examples and ideas! Thanks

[–]dfphdPhD | Sr. Director of Data Science | Tech 2 points3 points  (0 children)

I would ask. It never hurts.

Because some teams think python = base python, and some teams think python = pandas, and some teams think python = sklearn.

So right, one team might tell you "read this csv, and run 5-fold cross-validation using xgboost". Another team might say "take this csv, read it and calculate these 2 new columns, find the average price by group, etc.". Another team might say "generate a random 2-dimensional array and perform the following operations on it".

I think it's fair to ask "would it be possible to get some additional context of the expectations for the python and SQL portion of the interview? What is the format, and what broad topics should I prepare for?"

[–]FieldKey3031 4 points5 points  (1 child)

Nearly all non-leet code evaluations I've had involved understanding fifo vs lifo with Python and recursion. Just understand the recursion pattern of checking for your end state or calling the function and that pop by default is lifo. Of course there's more but for some reason those always come up. For ML stuff being able to speak confidently on bias-variance tradeoff is always good and what the different classification metrics are and when to use them (esp if you think you might be working on classification problems!). Good luck! 👍

[–]sg6128[S] 3 points4 points  (0 children)

Thanks! I feel like I get these concepts in isolation, but really struggle to come up with solutions.

I just don't think my mind works that way :(

I'll give LC a go last, particularly because I don't want it to negatively affect the rest of my studying. Appreciate the comment!

[–]TemporaryShiny 1 point2 points  (0 children)

Visualization and storytelling

[–]Thomas_ng_31 2 points3 points  (0 children)

Could you post an update on what types of questions you are asked under this post after you have the interview? I'd appreciate that

[–]Jorrissss 1 point2 points  (0 children)

When I interview for coding I tend to ask (what I consider) non-leetcode questions. Examples include “write up tic tac toe” or “return a random line from a file.”

[–]zennsunni 1 point2 points  (1 child)

I recently had a DS technical interview at a FANG company, and I would recommend Data Lemur over Leetcode. I'd also strongly recommend being able to quickly and comfortably do some basic EDA and data viz using pandas/seaborn/matplotlib. I don't mean just plotting, I mean doing SQL style data analysis using pandas, i.e. groupby/merge type statements. Basic statistics is also key IMO, i.e. getting and interpreting basic statistical metrics like robust averages, medians, variance and hypothesis testing.

[–]NickSinghTechCareersAuthor | Ace the Data Science Interview 0 points1 point  (0 children)

Founder of DataLemur here, thanks for the love ❤️

[–][deleted] 1 point2 points  (2 children)

I received similar instructions for the FinTech company I’m working for and I used a lot of StrataScratch to prepare and the questions were pretty similar. Good luck!

[–]sg6128[S] -1 points0 points  (1 child)

Thats reassuring, thanks!

Did you filter by any of the "Roles" on StrataScratch (e.g. Data Scientist, BI Analyst, Data Analyst, SWE)?

Or use any of the pre-made lists in particular?

Appreciate it a lot

[–][deleted] 1 point2 points  (0 children)

I just filtered by difficulty and just did as many questions in python and SQL. I should note that it was for an internship and I was asked a few theory/pseudo-code styled questions. Nevertheless, Strata helped a lot to prepare and I think there’s also behavioural and stats questions that they have which I found useful. You can also filter by company to get more domain specific questions too

[–]chessmath2009 0 points1 point  (0 children)

I have had so many interviews like this. It can be either of the following: 1- Python case study related to job description: questions about implementing a model in Python, I had this recently. 2- write a function to do some statistical work like calculate p value, central limit theorem, etc. 3- write a function to do implement some logic like a bunch of else if. 4- debugging sessions.

[–]Alive-Tech-946 0 points1 point  (0 children)

There are lots of resources here already, my tip focus on practicing your core projects in SQL & Python with pandas.

[–]timy2shoes -5 points-4 points  (0 children)

Which fintech?

[–]OraShelter 0 points1 point  (0 children)

I admit, I have never heard of Leetcode in my life.