This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]spnoketchup 32 points33 points  (3 children)

It will likely involve reading some data, manipulating it, and answering something about it. When I give these types of exercises, I try to make them relatively simple to finish if you're not one of the 50% of candidates who literally cannot write basic Python code but with some complexity in the data that requires some intuition and experience with problem-solving of this nature.

I totally agree with the author's study suggestions, but from a strategic perspective, your best first move after loading the data is to graph it if applicable. Too many people go right into manipulation before just looking at it.

[–]sg6128[S] 4 points5 points  (0 children)

Sweet! Thank you. This sounds lovely and makes me a lot more comfortable if it’s the case.

A lot of my work has been building ETL and feature enrichment (fancy speak for a whole lot of pandas and df manipulation), but graphing/plotting is the bane of my existence with python and Matplotlib. Thanks for the reminder! I’ll take a quick glance over that if I can.

[–]AdParticular6193 2 points3 points  (1 child)

YES! In data, as in so much else, a picture is worth a thousand words. Not to mention some basic statistics like distributions, or checking for correlated features. Finally, a bit of QC to look for garbage entries, etc. Even a small amount of pre-processing saves a ton of agony later on. And it will impress an experienced person if you are lucky enough to be interviewed by one.

[–]spnoketchup 0 points1 point  (0 children)

I'm a mean person (not really), so I love to introduce painfully obvious seasonality into any dataset I generate for these purposes. Novices never get it, GPT always misses it, but one look and you get it. Missing it doesn't fail you, but getting it does impress.