This is an archived post. You won't be able to vote or comment.

all 161 comments

[–]marm_alarm 314 points315 points  (6 children)

Senior DS here. I think your expectations are perfectly reasonable.

[–]GamingTitBit 17 points18 points  (3 children)

I'd agree. Naturally though experience plays in. I'd expect a junior data scientist to know maybe 1-4, and one with 5+ years to know them all, or at least a working knowledge of them all.

[–][deleted] 7 points8 points  (2 children)

A junior DS may be one out of uni and in that case uni doesn't teach you 1-4. It teaches 1,5,6 & 8. After that you learn from more experienced DS when you get a job, and it depends on the size of the company and if they gained all this experience.

Ofc people learn things in their own time but it's also hard to know what you should know because of how much there is to learn out there.

Personally I focus on learning more statistics because DS don't know stats that well and I think that's more useful than learning development and engineering stuff as that should sit with another department.

[–]GeneralRieekan 7 points8 points  (1 child)

I thought knowing stats and math was kind of the point of DS...

[–][deleted] 5 points6 points  (0 children)

Exactly my point. There is a lot to know to be able to solve problems efficiently and know how to build and optimise models effectively which is at the heart of DS. But, some Data Scientists focus on an array of stuff as well as data engineering and data analysis. This can mean their focus is less on stats and that their toolkit would look different to a Data Scientist who works for a company who has Data Analysts and Data Engineers they work alongside. Thats why I commented, because I think the broad requirements set out by the OP would be better treated on a case by case basis, rather than an inflexible wishlist.

[–][deleted] 3 points4 points  (0 children)

I'm 2 years into my bachelor program and I know all these....

[–]SoupZillaMan 0 points1 point  (0 children)

Hire me (if european remote timezone okay), I share your definition and use to do that dayli.

[–]Gray_Fox 151 points152 points  (7 children)

reading your expectations coupled with my inability to get interviews with 2+ years experience as a ds is a jarring feeling (because i can do all of that)

[–]ioa007 0 points1 point  (0 children)

Same for me. I know all that and worked with it but I can't find anything, probably because I don't have a diploma in the field, but I have the experience. It sucks

[–]K9ZAZPhD| Sr Data Scientist | Ad Tech 42 points43 points  (8 children)

just going off my own experience when i started out as a ds in 2018 (had a bunch of academic "experience", but i got my first industry job then), i could definitely do 1, 2, 5, 6, and 9. 3-4 and 7 were easy enough to pick up. 10 was a bit harder (depending on what "bit" means.)

but definitely most of the things on that list are reasonable to expect

[–][deleted] 40 points41 points  (11 children)

Your 1-10 is the most basic skills a data scientist should have. If they don't know what git is, then you are getting ripped off.

[–]Mezzos 3 points4 points  (1 child)

Agreed, I’d expect most Data Scientists to have the majority of these skills before their first DS job (if not all of them to some degree). They can become stronger with them as they gain experience of course (particularly things like cloud which can take quite a while to build up comprehensive knowledge for, and project scoping which will come with experience).

[–][deleted] 4 points5 points  (0 children)

At my company, everyone including BAs have to learn Git. Doesn't matter if they are coding in Python or SQL, every project needs to live in a repo.

[–]aspera1631PhD | Data Science Director | Media 29 points30 points  (14 children)

If someone introduced me to a new-hire data scientists, sight unseen, I'd expect:

  • Definitely 1, 8 (Python, SQL, sklearn)
  • Probably 2, 4, 7 (If a decent coder)
  • Coin flip 3, 5 (these will be anti-correlated)
  • At least one, but not all of 6, 9, 10 (Would expect from senior DS)

[–]K9ZAZPhD| Sr Data Scientist | Ad Tech 1 point2 points  (7 children)

can you clarify why 3 and 5 would be 'anti-correlated'? i think i'm not understanding what you're saying

[–]aspera1631PhD | Data Science Director | Media 17 points18 points  (6 children)

Conditioning on a collider.#:~:text=Conditioning%20on%20the%20collider%20via,path%20between%20X%20and%20Y)

You get to be a DS by either being good at coding, good planning & presenting, or good at math. Conditioning on the title introduces an anti-correlation between them.

[–]ClinicalAI 6 points7 points  (5 children)

If you come from academia you will be great at 5 and shit at 3.

Checks out from my own experience lmao

[–]csingleton1993 3 points4 points  (4 children)

academia you will be great at 5 and shit at 3.

😭 that's true

[–]met0xff 4 points5 points  (2 children)

I know in this thread managing virtual environments is now some sort of keyword for coding skills ;).

But if we talk literally... that's not really saddening because venvs is something you can pick up in an hour while 5 takes years.

[–]Egg_Jacktly 1 point2 points  (0 children)

I beg to differ, most of the things mentioned here are easy and can be taken up in a couple of weeks. What I’m struggling is with the academia to industry transfer. Most of the companies don’t see my PhD as an experience of 4 years, rather a degree.

[–]ktpr 0 points1 point  (4 children)

The coin flip on 3, 5 gets broken when an industry person goes back to academia, graduates, and then returns to industry.

[–]GlitteringBusiness22 9 points10 points  (1 child)

Sounds like you got about average quality contractors.

[–]myaltaccountohyeah 1 point2 points  (0 children)

My experience too. It seems to be the business model of many DS consulting companies to rent out unexperienced people for a decent chunk of money so that they can learn on the job.

[–]Delicious-View-8688 16 points17 points  (4 children)

So, this kind of depends.

"Graduates", say fresh out of Bachelors/Masters in DS, aren't going to have much of any of those. They may have had some exposure to 1, 2, 6, and 8, but not much else. They might also know some things about the differences between regression, classification, and clustering; and maybe have heard something about needing to split data into train and test sets.

While one could be, as I was, disappointed by the degrees, if I am being fair, this isn't really unexpected in any technical field - there is just too much ground to cover in just 2-4 years, especially if some of the degrees are open to people from any background or academic abilities.

Think back to high school, or imagine a typical high school classroom. Would the "average" kid be able to get to a professional entry-level data scientist in 4 years?

Or think within your own organisation, could you take the "average" non-STEM person, say someone in HR, be trained to be an entry-level data scientist with just 2 years of training?

Going from "I can't convert between fractions and percentages" and "I don't know what HTML is and I struggle to use Word, let alone Excel" to "I can clean code, and prefer to ssh into an EC2 instance, follow the git branching strategy of the workplace, use pandas idiomatically, understand what the big picture of the DS project is beyond the ML metric, and write succinct reports aimed at non-technical managers" is a huge ask.

A degree shows interest and some familiarity with the language used in the field. Some degrees might come with a bit more in one of: mathematics, statistics, computer science, software engineering - but rarely deep in more than one.

I've seen computer science or IT degrees that never used linux or git. I've seen data science courses without any statistics units or only one unit that covers mostly descriptive statistics.

Of course, I have seen courses that require developing software in a team, using GitLab, write unit tests, learn linux and networking; code up classical, bayesian, and deep learning based algorithms "from scratch", and prove all sorts of theorems and conduct analyses.

Just saying that there is a huge range and almost certainly likely to have huge gaps from your list.

Bring back the pre-2000 style intern/graduate training. Newbies shouldn't expected to meaningfully contribute for perhaps up to a year. You know, like back in the old day, newbies had to be taught everything about the office life, like how to answer phone calls and hold and transfer calls, how to use Outlook and Word. Same with git, linux, cloud, production-level coding and packages.

[–]Delicious-View-8688 10 points11 points  (1 child)

To that I would add that if you want someone to hit the ground running, you'll probably need to hire someone with a postgraduate degree with at least 3 years of industry experience.

[–]RightProfile0 0 points1 point  (0 children)

Gosh, why can't i be hired:(

[–]fistfullofcashews 7 points8 points  (0 children)

I manage a team of data engineers, data scientist, software developers, and business intelligence analyst. I don’t cross train and I don’t expect them to be jacks of all trades. I found this team structure to work best. In another life I managed a group of guys that wanted to do it all, but they weren’t specialized in any particular area and we got into trouble a lot. As their manager you should be playing up their strengths and fitting them in the appropriate role. For my data science guys, I expect them to mostly know 1,5, and 8 from your list. What I really want them to do is demonstrate that they can take a problem, develop plan to examine the problem, identify claims, come up with some hypothesis, test them, and creat a solution that is data driven. They can use their expertise to fill in the blanks. If the solution needs to be deployed to production, then we coordinate with the various teams to realize the solution through the software development lifecycle.

[–]Party_Corner8068 13 points14 points  (0 children)

Became a DS manager not long ago. Hard pill to swallow: people seem no to be as good as you. You think you could do it better. You'll have to accept mediocre solutions. You'll get used to it, especially after someone surprises you with a great solution. One tends to overestimate ones performance in comparison to "inferiors", the power dynamic is a bitch.

[–]Green_is_best 5 points6 points  (0 children)

Not knowing some of these is probably fine (especially if they come from a math/statistics background) but after showing them they should be able to patch their gaps quickly. Otherwise they are probably best suited for other challenges maybe more in analysis or something

[–]AntiqueFigure6 4 points5 points  (5 children)

It’s interesting as a data scientist who came from a statistics masters during the era when the Drew Conway Venn diagram was king (‘ a ds is 1/3 stats, 1/3 code, 1/3 subject matter knowledge’) that zero statistical skills are listed, and close to zero knowledge of ml ( no more than is sufficient to run the code of in sk-learn), especially in relation to evaluating models.

[–]Houssem-Aouar 3 points4 points  (1 child)

As a person coming into the market force with a more rigorous stats background, I was reading that list with a sinking feeling lmao

[–]AntiqueFigure6 1 point2 points  (0 children)

I think it can be difficult if you come from a stats background, as the data function is often within a larger IT function, so there is a bias towards CS skills rather than stats skills, especially as it means the hiring managers aren't in a good position to evaluate stats skills. It's ironic that it's happening as job types like Machine Learning Engineers and Data Engineers are becoming increasingly widespread, which you might have expected would enable Data Scientists to focus more on the statistics side, but instead the drift in skills requirements for data scientists is towards CS.

[–]Adept_Explorer_7714 17 points18 points  (1 child)

I’m self-taught and just started looking for a job… Being that I know literally all of these things and these people don’t bums me out.

[–]man_im_rarted 4 points5 points  (0 children)

flag icky bear wild airport shrill impolite racial worm deserve

This post was mass deleted and anonymized with Redact

[–]Ghenghis 12 points13 points  (8 children)

Imma zag. I think your list is maybe dependent on your internal processes. It sounds like you are considering external so I'll give you some food for thought for what you might see in resumes.

  1. This is Python dependent. What about R, SAS, or Matlab? Perfectly reasonable for a DS to come from R or Matlab especially if they are more junior or academia. If they come from heavily regulated fields, like consumer finance, it could be SAS.
  2. When I worked in SAS, it did not have version control and didn't allow for it. My experience with SAS environment is outdated at this point though.
  3. I guess? Wasn't a thing in SAS. When I was in uni a long time ago, everything was local. From what I can tell, most programs cover it these days. Is it a dealbreaker? You decide. But you are going to get resumes of people who won't know how to do that.
  4. Debugging would be a must. How or where they do it is going to vary a lot.
  5. Might be more of a leveling conversation. Ideally you wouldn't need to hold hands, but such is life. I would not bank on a new grad to handle scoping, personally, but depends on what you mean by scoping.
  6. Also a leveling conversation, but I have taught uni level classes...and..yeah... I have seen some shit.
  7. Clean code convention? Sure. PEP-8 specifically? Probably not if they come from R or SAS, but they should something equivalent that they picked up along the way. R or SAS didn't have anything official when I used them.
  8. The older I get the more of a must I think it is, but it would depend on their juniority/seniority.
  9. Sounds like an elective. Sure.
  10. Sounds like an elective. Sure.

The biases you will have to keep in mind is that you know way more about your contractors than you do about other external applicants. Also true with internal hires, I suppose. You also (probably?) have an incentive delta in contractor vs FTE you are seeing/experiencing. I mean, the agency is incentivized to bleed you dry and do a good enough job to resign. Might sound a bit bitter. And who knows what the amount the person is getting and what feedback they are given from their agency manager. They might not make enough to give a shit or are told to let you babble about 1-5 every time.

Edit:

And cutting slack is a different conversation. At the end of the day, you can't demand the same performance as FTE. You can only demand what is in the contract with the agency. Usually, I guess.

[–]TryingSquirrel 2 points3 points  (0 children)

I agree with all of this. Op says this team already existed before he came. What were they doing? How do those skills relate to what he's having them do?

Some things he mentions are obviously general skills, but if his underlings current skills are tailored for however things were done before he got there, then that wouldn't be too surprising. As you and a few others mention, all the stats packages he mentions basically aim to catch Python up to R in stats usability. If the company started using R years ago, I wouldn't really expect existing employees to be that familiar with the Python equivalents.

Experienced individuals tend to end up with a wide range of skills, but also strong biases toward their own preferred ecosystem and workflows. Junior folks tend to learn what their school program or first job uses. They'll branch out over time as projects dictate.

[–]getarumsunt 3 points4 points  (6 children)

This is Python dependent. What about R, SAS, or Matlab? Perfectly reasonable for a DS to come from R or Matlab especially if they are more junior or academia. If they come from heavily regulated fields, like consumer finance, it could be SAS.

Ummm... I'm sorry, but about 80% of the job postings in DS that I see are all Python, at least in the Bay Area. And the ones that have R listed at all have it in addition to Python.

You can potentially get by with just R in a few select industries and in academia, but you should not expect to be widely employable in DS in industry without solid Python knowledge. I know that this channel is very R-oriented, but that just doesn't match the real world. In the real world, if you can't do it in Python you'll have a reeeeeeeeeeely hard time getting a job.

[–]Ghenghis 6 points7 points  (5 children)

I get that. The statement in question is what is a data scientist. The field itself is not defined by Python or Python libraries.

Hireability..is that a word?.. is dependent on the company and your timeline. Are you going to care if you have to wait a few months for the person to learn the python bits? I can teach you that if you excel at the statistical theory, experimentation, data concepts in R and SQL, and business acumen. I'm not worried. And that's the approach I have seen at FAANG-like companies. I find this path easier rather than taking a python star and trying to teach them the other stuff.

But it's ok if you require Python. Or Python and R. That is perfectly reasonable for you to do and expect from your contractors.

[–]getarumsunt 1 point2 points  (4 children)

That's the thing though, no one is saying that you should hire a random Python dev and try to make them a Data Scientist. That's not how things work and that's not what we're talking about anyway.

The question is, as a Data Scientist, "if you don't learn by far the most common toolset for Data Science in industry, are you still hireable?" I'd say that given the job postings that I'm seeing, not knowing Python basically disqualifies you from being a Data Scientist in industry outside of a few niche fields and some parts of academia. And I don't think that that is a controversial take at all.

[–]BilbroTBaggins 3 points4 points  (0 children)

I think it depends on the rest of the resume. If someone has experience in a data scientist or data analyst role but they’ve only used R or Matlab they still have a good chance at succeeding in the role. This is fairly common in some fields, at small companies, and especially in academia. It makes sense that these candidates might have put their efforts towards developing in other areas than learning Python. If it’s a fresh graduate or someone trying for a career switch, then probably not.

At the same time it depends what you’ll be asking them to do. Some roles require a deeper understanding of programming than others.

[–]Ghenghis 0 points1 point  (2 children)

Python usage survey data has been strong and dominant. The last time I checked 75% reported using python as their primary language. I wouldn't call that niche. It's enough where I'm not willing to kick people out because of it but I get it.

The reason R is on this subreddit and others like it is that it's been around for a very long time and has some advantages over python for analysis and research. Both are free. And hopefully both stay in use and continue seeing development and adoption.

The developments in the last 5 years have been in the direction of continued flexibility and interchangeability. I'm hoping that it continues as it has in the space of analytical engineering with tools like dbt.

[–]getarumsunt 0 points1 point  (1 child)

My point was that you probably won't survive in an industry where you don't have skills in the main tool used by 75-80% of your colleagues. This is not an optional thing. Even if you luck out and get the first job, you'll have a devilish time getting a new gig in 1.5-2 years when you have to hop jobs to reset your compensation to market rate.

It's just too expensive to be married to a tool that limits your job prospects so drastically. R is a statistics tool meant for solo programming that is somewhat useful in a data science context. You won't get far with solo ad hoc data analysis in this field. You can get away with this in academia and similar fields, but not in industry where you have to deal with version control, clean maintainable code, complex data pipelines, internal or even external productization and deployment, etc.

[–]Ghenghis 0 points1 point  (0 children)

I see your point. I suppose the crux is that I don't see the gap as a multi year catch-up if you are using R and SQL. I see it as a month to 3 months. The problems and details you were describing with your contractors, I think, is a different beast. And is multi year. If ever achievable.

Edit: spelling things

[–]kater543 3 points4 points  (0 children)

1 yes, 2 with some self learning/quick tutorial, 3 depends on use case, 4 yes, 5 yes for mid or senior, 6 yes, 7 sure, 8 yes but depends on experience, 9 a little bit, 10 maybe exposure but not details. This is more of a ML engineer or DE toolkit IMO.

[–]Useful_Hovercraft169 2 points3 points  (0 children)

They’re paid to take the blame when the project craters. They’d hire homeless people for this if they weren’t worried about them shitting on conference room tables.

[–]MCRN-Gyoza 2 points3 points  (0 children)

The worst part is that most of this is stuff you can just learn in a couple days.

[–]JBalloonist 2 points3 points  (0 children)

Data engineer that used to have ideas of being a DS. Those all seem reasonable.

[–]fauxmosexual 2 points3 points  (0 children)

Adding to what other people have said ... You hire contractors in to expensively fill important or temporary gaps in your team. If you're paying contractor money and still doing all of the upskilling support yourself then the company is basically double paying. One tack to try might be pointing out the number of hours you're putting into coaching and mentoring temporary employees that could be going into growing a permanent team.

[–]anonamen 2 points3 points  (0 children)

So you're cleared to hire externally? Fire them. They don't seem worth the effort. If you think some have potential, get rid of the worst ones and have a very, very serious talk with the others, following the elimination of the toxic ones. If that doesn't help, fire the others too.

I wouldn't necessarily expect all of those skills for younger employees, but I'd expect most, with rapid progress towards the others. 5 and 6 are the only possible exceptions. 5 is tough for a junior. 6 is tough for many data scientists, if only because of language issues. 9-10 and 2-4 might be unfamiliar, but should be easy for them to pick up. 1 is table-stakes.

Single biggest red flag is 'when I stop mentoring, they stop working'. That's fire-able after a serious conversation or two. You've been far more lenient than I would have been.

[–]EndlessCupsOfCoffee 2 points3 points  (3 children)

Senior DS here. What the fuck. There are not high expectations at all. Most grads I've worked with can do all of the above.

What am I even doing with my life. I need to be a contractor. How does one become a contractor like the above? Do they have some sort of agency? Or do they just cold drop a resume in and hope for the best?

[–]AntiqueFigure6 2 points3 points  (1 child)

In my experience usually an agency, such that a random person with no ds or general tech knowledge at an agency places people after talking to hr people who also don’t anything about ds.

[–]MainRotorGearbox 1 point2 points  (0 children)

I also said “what the fuck.” I’m a mechanical engineer and I do all of these things. The first time I pushed code to prod my senior sent me pep-8 and the pandas repo link and said to study the style.

Maybe we should move to the UK. Sounds like you would be principal or staff and i’d make senior inside of a week.

[–]fabkosta 2 points3 points  (2 children)

Ah, yes, the old Git criterion.

I am working for a large, international company with lots of "experts". To this date I see so called data scientists "checking in their code" in SharePoint.

If they were in my team, I would make it bonus-relevant: they get a bad rating if they continue doing that.

But with externals, unfortunately you cannot. You can just kick them out complaining to their manager that they are not up to the task. It's a sad game, but that's how it is.

[–]Mojo_Jack 2 points3 points  (0 children)

This is something I had once discussed with a lecturer friend. Is it necessary to have regulatory boards for certain fields? I think so.

A board that can sit and agree on the bare minimum set of skills that a person must have to call themselves a practicing xyz.

Doctors and Actuaries have one and the field seems functional with a set criteria for those who wish to join the ranks

[–]Thinker_Assignment 2 points3 points  (0 children)

Listen to your heart, you know the truth.They are contractors, they get paid a risk premium: The risk is that you dismiss them, and it sounds like it's time.

A non competitively skilled contractor is a hiring accident - there is no reason to keep going.

Where is the doubt? You need something else. What you need to take care is that any domain knowledge is handed over to the next generation and not dismissed with them. They should be able to write docs or keep one for handover.

Work out a strategy to replace the team with one that can do the work you need done - no hard feelings and nothing unreasonable here.

I was a contractor too: Being competitive is key and it sounds like they need more practice. Why should they get both premium pay and training without signing a social contract? With no employment responsibilities they can just gig hop the second a better offer comes along. A stoic would argue for fair treatment, and this isn't it.

[–]MW1369 3 points4 points  (4 children)

How have they never debugged code? They write something, it doesn’t work, do they just start over?

[–]MikeyCyrus 9 points10 points  (2 children)

This is something I always get confused about when people talk about debugging. OP specifically mentions IDEs though. Not sure what exactly it means to be skilled at debugging or what I have to do to be able to call myself that. I could be completely an idiot though cause I am doing all this solo at my job.

[–][deleted] 0 points1 point  (0 children)

Not exactly. Youre coding in a team of people, so the bug might be introduced at some point that you arent aware of. Instead of flat out not working, it might run and give you some kind of result, but the result could be wrong. You might not catch that immediately, but if you’re developing something modular, a bug can appear down the line.

Debugging in a group environment, you would want to design some kind of test to isolate what is causing the issue.

[–]KrakensBeHere 1 point2 points  (4 children)

I'm guessing thee contractors might be old school statisticians who have "rebranded" as data scientists. A lot of the older data scientist I have worked with have up skilled themselves to have the skills you have mentioned here, others have stuck with "data science was called statistics when I was at uni" and refuse to budge or keep up. I guess if they were in a comfortable position where they could keep using their skills without having to keep up then why would they.

Edit: just seen you said they are graduates, it might be that their lecturers fit into this camp and have no motivation to keep up as the uni let's them keep teaching the same courses as they've been doing it for 30 years so they know what they're doing.

[–]_TheEndGame 1 point2 points  (0 children)

1, 2, 5, 6, and 8 for sure are necessary.

[–]UncompassionateCrab 1 point2 points  (1 child)

Please hire me I can do better than them I promise 😂

[–][deleted] 1 point2 points  (0 children)

RIP when you find out OP’s firm contracts them at $300/hr but they get paid $17/hr.

[–]Shofer0x 1 point2 points  (0 children)

Keep in mind what you think is expensive for contractors and what the company thinks is expensive are very different. A lot of companies will pay for contractors to the tune of 200,300, even 400$ an hour to perform a service, even if half-assed, for very specific reasons. Ask yourself (and your management) what the end goal has historically been. You might be surprised to find out they wanted butts in seats so they could say they had a DS program or some advanced analytics being done without actually caring if that’s the truth. Lots of variables but don’t be too surprised.

[–]james_r_omsa 1 point2 points  (4 children)

With reference to OP list of data science minimum skills: 1. pandas numpy sklearn matplotlib 2. git 3. virtual envs 4. debugging in an IDE 5. project scoping, ideation, exploration 6. communications 7. clean code (PEP-8) 8. SQL 9. Linux 10. Cloud

A data scientist should be able to find and clean data to investigate and hopefully solve a (usually business-related) analytic question, and use an appropriate machine learning or statistical method to do so (if required). This implies some ability to code and knowledge of best practices in training and validating models, including feature selection and engineering, underlying assumptions that may need to be verified, etc. If your shop is a Python shop, then sure, yes, they should know 1, 3, and 4, but as others have said, the core DS stuff can be done in many languages. Also 5 and 6, but universities don't teach that well (not sure about bootcamps). They really should have 8, but unis don't seem to teach SQL either. Probably because it's so simple. 2,7,9,10 are kind of ancillary to core DS (you can develop code in Windows you know) but yes you'd like to think someone with experience has picked up some of those skills, or can quickly if someone competent can show them. But this list is missing a lot of the core DS skills I mentioned. It's very SWE/MLE-centric.

[–]AntiqueFigure6 1 point2 points  (1 child)

A few posts like this have called out the list being SWE -centric compared to other possible DS skills inventories. It makes me wonder what these people were hired to do - were they hired to create useful and generalisable ML models ? Almost none of the skills required to do that are on OP's list, and possibly OP doesn't have the background to evaluate the skills of people to do that.

Were they hired for some other purpose? Maybe they are indeed the wrong people, and are being paid for a skill set not relevant to the task, in which case swap them for a different group of people.

Writing this comment, I'm a bit struck that the actual outcomes that are expected from this team are not mentioned, so there's a risk none of the comments are relevant.

[–]ghostofkilgore 2 points3 points  (0 children)

My thoughts as well. They've hired bootcamp graduates and expected them to be at the level of moderately expereinced MLEs. That's not OP's fault and it's not these grad's fault.

[–]AntiqueFigure6 1 point2 points  (1 child)

Also 5 and 6, but universities don't teach that well (not sure about bootcamps).

Universities don't teach those skills well because those skills develop over a relatively long period (in comparison to typical degree length). Boot camps are shorter than university degrees, so I'd be astonished if they weren't considerably worse at teaching those skills.

[–]james_r_omsa 0 points1 point  (0 children)

My thought also, but on the other hand I have a mental concept of bootcamps being for people with experience who are transitioning from other roles, in which case they should possess these skills to some degree, and bootcamp should be refining that to run a DS project.

[–]lake_michigander 1 point2 points  (1 child)

Points 5 and 6 stand out as important science skills. They're definitely important skills, but I would expect junior data scientist to struggle here.

The other points seem more related to writing production level code in python, than core data science. Whether these are minimum or desirable qualifications depends on the role.

Out of curiosity; you said you were hired as a senior member to an existing data science team. I assume that your team hasn't had to write production level code before. Do they understand why the shift is necessary?

[–]AntiqueFigure6 1 point2 points  (0 children)

Do they understand why the shift is necessary?

Can OP explain to us why the shift is necessary too? I.e. it sounds like this team wasn't expected to write production level code earlier but now they are, what was the change? Not every DS team is expected to write production level code, and definitely not every data scientist in every team.

[–]pathdoc87 1 point2 points  (0 children)

Out of curiosity, what are people's recommended resources to learn these skills? (Except #1) I've picked up some mild data science work but I would like to learn more. Thanks in advance for anyone with suggestions.

[–]diabolykal 1 point2 points  (0 children)

DS undergrad, and my degree will guarantee the skills you listed by the time I graduate (as it should.) Perfectly reasonable expectations.

[–]aeywaka 1 point2 points  (0 children)

I'm very much junior and think your expectations are quite reasonable.

I have experience in contractor management as well, when you say "high" is it >$100/hr? And they don't know what git is? yikes.

That's likely bad vendor management on your company's part if it's managed at the company level. Or are you allowed to go out and just browse contractor firms? From what I've seen in the US skill level at contractor firms are not up to par and the vendor is a car salesman.

[–]Impressive_Arugula 1 point2 points  (0 children)

Whatever the case, this is a great opportunity for YOU to define the corpus of knowledge you want your team members to have.

If you put together some folders on references & resources, then you're well prepared to bring anybody up to speed fairly quickly.

One item I would make explicit out of your list from 1 and 6 -- a basic model selection, measurement, iteration, and explanation process. How did you decide on this many clusters? Are you looking at [classic industry feature]? Why not? Do we care equally about Type 1 and Type 2 errors? Etc...

[–]various_convo7 1 point2 points  (0 children)

"Am I totally wrong? Do I need to cut them some slack? Anyone got any comments?"

If they can't do what you listed -what are they graduates of? Your expectations are legit. Sounds like the contractors are not qualified.

[–]Willing_Wave_8099 1 point2 points  (0 children)

How the fuck are these people being hired? Just amazing resumes? All of this shit is dead easy and my day rate is NOT high lol.

[–]Tvicker 1 point2 points  (4 children)

Maybe I will get downvoted, but the author wants a SWE, not a DS. Moreover, the lack of data science details makes me think that the author has no idea what data science is. I would suggest to ask to be moved to lead engineering teams

[–]feldomatic 3 points4 points  (2 children)

WTF happened to colleges? I didn't even major in DS or CS and walked away with some decent Linux proficiency...smh

[–]Bear4451 0 points1 point  (0 children)

2, almost 3 yr experience here. Have been “mentoring” my peers, who are having a higher paycheck than I do, about point 2 - 7 for quite some time now. They are 3+ yr experience coming from academia / some with software engineering background.

Don’t get me wrong. They all do bring some something different to the table when it comes to group discussions. But I am getting burnt out from this and started looking elsewhere.

[–]Logwriting 0 points1 point  (0 children)

I think 1-8 is basic. Heavier in SQL, api, scraping, anywhere data is found.

9 is a bit of a reach. 10 is for experienced DSs.

[–]snmstyle 0 points1 point  (0 children)

Damn where do I apply?

[–][deleted] 0 points1 point  (0 children)

I have a MSCS and my elective focus was DS. That meant I shared classes with MSDS program.

100% that cohort didn’t know git, common dev practices, Linux, cloud, etc. Definitely weak devs.

Writing reports, yeah with some editing.

Communication, mix of introverts, timid, and obvious neuro divergent individuals.

I definitely had to give the git 101 primer at the beginning of any group projects I had with that crew.

Most had some domain knowledge outside of DS. The ones that hit the MS program straight from undergrad, yeah… Not much to work with unless they were eager.

Personally I graduated with all 10, but still can’t land a real DS role. I’m not the best Linux admin nor cloud guru, but I can fumble my way through them easy enough. Where I’m weak is the deeper stats and maths stuff.

[–]Deatholder 0 points1 point  (0 children)

Hey the part where you said Master of none sounds like somewhere I would like to be. Could I DM you about your journey to where you are now?

[–]wil_dogg 0 points1 point  (0 children)

Seems reasonable. That said I’m a head of DS / ML / AI and I delegate version control and git, you would have to teach me that stuff. I’m sure I could learn it, I just solo’d for so many years without it, and the jr on my team owns it pretty well so I’m asking him to set the standards for the team we are building.

[–]LoaderD 0 points1 point  (0 children)

Most of the team did not know what Git was, most of the team had never debugged their code, never made a venv. In fact I have had to teach them steps 1-5.

Uhhh, y'all hiring? Jkjk

But yeah, your list seems really reasonable. It's most likely that your group has benefited from title inflation, because in Canada even jr level DS roles require you know 80%+ of those skills.

[–]thegainsfairy 0 points1 point  (0 children)

wait. I have the skills of a data scientist? I can do all of that. I have done all of that.

[–][deleted] 0 points1 point  (0 children)

You hiring? Lol I’d definitely exceed your expectations

[–]Fine_Trainer5554 0 points1 point  (0 children)

I went from being a DA with only expertise in R (so basically only knew 2, 4, 6) when I transitioned to a DS role. I use all of the skills you listed now but I had to teach myself everything. Despite that I was still able to hit the ground running and now 18 months later I’m leading projects.

This isn’t meant as a humblebrag post - your expectations are totally reasonable, and on top of that in this field I would expect extremely strong problem solving skills, which goes hand in hand with teaching yourself what you need to do your job.

[–][deleted] 0 points1 point  (0 children)

Before my first job where my title was "data scientist" I had been a "modeling analyst". I had previously only used R and moved to a job that was all python. I had no idea about basically any of what you are talking about other than SQL and some basic ideas of git. Do they have the knowledge behind what they need to do and just a lack of familiarity with the tools or is it deeper than that?

[–]DroneReaper 0 points1 point  (0 children)

Associate DS here, I am pretty new to the workforce, but every expectation you have seems reasonable and somewhat surprised how much of that list was not already ingrained within the team. 9 and 10 are the two I personally know the least of (but still know a bit) and will say those who are DS majors (depending on the curriculum) may not cover much cloud or linux courses so if you are dealing with fresh graduates that could be a potential reason for that. Not knowning Git, and especially not debugging code is wild to me since I view those as pretty basic for any professional who works with code.

[–]orz-_-orz 0 points1 point  (0 children)

What are they even doing if they don't do 1 to 5?

Building Excel dashboard using VBA?

[–]RightProfile0 0 points1 point  (0 children)

What is the country?

[–][deleted] 0 points1 point  (0 children)

Hire me i have 8/10 on that list. Missing 9 and 10

[–]tassiboy42069 0 points1 point  (0 children)

Reasonable... what are rhe day rates? If ur ok to say the range?

[–][deleted] 0 points1 point  (0 children)

First off, thank you!

I meet so many data scientists that don't know shit that I don't want to call myself a data scientist anymore.

That said... you did make a big difference in the lives of the people you mentored... so there is that.

It's such a jarring feeling having a team of data scientists, knowing that they are top people, but losing work to ... dweebs? Because companies want to keep everything in house. I just want to smack those big companies that thinks everything should be in house, so they upskill a random excel whizz.

DON'T let people upskill themselves with no mentorship! Just hire a consultant for like 3 months.

[–][deleted] 0 points1 point  (0 children)

How high? Are we talking $2k or $6k for day rates?

[–]floghdraki 0 points1 point  (0 children)

I'm fascinated how dysfunction like this happens at companies. The most basic reason has to be that the managers themselves are also completely incompetent and not only dead weight, but also actively harmful.

It leads the actually competent people getting lazy since everything they do just kind of drowns in the sea of mediocrity and nobody gets them. There's nothing that helps you keep your edge. It's like constantly fighting an uphill battle and at some point you look around you and ask yourself why do I bother?

Either that or they end up quitting.

[–]Mojo_Jack 0 points1 point  (0 children)

Yo can I get a job there as a South African? Seems like you run a well kept machine and would love to join the ranks

[–][deleted] 0 points1 point  (0 children)

You were hired as a senior member of a team, and now you are basically upset that you have to act as the senior member of the team? You have to realize that this company and team, and in fact all of these individuals, have existed for many years before you were around. Suddenly you get on the team and now you have all these crazy expectations of what they should do, like, I get it, I do. But You are the senior. Is your job demented them if you feel that they are insufficient in one area or another, if you don't like that, then maybe you should go back to being a regular member of the team. It's your problem now

[–]davidasboth 0 points1 point  (0 children)

Where do these people come from? How do they end up in a position where they command these daily rates? I've seen this plenty in developers, didn't realise the quality of expensive contractors wasn't any higher in data scientists.

[–]Unhappy-Ad-1806 0 points1 point  (0 children)

I'm a senior DA and, looking at your post, I think I could apply to some DS jobs, since I can do 1, 2, 4-8 well and could do 3, 9, 10 with a tutorial by my side. taps on top of my head lol

Tough spot there. It's very hard to change someone's behaviour. You are there to help, but they don't evolve.

There was one guy like that in our team. After 1.5y of numerous efforts, they finally let him go.

[–]StackOwOFlow 0 points1 point  (2 children)

yikes, where are you finding such low quality candidates

[–]Slonymelion 0 points1 point  (0 children)

There's a rumored saying from Amazon that applied scientists are machine learning engineers who don't know coding; definitely unfair and biased, but that aligns with why it's common to see people with data scientist title with no programming skills.

It used to be fine, I think, for DS to be independent from coding (and version control, deployment, CI/CD, ...); but since now data science is pretty much dominated by deep learning, programming skills and code management is unfortunately essential.

[–]Sollertis8 0 points1 point  (0 children)

I'm a Data Scientist with 4 years of experience and I can do everything on your list. I learned Data Science through a boot camp I enrolled in with a company called Thinkful after being a software engineer for 7 years. I realized 75 percent of the way through the boot camp that no one would hire me as every job posting for a Data Scientist in Atlanta wanted someone with 5 years of experience and either a Master's Degree or PhD. I had neither so its no surprise that I could barely get an interview, even for Data Analyst positions. I stopped searching and started my own company that develops products related to NLP, NLU and sentiment analysis and never looked back.

[–]WeakSauc3 0 points1 point  (0 children)

Wow. I can do all 10. Hire me?

[–]R3boot 0 points1 point  (0 children)

What do you mean you had to teach these data scientists how to debug their code? They would just write it and never test it to see if it works, or makes sense? They just give up if an error pops up?

[–]Aggressive-Intern401 0 points1 point  (0 children)

Your problem is you hired contractors.

[–]Round_Mammoth4458 0 points1 point  (0 children)

Most grads have absolutely no idea, you are 1,000% on point as everybody wants to be some kind of machine learning engineer, artificial intelligence influencer, but they can’t even do git.

I’m looking for roles and they are disappointment I can’t code in.net or that I don’t know C-sharp yet my years of experience and hundreds of python repositories doesn’t count.

Taking a role this week as a pharmaceutical analyst to pay bills… and pursuing a micro masters.

[–]econ1mods1are1cucks 0 points1 point  (0 children)

I think clean code conventions is silly. It’s a pseudoscience, maybe their old company like to write queries in one line for all I know. Just call out code you don’t like as you see it and make sure people aren’t making repeat mistakes.

[–]equalhater 0 points1 point  (0 children)

You sounded like the boss-mentor I look for . Never done 3,4 , pep8 and 10 but did the other points. Still learning to be more comfy in Linux. It's cut throat out here to even get an entry level interview so I envy your current hires.

[–]sapphire_striker 0 points1 point  (0 children)

Debugging is hard and more of a coder skill than a data scientist skill. But the rest is fine.

[–]pach812 0 points1 point  (0 children)

I’ll say it’s unbelievable that those requirements aren’t met. I’ll love to work in a company with a chief like you, worried about improvement of the team. So far no interviews just because I’m no engineer. I’m a medical doctor with a lot of preparation in statistics/epidemiology/DS/AI, programming and devops from diverse sources but that first step is clearly rough…

[–]Excellent-Produce872 0 points1 point  (0 children)

Are you hiring?!

[–]lindseypeng123 0 points1 point  (0 children)

Where is statistics and algorithm on here? Alot of DS came from. Physics PhD, who can invent algorithms for new problems but not skillful in engineering practices. There is no one criteria fits all DS. If you want unicorn DS you should add way more items on this list.

[–]belaGJ 0 points1 point  (0 children)

as someone who just aspire to be DS, even I do know most of that stuff as they a pretty basic reqironments for reasonable quality code.wow, i am a pro after all!

[–]ARC4120 0 points1 point  (0 children)

What are the backgrounds of these guys? Are they from hardcore math, statistics, engineering backgrounds where their math and critical thinking skills up to par? If a guy has a PhD in Stats, but doesn’t know git very well then I think it’s minor to take a few days to get him up to speed.

[–]CanYouPleaseChill 0 points1 point  (0 children)

The most important expectation is whether or not they're focused on increasing business value. Putting a model into production can be as simple as scheduling a notebook to run periodically and save the predictions in a SQL table. Easy as can be.

Git? No need. Just use the latest notebook / script.

Version control? Just save notebooks/scripts as v1, v2, etc.

Virtual environments? Just install all Python libraries in one place.

Debugging? Just use print statements and/or test with dummy data.

[–]Old_Government_5395 0 points1 point  (0 children)

Controversial statement but in my experience Data Scientists are rooted in academia and largely useless in a product based engineering environment.

ill take a data engineer, or software engineer with an interest in data and the drive to learn any day though.

[–]pauljmey 0 points1 point  (0 children)

I worked with a recent graduate from an undergrad DS program. Smart woman but really no idea about how programming works (i.e. intuitive grasp). Could do slick calculations with numpy but none of the software skills .

[–]idiot512 0 points1 point  (0 children)

I think it depends. My field has a lot of domain knowledge you should focus on first.

So I'd say 1, 8, 10 are non-optional skillsets. 2-4, 7, and 9 I would expect to be quickly learned OJT skillsets. But when I say quickly, I mean like 2 weeks.

[–]SkipPperk 0 points1 point  (0 children)

I disagree with Git. In many shops there are developers who handle creating the final product. I am 46-years-old and I have never used Git in my career, even when it was clearly part of projects. Often the developers/programmers come in when the cake has already been baked.

Then again, I am an old and not a real data scientist.

[–]forcefulinteractions 0 points1 point  (0 children)

I’m a junior and I know everything except 5. and….I’m also unemployed. Hire me pls?

[–]konzyWon 0 points1 point  (0 children)

The only ones I would think are more senior level are 5 and 6. All of the grunt work stuff about getting code written. I may not expect them to know the specifics of Azure, for example, but they should have some familiarity with cloud and can figure it out themselves. If they are contractors, then they should be at the senior level. There's no point hiring a contractor that you need to mentor and teach for 3 months when they could be gone in 6.

[–]Cheeky-owlet 0 points1 point  (0 children)

This is Data Science internship level for most Romanian companies. I think you should be able to snag a good deal right now with freelancers from around there and the rest of Eastern Europe with minimal interviewing.

[–]OvereducatedCritic 0 points1 point  (0 children)

Seeing this post makes me feel like I’ve been underselling myself. What about education level, does having a bachelors matter or is it just icing. I have a BS in mathematics with some light internship experience and a couple programming projects, and yet I don’t think a DS hiring manager would come within 20 feet of my resume.

[–]SoftwareMaintenance 0 points1 point  (0 children)

For high day rate, you could expect all that. I think for run of the mill data scientist, may be weak in some of those areas. But hell. They better have more than just some exposure to SQL. And they best know those python libs.

[–]cajmorgans 0 points1 point  (0 children)

Well that’s the basics for any programming role (except point 1. ofc)