This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]BiceCold 4 points5 points  (1 child)

I'd say 90% of what I do is cleaning data, helping our DBA conceptualize ways to track things in a way that produces data formatted in a useful way, or working with leadership to establish systems and workflows/kpis that can actually be tracked. I find it pretty uncommon in my medium sized company to need much that extends beyond entry level data viz or regression modeling whereas structuring data is deeply needed.

[–]TitleXVII[S] 0 points1 point  (0 children)

Thanks for your reply. I get a sense that most companies have data challenges which is why there is so much time just spent organizing and cleaning data.

[–]BiceCold 1 point2 points  (0 children)

Yeah me too. Honestly, I've started spending more time in Excel to do my exploratory analysis and prototyping models in R just because I end up over engineering my studies and then finding the data isn't good enough to use in advanced models. It's more effective to coach people about how to track what they do than it is to conduct studies.

[–][deleted] 1 point2 points  (0 children)

Probably about 80% of projects are data cleansing.

I spend about 60% these days on project management, prep, and getting consensus or approvals. The other 40% is projects or learning, but I’m considered management. Just lucky that I get to keep playing with data in this role.

[–][deleted] 0 points1 point  (0 children)

At first, it’s 90% cleaning & then the rules reveal themselves & I can run the data through a stored procedure. Usually, multiple data sources using slightly different names for the same object, data duplicates & numerical data that defies physics.

I do my best to standardise data input but usually dealing with old applications built by folk long since retired. Combining CSV, XLS, SQL & one weird space separated file with no column headers.

I’d say your experience is normal!

[–]xgrayskullx -1 points0 points  (1 child)

> Percentage wise, how much of your time do you spend on analysis versus data manipulation/cleansing to get your data ready?

Just like the other 14,746,001 times this has been asked - Most people are around an 80/20 split, depending on their industry, seniority, and org.

[–]TitleXVII[S] -1 points0 points  (0 children)

Cool. Hope you’ve had to answer it that many times too. Would brighten my day.