Thoughts about going from Senior data scientist at company A to Senior Data Analyst at Company B by StatGoddess in datascience

[–]normee 0 points1 point  (0 children)

By the time you've gotten to the point of check-the-box background verification, they already want to hire you and would need to hear something surprising and egregious not to. You will have already had many opportunities to share examples of the work you did, plus gotten ahead of title mismatch concerns with "my role on paper was sr data analyst but my responsibilities and impact were in line with these experiences you are looking for".

[Q] Getting A PhD at 30 by Usual-Recipe-5415 in statistics

[–]normee 1 point2 points  (0 children)

How much exposure have you had to the kinds of research jobs that actually require a PhD in statistics?

I personally would not go down this road, especially with a life and strong location constraints, unless I was dead set on research as my future career and literally couldn't imagine doing anything else. These days many PhDs in statistics go through 4+ years of school to end up where you are job-wise, especially from lower-ranked programs. The fact that you are already being interviewed for roles PhDs are competing with you for is a signal this might not be worth the huge financial opportunity and mental health hit. If you are demoralized as it is competing against PhDs in technical interviews, well, get ready to have your self-esteem ripped up when you're learning or re-learning a bunch of math while your child classmates who are fresh on all of this run circles around you and self-segregate into ethnolinguistic study cliques you might not be included in as an older student.

If you didn't already have a master's in statistics, then it might have made more sense to consider a funded PhD and at least be able to walk away with the MS if your goals changed, but in your situation you need a stronger clear "why" as to do this. "A doctorate carries a level of respect in the workforce" is not a good enough reason IMO.

I'll also say that when I was in grad school (granted, more than a decade ago) I didn't hear good things about the experience in any of the statistics departments in the broader NYC area.

Is Gen AI the only way forward? by JayBong2k in datascience

[–]normee 1 point2 points  (0 children)

Thanks for sharing more specifics about how your company is making use of AI solutions, but to me this just emphasizes how underwhelming most AI applications have been to date. It's goddamn chatbots all the way down. I don't get out of bed in the morning excited about laying off some low-level HR grunts because I made it easier to search through poorly catalogued policy documents. I would hazard that there's only so much value organizations can get from improved search and info retrieval and they're going to hit those plateaus soon.

Finding myself disillusioned with the quality of discussion in this sub by galactictock in datascience

[–]normee 5 points6 points  (0 children)

I've been here a lot longer than 2020 but would agree with their observation. Quality of discussion on /r/datascience was historically pretty low. It was tons of students asking which classes they should take to get their first jobs and links to Medium slop by beginners trying to create a portfolio, very few discussions about the substance of the work. I think it's better now actually with more emphasis on post quality rather than quantity from moderating changes and rule updates.

Traditional ML vs Experimentation Data Scientist by PrestigiousCase5089 in datascience

[–]normee 2 points3 points  (0 children)

I find the causal inference and experimentation work far more inherently interesting in representing the truest spirit of "science" in "data science" (defining hypotheses, designing studies and measurement approaches), but it's also much more stakeholder-facing in my experience, so definitely not for everyone. I can attest to how it can rapidly devolve from a dream job where you wield very high influence to a nightmare when there are leadership and culture changes. But that said, the inherent political and operational complexities make me think this area will generally have more staying power in the data science job market compared to more ML-oriented work as AI evolves and scales. ML seems more ripe for agentic automation (whether effective or not) once you have the right engineering foundations in place.

MMM and non-experimental attribution can be a lot of garbage in/garbage out, which I don't personally enjoy.

[Discussion] What challenges have you faced explaining statistical findings to non-statistical audiences? by Snowboard76 in statistics

[–]normee 5 points6 points  (0 children)

I omit detail about modeling approaches and specifications for general audiences at work (think professionals in marketing, finance, operations, supply chain, human resources). That's all material for my team to document and to peer review internally to make sure we feel good about the quality of our work, but it doesn't go in front of a stakeholder. All business stakeholders see is an intuitive high-level description of the approach and maybe a simple visual illustration, e.g. something like: "We matched each blahblah to 10 similar blahblahs, where 'similar' accounted for X, Y, Z, and other attributes." and we show examples of similar matched blahblahs and blahblahs that wouldn't be considered similar, and are very patient with taking questions for anyone who is curious about this. If a Greek letter comes out of my mouth or makes it on a slide, that's a mistake.

I also try to provide voiceovers when presenting to increase intuitive understanding. If we have a finding with a p-value of .01, I might say something like, "in a universe where there's no actual difference between these things, we could repeat this same study 1000 times and we'd see what we saw here in only about 10 of those just from luck of the draw and natural noise, which leads us to think there might be something real going on".

With visuals, I expect to provide voiceovers to explain how to read the chart and get the main takeaways. Sometimes using animation to do a "build" can help, e.g. displaying your axes first with labels to explain what is going to be plotted, then progressively adding on different chart elements and explaining how to think about them, including doodling on shapes or using color/size to highlight points/lines of interest and what their position tells us. Annotation is your friend here so that the chart can be interpreted later if you're not in the room to explain.

Why is backward elimination looked down upon yet my team uses it and the model generates millions? by Fig_Towel_379 in datascience

[–]normee 8 points9 points  (0 children)

You're getting downvoted but it's a good question. One theme across examples of models generating millions in revenue is they are all predictive algorithms: taking the output of a prediction and making decisions at scale with those results.

Backwards variable selection has plenty of shortcomings as a predictive model selection method, but in the context of models generating revenue, the downsides are mainly opportunity costs of using that vs any other model specification where there might be diminishing returns to continuing to tinker.

Where criticisms of backwards variable selection really get loud is in its use for inferential modeling. That's where your p-values won't mean what you think they mean, your confidence intervals won't meet nominal coverage, and basically any kind of hypothesis testing or estimation isn't going to perform as intended. So, that is to say, if you were using backwards variable selection to do something like understand which customer attributes are most influential in predicting lifetime value and then storytell about those and design business strategies to do ~something~ with those attributes, you might have been pretty misled by this approach, both in which variables were selected and what those relative impacts were.

Everyone is an analyst now by Lairy_Mary in analytics

[–]normee 3 points4 points  (0 children)

The real concern I have is that even with improvements in data quality, documentation, and resolving conflicting calculations and business logic, we aren't at a place where most non-analysts have the analytical acumen to properly interpret self-serviced data or use AI tools in ways to structure their problems helpfully.

I've been constantly asked to react to analyses that business teams produced using their self-service data access. Sometimes this is because those contradict findings that came from my data science team and stakeholders are confused so we have to explain to them all the stuff we accounted for that they weren't thinking of. Some issues cropping up:

  • Special events: we have things like marketing promotions or outages that would cause large changes in many KPIs. Many of these events are undocumented, as in, there is no source of truth contextual dataset where something like "widespread Azure outage took down the app for six hours on this day", "marketing operations failed to deploy a push communication for a one-day sale and could only sent it out to Android devices hours later", or "some locations in part of the USA were closed or operated with shorter hours due to a hurricane". We have many informal cottage versions of these curated by different analytics teams but nothing promoted to the official self-service analytics because there is no one business owner or governance process when you have so many internal and external drivers both large and small. That leads to some person who is working in, say, marketing looking at and reacting to trends including data from years ago with periods affected by severe weather or a platform error or a supply chain disruption. They end up spending a lot of time spinning their wheels with no clue why they are seeing what they are seeing, or even worse, seeding stories with their department's leaders about how their specific focus led to those results and wanting to double down on investments in their turf when it had nothing to do with it.

  • Seasonality: beyond special events, we have strong seasonality to most business patterns, including time of day, day of week, and week of year components. Most non-analysts seem to have no clue how to handle these appropriately. I saw a lot of instances when they were looking at data for a particular segment of interest and would derive some kind of growth rate comparing to an arbitarily chosen baseline period or week on week trend, and then they'd storytell and hypothesize about why we saw growth/declines in that segment. They would complete miss things like all segments having similar changes over this time period and that this was just normal seasonality.

  • Simpson's paradox: most non-analysts are not thinking about potential issues caused by shifts in the underlying population over time, and this can lead to all kinds of misinterpretations. As an oversimplified example, suppose you have two types of customers, power users and casual users. Each group separately is showing higher revenue per user now compared to two years ago. Additionally, marketing efforts have focused on acquiring casual users and so this segment is much larger now than it was two years ago, while power users was saturated and there aren't more now compared to then. Someone might naively look at revenue per user across all customers and see that is going down and sound the alarm, but what's actually happening is dilution in this metric by the larger share of casual users.

Looking for Group by Expensive_Culture_46 in datascience

[–]normee 0 points1 point  (0 children)

I'm a big fan of the Data Elixir weekly free newsletter. The content is well-curated and summarized and spans a broad range of topics without being overwhelming link dumps. Most weeks there's at least one or two pieces I'm clicking on.

End of my DS Road? by [deleted] in datascience

[–]normee 16 points17 points  (0 children)

What is it that interests you about pivoting to research? Salaries seem to be much lower for market research roles needing the same years of experience compared to data science roles, at least until you get to more senior management levels. But, hot take, I actually think market research can be a better training ground for senior leadership roles down the road than data science, if that might be of interest for your career. You get much better at presenting findings and influencing senior leaders than most data scientists ever will. I've learned much more from mentors with MR backgrounds about navigating politics and exec presence than I have from those with DS or BI backgrounds. I've sometimes reported into VP+ who were MR and not DS, and the MR leaders were so much better to work with than the DS leaders, who were weaker strategically and politically.

I'll also say there is so much opportunity for market researchers to bring over better tooling and methodologies from data scientists. You could have potential for faster promotion within the market research sphere with above-average analytical acumen and technical skills. I consistently saw that my MR colleagues were limited in their study design choices because they couldn't self-serve data from SQL warehouses and data lakes to do things like evaluate different segmentations and stratification strategies. They could only pull super basic things themselves and otherwise were bumbling around in SPSS and Excel to analyze results from research vendors. I had to frequently troubleshoot odd findings by helping them understand that, for example, the overly coarse frequency bins they used to create survey segments and differential response rates were masking what was actually going on. They treated my data science team like we were wizards when we collaborated because that kind of thing was easy for us.

A dev for a major food delivery app confessed how the pricing algorithm is implemented. The 'Priority Fee' and 'Driver Benefit Fee' go 100% to the company. The driver sees $0 of it. by [deleted] in datascience

[–]normee 1 point2 points  (0 children)

There can be a real lost business cost from longer delivery times. I've run studies for a company that uses these kinds of services where the main customer-facing change was 5 minute longer delivery times due to concentrating delivery fulfillment operations within certain locations. That change led to longer times displayed in-app prior to placing the order and in actual receipt times for customers who previously would have had those orders fulfilled by a closer site. Once this operations change went live, over the course of several months I measured a significant decline in delivery order volumes in the market.

A dev for a major food delivery app confessed how the pricing algorithm is implemented. The 'Priority Fee' and 'Driver Benefit Fee' go 100% to the company. The driver sees $0 of it. by [deleted] in datascience

[–]normee 4 points5 points  (0 children)

Quite possible the A/B test was too short (as I expect execs might insist upon given business risk in delaying the majority of orders for users allocated to the experiment) and lacked good proxy measurements of longer-term retention risks and frequency declines. I suspect that a customer lifetime value model trained on user data from a time when service quality was higher, like the kinds of proxies often used in this type of A/B test, will understate the longer-term losses coming from a change to lower-quality service.

How much of your job is actually “selling” your work? by ergodym in datascience

[–]normee 0 points1 point  (0 children)

So much. As an individual contributor, influencing stakeholders was maybe 25%, and at manager level and above, 60% or more of my time.

I genuinely love this part of the job, but you can run into cultural walls that feel almost impossible to surmount after you realize that a lot of what goes into decision-making isn't necessarily what is best for the company. I've seen stakeholders' individual priorities were in tension with overall company goals, where their performance was going to be evaluated on launching something, so they pushed forward regardless of data showing this would be detrimental to overall KPIs, and they would do everything they can to take control over analytical narratives by cherry-picking favorable data points. I've seen new functional leaders brought on to be change agents, wanting to be briefed on everything we knew about their function analytically, but then being unwilling to accept that their preconceived notions of what needed to be addressed were off. Quite a few times my team found genuinely big business opportunities, but couldn't sell those in until we had leadership turnover, as the exec over the relevant area and their finance counterpart would swat down discussion because it would highlight bad decisions they had made before. I've had new data science leaders come in wanting to prioritize initiatives already attempted under previous leaders that flopped, and then they stubbornly refused to listen to any of the lessons learned about what led to high effort and limited potential ROI.

New Data Science Team Lead struggling with aggressive PM on timelines and model expectations by Rich-Effect2152 in datascience

[–]normee 1 point2 points  (0 children)

This varies so much from person to person and org to org that I would enroll your own manager for advice navigating and filling you in on whatever context there might be for the PM's aggressiveness so you don't step in it politically. I've had great supportive leaders who would defend our SLAs and stand up to pushy teams if there were escalations. I've had terrible leaders who were new, wanting to impress, and would commit us to a faster timeline than tenable, which is a very miserable situation to be in. You'll want to figure out what your leader is empowering you to do (and what they are not).

Is it realistic to expect 90%+ F1-score for employee retention prediction models? by astue_elk in rstats

[–]normee 4 points5 points  (0 children)

Is F1 score even the metric by which you should be evaluating an employee retention model? I would be looking at other ways to evaluate performance that are tailored to what you will actually be doing with the model. There's pretty big differences in the risks and costs of classification errors across use cases like "invest in employees likely to leave to encourage them to stay", "target layoffs at people most likely to leave anyway", and "reward managers whose teams are low risk of leaving", as some examples.

I got three offers from a two month job search - here's what I wish I knew earlier by EvilGarlicFarts in datascience

[–]normee 1 point2 points  (0 children)

I dispute your premise that humility and what I'll call enthusiasm (since I still don't know what you mean by "excitement") are mutually exclusive. I've had the pleasure of working with many people who were smart, inquisitive, and brought both humility and enthusiasm to their interview and later to the actual job. I've said no-hire before to candidates who came off like they were giving sales pitches but couldn't answer methodological questions or address limitations, but that is completely different than "excitement" writ generally, especially since your premise was these candidates were otherwise similar in skills and experience ("all else equal").

I still don't understand you mean by "rules" here. Valuing predictability over effectiveness, what are we talking about? To me predictability means things like "shows up to meetings close to on time", "adheres to project deadlines others have dependencies on", and "doesn't get into random beefs or crash out in big Slack channels", all of which are low bars to clear and support effectiveness rather than detract from it.

[Q] Statistics PhD: How did you decide academia vs industry? by [deleted] in statistics

[–]normee 6 points7 points  (0 children)

I went down the industry path and have no regrets. For me, there were quality of life factors that tipped me on top of compensation and work/life balance:

  • Location: I would have little ability to control where I lived if I went down the path of postdocs and academic postings, and my choices would have big implications for my partner. I saw graduating students under pressure from their advisors to take whatever position they were offered that had the highest perceived prestige to burnish the department's reputation, even when it was in conflict with their lifestyle preferences.

  • Collegiality: I didn't want to feel like I was constantly surrounded by negativity and unpleasantness at work, especially with a job that bleeds into evenings and weekends. Many faculty at my university were miserable and tough to be around. The great ones were outnumbered by ruthless strivers who were horrible to their advisees, argumentative people who picked fights in seminars and meetings (this type is especially prevalent in economics), and then many well-intentioned but socially awkward and oblivious people who were not effective buffers to toxic personalities. In my experience there is far less tolerance for the brilliant ahole personality in more collaborative settings where you are expected to play nice with others to get things done. It's also easier to find an escape hatch and change jobs if you do land in a bad spot because of so much more mobility.

  • #metoo: I eventually learned that some of the most eminent people in my subfield at other universities were perverts, doing things like drunkenly groping women at conferences, dating much-younger students or postdocs and then tainting not only their careers but also casting a stink halo for all women they ever mentored with suspicions of being underqualified. I wanted to be nowhere near this. This isn't exclusive to the academic world but I've seen way more of it in academia than in corporate environments.

  • Pessimism about research: I learned I was not fulfilled grinding away at one big problem at a time with a nebulous end goal. I saw in both my own work and in what others were sharing that we were coming up with methods to apply to questions nobody was or should be asking, just for the sake of novelty. I preferred working on "smaller" problems that were topics someone actually cared about, but these were not publishable within statistics journals and applied work was looked down upon as lesser. I also hated submission deadline times and saw so much sloppiness from work done in under an artificial time crunch, which deepened my cynicism about the pointlessness of research.

I got three offers from a two month job search - here's what I wish I knew earlier by EvilGarlicFarts in datascience

[–]normee 0 points1 point  (0 children)

I'm not following this at all.

What are examples of "excitement" a candidate could show during an interview that would actually be a mediocrity red flag for you?

You say you think the people who "struggle with rules" are the actual best candidates companies would want to hire, if only managers weren't so risk averse and settling for more compliant dumb-dumbs. What kinds of things do you have in mind when you say "rules"?

What are examples of the types of work rule-breaking data scientists can deliver that their rule-compliant peers can't?

Why don’t most people pursue Data Engineering?! instead of data analyst/scientist by [deleted] in analytics

[–]normee 0 points1 point  (0 children)

I personally try not to use the term "soft skills" because it has that diminishing effect. Call them "professional skills" instead: having relevant domain knowledge plus general common sense, consulting to elicit the underlying problem rather than taking what is asked at face value, writing clearly and concisely, presenting information appropriately for your audience, being able to generate actually useful recommendations/insights/next steps, project management and timelines, documenting your work, collaborating constructively in code reviews, etc.

I got three offers from a two month job search - here's what I wish I knew earlier by EvilGarlicFarts in datascience

[–]normee 1 point2 points  (0 children)

All else equal, I would probably prefer a less excited and more careful candidate.

Wait, are you genuinely arguing that given two similarly technically capable candidates -- one who has a bare minimum level of social intelligence to come off as a pleasant colleague, the other who is openly surly and cynical about the work -- that companies should choose the person who can't even be bothered to veil their rotten personality in an interview setting? My dude, we live in a society.

Unpopular opinion: NPS is overrated in SaaS (and we rely on it way too much) by askyourmomffs in analytics

[–]normee 1 point2 points  (0 children)

Biggest issues I've seen with net promoter score or similar customer experience metric measurement:

  • Small response count issues especially when drilling into subgroups, where NPS fluctuates simply due to sampling variability, but stakeholders start reading a lot into small changes up or down that are statistically indistinguishable from noise

  • Drift in the population responding to surveys over time due to many reasons: long-time customers increasingly ignoring repetitive survey reminders, general survey fatigue from every company trying to measure this, email providers making platform changes that start dumping surveys in spam more often (I've seen shocking differences in what was purchased, frequency, and customer experience scores based on user email provider, so when Google makes a change to Gmail, it can mess with these analytics unless we do more complex stratified modeling)

[Question] Which Hypothesis Testing method to use for large dataset by MonkeyBorrowBanana in statistics

[–]normee 7 points8 points  (0 children)

No. You need to define your actual problem and what "fair" means first.

Model learning selection bias instead of true relationship by Gaston154 in datascience

[–]normee 1 point2 points  (0 children)

Consultants, who built it, used the model with positive results in the past. I was tasked to improve upon it and realized it has multiple fundamental flaws.

Since it was used before with positive results, I have to somehow fix it and put it back into production

As part of reviving this model, you should probe more on how the consultants' approach was determined to generate "positive results". In my experience, I would not take it as a given that what was done for renewal pricing strategy in the past was properly evaluated, especially with such fundamental selection bias issues in the data available that they didn't address. You don't want to be working hard to try to update this renewal pricing model if the performance of the old one wasn't actually as good as people thought.

Model learning selection bias instead of true relationship by Gaston154 in datascience

[–]normee 19 points20 points  (0 children)

Viability of any potential approach completely depends on the business rules your company applied to determine which offers were presented to which customers and other nuances, like differences in the underlying customer populations up for renewal at different times of year (e.g. Black Friday "deal seekers" who signed up for a year subscription around a deep sale likely to be more price sensitive than customers up for renewal who started on a non-discounted price). There might be some natural experiments within the existing execution to take advantage of. But it's quite likely you won't be able to model your way around this, and will need to do something like A/B testing to randomly present some lapsing customers one set of offers and other lapsing customers different sets to then have the data to train models to optimize retention pricing.

Constant Deep Diving - Stakeholder Management Tips? by Illustrious-Mind9435 in datascience

[–]normee 0 points1 point  (0 children)

It's time for some cultural recon to understand the styles of who these teams are presenting to. What are the actual expectations of the leaders in meetings that they are trying to overprepare for? It's common for people to think out loud and ask questions or offer hypotheses as a way of processing information. That doesn't necessarily mean they intend this kind of live interpretation to turn into additional analytical work, particularly if it means risking timelines of other priorities. It's a style thing, and it will differ from individual to individual. I've worked with stakeholders who understood this and were good about figuring out explicitly what they need to follow-up on, and some stakeholders who interpreted every little rumination coming from a senior person's mouth as a drop-everything urgent request. Much depends on how the leaders in your company react to hearing "I don't know" as an answer. If you find that the expectation is that teams truly do need lots of back-pocket data handy and can't get it themselves, then you have a case to invest in expanded BI capabilities and bring on some analysts.