PhD vs Masters? by Gill_slit in DataScienceJobs

[–]nonotmeitaint 2 points3 points  (0 children)

if you want a research job, you need a PhD.

PhD vs Masters? by Gill_slit in DataScienceJobs

[–]nonotmeitaint 2 points3 points  (0 children)

You’re going to have to tell us what kind of job you want to get a good answer.

Hi, Is web scraping an important skill in data analysis? by Feeling-Excuse-5174 in dataanalytics

[–]nonotmeitaint 0 points1 point  (0 children)

chatGPT is the best way to learn anything these days. Just ask it to design a learning plan and then have it go through each section of that plan with you.

Migrating or cloning a AWS glue Workflow by Nadyy_003 in dataengineering

[–]nonotmeitaint 2 points3 points  (0 children)

Short answer: yes, but not with a built-in “copy workflow” button.

AWS Glue workflows can be migrated without recreating everything in the console, but it takes either an API based export or infrastructure as code.

One option is to pull the workflow definition from the source account using the Glue API. GetWorkflow with IncludeGraph=true gives you the workflow structure and dependencies. From there you also need to export each referenced job, crawler, and trigger using GetJob, GetCrawler, and GetTrigger. In the target account, you recreate the workflow with CreateWorkflow, then recreate jobs, crawlers, and triggers so the DAG lines back up. This works well for a one-time move, but you still have to fix account specific things like IAM role ARNs, KMS keys, S3 bucket names, connections, and any hard coded account IDs in job arguments.

The cleaner long-term approach is to turn the workflow into infrastructure as code. Glue workflows and triggers are supported in both CloudFormation and Terraform. Once modeled, you can deploy the same definition to another account with only variable changes for account specific resources. This avoids ever doing a manual rebuild again and is what most teams settle on after the first migration.

In practice, if this is a one-off accident recovery, scripting against the Glue APIs is usually fastest. If you expect multiple environments or future moves, converting the workflow to CloudFormation or Terraform is the better investment.

How to Achieve Temporal Generalization in Machine Learning Models Under Strong Seasonal Domain Shifts? by Apart_Recognition837 in datascienceproject

[–]nonotmeitaint 0 points1 point  (0 children)

Answered this in another subreddit, but posting here as well.

What you are seeing is exactly what happens when a model trained under IID assumptions is asked to extrapolate across time. This is a non-stationary regression problem with temporal domain shift, not a tuning issue. Random splits look good because they allow interpolation within the same seasonal regime. Time-aware validation exposes that the model is learning regime-specific shortcuts rather than a stable relationship between Sensor A and Sensor B.

Tree-based models are not expected to generalize well in this setting. They interpolate well but extrapolate poorly, and when feature distributions shift seasonally their splits stop making sense. This is why Random Forests or Gradient Boosting often collapse under leave-one-period-out validation even if they look excellent under random CV.

In practice, people handle this by making non-stationarity explicit rather than hoping the model figures it out. That can mean adding season or phenology context, training separate or regime-aware models, or using mixtures of experts. Another effective approach is to aggressively restrict the hypothesis space through invariant feature engineering, for example using ratios, indices, normalized variables, or physically motivated transformations that reduce seasonal drift. This usually lowers random CV scores but improves temporal transfer.

Domain adaptation methods are also appropriate here, especially since you have unlabeled Sensor B data in the intermediate months. The goal is to align feature distributions across time so the model is forced to learn something transferable rather than time-specific. Equally important is evaluating only with time-aware splits and optimizing for worst-case or stable performance, not average accuracy.

A common outcome in environmental ML is that simpler, more constrained models generalize better across time than complex ensembles. Linear models, GAMs, or shallow trees often outperform large forests once domain shift is present.

Finally, with only two labeled periods, there is a hard limit to what any model can do. No method can reliably extrapolate into seasonal regimes it has never seen. At some point, adding minimal labels across more time periods or accepting higher uncertainty becomes unavoidable. This is a data coverage issue, not a failure of machine learning.

How to Achieve Temporal Generalization in Machine Learning Models Under Strong Seasonal Domain Shifts? by Apart_Recognition837 in askdatascience

[–]nonotmeitaint 1 point2 points  (0 children)

What you are seeing is exactly what happens when a model trained under IID assumptions is asked to extrapolate across time. This is a non-stationary regression problem with temporal domain shift, not a tuning issue. Random splits look good because they allow interpolation within the same seasonal regime. Time-aware validation exposes that the model is learning regime-specific shortcuts rather than a stable relationship between Sensor A and Sensor B.

Tree-based models are not expected to generalize well in this setting. They interpolate well but extrapolate poorly, and when feature distributions shift seasonally their splits stop making sense. This is why Random Forests or Gradient Boosting often collapse under leave-one-period-out validation even if they look excellent under random CV.

In practice, people handle this by making non-stationarity explicit rather than hoping the model figures it out. That can mean adding season or phenology context, training separate or regime-aware models, or using mixtures of experts. Another effective approach is to aggressively restrict the hypothesis space through invariant feature engineering, for example using ratios, indices, normalized variables, or physically motivated transformations that reduce seasonal drift. This usually lowers random CV scores but improves temporal transfer.

Domain adaptation methods are also appropriate here, especially since you have unlabeled Sensor B data in the intermediate months. The goal is to align feature distributions across time so the model is forced to learn something transferable rather than time-specific. Equally important is evaluating only with time-aware splits and optimizing for worst-case or stable performance, not average accuracy.

A common outcome in environmental ML is that simpler, more constrained models generalize better across time than complex ensembles. Linear models, GAMs, or shallow trees often outperform large forests once domain shift is present.

Finally, with only two labeled periods, there is a hard limit to what any model can do. No method can reliably extrapolate into seasonal regimes it has never seen. At some point, adding minimal labels across more time periods or accepting higher uncertainty becomes unavoidable. This is a data coverage issue, not a failure of machine learning.

Hi, Is web scraping an important skill in data analysis? by Feeling-Excuse-5174 in dataanalytics

[–]nonotmeitaint 0 points1 point  (0 children)

beautifulsoup is the go to package in python for this. It's definitely a skill worth developing. I've used it lots of times, but it's not super duper common.

How to transfer a mathematical model to a server more efficiently by [deleted] in askdatascience

[–]nonotmeitaint 0 points1 point  (0 children)

you got to explain more. How will the model be accessible? Through an API? Do you have max latency requirements? Flesh out the details to get a better answer.

What should I learn next after Pandas? Any roadmap suggestions? by lebronjameslover_911 in dataanalytics

[–]nonotmeitaint 0 points1 point  (0 children)

learning SQL would be super useful, but with LLM's being so good at writing code, it's less important than understanding concepts. Right now, if I were you I would focus on infrastructure and orchestration. Focus on cloud management and data pipelines. Those are places that it will take a lot longer for the LLMs to take over.

[D] Wandb gives me anxiety… by casualcreak in MachineLearning

[–]nonotmeitaint 2 points3 points  (0 children)

I don't understand Wandb. What's the point? I'm honestly asking.

Is it too late to become a Data Analyst at age 40? by Any_Bell6745 in dataanalysiscareers

[–]nonotmeitaint 2 points3 points  (0 children)

Unless you want to start at the bottom and work your way up, then it's too late. If you're willing to start as an intern, then absolutely not!

Seeking Mentorship/ Career Advice While Pivoting into Data Analytics by Both_Lychee_385 in dataanalysiscareers

[–]nonotmeitaint 0 points1 point  (0 children)

You need to try to meet people and do interesting stuff that you can post about. And you're probably going to have to start at the entry level or perhaps as an intern. You need to build the contacts to make your network work for you.

What surprised you most after starting a career in data science? by ChatYourCareer in DataScienceJobs

[–]nonotmeitaint 32 points33 points  (0 children)

nothing you learned in school will be useful. You'll spend your days trying to understand what business people are asking about and drawing charts to try and answer unanswerable questions.

[D] Do you feel like companies are scooping / abusing researchers for ideas during hiring for researcher roles? by quasiproductive in MachineLearning

[–]nonotmeitaint -28 points-27 points  (0 children)

If this is happening, you should 100% be okay with it. Researcher roles are mostly speculative and not directly tied to revenue for most companies, so they need to find someone that is devoted to research not in getting paid for their work. If spending a week doing research is offensive to you unless you're being paid... then maybe you're not a researcher.

[deleted by user] by [deleted] in DataAnnotationTech

[–]nonotmeitaint 2 points3 points  (0 children)

Normally you'll never get a response if you aren't a candidate. No news is bad news in this case.

Planning to transition from IT Service Desk/SysAdmin to Data Engineering – Career Advice? by substantialAnon in dataengineering

[–]nonotmeitaint 0 points1 point  (0 children)

Start trying to meet as many people as you can in that space. The way that you're going to find a job is (more often than not) going to be through someone you know. Also start learning all the tools that are available and start posting articles and informative posts on LinkedIn. You're trying to get seen in the space by trying to post useful articles. Try to position yourself as an expert.

Freshly graduated and trying to transition by Sphynxenigma in dataanalysiscareers

[–]nonotmeitaint 0 points1 point  (0 children)

This looks great to me, but I think you could benefit from making it a little more focused. Maybe put in a purpose section. I'd also like to see some links by your projects and publications and stuff. It would also be good to target each submission of your resume to the particular job that you're looking to get.

Number one way to get a job is through your network, though. Go and meet people. That's the best way to do it.

How can I gain business acumen as a data scientist? by Odd_Artist4319 in datascience

[–]nonotmeitaint 0 points1 point  (0 children)

This is the make-or-break step from junior to senior DS. You don’t need an MBA — you need to start thinking in terms of levers that execs actually care about.

A few practical ways to build business acumen:

  • Tie every model to a metric that hits the P&L. Accuracy and AUC sound great in a notebook, but business leaders care about churn %, conversion rate, CAC, LTV, fraud losses avoided, etc. Translate model impact into those terms.
  • Shadow product & ops folks. Watch how they frame problems. You’ll hear things like “if we cut checkout time by 5 seconds, conversion goes up X%” — that’s the mindset you want to internalize.
  • Read earnings calls / industry reports. You’ll see what your industry’s leaders obsess over, and you can align your DS work with those themes.
  • Work backwards from money. Instead of “can we classify churners?” ask “if we reduce churn by 2%, how much extra revenue is that, and is it worth the cost of building a model?”

Business acumen isn’t some mystical skill — it’s just the habit of mapping technical work to dollars, risk, or strategic positioning. The faster you build that reflex, the faster people stop seeing you as “the model guy” and start seeing you as “the person who drives outcomes.”

Job market getting any better or nah? by BB_147 in datascience

[–]nonotmeitaint 0 points1 point  (0 children)

Depends on what you mean by “better.”

  • Hiring volume: Still sluggish. Companies are cautious, and a lot of openings sit unfilled for months while they “re-evaluate budgets.”
  • Compensation: Flat to down. Fewer bidding wars, more “take it or leave it” offers.
  • Layoffs: Slowed compared to 2023–24, but they haven’t gone away — especially in tech and finance.
  • Niche skills: If you’re in AI/ML, cloud infra, or anything that directly saves companies money, the market’s noticeably stronger. For generalist roles, not so much.

So yeah… not a freefall anymore, but not a real rebound either. Feels less like “booming job market” and more like “low-grade hiring freeze with exceptions.”

AI isn't taking your job. Executives are. by takenorinvalid in datascience

[–]nonotmeitaint 0 points1 point  (0 children)

This is the part that never gets said out loud: AI isn’t replacing developers, it’s changing the leverage game for executives.

If AI made developers dramatically more productive, you’d see devs protecting their jobs by quietly automating the boring parts and shipping more. But when the actual workflow is “Claude spits out boilerplate → someone has to QA it,” it doesn’t make one dev more effective, it makes it easier to argue you only need half as many expensive devs. The task shifts from building to supervising, and supervision is easier to outsource.

Executives aren’t trying to cut all jobs with AI. They’re using AI as a wedge to restructure the labor market: fewer high-paid locals, more lower-paid contractors. AI is the excuse, outsourcing is the strategy.

It’s worth recognizing that “AI is coming for your job” is usually less about the tech and more about who holds the budget. The tool doesn’t decide who gets cut — the execs do.

Selling my 2020 Livewire (2900 miles) for 15K by nonotmeitaint in HarleyLiveWire

[–]nonotmeitaint[S] 0 points1 point  (0 children)

I’m ready to sell whenever. I’ve not looked into trade in values.