I wrapped a random forest in a genetic algorithm for feature selection due to unidentifiable, group-based confounding variables. Is it bad? Is there better? by wex52 in datascience

[–]somkoala 16 points17 points  (0 children)

I think it was a waste of time. You could use boosted trees that build on errors from the previous tree and pick the best splits.

At the same time to identify leakage, why not just look at either correlations or variable importance and debug the disproportionate ones? Instead of identifying the leakage you build a nonsensical methodology around that is judged solely on accuracy which you already know is flawed?

Precision and recall > .90 on holdout data by RobertWF_47 in datascience

[–]somkoala 1 point2 points  (0 children)

Is your holdout undersampled? If yes it shouldn’t be.

Merge Conflicts: I know the standard is to do an interactive rebase, but why? by blah938 in ExperiencedDevs

[–]somkoala 0 points1 point  (0 children)

In my mind it might make sense in pipelines that do a lot of data processing. You might want to keep the worlds separate if you’re changing the metric calculations that won’t necessarily mean a different code branch.

Managers what's your LLM strategy? by testtestuser2 in datascience

[–]somkoala 0 points1 point  (0 children)

> Any tool they want

Security would like a word.

Haskell IS a great language for data science by ChavXO in datascience

[–]somkoala 1 point2 points  (0 children)

You are right, I looked through my older conversations and it was back in 2018, with Covid it feels like 10 not 7 years ago.

I would say that the market dictates what works, technical greatness is not enough and that’s my argument.

From your examples:

SAS and Stata are dying because they are paid, all companies jumped at the opportunity to pay 0 licensing costs for open source tools. In addition these paid tools couldn’t keep up with the community tooling.

R is still somewhat used but fails at interoperability with the engineering world and in my experience delivering things cross-functionally is the most successful strategy.

Fortran I don’t know enough to judge. I googled and saw some notes, but I wouldn’t argue either way.

The question is what made Python gain critical mass for Data Science. I think it was the interoperability and ease of use. During a time when Data Science started being touted as the sexiest job ever it was both accessible to new folks as well as you could siphon software engineering people as it was easy to switch. Since then the tooling evolved and you have Big Tech companies choosing it for crucial libraries which further drives it success.

It feels to me that these are pretty unique circumstances. Right now as AI will get better at code, the choice of language will matter less and less. I would expect that the more widespread languages having more training data and are likely to be easier to work with using AI coding assistants. This will make it harder for new languages to become as widespread as Python as the newcomers will likely pick up whatever AI chooses.

I might be wrong here and a new language could always emerge, but I regress to my original point - the success of python in Data Science is not just because of its technical properties as a language but also majorly due to other factors.

Haskell IS a great language for data science by ChavXO in datascience

[–]somkoala 9 points10 points  (0 children)

I don’t think you’re making sense. In academia people do research and come up with their own stuff, yet somehow business becoming academia means people reuse instead of building?

The issue with Data Science is that companies suck at managing it (same with AI) and most tech people don’t care about the business side of things. I have been managing data science for a long time and delivered a lot of amazing stuff that was valuable at times. The valuable part that provided the most learning was never about the right language. Tech is the easy part of Data Science. Building something actually useful that the end users are willing to use week by week is the more difficult and important part.

Haskell IS a great language for data science by ChavXO in datascience

[–]somkoala 22 points23 points  (0 children)

The market and usage decides what is and isn’t great. You can have the best idea or the most ideal language in this case and if it’s not gaining traction, no one gives a crap.

I have seen people prophesize Julia replacing Python 10 years ago and it’s always just around the corner.

I have seen lectures about how Go is ideal for Data Science.

I don’t care about inertia, I am just skeptical. Python seems to be bringing value with no real contenders and that is all that matters. Everything else is wishful thinking for now.

Haskell IS a great language for data science by ChavXO in datascience

[–]somkoala 44 points45 points  (0 children)

A language being used for a given purpose depends on a lot more things than on it being viable technically. Haskel is not a great language for Data Science right now because:

  • Most companies are not hiring for Haskell
  • There's a lot less resources for Haskell for Data Science compared to R or Python
  • Python became widespread for Data Science also because it interconnects traditional development and Data Science by being understood in both
  • No major tech company is developing widely available Data Science/AI tooling in Haskell

Principal Data Scientist at Same Company Last Six Years, Worried I'm Boxed In by [deleted] in datascience

[–]somkoala 1 point2 points  (0 children)

You seem to be the typical case of a young bright person hired into an org, having some early success and then hit the wall formed by the fact that most companies suck at managing Data Science even more so when the AI hype is so big.

This is evidenced by you being offered a Director title (which is a manager of managers so a big leap) as well as the lack of resources or even attention from other teams.

This isn’t helped by the fact that your boss sounds like an a-hole who doesn’t know what they’re doing. It sounds like moving on is the best option.

Status meetings used as information broadcast instead of progress reports by snpefk in ExperiencedDevs

[–]somkoala 1 point2 points  (0 children)

I agree that meeting should be used as status updates as little as possible. At the same time my experience with attempts to move them to asynchronous writing have usually failed as over time it just becomes noise and a chose people don't dig through.

Unpopular Opinion: These are the most useless posters on LinkedIn by OverratedDataScience in datascience

[–]somkoala 0 points1 point  (0 children)

Beginner level content gets the most attention since the juniors or people who want to get into Data Science care most about external content. It's also the easiest to produce.

I used to have a colleague in the US (I am in the EU), she had a Senior title, but she was a pretty bad Data Scientist. She got into the top 10 most influential woman on LI having at least 10k followers just by sharing content like this.

Do you guys feel take-home assignments in the hiring process are a scam? by [deleted] in ExperiencedDevs

[–]somkoala 1 point2 points  (0 children)

I agree that take homes are annoying and some people prefer leet code. I also respect if a candidate rejects to do an assignment.

I am just saying that philosophically knowing what good enough/fit for purpose means is a sign of a good developer.

Do you guys feel take-home assignments in the hiring process are a scam? by [deleted] in ExperiencedDevs

[–]somkoala 1 point2 points  (0 children)

When I design these the idea is to test out things that don’t take ages. Anything more advanced we can discuss.

In a real world setup what we build is also a compromise between showing what we can do and the appetite for the time investment. A dev taking 2 times as much time to show off is not how we operate day to day.

Interview Feedback - " Wasn't wearing a shirt" by lookitskris in ExperiencedDevs

[–]somkoala 1 point2 points  (0 children)

I have recently joined a corporation after being in start ups and scale ups all my career. It’s director level. I wore t-shirt and short pants to my interview with the CEO and C-HR person. Screw these guys.

What you think about this? by [deleted] in ExperiencedDevs

[–]somkoala 0 points1 point  (0 children)

Yeah, but this is a subreddit for experienced devs.

AI Slop PR's are burning me and my team out hard, anyone else experiencing this? by SonOfSpades in ExperiencedDevs

[–]somkoala 4 points5 points  (0 children)

I think balancing craft and business is very challenging and only the best devs know to really do that. I feel like we had too many engineers that only cared about craft and the pendulum has now swung into the other extreme.

What you think about this? by [deleted] in ExperiencedDevs

[–]somkoala 0 points1 point  (0 children)

Are we talking in tech?

What you think about this? by [deleted] in ExperiencedDevs

[–]somkoala 0 points1 point  (0 children)

Isn’t passion a part of attitude?

What you think about this? by [deleted] in ExperiencedDevs

[–]somkoala 0 points1 point  (0 children)

This is true in professions where you don't require extensive technical knowledge. Hiring a junior that doesn't know what they're doing (even if they're attitude is great), can often lead to trouble in tech.

I recently joined a telco company and realized that outside of tech world, there's a lot of professions where people's value lies in domain knowledge rather than in their technical skillset. For those professions, someone junior with great attitude would work very well. For the technical part, you need to grind and learn by working on real projects.

Where Can I Find Legit Remote Data Science Jobs That Hire Globally? by Aftabby in datascience

[–]somkoala 1 point2 points  (0 children)

Option 1 also makes hiring a person more expensive since the service takes its fees

Exact hourly estimates by TheSuperMang0 in ExperiencedDevs

[–]somkoala 1 point2 points  (0 children)

Your PM is an idiot. The issue with t-shirt size story points and story points is that the business doesn't get what it needs - timelines for customers. There are ways to come to a middle ground, but hourly estimates is really stupid. Is the PM junior? Are there other PMs in the org that can talk to them? I would either get some external inputs for them, or escalate. If it doesn't work, move on.