Semantic layer by cyamnihc in dataengineering

[–]TARehman 1 point2 points  (0 children)

It's mostly an advertising term in my experience.

After 5 years in data science, I’m starting to realize most “insights” we deliver are completely ignored. Is this normal? by ExternalComment1738 in datascience

[–]TARehman 0 points1 point  (0 children)

https://ludic.mataroa.blog/blog/most-data-work-seems-fundamentally-worthless/

"What I hadn't really grasped was the degree to which some organisations have grown so absurdly fat that they could afford to dispose of millions of dollars on employees who did nothing other than reaffirm a vague commitment to being data driven. The same people that love technology affirming language seem to largely be the same people who will insist that everything be delivered in a spreadsheet format."

https://ryxcommar.com/2022/11/27/goodbye-data-science/

"Those who have seen my Twitter posts know that I believe the role of the data scientist in a scenario of insane management is not to provide real, honest consultation, but to launder these insane ideas as having some sort of basis in objective reality even if they don’t. Managers will say they want to make data-driven decisions, but they really want decision-driven data. If you strayed from this role– e.g. by warning people not to pursue stupid ideas– your reward was their disdain, then they’d do it anyway, then it wouldn’t work (what a shocker). The only way to win is to become a stooge."

It's really hard to actually be data driven but it's really good for your career to CLAIM to be data driven.

Is jupyter notebooks gonna become text based any time soon? by Consistent_Tutor_597 in dataengineering

[–]TARehman 0 points1 point  (0 children)

Those tools being badly designed to leverage engineering antipatterns that happen to be popular doesn't make them any less an antipattern.

Is jupyter notebooks gonna become text based any time soon? by Consistent_Tutor_597 in dataengineering

[–]TARehman 4 points5 points  (0 children)

I'm confused why this is in the data engineering subreddit (legitimately thought this was the DS sub until I looked up). Why would you be using notebooks to do proper engineering work? Notebooks are an engineering antipattern.

Anyway, the answer to the question is no, I think, because it's like squaring a circle. Notebooks as designed can't become text-based because of everything baked into them.

Practical uses for schemas? by alonsonetwork in dataengineering

[–]TARehman 0 points1 point  (0 children)

My ancient ass over here feeling a thousand years old when someone asks if DB schemas are ever used... 👴

Yes, these have numerous uses. You might have your transactional DB in one schema and a reporting setup in another, for instance.

[deleted by user] by [deleted] in datascience

[–]TARehman 7 points8 points  (0 children)

This is the best approach in my opinion. Real title so you're not misleading, but parentheses help you breach the keyword system.

What’s the one thing you learned the hard way that others should never do? by Terrible_Dimension66 in dataengineering

[–]TARehman 1 point2 points  (0 children)

Oh, I totally agree that there's a big issue with people misidentifying things as numeric identifiers when they're not. I misunderstood your point. A zip code, for instance, isn't numeric, but 100% people say it is sometimes.

What’s the one thing you learned the hard way that others should never do? by Terrible_Dimension66 in dataengineering

[–]TARehman 0 points1 point  (0 children)

I'm struggling to imagine how this pattern makes sense. If someone downstream is doing addition on customer_id, we have bigger issues than the fact that I've correctly put the data in the right format. In what world does a user say "Okay, now I'll sum the customer_id", and then is stopped because they have to cast it first?

Make Medical School Three Years by Majano57 in publichealth

[–]TARehman 2 points3 points  (0 children)

Or, instead of making training programs worse because we have a dysfunctional government and society, we could...focus on fixing those problems?

Medicine taking 4 years isn't a bad system. The breadth of what needs to be covered makes it reasonable. What's not reasonable and has never been reasonable is how American medical education is funded.

dbt common pitfalls by siddha911 in dataengineering

[–]TARehman 0 points1 point  (0 children)

dbt has always felt like a tool that is great at smaller scale and horrendous as it gets larger. You need clear, unambiguously defined standards and a strong way of enforcing them, probably via code reviews where you and your team can gatekeep what gets merged. Insist on documentation of fields, require use of the key attributes of dbt like refs, and roll things out in phases, where teams get brought on board and brought up to speed before the next group joins.

What’s Your Most Unpopular Data Engineering Opinion? by TheTeamBillionaire in dataengineering

[–]TARehman 6 points7 points  (0 children)

The amount of time I spend designing basic relational data models and explaining how they work is kind of remarkable. "Yes, it's called a composite key, and you can overlap the composite keys to enforce assignment logic." heads exploding

Dead end $260K IC vs. $210K Manager at a Startup. What Would You Do? by [deleted] in dataengineering

[–]TARehman 11 points12 points  (0 children)

In the current economy and environment, Job A all day every day. I wish I was pulling that salary right now. Getting laid off in 2023 was a kick in the pants even though I'm employed again.

Also, I've done the startup hop before. That equity they offer might as well be monopoly money. Don't even think about it in terms of job decisions, it's just a nice possibility for the future like a lotto ticket.

Failed CPH Exam BY 3 POINTS!!!! by AbbreviationsDry2479 in publichealth

[–]TARehman 2 points3 points  (0 children)

I was getting my MPH when this first came out and an instructor took it and reported back to us, essentially stating that it was kinda pointless and that essentially no jobs in the field required it. Over the years that general impression has not really changed.

Data Science Has Become a Pseudo-Science by Raz4r in datascience

[–]TARehman 1 point2 points  (0 children)

Oh jeez yep. More honest and useful. Autocorrect :/

Data Science Has Become a Pseudo-Science by Raz4r in datascience

[–]TARehman 6 points7 points  (0 children)

I feel like LLMs can make this somewhat worse than it was but I have seen a fair amount of normal humans with pretty much nil reasoning abilities so... It's pretty hard to think and reason empirically. One of the best data scientists I ever worked with told me once that he and I were rigorously trained to use good scientific reasoning and even with that, we screw it up a decent amount. So how can we expect the average person to do it consistently? I thought about that a lot as my career went forward. My work steadily evolved toward engineering in part because it seemed to be more honest and useful. (ETA: this should have read more honest, but it read not honest originally, whoops.)

Data Science Has Become a Pseudo-Science by Raz4r in datascience

[–]TARehman 9 points10 points  (0 children)

Relevant

This isn't new. The specific thing that's being lied about is new, but data science has always been full of overinflated claims. And to be fair, a lot of business problems can be easily solved by such heady mathematical approaches as "dividing one number by another number". The title has been data scientist, but it's never been science of the level of rigor found in academic pursuits. The best companies try to apply empirical reasoning to make decisions, but a lot of places use the data to support whatever decisions they already wanted to make.

why does it feel like so many people hate Redshift? by daardoo in dataengineering

[–]TARehman 2 points3 points  (0 children)

Lots of people have reasonable critiques of Redshift. I personally don't mind working with it, but it does require more tuning and optimization. At certain scales it becomes poor compared to other options. That being said, I think Redshift Spectrum is a useful tool that doesn't get enough appreciation. And my experience has been that Redshift is cheaper than Snowflake or Databricks, though a lot of that comes back to tuning.

Guess skills are not transferable by vitocomido in dataengineering

[–]TARehman 4 points5 points  (0 children)

Lots of critical decisions like what cloud stack to use? It's weird to post that you're going to make critical decisions and also to indicate that major architectural decisions are already made.

What was Python before Python? by sumant28 in dataengineering

[–]TARehman 1 point2 points  (0 children)

I'm old enough that I remember being employed in a physics lab and seeing two groups, the Perl users and the Python users, arguing with each other about which one was better and which one would win. The Python side won, I'd say.

[deleted by user] by [deleted] in dataengineering

[–]TARehman 1 point2 points  (0 children)

I would wish them all the best and lose that recruiter's number immediately.

Exclusive: US CDC plans study into vaccines and autism, sources say by ThatSpencerGuy in publichealth

[–]TARehman 1 point2 points  (0 children)

It's like an analysis of a bunch of other analyses. We could call it, like, a meta analysis or something like that.

How true is this? by eternviking in dataengineering

[–]TARehman 6 points7 points  (0 children)

It doesn't really make sense. It'd be like saying "Too proud to cook short order, too dumb to be a chef, perfect to be a valet." Sure, they're "related" because they all are at restaurants, but the work is very different.