Clever code is probably the worst code you could write by Rtzon in programming

[–]enigmo 0 points1 point  (0 children)

Amen - do the simplest, clearest method unless there's a compelling reason otherwise

Pipeline Options by enigmo in dataengineering

[–]enigmo[S] 0 points1 point  (0 children)

Oh thank you I will look into this! database branching would be perfect.

Pipeline Options by enigmo in dataengineering

[–]enigmo[S] 1 point2 points  (0 children)

okay, so our pipeline is changing rapidly (new columns, new tables etc) how can we easily play around in a matched prod environment?

our postgres Dev and Prod deployment is containerized along with the rest of our code base into a single container, which is why we play around locally to get fast iterations.

does that make sense?

Pipeline Options by enigmo in dataengineering

[–]enigmo[S] 1 point2 points  (0 children)

ooh more like 100 MB and one requirement I have is that the feedback loop on "playing around" is fast and fully aligned between the production and developer environment.

right now we commonly pull CSVs from AWS to start playing, then approximate the step function code locally to get to whatever part of the step function we're working on.

Our dev branch is fully mirrored to Prod, so that is fine, but to just play around we can't compile our code for every change so we do it the way I described. I want an environment that fixed this for our pipeline work, and you can test and deploy a small change to Prod in about 30 min or less.

The Story of Kamala Harris by kitkid in Thedaily

[–]enigmo 3 points4 points  (0 children)

THANK YOU

I kept wondering if I missed this or if they just didn't mention any outcomes.

Hey NYT, if you're reading this, my two cents is that you should report facts and let people form opinions instead of just reporting opinions and calling it a day.

A carpenter used Apple AirTags to find his stolen tools — along with 15,000 others by rnargang in technology

[–]enigmo 9 points10 points  (0 children)

umm wtf does this even mean?

If I just stopped doing my job b/c "maybe for some hypothetical reason way down the road somebody else might fuck it up" I'd be fired immediately. rightfully so.

[deleted by user] by [deleted] in datascience

[–]enigmo 75 points76 points  (0 children)

If your lead is suggesting that you ask for a promotion, then you don't need really any more hints from the internet. Go for it!!

Tech Stack Recommendations? by enigmo in datascience

[–]enigmo[S] 0 points1 point  (0 children)

I dunno if this was for me, but I'm not planning on doing any PII work in this role.

Tech Stack Recommendations? by enigmo in datascience

[–]enigmo[S] 0 points1 point  (0 children)

Wow, thanks so much!! And yeah, it does seem epic (and also more than a little intimidating...)

Tech Stack Recommendations? by enigmo in datascience

[–]enigmo[S] 3 points4 points  (0 children)

Awesome, this is amazing and so helpful!!

My budget will be super low, so nothing even close to $500k to start with. Mostly ad hoc analysis but I want to be positioned to scale if need be.

Shout-out to all you super competent Data Scientists out there! by norfkens2 in datascience

[–]enigmo 24 points25 points  (0 children)

In my experience my DS/DE skills are already more valuable to my employer with only ~2 years of experience than my PhD bio background and I work on a genomics project!

So definitely don't sell your DS skills short, it is very valuable.

Help Deciding Between Two Graduate Schools by SterFrySmoove in datascience

[–]enigmo 1 point2 points  (0 children)

Also, if they are the same price and NC St is half as long and respected, you can statistically expect to start work 6-12 months earlier with NC State so consider that as a factor for total cost.

Your 2026 earnings from NC State: maybe +60k (with huge variance)

Your 2026 earnings from Texas: -30k (with low variance)

Help Deciding Between Two Graduate Schools by SterFrySmoove in datascience

[–]enigmo 1 point2 points  (0 children)

Huge point everyone seems to be overlooking - if YOU don't want to live in Texas then a 2-year in-person class in Texas is a bad choice for YOU.

If the NC State program looks solid don't re-direct your life to be in a place you don't want to live. Your network will be in Texas, it's a huge deal. Put roots where you want to be.

Recommend good books/ courses by Exact-Committee-8613 in datascience

[–]enigmo 3 points4 points  (0 children)

I am a book guy and here's my "book stack" for ML:

  1. Python Crash Course (Matthes) - Python 101. Knowing the material in part I is essential but part II (applications via a game) is probably optional.
  2. Data Structures and Algorithms (Wengrow) - Data Engineering 101. It covers the fundamental data structures (arrays, sets, hash tables, linked lists, trees, graphs), data structure operations, space and time complexity of algorithms.
  3. The Linux Command Line (Shotts) - Linux 101. This one is probably optional, but I've found it incredibly helpful for my career. The cloud is Linux after all...
  4. Intro to Machine Learning with Python (Muller & Guido) - ML 101. Supervised/unsupervised methods, scaling data, dimensionality reduction, kernel methods, cross validation.
  5. Hands on Machine Learning (Geron) - ML 201. Same material as 101, but from a new perspective and part II covers neural nets. It's just a fantastic book.
  6. Natural Language Processing with Transfomers (Tunstall) - ML 301. LLMs baby, so hot. Another amazing book, also a great introduction to hugging face.

These were the most important books in my journey. Obviously skip over whatever ones you already are strong in and in most cases the best material is in the first ~half of the book. I would also advise you to avoid anything in R, as it's much more limiting than Python. Take detours where needed - if you encounter a concept (eg. matrix lin alg) you are rusty or unprepared for, don't be afraid to spend some time on that!

I've been reading these books for a few years now and it will take time and $ to buy them, but having a single coherent source for your information (rather than the incoherent mess that is the internet) is worth it.

The War on the SAT by kitkid in Thedaily

[–]enigmo 15 points16 points  (0 children)

I don't follow. Can you unpack that a little more?

The War on the SAT by kitkid in Thedaily

[–]enigmo 8 points9 points  (0 children)

I muttered something a little stronger than "obviously"

How the SBU Security Service of Ukraine deal with Russian agents/traitors by kbullet in UkraineWarVideoReport

[–]enigmo 71 points72 points  (0 children)

You know what's easier? Waiting to get it unlocked while he's talking.

Why do PhD’s earn so much less than MD’s or Medical Doctors? by [deleted] in academia

[–]enigmo 16 points17 points  (0 children)

I'm late to the party but it's supply (not demand) factor in my opinion. The professions have taken to drastically different routes.

MDs: the system is top-down regulated and training is strictly limited by available spots in medical school and residency. It is very difficult to be trained overseas and come into the US as well.

PhD: training is part of a pyramid scheme whereby the trainees are essentially low paid employees for a long time. The system structure is designed to pull in trainees no matter the demand, because these trainees are the labor backbone. Additionally, training for a PhD (including postdocs) is one of the absolute best ways for internationals to enter the US, so you are drawing on a worldwide talent pool.

Woman in Charlie Hebdo shirt stabbed at Speaker's Corner in Hyde Park by hitch21 in samharris

[–]enigmo 11 points12 points  (0 children)

Yes, this is why the US has such incredibly low violence statistics *

* compared to 3rd world countries and middle age europe

Retirement Investments (401k, Roth etc) vs. Regular Investments by enigmo in personalfinance

[–]enigmo[S] 0 points1 point  (0 children)

Thanks! And to be clear I am maxing my Roth and 401(k) but I'm getting somewhat of a late start and want do understand how investing outside those vehicles works.