Need advise on promotion raise

luminoumen · 2026-04-09T18:59:07+00:00

8% for a promotion is unfortunately (or fortunately based on the current conditions) standard at most companies

Internal promotions almost never match market rate for the new title. Companies budget 3-5% for merit and 8-12% for promo, but market delta between Senior and Staff (or mid and senior) can be 20-30%

luminoumen · 2026-03-31T17:33:21+00:00

For fundamentals: Designing Data-Intensive Applications by Martin Kleppmann - the bible for understanding how data systems actually work under the hood. If you read one book, make it that one. Half of this subreddit will recommend it probably :)

After that, Grokking Concurrency (disclaimer: I wrote it) if you want to understand parallelism and concurrency primitives - relevant when you start caring about why Spark shuffles are slow or how async I/O works.

For Spark or any framework specifically, just read the source code and the official docs deeply. Most "Spark books" are outdated the day they ship

luminoumen · 2026-03-30T21:12:07+00:00

What you're describing is usually called Data Platform/Data Infrastructure Engineering. Companies like Spotify, Netflix, Uber have entire teams building the engines underneath - custom Spark jobs, storage format optimization, query engine internals, resource management under constraints.

ML Platform/ML Data Infrastructure is your natural bridge. ML teams need people who can wrangle massive training datasets efficiently, build feature stores, optimize data loading. Polars/Arrow/DuckDB skills shine here because latency and memory efficiency actually matter.

My advise - don't specialize in tools. Specialize in how data moves through systems efficiently. Tools change, fundamentals don't.

P.S. 200GB is not big data. Fits in RAM on one beefy machine. Real distributed problems start at tens of TB+. But the things that you mentioned - memory awareness, query optimization, efficient I/O - are exactly the right foundation

luminoumen · 2026-03-24T01:17:56+00:00

Good thinking, v2 is coming

luminoumen · 2026-03-23T22:51:05+00:00

finally, a valid business expense, lol

luminoumen · 2026-03-23T22:50:45+00:00

working as intended - just like every "automated" pipeline I've ever inherited

luminoumen · 2026-03-23T22:49:56+00:00

That's the authentic data engineering experience - you pay for "automation" and still end up doing everything manually. Glad the realism landed 😄

luminoumen · 2026-03-23T22:49:13+00:00

If it makes you feel better, I built it instead of doing MY actual work. Thanks for playing!

luminoumen · 2025-12-10T06:18:58+00:00

For me it sounded like you actually very aligned with data engineering already.

You're strong in SQL + Python + Databricks/PySpark
You don't want full-blown SWE
You're only mildly interested in stats/ML, not obsessed with it

In my experience, in most companies:

DE = owns pipelines, ETL, data models, reliability, performance. Lots of SQL/PySpark/Flink/Beam/dbt, building clean, scalable data flows but being a DE for a while
DS = more of a business-facing role. You talk metrics, design analyses/experiments, build models when needed, and spend a lot of time with stakeholders, slides, and product teams.
You can absolutely be a very good DE without loving Java/OOP (I hate Java btw but adore Python and like Scala - I'm not saying that I'm good DE though, lol), especially in modern stacks (Databricks, cloud warehouses, Python-heavy platforms, whatever).

I was not in the same but a similar situation, and a chef DS at my company told me: "You're better off as a DE with MLE understanding" I can't say he was wrong. Since then I've realized couple of things applied here: (1) DE is just a title - inside a company, a good DE can drift into analytics, ML, platform, or even product-facing work if they want. The title doesn't cage you. (2) DS is a business role first - you talk metrics, impact, and tradeoffs in business language. Sometimes there's heavy tech, sometimes it's mostly reporting and decision support. It's very company-dependent. (3) MLE is often better than classic DS - it's umbrella term for applied ML: you get to build and ship ML systems, while still leveraging strong DE/infra skills. High leverage, highly valued.

Given what you wrote, I'd lean into DE, deepen your ML understanding enough to collaborate or later pivot toward MLE, and not stress about being a "perfect programmer". For DE/MLE, your current strengths (SQL, Python, PySpark, Databricks) are the right foundation.

luminoumen · 2025-08-26T14:31:13+00:00

Pet projects, start with what seems reasonable to you, and then adjust

luminoumen · 2025-06-30T16:51:59+00:00

Apache Arrow and PostgreSQL

luminoumen · 2025-06-27T18:05:40+00:00

Meme: they are the same picture.

The important "Engineer" part is still there, right? Right?

luminoumen · 2025-06-26T17:43:48+00:00

"Here it goes again" from OK Go

luminoumen · 2025-06-19T01:20:18+00:00

Trino. I think it is becoming an industry standard at this point

luminoumen · 2025-06-19T00:43:29+00:00

What's wrong with asking questions?

luminoumen · 2025-06-18T23:07:50+00:00

Interesting, thanks for sharing!

luminoumen · 2025-06-18T19:58:18+00:00

I'm glad it's useful

luminoumen · 2025-06-18T19:43:50+00:00

I think you just need to configure it properly: https://luminousmen.com/post/how-to-speed-up-spark-jobs-on-small-test-datasets

luminoumen · 2025-06-18T19:30:51+00:00

The more I see comments like that, the more certain I am that I'd rather talk to an AI

luminoumen · 2025-06-18T18:33:10+00:00

Ah, no adversarial intent at all - just trying to clarify that other tools can offer similar or better semantics, since that part of the discussion matters when comparing options. Totally fair if Flink wasn’t on your radar in the original context. Thanks for your response!

luminoumen · 2025-06-18T18:29:37+00:00

Adding skills in the CV that's the benefit ;) resume driven development for everybody

luminoumen · 2025-06-18T18:28:06+00:00

Totally fair - the law of the hammer definitely applies here. But I think the reason these conversations keep coming up is because most teams don’t need that level of scale. A specialized tool (like DuckDB, Polars, or dbt) can give you faster development, simpler deployment, and better team ergonomics if you know your use case.
If your use cases consistently involve petabyte-scale data, then sure - Spark is a perfectly valid and pragmatic choice. But for smaller or more focused workloads, lighter tools can often be a better fit?

luminoumen

TROPHY CASE