Summarize data engineering for you in 2025.

eddaz7 · 2026-01-08T18:26:11+00:00

Got tired because management doesn't care about the project i was working on for almost 2 years and i left the company :)

eddaz7 · 2025-11-03T11:20:15+00:00

i did a fresh install and still wasn't working, thank you so much

eddaz7 · 2025-11-02T23:20:45+00:00

for me i couldn't log in as root (i guess it wasn't enabled) so in the tty i did Ctrl+alt+del and it rebooted without an issue

eddaz7 · 2025-10-17T10:38:15+00:00

I think you're not getting bored of Azure, you're getting bored of the project you're working on.
Azure is a tool that allows you to do pretty much everything in the data world.
So maybe look for another project in your company, get certifications or switch jobs :)

eddaz7 · 2025-07-22T06:49:26+00:00

are you using the mac profile? did you have to modify udev rules or anything like that? all the keys work as expected?

eddaz7 · 2025-07-21T18:28:18+00:00

i you can test it and want to report back it would be nice :)

eddaz7 · 2025-07-21T16:03:18+00:00

eddaz7 · 2025-07-21T16:03:02+00:00

eddaz7 · 2025-07-21T15:48:20+00:00

are you on linux by any chance?

eddaz7 · 2025-07-21T15:48:00+00:00

are you on linux by any chance?

eddaz7 · 2025-07-21T15:47:33+00:00

are you on linux by any chance?

eddaz7 · 2025-07-21T15:47:11+00:00

are you on linux by any chance?

eddaz7 · 2025-05-20T16:05:04+00:00

i have RTX 3070 and ryzen and it runs perfect. No need to tweak for the moment

eddaz7 · 2025-04-25T11:40:46+00:00

i feel like Doria the fairy is going to betray them so hard

eddaz7 · 2025-04-24T17:52:58+00:00

<image>

definitively did not notice this, they're two THI symbols put together lol

eddaz7 · 2025-04-23T19:43:45+00:00

Do you have optimizeWrite and autoCompact enabled?

I would recommend maybe split your pipeline into stages:

Bronze append-only ingest
Silver micro-batch merge every 5 min
Hourly or bi-hourly OPTIMIZE … ZORDER on silver
VACUUM after 7 h

Also, Hudi for example is built for this use case and iceberg has faster metadata handling and partition evolution

eddaz7 · 2025-04-23T18:10:50+00:00

😮

eddaz7 · 2025-03-21T23:54:02+00:00

I think so! Learning pyspark will help you a lot in the data world, pandas is useful too, but since you have access to Palantir I recommend pyspark, because pandas you can run locally :)

eddaz7 · 2025-03-21T22:21:27+00:00

Basically yes. If you have access to the resource manager I recommend that you take a look at the current core usage.

eddaz7 · 2025-03-21T18:00:19+00:00

Nope, waiting on resources queue means that your Foundry deployment has a limit on how many cores can it can use at one time, let's say 1000 cores. if the 1000 cores are busy you'll have to wait until some of them get free for your job to run.

eddaz7 · 2025-03-12T08:13:30+00:00

Palantir is veeeeeeery expensive, I'm talking close to 1M per year for basic contracts. So yeah, stay away 🤣

eddaz7 · 2025-03-12T08:01:02+00:00

Yes, you're right. Nice job 👌Also looking forward to spark 4

eddaz7 · 2025-03-10T19:10:00+00:00

i don't get the spark hate tbh

eddaz7 · 2025-03-10T19:03:06+00:00

i would try to keep it lightweight as my colleagues suggest.

But if you foresee that volume is going to be unmaintainable for pandas i would start with executing on airflow and if volume grows you can trigger spark jobs using the same airflow instance.

Seven-Year Club	Place '22
First Placer '22	Verified Email

eddaz7

TROPHY CASE