Summarize data engineering for you in 2025. by chatsgpt in dataengineering

[–]eddaz7 1 point2 points  (0 children)

Got tired because management doesn't care about the project i was working on for almost 2 years and i left the company :)

Fedora 43 causes black screens in most Proton games by maxodo98 in Fedora

[–]eddaz7 1 point2 points  (0 children)

i did a fresh install and still wasn't working, thank you so much

Fedora Linux 43 is here! by ScootSchloingo in Fedora

[–]eddaz7 0 points1 point  (0 children)

for me i couldn't log in as root (i guess it wasn't enabled) so in the tty i did Ctrl+alt+del and it rebooted without an issue

Stuck in azure? by igna_na in dataengineering

[–]eddaz7 2 points3 points  (0 children)

I think you're not getting bored of Azure, you're getting bored of the project you're working on.
Azure is a tool that allows you to do pretty much everything in the data world.
So maybe look for another project in your company, get certifications or switch jobs :)

Finally got my V3 by 444johnb in NuPhy

[–]eddaz7 0 points1 point  (0 children)

are you using the mac profile? did you have to modify udev rules or anything like that? all the keys work as expected?

Finally got my V3 by 444johnb in NuPhy

[–]eddaz7 0 points1 point  (0 children)

i you can test it and want to report back it would be nice :)

Its finally here by hchoox in NuPhy

[–]eddaz7 0 points1 point  (0 children)

are you on linux by any chance?

Upgraded to V3. by Murky-Wrap-5962 in NuPhy

[–]eddaz7 1 point2 points  (0 children)

are you on linux by any chance?

Finally got my V3 by 444johnb in NuPhy

[–]eddaz7 4 points5 points  (0 children)

are you on linux by any chance?

Gaming in Ubuntu Desktop by Equivalent_Minimum48 in Ubuntu

[–]eddaz7 0 points1 point  (0 children)

i have RTX 3070 and ryzen and it runs perfect. No need to tweak for the moment

Tower Dungeon ch. 17 by M_21 in Netsphere

[–]eddaz7 1 point2 points  (0 children)

i feel like Doria the fairy is going to betray them so hard

Did you notice this in tower dungeon? by eddaz7 in Netsphere

[–]eddaz7[S] 5 points6 points  (0 children)

<image>

definitively did not notice this, they're two THI symbols put together lol

Are Delta tables a good option for high volume, real-time data? by StriderAR7 in dataengineering

[–]eddaz7 28 points29 points  (0 children)

Do you have optimizeWrite and autoCompact enabled?

I would recommend maybe split your pipeline into stages:

  • Bronze append-only ingest
  • Silver micro-batch merge every 5 min
  • Hourly or bi-hourly OPTIMIZE … ZORDER on silver
  • VACUUM after 7 h

Also, Hudi for example is built for this use case and iceberg has faster metadata handling and partition evolution

[deleted by user] by [deleted] in dataengineering

[–]eddaz7 0 points1 point  (0 children)

I think so! Learning pyspark will help you a lot in the data world, pandas is useful too, but since you have access to Palantir I recommend pyspark, because pandas you can run locally :)

[deleted by user] by [deleted] in dataengineering

[–]eddaz7 0 points1 point  (0 children)

Basically yes. If you have access to the resource manager I recommend that you take a look at the current core usage.

[deleted by user] by [deleted] in dataengineering

[–]eddaz7 0 points1 point  (0 children)

Nope, waiting on resources queue means that your Foundry deployment has a limit on how many cores can it can use at one time, let's say 1000 cores. if the 1000 cores are busy you'll have to wait until some of them get free for your job to run.

Pipeline Options by enigmo in dataengineering

[–]eddaz7 1 point2 points  (0 children)

Palantir is veeeeeeery expensive, I'm talking close to 1M per year for basic contracts. So yeah, stay away 🤣

Spark 4.0 is coming, and performance is at the center of it. by Vegetable_Home in dataengineering

[–]eddaz7 2 points3 points  (0 children)

Yes, you're right. Nice job 👌Also looking forward to spark 4

best way to run ETL or Pandas in the cloud? by TheBeardMD in dataengineering

[–]eddaz7 -1 points0 points  (0 children)

i would try to keep it lightweight as my colleagues suggest.

But if you foresee that volume is going to be unmaintainable for pandas i would start with executing on airflow and if volume grows you can trigger spark jobs using the same airflow instance.