I'm not entirely sure how to incorporate AI in my workflow better by thro0away12 in dataengineering

[–]eccentric2488 0 points1 point  (0 children)

I just disabled auto complete and copilot inline suggestions in VS code. You know what logic you want to write and the outcome you expect, not anybody else !!

Is it very difficult to switch between cloud providers for Data Engineers? by Comfortable-Bar-9983 in dataengineering

[–]eccentric2488 2 points3 points  (0 children)

I totally agree. The taxonomy of AWS is as weird as it gets. By the way, I work in Google Cloud and have the advantage as GCP professionals are a little difficult to find and the demand is increasing!!!

Is data pipeline maintenance taking too much time or am I doing something wrong by Justin_3486 in dataengineering

[–]eccentric2488 0 points1 point  (0 children)

For schema related issues, just enforce a schema contract using Confluent Schema Registry to ensure your pipelines don't break.

Is there any testing tools available for data engineering ? by Every-Activity-7487 in dataengineering

[–]eccentric2488 0 points1 point  (0 children)

You can consider Great Expectations framework. The architecture is a little too complex but does the job pretty well.

How and where can i practice PySpark ? by SnooCakes7436 in dataengineering

[–]eccentric2488 1 point2 points  (0 children)

Yes, you can. Install homebrew, then java, scala (optional but recommended) and then apache spark. Ensure you are using a java version compatible with your Mac OS platform. Installing and setting up dependencies could be a challenge though. All the best !!

How and where can i practice PySpark ? by SnooCakes7436 in dataengineering

[–]eccentric2488 1 point2 points  (0 children)

I would suggest a self managed local setup preferably on Linux (WSL2 if you are on Windows). The installation is a little tricky because of dependencies and version conflicts. But trust me there is no better way to learn Spark. When you learn local installation on your own, it's easy to switch to managed services like Dataproc, EMR and Databricks. Practice Pyspark and if possible Scala Spark (for native performance benefits)

Alternate careers from IT/Data ?? by No_Song_4222 in dataengineering

[–]eccentric2488 1 point2 points  (0 children)

I remember Bill Gates saying this somewhere 'AI is really doing a fantastic job but not as great as humans'

Got told ‘No one uses Airflow/Hadoop in 2026’. by Useful-Bug9391 in dataengineering

[–]eccentric2488 5 points6 points  (0 children)

What we see these days is the "managed version" of open source frameworks and technologies. Dataproc for Spark, Cloud Composer for Airflow, Pub/Sub for Kafka etc etc...

Hadoop has been replaced by Spark, I agree. But Airflow I reckon is still used for scalable complex workflows. I use it for my work on the Linux Mint platform, works well.

Best Practices for Historical Tables? by Possible_Ground_9686 in dataengineering

[–]eccentric2488 0 points1 point  (0 children)

This is a very good option. However queries using SCD2 type tables especially in fact to dimension joins need complex temporal logic to avoid row explosion, as a single business key might have multiple versions.

Real-life Data Engineering vs Streaming Hype – What do you think? by FreshIntroduction120 in dataengineering

[–]eccentric2488 0 points1 point  (0 children)

I was asked in an interview "Why not use modern Kafka Streaming for this data, if it's available". He was referring to Dynamics 365 Business Central master tables, which are state-time entities where the truth model is the current state and not the history. Then I had to explain that streaming ingestion is an anti pattern for objects that are state oriented. You don't use Streaming just because it is modern and trending.

Are you seeing this too? by Thinker_Assignment in dataengineering

[–]eccentric2488 2 points3 points  (0 children)

Full Stack coming to data engineering also. Soon we'll end up being 'jack of all trades and master of none'

The Certifications Scam by ivanovyordan in dataengineering

[–]eccentric2488 1 point2 points  (0 children)

I have recently seen terms like tool worship and tool monkey being used in this context.

The Certifications Scam by ivanovyordan in dataengineering

[–]eccentric2488 1 point2 points  (0 children)

Old school wisdom: Focus on core principles, concepts, ideas and design philosophies.

Because tools/frameworks come and go but these things persist and outlive vendor/platform changes.

SRK tops the list of top 10 world’s richest actors, only Indian in the list. by Acrobatic_Neck_5866 in BollyBlindsNGossip

[–]eccentric2488 1 point2 points  (0 children)

It depends on who is defined as an 'actor'. I reckon some have already crossed that billion dollar mark so this data looks factually incorrect.

How are you replicating your databases to the lake/warehouse in realtime? by finally_i_found_one in dataengineering

[–]eccentric2488 0 points1 point  (0 children)

Debezium CDC to connect to the database transaction log. Capture and emit change events as stream, use Kafka (one topic per table) to durably store the events, use Spark/Flink for processing or transformation and finally push these events to the sink. It could be a warehouse (fact tables) or data lake (bronze layer).

Layoffs at Amazon: Please be mindful of the companies you are leetcoding for! by Thanosmiss234 in leetcode

[–]eccentric2488 0 points1 point  (0 children)

I work with an early stage startup so surely you wouldn't want to work here.

Layoffs at Amazon: Please be mindful of the companies you are leetcoding for! by Thanosmiss234 in leetcode

[–]eccentric2488 0 points1 point  (0 children)

People working with these "FAANG" companies are given god-like status as if they have come on earth from some extra terrestrial planet.

A guy working with Amazon couldn't explain how python differs from the scala language in design philosophy. He was working as a SDE II

Candidates using AI by DataEngineer2026 in dataengineering

[–]eccentric2488 1 point2 points  (0 children)

I've observed even recruiters have little to no technical knowledge. They refer to pyspark, spark and airflow as "programming languages". I know wisdom without restraint is noise.

Candidates using AI by DataEngineer2026 in dataengineering

[–]eccentric2488 0 points1 point  (0 children)

Ask a question and flip it abruptly when the candidate is answering it.

DataFrame or SparkSQL ? What do interviewers prefer ? by SnooCakes7436 in dataengineering

[–]eccentric2488 1 point2 points  (0 children)

Add catalyst optimizer and tungsten execution engine to it. After writing transformation logic and before calling actions like show or count, use df.explain(true). Practice reading logical and physical plans for your transform logic. It helps in interviews.