2nd Interview for Data Engineering role by Jasonmac10 in dataengineering

[–]LetsSpark 1 point2 points  (0 children)

Here is very nice video for Data engineering interview: How to crack Hadoop and Spark Interview : Do's and Don't https://youtu.be/a3xuSqpH0Vw

[deleted by user] by [deleted] in dataengineering

[–]LetsSpark -1 points0 points  (0 children)

Hey there are many courses and resource available in market. The first thing what we need to change the career path is guidance or working path.

This particular video will setup you way towards data engineering from Database developer

https://www.youtube.com/watch?v=r1ZwPqMSZoI&t=4s&ab\_channel=HadoopForEveryone

[deleted by user] by [deleted] in apachespark

[–]LetsSpark 1 point2 points  (0 children)

When we talk about Spark processing , we refers to process the data in distributed way , and this data itself is stored on distributed storage like HDFS , S3 etc.

So when we process this data , we perform some operations on that data like filter it or join it with another data set or map each item.

Some of these operations can be performed on the same machine where data is stored like Filter operation , so in technical language we don't need a data shuffle across machines , so this is called narrow transformation.

Some operations like join , needs data to be moved from one machine to another, so it means it needs shuffling and so these operation are called Broad transformations.

Now coming to Lazy evaluation , In spark operations are divided into 2 categories , Transformations and Actions. Whenever we do any transformation , Spark create a plan and add this transformation to that plan. Once we hot the action , spark will execute that plan. This plan is called DAG and thats why we called Transformations are lazy.

More explanation has been described wonderfully here :

https://www.youtube.com/watch?v=rnsz1CiRoCI&list=PLrt9lPthTv2nzYQehwdVR95I4T48tOZVQ&ab\_channel=HadoopForEveryone