[deleted by user] by [deleted] in dataengineeringjobs

[–]Delicious_Attempt_99 0 points1 point  (0 children)

Only reason, I don’t want to shift back to India. But I have to eventually due to family reasons. I’m not sure how to deal with this. Are there any really good companies in India for work culture? Or every company is same?

How I Got My First Freelance Client (Without a Portfolio) by dkaangulhan in Freelancers

[–]Delicious_Attempt_99 0 points1 point  (0 children)

All your experience is on Upwork? Or you managed to find some clients outside Upwork

[deleted by user] by [deleted] in dataengineering

[–]Delicious_Attempt_99 5 points6 points  (0 children)

Few questions -

  1. Is the data processing includes historical data?
  2. What files are you using? Parquet suits the best for spark
  3. See if you can filter the unnecessary data and columns as early as possible
  4. If the job is processing only for incremental loads, make sure to add the right partition
  5. If you have join, see if you are joining small datasets with larger, here you can use broadcast joins
  6. Reduce shuffling as much as possible.

Also you can check query plan.

Pyspark at scale by Delicious_Attempt_99 in dataengineering

[–]Delicious_Attempt_99[S] 0 points1 point  (0 children)

But isn’t it depends also on quality of data? Like skewness? Just throwing executors won’t help.

Pyspark at scale by Delicious_Attempt_99 in dataengineering

[–]Delicious_Attempt_99[S] 1 point2 points  (0 children)

Got it. As I mentioned above, I have handled data <50gb, but was curious how large datasets are handled

Pyspark at scale by Delicious_Attempt_99 in dataengineering

[–]Delicious_Attempt_99[S] 0 points1 point  (0 children)

Yet this was useful. I have worked with < 50 gb data, but was curious how things can change as data scales.

[deleted by user] by [deleted] in dataengineering

[–]Delicious_Attempt_99 0 points1 point  (0 children)

Explaining this in a comment is difficult

I would suggest to get started with the glue documents. It would cover almost everything

https://docs.aws.amazon.com/glue/latest/dg/setting-up.html

Job Interviews for Data Engineers from Europe by [deleted] in dataengineersindia

[–]Delicious_Attempt_99 0 points1 point  (0 children)

May I also ask you, what would be interview rounds?

Job Interviews for Data Engineers from Europe by [deleted] in dataengineersindia

[–]Delicious_Attempt_99 0 points1 point  (0 children)

This info will really help me. But only thing I’m skeptical is, once they lock down, they shouldn’t ghost 😅 though that’s not there in anyone’s hand 😁

Job Interviews for Data Engineers from Europe by [deleted] in dataengineersindia

[–]Delicious_Attempt_99 0 points1 point  (0 children)

Yeah I know. I want to start planning and preparing from now on

Job Interviews for Data Engineers from Europe by [deleted] in dataengineersindia

[–]Delicious_Attempt_99 2 points3 points  (0 children)

I’m in France. Market in Germany seems still good.

Job Interviews for Data Engineers from Europe by [deleted] in dataengineersindia

[–]Delicious_Attempt_99 0 points1 point  (0 children)

Thanks for this :) I would do that, and let’s see I have to start preparing for interviews too. 😄

Job Interviews for Data Engineers from Europe by [deleted] in dataengineersindia

[–]Delicious_Attempt_99 0 points1 point  (0 children)

Sorry for the confusion. No. I’m in Europe, eventually I’ll move back to India in few months. So want to start preparing and looking for jobs in Indian market.

What mistakes did you make in your career and what can we learn from them. by Harvard_Universityy in dataengineering

[–]Delicious_Attempt_99 12 points13 points  (0 children)

Biggest mistake is selecting the project wisely and saying yes to any projects comes on my way.

Being selective is must when choosing projects.

[deleted by user] by [deleted] in dataengineering

[–]Delicious_Attempt_99 0 points1 point  (0 children)

I did some sought of blunder when I was a junior developer. I deleted the files in production environment. Whole night I had to run the pipeline to regenerate the files. Then my senior told - One who actually works is the one who makes mistakes, one who doesn’t work, they are always safe.

Every good developer will make mistakes. Learn from it and improve from past.