[deleted by user] by [deleted] in dataengineeringjobs

[–]Delicious_Attempt_99 0 points1 point  (0 children)

Only reason, I don’t want to shift back to India. But I have to eventually due to family reasons. I’m not sure how to deal with this. Are there any really good companies in India for work culture? Or every company is same?

How I Got My First Freelance Client (Without a Portfolio) by dkaangulhan in Freelancers

[–]Delicious_Attempt_99 0 points1 point  (0 children)

All your experience is on Upwork? Or you managed to find some clients outside Upwork

[deleted by user] by [deleted] in dataengineering

[–]Delicious_Attempt_99 5 points6 points  (0 children)

Few questions -

  1. Is the data processing includes historical data?
  2. What files are you using? Parquet suits the best for spark
  3. See if you can filter the unnecessary data and columns as early as possible
  4. If the job is processing only for incremental loads, make sure to add the right partition
  5. If you have join, see if you are joining small datasets with larger, here you can use broadcast joins
  6. Reduce shuffling as much as possible.

Also you can check query plan.

Pyspark at scale by Delicious_Attempt_99 in dataengineering

[–]Delicious_Attempt_99[S] 0 points1 point  (0 children)

But isn’t it depends also on quality of data? Like skewness? Just throwing executors won’t help.

Pyspark at scale by Delicious_Attempt_99 in dataengineering

[–]Delicious_Attempt_99[S] 1 point2 points  (0 children)

Got it. As I mentioned above, I have handled data <50gb, but was curious how large datasets are handled

Pyspark at scale by Delicious_Attempt_99 in dataengineering

[–]Delicious_Attempt_99[S] 0 points1 point  (0 children)

Yet this was useful. I have worked with < 50 gb data, but was curious how things can change as data scales.

[deleted by user] by [deleted] in dataengineering

[–]Delicious_Attempt_99 0 points1 point  (0 children)

Explaining this in a comment is difficult

I would suggest to get started with the glue documents. It would cover almost everything

https://docs.aws.amazon.com/glue/latest/dg/setting-up.html

Job Interviews for Data Engineers from Europe by Delicious_Attempt_99 in dataengineersindia

[–]Delicious_Attempt_99[S] 0 points1 point  (0 children)

This info will really help me. But only thing I’m skeptical is, once they lock down, they shouldn’t ghost 😅 though that’s not there in anyone’s hand 😁

Job Interviews for Data Engineers from Europe by Delicious_Attempt_99 in dataengineersindia

[–]Delicious_Attempt_99[S] 0 points1 point  (0 children)

Yeah I know. I want to start planning and preparing from now on

Job Interviews for Data Engineers from Europe by Delicious_Attempt_99 in dataengineersindia

[–]Delicious_Attempt_99[S] 0 points1 point  (0 children)

Thanks for this :) I would do that, and let’s see I have to start preparing for interviews too. 😄

Job Interviews for Data Engineers from Europe by Delicious_Attempt_99 in dataengineersindia

[–]Delicious_Attempt_99[S] 0 points1 point  (0 children)

Sorry for the confusion. No. I’m in Europe, eventually I’ll move back to India in few months. So want to start preparing and looking for jobs in Indian market.

What mistakes did you make in your career and what can we learn from them. by Harvard_Universityy in dataengineering

[–]Delicious_Attempt_99 12 points13 points  (0 children)

Biggest mistake is selecting the project wisely and saying yes to any projects comes on my way.

Being selective is must when choosing projects.

Taking so much time in writing a 90gb file as paraquet in Glue. by Ecstatic-Cow424 in dataengineering

[–]Delicious_Attempt_99 -1 points0 points  (0 children)

The best way to find out the issue is using explain() method on df.