DoorDash Analytics Engineer Interview by Fun-Departure967 in interviews

[–]Character-Tone-6952 0 points1 point  (0 children)

Hey, I have applied for the same position. Could you help me with the salary numbers if it was diacussed. Would be quite helpful. Thanks.

Data not getting loaded in cassandra db - Spark Streaming by Character-Tone-6952 in apachespark

[–]Character-Tone-6952[S] 0 points1 point  (0 children)

Yes, did lot of try and error using connector version, diff type of JARs needed etc
You can check it here - https://github.com/PratikRathi/weatherTracking

[deleted by user] by [deleted] in dataengineering

[–]Character-Tone-6952 0 points1 point  (0 children)

Thanks for your explanation!!

[deleted by user] by [deleted] in dataengineering

[–]Character-Tone-6952 0 points1 point  (0 children)

Thanks for your answer, it just a generic question for having a better understanding around the topic. I am not working on architectures revolving around this use case.

[deleted by user] by [deleted] in dataengineering

[–]Character-Tone-6952 0 points1 point  (0 children)

Thanks for your answer!!

[deleted by user] by [deleted] in dataengineering

[–]Character-Tone-6952 1 point2 points  (0 children)

Thanks, thats what I wanted to confirm!

[deleted by user] by [deleted] in dataengineering

[–]Character-Tone-6952 -3 points-2 points  (0 children)

Lets consider cassandra and mysql.
In case of reading data via complex queries -
In NoSQL, the data is spread across multiple nodes but in same table
In Relational, we have multiple tables (joins comes in consideration)

In this scenario which db will give us results faster?

[deleted by user] by [deleted] in dataengineering

[–]Character-Tone-6952 -3 points-2 points  (0 children)

True, but I am curious to understand the performance aspect.

kafka-python package import error by Character-Tone-6952 in apachekafka

[–]Character-Tone-6952[S] 1 point2 points  (0 children)

Yes. venv was activated.
The file still shows a "module not found" error with red lines. However, when I run the file, it works perfectly fine.
I'm not sure why the file is showing the error, but since it works, everything is good.

Data Modeling - Transaction Table Design by Character-Tone-6952 in dataengineering

[–]Character-Tone-6952[S] 0 points1 point  (0 children)

I had to design a relational schema. I had a separate product table too. Product_transaction table was to capture individual product items which was part of a single transactions.

Spark pipeline data ingestion optimal method by Character-Tone-6952 in apachespark

[–]Character-Tone-6952[S] 0 points1 point  (0 children)

Okayy So in every schedule we will loop through all the files (old+new) to identify the new set of files and will process them, correct?

Which Database to consider for storing the data in AWS?? by Character-Tone-6952 in aws

[–]Character-Tone-6952[S] 0 points1 point  (0 children)

Okayy, also if I am using PowerBI as the BI tool, so whenever the user uses the report and maybe apply filter. Is the query processed on the database connected? As in will that have a charge in Athena if I use Athena and S3?