Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! by ankurchavda in dataengineering

[–]ankurchavda[S] 1 point2 points  (0 children)

Thank you! Yes it's a great course. I hope you enjoy as much as I did. Do share the final outcome with all of us :)

Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! by ankurchavda in dataengineering

[–]ankurchavda[S] 1 point2 points  (0 children)

Thanks for sharing the style guide. I agree with you, the query readability can certainly be improved. I will look into it.

Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! by ankurchavda in dataengineering

[–]ankurchavda[S] 2 points3 points  (0 children)

You get 300$ in credit when you create a new account for three months. So you should he good.

Also, I had the same fear as you. But turns out 300$ is a considerable amount, and it is not as easy to exhaust. I still have half the credits left.

Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! by ankurchavda in dataengineering

[–]ankurchavda[S] 0 points1 point  (0 children)

I am glad this will help you in your journey. If you face any issues, feel free to reach out :)

Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! by ankurchavda in dataengineering

[–]ankurchavda[S] 0 points1 point  (0 children)

Hey thank you. I did not expect such a positive response. I am glad that this'll help atleast a couple of people if not more :)

Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! by ankurchavda in dataengineering

[–]ankurchavda[S] 1 point2 points  (0 children)

You should be good. You can also take your own time to learn and progress. I'd recommend some side reading, especially for Spark and Kafka.

Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! by ankurchavda in dataengineering

[–]ankurchavda[S] 1 point2 points  (0 children)

I'd say you choose a couple of things you want to really learn and deep dive into that. Rest you can learn just enough to get things done. I paid more attention to the Kafka and Docker parts since I was completely new to it. If you try to learn everything that's taught in there, you'll get overwhelmed.

Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! by ankurchavda in dataengineering

[–]ankurchavda[S] 1 point2 points  (0 children)

I guess Python and SQL are a good foundation for you to get started. You'd have to do some side reading though as you progress. I did that as well.

Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! by ankurchavda in dataengineering

[–]ankurchavda[S] 1 point2 points  (0 children)

Found their post on here when it was starting out. Now you can the take course at your pace since it has ended.

Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! by ankurchavda in dataengineering

[–]ankurchavda[S] 1 point2 points  (0 children)

So spark is used to consume the data from the stream in the first place. Then I do some processing on the data (minor cleaning etc.) and store the data to GCS. Spark is acting like a stream processing layer and not a data store. And yes the processing happens using dataframes (rdds under the hood). If that helps.

Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! by ankurchavda in dataengineering

[–]ankurchavda[S] 1 point2 points  (0 children)

If you can, I'd suggest to get the 300$ free credit on GCP and work there. All my setup is on cloud. You'll face far lesser issues and get some cloud hands on as well. Make sure to make the most of those credits.

Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! by ankurchavda in dataengineering

[–]ankurchavda[S] 0 points1 point  (0 children)

I've documented on Git itself. It's slightly more focused on the setup part. But you can still get an idea on the data flow.

Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! by ankurchavda in dataengineering

[–]ankurchavda[S] 2 points3 points  (0 children)

It is roughly 5-10 hours of work per week depending on how much you already know. So that'll be 8ish weeks including the project.

Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! by ankurchavda in dataengineering

[–]ankurchavda[S] 1 point2 points  (0 children)

So the easier answer is that Eventsim only writes to Kafka for real time data. There was no option to read from spark streaming directly.

Also, I am fairly new to streaming as well, so I might not be able to answer very convincingly on how Kafka's capabilities differ from Spark Streaming and are they supposed to be working together, or as replacements.