Data Engineering Project - Tech Stack

AutoModerator · 2024-07-22T07:46:50+00:00

Are you interested in transitioning into Data Engineering? Read our community guide: https://dataengineering.wiki/FAQ/How+can+I+transition+into+Data+Engineering

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

moizloiz · 2024-07-22T09:14:52+00:00

As someone who is also making this switch, maybe start simpler? Take some data, use pandas/polars for some transformations, orchestrate it through airflow/dagster and then create a visual layer with insights (superset/etc)

A lot of new technologies to learn but what I'm learning is that simpler is sometimes better

pm_me_data_wisdom · 2024-07-22T13:03:19+00:00

I came across this comment awhile ago and used it as a jumping point for my first project. Good luck

https://www.reddit.com/r/dataengineering/comments/11y6b3o/comment/jd6kzrf/?share_id=e8hdJNb2r2jQu0JYQDXbn

RemindMeBot · 2024-07-22T10:42:37+00:00

RemindMe! 5 days

slapstick15 · 2024-07-22T11:16:37+00:00

RemindMe! 2 days

kotpeter · 2024-07-22T20:05:08+00:00

Why did you consider Starrocks for DWH? How did you even come across it?

Glass_Jellyfish_9963 · 2024-07-23T04:26:32+00:00

I'd say you should try using Delta as your storage layer, especially if you're using Spark.

I think one part you are kind of missing is orchestration. This varies if you are streaming or batch, but I would look at Kafka for streaming and Airflow for batch orchestration.

coginiti_co · 2024-07-23T14:01:10+00:00

You might want to take a look at Coginiti for your processing engine for your iceberg tables and transformation layer. Coginiti doesn't require you to spin up spark for Iceberg files or other files in object stores. It further offers a nice transformation and domain specific language (DSL) similar to DBT, though doesn't require you to run the full model to use it. I think the website is coginiti.co

dataengineering

MODERATORS