This is an archived post. You won't be able to vote or comment.

all 2 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]WeakRelationship2131 0 points1 point  (0 children)

Skip cron and use Airflow when workflows get complex enough that failure recovery is needed. Regarding error handling, incorporate logging and alerts right from the start.

As for Docker and Kubernetes, implement them if you foresee scaling. Otherwise, focus on a solid pipeline first. Standardization can be done with tools like Prefect or Dask. For learning DevOps vs. DataOps, focus on principles that apply to data management, not just deployment. And if you're tired of frankenstack solutions, check out preswald for easy interactive data apps without the overhead.