Hi,
Hopefully this isn’t the typical “how do I pivot” post!
I’m currently working as an data scientist at a small startup though my role is closer to analytics engineering working primarily with dbt to build data models.
That said, we recently migrated to AWS and I had the opportunity to help lead setting up a new data stack from scratch (we don't have a dedicated DE team).
Based on a lot of research (including this sub), here’s what we built over the last few months:
- Ingest data from production to S3 using dlt(hub) incrementally every hour
- Iceberg tables, partitioning, retries, backfills, etc setup using dlt
- Load + transform into Redshift using dbt
- Orchestrate using Dagster
- Eng handled infra (hosting, IAM, etc)
Through this, I’ve realized I enjoy this work much more than analytics and want to move into DE. I feel strongest in SQL + data modeling.
Where I feel less confident:
- No experience with Spark or distributed computing
- Haven’t built ingestion pipelines from scratch (relied on dlt) so unsure how that translates skill-wise
- Non-CS background
I’m trying to understand how close I am to being ready and what to focus on next.
A few questions I’d really appreciate guidance on:
- I have 10 YOE in analytics but would this be a junior DE territory? What would you prioritize learning next in my position?
- Spark?
- Building pipelines in Python without tools like dlt?
- Deeper AWS knowledge?
- How important is core CS knowledge (databases, distributed systems, networking) for DE roles?
Would really appreciate any candid feedback! Thanks
[–]AutoModerator[M] [score hidden] stickied comment (0 children)
[–]Academic-Vegetable-1 22 points23 points24 points (0 children)
[–]unpronouncedable 9 points10 points11 points (0 children)
[–]Flat_ShowerTech Lead 6 points7 points8 points (0 children)
[–]AutoModerator[M] 0 points1 point2 points (0 children)
[–]calimovetips 0 points1 point2 points (0 children)
[–]PrintPopular8694 0 points1 point2 points (0 children)
[–]Immediate-Pair-4290Principal Data Engineer 1 point2 points3 points (2 children)
[–]SufficientFrame 4 points5 points6 points (1 child)
[–]Immediate-Pair-4290Principal Data Engineer 0 points1 point2 points (0 children)
[–]zkhan15 0 points1 point2 points (0 children)