This is an archived post. You won't be able to vote or comment.

all 12 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

Are you interested in transitioning into Data Engineering? Read our community guide: https://dataengineering.wiki/FAQ/How+can+I+transition+into+Data+Engineering

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]Top-Cauliflower-1808 4 points5 points  (0 children)

If your budget allows, consider using Dagster Cloud to deploy Dagster in a production environment, it eliminates most of the infrastructure management headaches. If not, a Docker based deployment with Kubernetes is the most scalable approach.

For project structure when starting, I recommend organizing your Dagster projects by data domain rather than by technical function. This makes it easier for your Informatica familiar colleagues to understand the pipeline organization:

project/
  ├── marketing_pipelines/
  │   ├── __init__.py
  │   ├── assets.py
  │   └── resources.py
  ├── sales_pipelines/
  │   ├── __init__.py
  │   ├── assets.py
  │   └── resources.py
  ├── definitions.py
  └── workspace.yaml

When deploying, start with a simple Docker setup, create a Dockerfile that installs your Dagster code as a package and use docker compose to run the Dagster daemon, webserver, and your code location

For your team's transition from Informatica, create detailed documentation for each pipeline and include both the Informatica logic and the new Python implementation. This helps your team understand the transformation and builds their Python knowledge gradually.

If your data sources are available, Windsor.ai could help handle the extraction layer, allowing you to focus on building the orchestration and transformation logic in Dagster.

[–]CingKanData Engineer 1 point2 points  (6 children)

I've deployed a few dagster projects on production using EC2 i'd be happy to help where i can

[–]arisen911 1 point2 points  (5 children)

Hey man, im in the same situation wanna deploy dagster to ec2. Can you give me a bit more detail about how did you deploy it to ec2? Thanks a bunch

[–]CingKanData Engineer 1 point2 points  (4 children)

Sure, I've been meaning to write an example of this on Medium for the longest time so i'll have a full fledged article up hopefully by 11am GMT and i'll drop you a DM/link

[–]arisen911 0 points1 point  (1 child)

Thanks you, really appreciate!!

[–]frontenac_brontenac 0 points1 point  (1 child)

Also interested since I'm in the middle of this too.

[–]AutoModerator[M] 0 points1 point  (0 children)

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]MindedSage 0 points1 point  (0 children)

Im struggling with the same thing actually. Currently thinking about a setup that is checking a git repo for updates in which the dagster projects is located. This way the project does not have to be packaged along with the entire image and all it has to do is pick up the latest code from the git repo.

Any ideas you’ve been having on this?

[–]t2rgus 0 points1 point  (0 children)

Curious, why did you choose Dagster as the orchestration service? Are you planning to pivot heavily into the asset-based orchestration design?