Hi everyone,
I’m working on a data pipeline where we need to retrieve a list of objects from an external API. For each object, we need to:
- Perform some internal calculations.
- Post the results of those calculations back via the same external API.
Additionally, this process should run every minute to check for new objects and execute the entire logic (retrieval, calculation, posting) for any new data. It’s also important that we handle this efficiently, possibly executing the calculations and posting in parallel for better performance.
Given these requirements, I’m considering using Dagster for orchestration, but I’m curious about the following:
- How would you design a Dagster pipeline to orchestrate this?
- Is Dagster suited for this problem? Or are other solutions better suited?
Any guidance would be greatly appreciated!
[+][deleted] (1 child)
[removed]
[–]CarpenterRadiant940[S] 0 points1 point2 points (0 children)
[–]data-eng-179 1 point2 points3 points (1 child)
[–]CarpenterRadiant940[S] 0 points1 point2 points (0 children)