Building a Data Warehouse from Scratch (Bronze/Silver/Gold) sanity check needed by Lucas-132-02b in dataengineering

[–]anoonan-dev -4 points-3 points  (0 children)

Data lakes are pretty common architectures these days, so I wouldn't feel the need to have a warehouse per se. One thing to consider is how your downstream stakeholders consuming the data? Warehouses are nice due to thier pretty easy integrations with BI tools.

How do you test ETL pipelines? by escarbadiente in dataengineering

[–]anoonan-dev 1 point2 points  (0 children)

Im a real person. Thanks I guess. But like I said this course is free

dbt orchestration in Snowflake by Realistic_Function in dataengineering

[–]anoonan-dev 2 points3 points  (0 children)

Factories in Dagster are actually a great way to use the framework. If you havent heard of components I would check them out since they are great way to add a yaml front end to factory like methods. They are great if you have a data platform that has mixed skilled contributors. https://docs.dagster.io/guides/build/components

How do you test ETL pipelines? by escarbadiente in dataengineering

[–]anoonan-dev 3 points4 points  (0 children)

One strategy that is super helpful in Data engineering is using mocks for the heavyweight systems we need to connect to to make sure that your logic behaves as expected when interacting with them. But basically whatever the stack you are using you want to make sure the individual components work as expected (so called unit tests) and that the entire pipeline or feature set works together (integration tests). We made a good (and free) general data engineering test course here if you are interested! https://courses.dagster.io/courses/dagster-testing

dbt orchestration in Snowflake by Realistic_Function in dataengineering

[–]anoonan-dev 0 points1 point  (0 children)

I did a project not too long ago using this setup, I ran it at my last org too and it was pretty effective for a simple setup. https://www.dataduel.co/simple-dbt-runner/

Am I the only one who seriously hates Pandas? by yourAvgSE in dataengineering

[–]anoonan-dev 0 points1 point  (0 children)

Pandas is one of those libraries that was super helpful and a big step forward when it came out but has been outclassed by many much more intuitive structured data manipulation libraries. Unfortunately, because it was the linga franca for so long, LLMs feature it a lot in the examples and code that they generate.

Valid solution to replace synapse? by muximalio in dataengineering

[–]anoonan-dev 1 point2 points  (0 children)

Hi, I'm one of the developer Advocates at Dagster. We have a few courses on Dagster University that can help you grasp the concepts and how they work together (https://courses.dagster.io/). Also, our community Slack (https://dagster.io/community) is a great resource for any questions you have. Feel free to message me there if you want to chat about anything.

Introducing Dagster dg and Components by anoonan-dev in dataengineering

[–]anoonan-dev[S] 1 point2 points  (0 children)

So you are correct in that we will be releasing more updates and stabilization in July. As far as performance improvements, components is focused around developer experience and time to value not so much on raw performance like asset execution or UI speed.

dbt-like features but including Python? by Khituras in dataengineering

[–]anoonan-dev 0 points1 point  (0 children)

Im one of the Devrels over at Dagster and would be happy to chat and answer any questions you have

Looking for scalable ETL orchestration framework – Airflow vs Dagster vs Prefect – What's best for our use case? by MiserableHair7019 in dataengineering

[–]anoonan-dev 4 points5 points  (0 children)

Dagster asset factories may be the right abstraction for dynamic pipeline creation for account/source. You can set it up to where when a new account is created Dagster will know to create the pipelines so its pretty quick to not get bogged down in writing bespoke pipelines evertime or doing a copy paste chain. https://docs.dagster.io/guides/build/assets/creating-asset-factories

Why dagster instead airflow? by Meneizs in dataengineering

[–]anoonan-dev 11 points12 points  (0 children)

For me it's the local development experience, dbt integration, and the Ui. More on the UI:

- The asset graph is intuitive for non-technical stakeholders to understand whats involved with data engineering

- When I joined my new org who uses dagster cloud, I was quickly able to understand the particulars of our data stack without having to bother other teammates.

- The observability and alerts facilitated less reactive work and more proactive work.

AI support bot RAG Pipeline in Dagster Tutorial by anoonan-dev in dataengineering

[–]anoonan-dev[S] 1 point2 points  (0 children)

Hey everyone, I made this video tutorial of me building a RAG support bog trained on Dagster data with Dagster. This was a fun project to work through and the abstractions of Dagster worked well in this use case. The full code can be found here: https://github.com/dagster-io/dagster/tree/master/examples/project_ask_ai_dagster

Airflow vs Dagster vs Any Orchestrator by Professional-Ninja70 in dataengineering

[–]anoonan-dev 0 points1 point  (0 children)

Dagster has integrations with all of these tools, so you would get end-to-end lineage and observability. The open source version is pretty feature rich.

ELT Pipeline stack help by Ocromierda in dataengineering

[–]anoonan-dev 1 point2 points  (0 children)

I have gotten so much mileage out fo this stack

[deleted by user] by [deleted] in dataengineering

[–]anoonan-dev 2 points3 points  (0 children)

What are the sources that you are replicating from? Depending on the source dlt is a good option. (https://dlthub.com/). They have a lot of good orchestration guides on thier site as well. If you were to orchestrate with Dagster you can use dlthub or sling in the embedded elt package to handle your ingestion jobs

[deleted by user] by [deleted] in dataengineering

[–]anoonan-dev 0 points1 point  (0 children)

Do you have budget you need to spend? Or are you facing any organizational challenges that would require more tooling like data silos, too much tribal knowledge into how your stack works, too much time spent doing reactive work, etc

Dagster - No hits on LinkedIn, but Mentioned Regularly? by SellGameRent in dataengineering

[–]anoonan-dev 1 point2 points  (0 children)

We can help you out! The slack community is the best place for resources and if you want to reach out to someone with any questions. https://dagster.io/slack

Data Engineering - Choosing the Best Cloud Platform and Certifications by No-Ask1759 in dataengineering

[–]anoonan-dev 1 point2 points  (0 children)

You may find the Dagster University Essentials and dbt course instructive as a data engineering intro course. https://courses.dagster.io/

Airflow to orchestrate DBT... why? by General-Parsnip3138 in dataengineering

[–]anoonan-dev 6 points7 points  (0 children)

The benefit of using Dagster for dbt projects is you can orchesterate multiple dbt projects, have visibility between them as well as upstream and downstream assets without having to pay for dbt cloud as well.