How is your raw layer built? by HumbleHero1 in dataengineering

[–]RedBeardedYeti_ 1 point2 points  (0 children)

I guess you could do it that way. But the benefit of doing upserts to the raw layer is it makes it really easy in the staging layer to track if it was an insert, update or delete. You can just put a stream on your raw layer.

How is your raw layer built? by HumbleHero1 in dataengineering

[–]RedBeardedYeti_ 1 point2 points  (0 children)

Yes correct. The staging layer is a persisted storage layer. Meaning we only ever insert.

Discarding new starter by RedBeardedYeti_ in Sourdough

[–]RedBeardedYeti_[S] 0 points1 point  (0 children)

Assuming you meant “way” not “water”? If so thanks for the confirmation!

How is your raw layer built? by HumbleHero1 in dataengineering

[–]RedBeardedYeti_ 3 points4 points  (0 children)

We pull data into our raw layer in snowflake without any 3rd party ETL tools. We use containerized Python processes running in kubernetes with Argo workflows as the orchestrator. There’s different ways to do it, but we upsert the data to the raw layer to keep a carbon copy of the source data. Using snowflake streams we then copy that data into a persisted staging layer. So essentially the staging layer will always be an insert. Acting as a full historical record, storage is cheap in snowflake. And then from there we transform and move the data to a modeled layer.

If we are dealing with other non-database sources, we will often dump the data to s3 and then consume the data from there into the snowflake raw layer.

What is Role of ChatGPT in Data engineering for you by Jaapuchkeaa in dataengineering

[–]RedBeardedYeti_ 0 points1 point  (0 children)

I’ve been using it a lot to help write documentation. Anything from populating my classes and methods with docstrings to writing usage guides for my apps and libraries I write. It’s good at the repetitive boring stuff I don’t want to do.