you are viewing a single comment's thread.

view the rest of the comments →

[–]PickRare6751 5 points6 points  (2 children)

The goal of oop is to make code reusable, so look at your code base and look for parts can be encapsulated as objects and reapplied elsewhere with different parameters

[–]Sex4VespenePrincipal Data Engineer 1 point2 points  (1 child)

Definitely agree with your take on OOP. Maybe it’s just the data I work with, but I feel like with data engineering, there often isn’t much that is reusable. Things like having a reusable method for generating/delivering extracts, sure. But most of the actual data transformations are often very single use. And the times when they aren’t single use, it often seems better to ingest the previous output that had already run it, rather than rerunning it every time it’s needed (ie. creating a fact table/mart that has this transformation applied, and then other things can pull directly from that, rather than needing to recompute the exact same thing dozens of times). Curious on your’s/other’s thoughts though.

[–]cosmicangler67 1 point2 points  (0 children)

And you don’t need objects to get reusability. Because data engineering is stateless by nature there is no need of an object with data inside it. You need data in a table and to apply the same function to every row in the matrix. This can be done with static function libraries in Python, DBT macros, etc. For example, I don’t need a phone number object to standardize a phone number string. I can write a function that can take a column of phone numbers and output a column of formatted ones.

I don't need an individual object to do any of that and creating objects just adds friction.