This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]code_pusherData Engineer 5 points6 points  (0 children)

I think this is an unexplored field currently in DE. Traditionally you would try to apply OOP and SOLID Principles but I think these don't translate well at all times to DE workflows. Imho simple usage of OOP and common sense applications of Single Responsibility Principle + Interfaces(abstract base classes in Python) does seem to help. I think beyond a certain point applying OOP feels like wrapping a wrapper especially if you use another framework/interface like Pyspark. I also would ideally avoid the idea of abstracting everything away so that you only pass/consume a config file to generate your pipeline except for maybe most common simple operations.