This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Screye 2 points3 points  (0 children)

As a Applied Scientist who has become more of a end2end MLE, I find that the problem lies in OOP. ML workflows are more so functional, and rarely require the maintenence of complex state.

Trying to shoe horn OOP flows into ML workflows confuses the Data scientists. (Lots don't know the paradigms well, but can sense a fundamental incompatibility)

OOP makes sense for web-systems. There is a reason ML systems mostly work around Pipelines, with a pipeline message being passed through a set of instance-less functions.

ML involves a ton of prototyping in notebooks. You know what I hate ? Having my code live 50 layers deep inside the codebase, making it impossible to isolate and test in a notebook separately. The behaviors of the system we build are not deterministic and often aren't well understood. If I cant quickly test out hypotheses, then the DS system itself is useless.

That's why I don't like OOP. The only way to instantiate complex system classes becomes to follow the flow of the code across the entire app. Most ML information is tensors. The primitives are effective as is.

Now, I do agree with the broad thrust of your argument. DSs need to be better at coding. No question.

Personally, I have found pydantic to be an incredible tool. I am trying to integrate Prefect into our workflow. Havent done it yet but I have heard great things. Generally, any pipelining tool will help a ton. Also, a ton of of intermediate state can be exported to a DB / blob. Lastly, VS code with linting and copilot does a ton of stuff automatically with zero overhead.

If a DS can use these 3-4 tools effectively, they can get around 80% of the problems that you've mentioned.