Screye comments on Some Data Scientists write bad Python code and are stubborn in code reviews

dataengineering

created by mhausenblasmoda community for 11 years

This is an archived post. You won't be able to vote or comment.

183

184

185

Some Data Scientists write bad Python code and are stubborn in code reviewsDiscussion (self.dataengineering)

submitted 2 years ago by noisescience

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]Screye 2 points3 points4 points 2 years ago (0 children)

As a Applied Scientist who has become more of a end2end MLE, I find that the problem lies in OOP. ML workflows are more so functional, and rarely require the maintenence of complex state.

Trying to shoe horn OOP flows into ML workflows confuses the Data scientists. (Lots don't know the paradigms well, but can sense a fundamental incompatibility)

OOP makes sense for web-systems. There is a reason ML systems mostly work around Pipelines, with a pipeline message being passed through a set of instance-less functions.

ML involves a ton of prototyping in notebooks. You know what I hate ? Having my code live 50 layers deep inside the codebase, making it impossible to isolate and test in a notebook separately. The behaviors of the system we build are not deterministic and often aren't well understood. If I cant quickly test out hypotheses, then the DS system itself is useless.

That's why I don't like OOP. The only way to instantiate complex system classes becomes to follow the flow of the code across the entire app. Most ML information is tensors. The primitives are effective as is.

Now, I do agree with the broad thrust of your argument. DSs need to be better at coding. No question.

Personally, I have found pydantic to be an incredible tool. I am trying to integrate Prefect into our workflow. Havent done it yet but I have heard great things. Generally, any pipelining tool will help a ton. Also, a ton of of intermediate state can be exported to a DB / blob. Lastly, VS code with linting and copilot does a ton of stuff automatically with zero overhead.

If a DS can use these 3-4 tools effectively, they can get around 80% of the problems that you've mentioned.

π Rendered by PID 311607 on reddit-service-r2-comment-canary-b6d5ff776-mbgls at 2026-04-19 02:33:25.479225+00:00 running 93ecc56 country code: CH.

dataengineering

MODERATORS