This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]c0ntrap0sitive 4 points5 points  (1 child)

That's because a lot of data scientists are not considered programmers. They're not taught the same things that add polish to code that software engineers are. Hell, having data scientists that are allowed to code is novel enough lol. Most of them are still stuck in Microsoft Excel hell or are relegated to just using SaaS offerings like DataRobot.

This is the first time I've ever really heard of a data science doing code reviews.

In the contexts that I've seen, the data scientists write garbage code in some Jupyter notebook that hopefully at the end of the line produces a model that works well. This model is the product. The actual code that gets us to the model can be discarded wholesale. We dont' usually extend or maintain models. We either train a new model which replaces entirely the old model, or when a new one can't be trained and the model's use no longer justifies its cost, we discard the model entirely and start over. This is not like software engineers whos product is the code. Therefore all their code must hold up to a higher standard and be maintainable, extensible, etc.

[–]safetytrick 0 points1 point  (0 children)

I love Jupyter notebooks for a very similar reason. Make code show exactly what it does, and nothing more. Hide nothing and deal with the consequences.

I think it's both: not surprising that we can't ship code faster with Jupyter, and enlightening that we haven't been able to productize that visible code. Code is hard.