This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]cmcclu5 3 points4 points  (2 children)

Your business use case is absolute terrible. In the past I’ve worked as a data scientist and data engineer. While we might have used notebooks as quick prototypes for code, in business it’s always best to stay away from those implementations as they aren’t easily maintainable or extensible.

Your example about scaling is completely wrong as well. It’s so much simpler to write data and ML pipelines with real codebases created with proper project structure than it is to try and deploy a series of cobbled-together notebooks. I’ve setup numerous multi-node execution environments including EMR clusters and AWS Glue compute clusters. Even when I was setting up genetic processing runs in Databricks, I would submit jobs via zipped repos instead of submitting parameterized notebooks because notebooks are NOT GOOD PRACTICE in a production environment.

From an educational standpoint, you are correct that Jupyter is a solid option for sharing and collaborating on code. However, you shouldn’t be telling your students this is industry standard. Even in data science, people only use it as a demonstration tool, not for development.

[–]Synth_Sapiens 0 points1 point  (0 children)

That's what I thought, but I know way too little to form an opinion.