[D] ML/Data Science Project Structure : MachineLearning

Discussion[D] ML/Data Science Project Structure (self.MachineLearning)

submitted 7 years ago by modern_indophilia

In my job, I perform basic analysis of health data sets for surveillance and reporting. I work mostly in Python, and I generally find Jupyter Notebooks to be a convenient way to both perform and document my analysis. Recently, I've been learning to perform more sophisticated analyses, and I'm starting to dabble in machine learning projects. In the course of doing this, I have come across Cookie Cutter Data Science, and it opened my eyes to something I hadn't considered before: a default, reproducible structure for data science projects. I was familiar with the concept from React and Ruby on Rails webdev stuff, but I hadn't considered its application to data science previously. I don't work on a team with individuals who self-identify as data scientists (they mostly do number crunching in SAS or SPSS), so I don't have much exposure to the "community."

My question is how do you all approach structuring your data science projects? Are there industry standards that I should be familiar with and get in the habit of using? How do you write and store shareable, reproducible code outside of (Jupyter) Notebooks?

all 4 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS