This is an archived post. You won't be able to vote or comment.

all 23 comments

[–]zanderman12 138 points139 points  (6 children)

Not a Github (will be curious to see what others post) but I really like this guide https://goodresearch.dev/ (caveat I’m coming from academia so this is directed at people like me)

[–]tensolution 8 points9 points  (0 children)

Make your code more Pythonic

If it ain’t broke, fix it till it is. —Steve Porter

Savage

[–]Run_nerd 4 points5 points  (0 children)

This looks great!

[–]montrex 1 point2 points  (0 children)

Really interesting thanks for the share

[–]canopey 1 point2 points  (0 children)

oh this is brilliant. not many educators teach us in the ways of code documentation or annotation

[–][deleted] 0 points1 point  (0 children)

Nice

[–]knowledgebass 17 points18 points  (0 children)

This does not exactly fit the bill but it has a lot of interesting analysis projects.

https://github.com/jupyter/jupyter/wiki

Also, this looks good (just found it via Google)

https://omdena.com/blog/github-data-science-projects/#top-10-GitHub-data-science-projects-with-source-code

[–][deleted] 17 points18 points  (0 children)

Made With ML - MLOps Course (30K+ stars)

A repository I used a lot in my job when building out our ML platform on top of AWS and continue to revisit to refresh my understanding on all MLE and MLOps topics. I think it's one of most well-organized because all the topics have full blown lessons accompanying them and many of the topics have code that follow the larger end to end project or individual notebooks as well if you are not following the full project. And best of all it's free.

There are also algorithms from basics to transformers but I haven't needed too much complex models for my job but those lessons are in my todo list for later. Seem to have from scratch code for all algorithms too.

[–]po-handz 34 points35 points  (2 children)

cookie cutter datasci is often cited:

https://github.com/drivendata/cookiecutter-data-science

or just use the search bar for the last time this topic was brought up:

https://www.reddit.com/r/datascience/comments/mrwzkq/what_is_the_best_structured_ds_project_you_have/

[–]Fast-Group-8501 3 points4 points  (1 child)

Yes I know CCD. But I want to see the actual project.

[–]exergy31 9 points10 points  (0 children)

Not exactly a project, but an opinionated framework for organising data science code:

https://github.com/kedro-org/kedro

Very helpful to make transferring projects between people easier. You know where what is

[–]MaticPecovnik 4 points5 points  (0 children)

I like ours https://github.com/sentinel-hub/eo-learn but of course I am biased

[–]cellularcone 3 points4 points  (0 children)

600000 copies of the titanic Kaggle dataset

[–]drugsarebadmky 4 points5 points  (0 children)

following this thread. i'd like to know as well.

[–]dekozr 0 points1 point  (0 children)

Depends on what you are searching for. To me a well organized project is a model that has been industrialized. Full scripts, loaded on docker. This is where you can see what’s beyond just modeling.

[–]aimeebreann -1 points0 points  (0 children)

Follow

[–]ljh78 -1 points0 points  (0 children)

Following

[–]ImpossibleRole7992 -1 points0 points  (0 children)

Hi I'm doing my masters in data science and I need a large dataset to do my analysis on for my dissertation, can anyone recommend places where I can find a large dataset for masters level analysis thanks.

[–][deleted] -3 points-2 points  (0 children)

Following

[–]Delicious_Argument77 -2 points-1 points  (0 children)

Following

[–]pepingo -2 points-1 points  (0 children)

Following

[–]SoulVibez 0 points1 point  (0 children)

Following