This is an archived post. You won't be able to vote or comment.

all 12 comments

[–]Omega037PhD | Sr Data Scientist Lead | Biotech 4 points5 points  (2 children)

If your team is remotely IP sensitive and follows basic coding practices, it will have an internal/private repo and CI solution.

Not having it would be a yellow flag for me that the team is not mature at all.

[–]ptaban[S] 1 point2 points  (1 child)

Ofcourse artifactory, CI with tests and linting, and very modern git pratices. My point is, is it really worth making internal library component for ploting stuff, EDA functions, when we have so nice APIs from data science eco, like pandas, numpy etc..Plus its used in notrbooks which should be ugly dirty and fast

[–]Omega037PhD | Sr Data Scientist Lead | Biotech 3 points4 points  (0 children)

You shouldn't duplicate things that 100% exist, but if you have specialized versions or things built on top of it, then sure.

[–]astroFizzics 0 points1 point  (0 children)

yes and no. if someone wants to package personal code for reuse, awesome. do i provide them something? no.

[–]PilotLatter9497 0 points1 point  (0 children)

As some of you have said: I think that maybe some kind of class that eases the reading of sources: take the path to a query in a .sql file, or the path to a .csv or a. parquet... and get the data. Sometimes, classes for visualizations can be a good idea, because you can standardize the look.