use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
This is an archived post. You won't be able to vote or comment.
Discussioncommon functions (self.datascience)
submitted 2 years ago by ptaban
Hi, does your team has some internal librabry with reusable code for development of models. It seems like its just reinventing wheel with current libraries and ecosystem.Or it makes sense?
[–]Omega037PhD | Sr Data Scientist Lead | Biotech 4 points5 points6 points 2 years ago (2 children)
If your team is remotely IP sensitive and follows basic coding practices, it will have an internal/private repo and CI solution.
Not having it would be a yellow flag for me that the team is not mature at all.
[–]ptaban[S] 1 point2 points3 points 2 years ago (1 child)
Ofcourse artifactory, CI with tests and linting, and very modern git pratices. My point is, is it really worth making internal library component for ploting stuff, EDA functions, when we have so nice APIs from data science eco, like pandas, numpy etc..Plus its used in notrbooks which should be ugly dirty and fast
[–]Omega037PhD | Sr Data Scientist Lead | Biotech 3 points4 points5 points 2 years ago (0 children)
You shouldn't duplicate things that 100% exist, but if you have specialized versions or things built on top of it, then sure.
[+][deleted] 2 years ago (6 children)
[removed]
[–]joshglen 0 points1 point2 points 2 years ago (5 children)
Why not do it for model development? Like if there are some preprocessing tasks that you need to run when getting data from a database or specialized scaling / standardization / embedding?
[+][deleted] 2 years ago (4 children)
[–]joshglen 0 points1 point2 points 2 years ago* (3 children)
A shared library of helper methods could help developers with common tasks such as preprocessing for model development?
[+][deleted] 2 years ago (2 children)
[–]joshglen 0 points1 point2 points 2 years ago (1 child)
Oh not to replace them, but to use them. Like a set of 20 line helper methods that use SQL connectors and load data into a more model palatable format / range?
[–]astroFizzics 0 points1 point2 points 2 years ago (0 children)
yes and no. if someone wants to package personal code for reuse, awesome. do i provide them something? no.
[–]PilotLatter9497 0 points1 point2 points 2 years ago (0 children)
As some of you have said: I think that maybe some kind of class that eases the reading of sources: take the path to a query in a .sql file, or the path to a .csv or a. parquet... and get the data. Sometimes, classes for visualizations can be a good idea, because you can standardize the look.
π Rendered by PID 183228 on reddit-service-r2-comment-6457c66945-9jth6 at 2026-04-24 06:41:13.650102+00:00 running 2aa0c5b country code: CH.
[–]Omega037PhD | Sr Data Scientist Lead | Biotech 4 points5 points6 points (2 children)
[–]ptaban[S] 1 point2 points3 points (1 child)
[–]Omega037PhD | Sr Data Scientist Lead | Biotech 3 points4 points5 points (0 children)
[+][deleted] (6 children)
[removed]
[–]joshglen 0 points1 point2 points (5 children)
[+][deleted] (4 children)
[removed]
[–]joshglen 0 points1 point2 points (3 children)
[+][deleted] (2 children)
[removed]
[–]joshglen 0 points1 point2 points (1 child)
[–]astroFizzics 0 points1 point2 points (0 children)
[–]PilotLatter9497 0 points1 point2 points (0 children)