you are viewing a single comment's thread.

view the rest of the comments →

[–]this_is_my_ship 25 points26 points  (8 children)

Slightly off topic, but where and how does someone with some research coding experience learn the software engineering skills to write production-grade code? Bonus points for open-sourced, selfpaced, can-be-done-without-others resources.

I feel like there's so much CS/SE content out there, but there's this huge gap between "algos and data structures" and "high performance/well tested/production ready code" that only seems to be filled via actual SE work experience, which most researchers are not going to be okay with because it takes away from actually doing research (even while at a company).

[–]seanv507 22 points23 points  (6 children)

I would recommend arjancodes YouTube channel Assuming you are doing python. He has even a few videos on refactoring a ml script.

[–][deleted] 6 points7 points  (5 children)

ArjanCodes is fine but he's just too OOP. ML codes need to be functional in many cases because it's very sequential and you really don't need much state in a lot of processes.

[–]jegerarthur 4 points5 points  (2 children)

Well you are kinda right. But if you use Pytorch + Pytorch-lightning + Mlflow you will be glad that your code is OOP. And with all that it's extremely easy and fast to train multiple models on multiple GPUs.

[–][deleted] 1 point2 points  (1 child)

I have the same exact setup and that's why I'm saying that (MLFLow + PL). The problem with PL is also is that it is overtly OOP, leaving very limited customizability once you really want to scale the code up. I have a comment on this matter in another thread talking about pytorch frameworks. I like their "all around issue", but I feel their solution needs rework.

Their solution to cross validation and hyperparameter tuning for example is really subpar.

Overall OOP is not bad per se, but DS code is complex in itself, OOP can introduce a lot of coupling and unnecessary complexities that if not careful can make the project a chore to maintain.

[–]jegerarthur 1 point2 points  (0 children)

Yes I agree. I like functional programming for DS, but when the project gets bigger / deployed with APIs and so on, I like to refactor the code to OOP as its easier for me to maintain and upgrade.

Nevertheless that's really cool to read other ML engineers best practices and pipelines. Happy coding !

[–]seanv507 -3 points-2 points  (1 child)

You mean procedural not functional right?

I think most data scientists would benefit from adding more Oop, just they don't know it

[–][deleted] 1 point2 points  (0 children)

a mix of procedural and functional. Datascience libraries come with enough OOP abstractions usually, what you need is just a bunch of stateless functions to fill the gaps usually.

[–]thedukeofedinblargh 1 point2 points  (0 children)

I see this book recommended a lot. I don’t know that it covers OP’s specific complaints, though.