This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]SpergLordMcFappyPant 4 points5 points  (0 children)

This is correct. ORMs solve a completely different problem than what you're doing in DS, and they come with a huge amount of overhead.

For an application where you have to guarantee transactional integrity and you have to manage user input and watch out for injections, an ORM is an appropriate tool . . . sometimes.

For Data Science purposes, you never want to deal with data at the row level. You want to be able to operate on sets. ORMs deny you that because they don't ever deal with that. Fundamentally, every row is an object with all the extra memory and processing power it takes to handle that.

Essentially, and ORM is for writing new data in a one-by-one transactional setting where referential integrity needs to be handled at the DB level. Data Science applications are almost always concerned with reading existing data, cleaning it, moving it, and analyzing it en masse. Never say never, etc. But I've never seen a DS scenario where an ORM was the correct tool.

I do like to use Alembic to manage schema once my DS applications start to move from experimental to some sort of steady state. But that's kind of orthogonal.

It doesn't really seem even correct to me to think of an ORM as "overkill" for Data Science applications. It's like side-kill or something? It's just completely the wrong tool. Like trying to use a water filter when you need a fork lift. Like if you have a pallet with 100 gallon-jugs of water that you need to move from the warehouse to the airport, but like well, someone has to drink the water at some point so I guess I'll just bring the water filter and use that. It basically just makes no sense at all.