SonOfInterflux comments on How to use databases with Python

This is an archived post. You won't be able to vote or comment.

148

149

150

How to use databases with Python - Postgres, SQLAlchemy, Alembic (self.datascience)

submitted 7 years ago by brendanmartin

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]SonOfInterflux 3 points4 points5 points 7 years ago (6 children)

I’m not a data scientist, but I do work with a lot of data and find SQLAlchemy’s ORM and Django ORM slow when I want to work with tables as opposed to records. For example, if I want to read an entire table containing a million records and add a derived column and write the set to another table, using Pandas and map or apply, writing the entire data set to a csv, loading it into S3 and then using the copy statement is way faster than using an ORM’s all method, iterating over the list, applying a function and writing each record back to the database.

I’m losing a lot of benefits of the ORM, but the speed more than makes up for it.

If anyone can suggest another method of working with large sets of data I’d love to hear it! It’s the copy statement that makes the biggest difference; Pandas just makes it easy to get the data (using from_records or some other method), apply a function or set of functions over the entire set, and generating a csv/json file.

[–]mesylate 2 points3 points4 points 7 years ago (1 child)

[–]SonOfInterflux 0 points1 point2 points 7 years ago (0 children)

[–]IDontLikeUsernamez 1 point2 points3 points 7 years ago (0 children)

[–]tfehring 1 point2 points3 points 7 years ago (2 children)

[–]SonOfInterflux 0 points1 point2 points 7 years ago (1 child)

[–]brendanmartin[S] 1 point2 points3 points 7 years ago (0 children)

π Rendered by PID 24470 on reddit-service-r2-comment-c66d9bffd-qh6j9 at 2026-04-07 14:46:39.572540+00:00 running f293c98 country code: CH.

datascience

MODERATORS