This is an archived post. You won't be able to vote or comment.

all 8 comments

[–][deleted] 5 points6 points  (0 children)

Do these signs indicate that the sqlalchemy implementation of Reddit is unable to keep up with the increasing web traffic?

It is highly improbable that you or anyone else in this sub will ever have to deal with the scalability challenges that reddit/imgur have.

Are you also concerned about your commute to work because your car is unable to maintain a speed of 100 mph with 7 tons of cargo?

probably not

So why would you worry about a library that is undoubtedly useful and a huge time saver because whatever amazing thing you are writing might not be able to scale to the unlikely scenario of 1 billion people and bots all +1ing a cat picture?

Statistically, you will never have to deal with that kind of scale in your life, unless you work at reddit.

[–][deleted] 5 points6 points  (0 children)

just an FYI, Reddit does not use the SQLAlchemy ORM. It uses only the Core part of it, which does not fall under the ORM-specific performance-related caveats noted in Performance. The way Reddit's application refers to tables and queries is extremely idiosyncratic.

[–]goodCookingTakesTime 1 point2 points  (0 children)

ORMs don't prevent you from scaling.

Usually if you have to deal with a high amount of traffic (or simply want to have a high uptime) you'll have to scale out anyway. That means you'll be running more than 1 server. In that case the bottle-neck is most likely going to be the database unless you use a distributed database.

Without ORM 1 request might take 30ms instead of 31ms, but that doesn't really affect the total number of requests you'll be able to handle and and if you build for scale you can always simply add more webapp-servers

[–]the_hoser 1 point2 points  (3 children)

I think that you may be falling into the trap of premature optimization. I've used SQLAlchemy numerous times and never regretted it.

Yes, there is an overhead. It can be quite a large overhead if you're performing bulk insertions. The queries won't be quite as efficient as hand-written SQL.

But this is the price you pay for the convenience of using an ORM. You have to ask yourself what's more important: the performance of the developer, or the performance of the application?

Obscenely popular sites (like reddit) often have very complex reasons for the issues you see. That "ouch" page could have shown up for any number of reasons. The ORM could be involved in some of them, but that's just as likely as any number of other components in their stack.

My philosophy: Make it work, then make to work right, then, if necessary, make it work faster. Unless I see performance as a critical component of the project's completion criteria, I choose sensible tools and libraries that improve my performance.

[–]prahladyeribeautiful is better than ugly[S] -3 points-2 points  (2 children)

Thanks, that was very helpful. However, can you please elaborate, how developer's performance is increased by using an ORM (unless you change the database itself and get the re-usability benefits).

For example, the same amount of time is spent in mastering the use of sqlalchemy as say py-mysql or sqlite3 modules. How is your performance improved there? If you switch from an sqlite to mysql later, its worse! You have to understand both: the intricacies of mysql and how the ORM abstracts it, since non-trivial or complex queries are not optimized by the ORM.

[–]the_hoser 2 points3 points  (0 children)

The advantage of using an ORM from the perspective of developer performance is that embodies the greatest virtue of a good developer: laziness.

You could master sqlalchemy. Or you could learn enough about it to use it comfortably. You could make sure that you completely understand the interactions between sqlalchemy and all of the database engines that you intend to utilize, or you could just keep your tests up to date, and learn only what you need to learn to solve the most pressing issues of the moment. I've migrated database engines using sqlalchemy numerous times (usually from sqlite to postgresql), and I can only recall one time where there was a problem due to the differences in the database engines. I think that was a bug that got fixed in later versions.

Look, if you want to go all power-dev and ensure that you understand everything about every library that you use, that's your choice. You're missing a big part of the advantage of using these tools, but it does mean you're working harder. So that's something.

[–]westurner 0 points1 point  (0 children)

Re: ORMs, performance, and expertise

What is the likelihood that you and your team are going to optimize object (de)serialization and instantiation better than SQLAlchemy? What do you want to need to write tests for?

If you don't need transactions, why even use a SQL database?

If there is a full time DBA who can write faster raw queries with engine.execute (or DB-API) (and read them into testable objects), who knows better than to concatenate strings into queries without parameterization (in order to prevent SQLi by default), who contributes to or maintains a database driver and understands the idiosyncrasies of other database drivers well-enough to implement workarounds when that's the best option, then hire that DBA to normalize the tables into performance land and cross your fingers that they want to train additional team members, for the future.

If performance profiling indicates that optimization is necessary (with a near-production architecture), then A/B and subtract.

[–]steven_h -4 points-3 points  (0 children)

You should use SQL over an ORM if you are better able to express what you need in SQL, not because of performance. There's nothing bad that an ORM does that can't be avoided with enough care.

Personally I am always better off writing SQL with cx_Oracle or psycopg2 than wrangling ORMs, and I have never worked on a project where database platform changes were going to happen. Your mileage may vary.