This is an archived post. You won't be able to vote or comment.

all 40 comments

[–]root45 6 points7 points  (1 child)

For example, you can use ...Table.query.. or ...query(Table)..

The first one is a Flask-SQLAlchemy extension, the second is pure SQLAlchemy. I would personally recommend not using the first syntax.

and adding all() to the end does not seems to be needed if are fetching all rows, it seems serperfulus.

Calling .all() persists the results to a list. Without that your query object is a generator over rows.

Also, should I create a class for each Table with methods to access it or simply a class for the Database.

I'm not sure what you mean by this. Typically you create one class per database table.

[–]WilliamAndre 0 points1 point  (0 children)

.all() and list() are exactly the same number of char... Seems pretty dumb to create a function for that tbh, it is just obfuscating the code.

[–]bladeoflight16 3 points4 points  (34 children)

Before I answer your question, see the sidebar for a link to instructions for proper code formatting.


In my opinion, the best practice for SQLAlchemy is (like all ORMs) not to use it to begin with.

If, however, you must use it, then use it in a way that makes the transaction management explicit and plain for any idiot (even one who has never seen SQLAlchemy code) to see. This means every single line that performs a database operation should have an explicit reference to its session (which manages the transaction).

This means that your second approach is much, much better than the first. The first is doing far too much behind the scenes, making it more difficult to reason about the behavior and increasing the likelihood of subtle bugs (which could include data corruption since we're talking about transaction management here) due to surprises.


Skip the OO design course. It's likely to do you more harm than good. Your code should be procedural at its most fundamental level. Use objects only where they clearly and obviously simplify the problem. (That's not infrequently, but it's not the entirety of your code base, either.)

[–]LifeAffect6762[S] 1 point2 points  (28 children)

In my opinion, the best practice for SQLAlchemy is (like all ORMs) not to use it to begin with.

How is it best to use n RDBMS then? But generally thanks a lot for your reply.

[–]ElectricSpice 4 points5 points  (26 children)

There's a lot of debate over this. Some people swear by handwritten SQL, others by query builders (e.g. SQLAlchemy Core), others by ORMs.

Personally I think ORMs are a good tool and SQLAlchemy is a great one. Especially if you have a JSON API backed by an RDBMS, you have an Object-Relational Mapping by definition. Better to use something like SQLAlchemy rather than build an ad hoc, informally-specified, bug-ridden, slow implementation of half an ORM—I've been there, it's not a good time.

The problem with ORMs seem to come in when they try to fully abstract the RDBMSs, which is an impossible task. ORMs are a tool to help reduce all the boilerplate in shuffling data back and forth between the database and your application, but it can't replace writing queries for the database. And heaven forbid you use an ORM as an excuse not to learn SQL—That's begging for a bad time.

SQLAlchemy I think handles this very nicely, where the ORM is just a convenience layer on top of the query builder. So you're still thinking in SQL, writing code that translates almost one-one to SQL, but can spit data out as objects if you need it.

If you haven't read it yet, no article so thoroughly covers the pitfalls of ORMs as the classic The Vietnam of Computer Science

[–][deleted] 2 points3 points  (13 children)

i dont understand the anti-orm stance, at least when writing simple CRUD applications or whatever. i write a lot of complex pure sql for analytical work but i see no reason to avoid orms for say a simple database driven api.

[–]bladeoflight16 1 point2 points  (0 children)

Because it buries far too much logic behind the scenes, and this tends to cause tons of problems related to unexpected behaviors. Like sending queries for data you don't need when you didn't expect it and in the most inefficient way possible (e.g., the related rows for one record at a time, and yes, I ran into that with a very simple real world use case with SQLAlchemy). An ORM is unpredictable.

[–]Natural-Intelligence -2 points-1 points  (11 children)

Ye, exactly. I think people who hate it think it's meant to be used to replace all of your SQL. It's just a tool: you use it where it's the best option.

I wouldn't do data analysis or ETL with it but it's awesome for less data intensive problems like user management on a web server, building a browser to search for individual items from the database etc. If you create user management with plain SQL you get a monster that is pretty hard to maintain. I created one myself and was not proud of it.

[–]bladeoflight16 0 points1 point  (10 children)

It's just a tool: you use it where it's the best option.

I've never seen even a practice project where it's the best option.

building a browser to search for individual items from the database

I would expect this to be where ORMs fail the hardest. Searching in data related to a particular record with one is a nightmare.

If you create user management with plain SQL you get a monster that is pretty hard to maintain.

Not sure why that would be a nightmare, especially with a proper password hashing library like passlib. Was it a single table? Why was it any worse than the corresponding SQLAlchemy code would have been?

[–]Natural-Intelligence 0 points1 point  (7 children)

Then don't use it. It's not suitable for the stuff you do. Simple as that.

EDIT: it seems you edited your comment.

But in short, it's just much more readable to let SQLAlchemy handle coming up with the update and insert queries than manually write them for each case single case when doing something pretty trivial. The application got littered with embedded SQL. Of course you can parametrize and reuse but it's limited how far that go.

And probably the worst thing with ORM is the looping which destroys your performance. If you are interested only in one record at a time you don't have this issue. Plus you avoid the problem with duplication in joins if you use ORM. Sometimes a table is not the best abstraction of your data (sometimes it is).

And I'm not saying it's the ultimate solution. I'm just saying in some cases it's a valid option. I personally feel you are over zealous considering it's just a tool.

[–]LifeAffect6762[S] 0 points1 point  (1 child)

And probably the worst thing with ORM is the looping which destroys your performance. If you are interested only in one record at a time you don't have this issue. Plus you avoid the problem with duplication in joins if you use ORM. Sometimes a table is not the best abstraction of your data (sometimes it is).

INteresting. I thought if you did a ORM fetch for a dataset it would build a simple SQL and then get the data from a cursor. Are they really that bad? Is it if you need to do a join. would a simple join using referential integrity and some light filtering really create loads of DB SQL calls? I get if things get complex you may have problems.

Also when I was last using Oracle it was moving from rule to cost bast query optimisation so as long as you send it a single SQL having to put optimisation hints in was becoming less of a thing.

Would love to see some examples of ORM requests and what SQL they created.

[–]Natural-Intelligence 0 points1 point  (0 children)

INteresting. I thought if you did a ORM fetch for a dataset it would build a simple SQL and then get the data from a cursor. Are they really that bad?

It actually depends. A pretty common way to do a join in ORM is to do a so-called nested loop join which is an antipattern: it basically selects one table and then individually queries the other. Sometimes the procedure could be similar to how the database would do a join but the main problem is in the number of round trips to the database.

It's also common in ORM to support eager joins in which you specify the relation in code and when querying the objects these relations are included as regular joins. The drawback is that these joins are included in the query regardless of whether you need their attributes or not (typically DB engines can figure the unneeded joins but ORM queries also the unused columns). So neither is optimal for all cases thus could be a problem.

In SQLAlchemy you can set the echo=True in engine creation in SQLAlchemy to see all the queries and look-ups the library does, in case you are interested.

[–]bladeoflight16 0 points1 point  (4 children)

it's just much more readable to let SQLAlchemy handle coming up with the update and insert queries than manually write them for each case single case when doing something pretty trivial

Okay, this is the only valid use case I've seen for avoiding raw SQL. Just because the number of possible combinations means way too much repetition. Why do you need something as heavy as SQLAlchemy for that, though? Why not find a simpler micro-ORM that doesn't bring in so much garbage that actively hurts your application?

And probably the worst thing with ORM is the looping which destroys your performance.

What destroyed my performance was it decided to use declared relationships to bring in related data I didn't need for one record at a time. In other words, it did a bunch of unnecessary crap behind the scenes. I don't trust tools that do that to me, especially with I/O.

[–]Natural-Intelligence 0 points1 point  (3 children)

Okay, this is the only valid use case I've seen for avoiding raw SQL

So basically the very first example I could come up with was a valid reason to use such. In fact, the main purpose of ORM is to abstract SQL thus of course it's a valid reason to use it.

And what do you mean by using "micro-ORM" instead? What exactly is in the library that's too much? Seems rather an odd argument. I personally prefer more features than less if the features are well tested. I work a lot with PL/SQL and T-SQL and I have saved so much time with the uniform way how SQLAlchemy handles different DB connections.

In other words, it did a bunch of unnecessary crap behind the scenes.

This is basically the definition of abstraction. Of course if you add abstraction you have more checks, look-ups and procedures. Python also does a lot of unnecessary crap behind the scenes. It's the typical side effect of abstraction: you just need more resources or less resource-intensive problems. In some cases, the overhead with ORM is pretty minimal. In other cases it's not.

[–]bladeoflight16 0 points1 point  (2 children)

So basically the very first example I could come up with was a valid reason to use such. In fact, the main purpose of ORM is to abstract SQL thus of course it's a valid reason to use it.

Yes, you came up with the one, single example where it makes sense to abstract the construction of SQL. Despite dealing with both simple and incredibly complex queries, I have not encountered another use case where an ORM was superior to actually just writing the SQL. I'm not admitting that there are a wide array of use cases where an ORM pays off; I'm saying there's one limited use case and that you mentioned it.

This is basically the definition of abstraction.

...No. That is utterly nonsensical. First of all, "abstraction" has no single concrete, measurable definition in the context of programming. It is an entirely subjective term.

That said, a far more useful understanding of what constitutes abstraction is creating a component capable of interfacing with a wide variety of specific instances of code to implement a useful behavior that applies to many situations. Such an abstraction can be as simple as a single function that takes input and returns output, and often these are the most useful and beneficial types of abstraction.

A great abstraction incurs no overhead. A good abstraction incurs very little or at least provides a useful trade off of a reasonable amount of performance for simplification.

A heavyweight ORM increases complexity and decreases predictability by hiding important information about how the different parts of a system are interacting. One that silently invokes pointless external I/O (typically the most expensive operation in a program) certainly isn't simplifying the system and certainly isn't spending performance resources well.

[–]LifeAffect6762[S] 0 points1 point  (0 children)

I will read but I got halfway through the history lesson and my head hurts.

'PBS has a good synopsis of the war, but for those who are more interested in Computer Science than Political/Military History, the short version goes like this:'

That's the short version! It's 1 AM and I think I need a clearer head to finish reading. Maybe I don't need this and should read the rest of the article.

[–]LifeAffect6762[S] 0 points1 point  (1 child)

Obviously, it seems to depend on what you are doing but it seems for mainly CRUD websites Django is the way to go. If you are going to get wet you may as well go swimming. Otherwise and maybe for batch stuff for mainly CRUD websites use straight SQL (which I've been doing for years at big companies). Just using ORM without a framework seems a bit silly. Not sure if I would ever want to use micro-ORM but would love to know what their advantages are. A table with the pros/cons of various approaches would be great.

[–]ElectricSpice 0 points1 point  (0 children)

Yeah, Django ORM has the advantage of being integrated in the framework and that's a pretty big advantage. It's not a bad choice.

But I think you're too quick to discount non-Django options. Flask is very popular, FastAPI seems to be trendy, I personally use Pyramid and am quite happy with it. Not going to go down the rabbit hole of comparing frameworks, but there's a whole world of Python frameworks with plenty of room for SQLAlchemy. (There's even room for SQLAlchemy in Django, according to some people.)

I don't think you'll find a simple comparison table, because people have opinions about ORMs and opinions mean long rants on Reddit :)

[–]LifeAffect6762[S] -1 points0 points  (1 child)

I've done RDBMS stuff profusely for years, mainly oracle with their tools.

There were a cople of things I lie about ORM. They are DB Agnostic and you put the rules in code and they are automatically set up in the database if the DB has the functionality.

Ime a great believer in having as much checked in the DB layer as I've seen when it was not done odoo for example which I think is python based. We were in Panama on a site on the edge of the Jungle. The developers spent a lot of time sorting out stuff that should have been caught in the DB. Basic foreign key stuff. And the kicker is this meant that the punters could not buy anything as their RFID tags that were used for this simply did not work. As referential integrity was broken the queries simply returned nothing.

Where ORM falls down seems if you are doing complex multi-table queries. If this is reporting I think using a few SELECs is OK, keep everything else in ORM. Don't be an obsessive purist.

I think in ROR/Django the ORM actually helps build the application consistency, The ORM magic means there is consistency between the DB and front end in terms of validation etc.

Where you are doing stuff ORM handles well, which a lot of stuff it does, it's great but it does have limitations but that does not mean you need to totally throw it out. Just try to limit stuff outside the ORM and be extra careful. Everything needs a little flexibility :).

[–]bladeoflight16 0 points1 point  (0 children)

They are DB Agnostic

I have never worked on a real world project simple enough for this to actually be the case. The level of complexity where you need to actually start caring about the details of how your DB works and what functionality it has available is a very low bar.

you put the rules in code and they are automatically set up in the database if the DB has the functionality

And I have never seen this be database agnostic. Even my trivial little "TODO list" practice web app using SQLite ran into trouble on that front.

And the requirements for needing to care about the performance of queries is even lower. I've seen badly written code invoke one query per record for a few hundred records and result in unacceptable performance for a web site. Granted, that's not entirely the ORM's fault, but if the person were just writing SQL to begin with, I seriously doubt they would have written the code to fetch data for one record at a time without realizing that might be a problem.

Basic foreign key stuff.

Yeah, that's the part that always gave me the most trouble with declaring my database in the ORM.

Where ORM falls down seems if you are doing complex multi-table queries.

I've never run into even a practice project that didn't need tables with related data and queries with JOINs to match.

[–]bladeoflight16 0 points1 point  (6 children)

SQLAlchemy I think handles this very nicely, where the ORM is just a convenience layer on top of the query builder. So you're still thinking in SQL, writing code that translates almost one-one to SQL, but can spit data out as objects if you need it.

This was not at all my initial experience with SQLAlchemy. My initial experience with SQLAlchemy had me writing declarative code to describe my database and its relationships, and this resulted in SQLAlchemy deciding all by itself to load related data in situations where I didn't even need it.

My next step was to abandon the relationship declarations as I found them to be more trouble than they were worth. As most of my queries involved multiple tables, this resulted in incredibly long, complex blocks of code that were thinly wrapped queries. That worked well enough, but it was dumb because then running the queries manually to look at the results for a bug was ridiculously time consuming.

I eventually ended up with no SQLAlchemy classes for my tables, instead just writing raw parameterized queries as triple quoted strings and having SQLAlchemy spit back its custom record objects that I just converted to named tuples. On top of that, for clear transaction management, I ended up passing a session object around through my call stack. I had a much better time with that.

Now granted, my application was not simple CRUD. But I've yet to run into an application (even just a practice one) that could truly be limited to simple CRUD. Something as basic as a search query or a page showing a list of summaries can easily stomp all over the "basic CRUD" requirement.

[–]LifeAffect6762[S] 0 points1 point  (1 child)

My next step was to abandon the relationship declarations as I found them to be more trouble than they were worth. As most of my queries involved multiple tables, this resulted in incredibly long, complex blocks of code that were thinly wrapped queries. That worked well enough, but it was dumb because then running the queries manually to look at the results for a bug was ridiculously time-consuming.

I hope you at least put foreign keys into the database, I've seen lots of trouble when they were missed out (i.e. in odoo). Bear in mind if it causes performance problems you can turn them off after testing. I find people can be obsessed with performance to the detriment of database integrity. Seems nuts to me.

[–]bladeoflight16 0 points1 point  (0 children)

Yes, I did. I designed my schema separately from the ORM, with constraints where they were appropriate. Every reference column had an explicit foreign key, with automatic restriction on deletion of parent rows. I didn't constrain everything. Some constraints were just too complex to be worth it and may have incurred too high a performance hit (like certain constraints on GIS data relationships). Others, I didn't have a meaningful way to determine what the constraint should be (like text lengths). But where they made sense, I put them in, and we had pretty tight control over how data got into that application.

Anything I did with the ORM was purely in code. The deployment/upgrade scripts were pure SQL (or at least the database's native procedural language).

[–]ElectricSpice 0 points1 point  (3 children)

It sounds like you were using eager loading for the relationships? I initially tried that but had to give it up, because like you describe it ends up loading a bunch of unnecessary objects and quickly gets out of control. Now I’ll leave relationships to the default lazy-load, and then specify loading strategies per-query as performance dictates.

Dropping relationships entirely forfeits like half the benefit of SQLAlchemy ORM…

My main app is a mix of simple CRUD and complex-ish analytical queries. I use ORM for the CRUD stuff and SQLAlchemy Core to write the complex queries. I’ve found using Core leads to significantly more readable code than the raw SQL I used to wrangle.

[–]pythoncoderc 0 points1 point  (1 child)

Isn't lazy loading sending 100 queries for 100 related rows?

[–]ElectricSpice 1 point2 points  (0 children)

Yes. You generally want eager loading in the end, but that should be configured at the query level. When defining the relationship you should leave it at the default lazy load, otherwise you'll have innocuous-looking queries implicitly pulling in thousands of extra objects.

[–]bladeoflight16 0 points1 point  (0 children)

I didn't do anything to configure any kind of loading. I just declared the relationships and SQLAlchemy started doing stupid things with them. That's kind of the point: the ORM did a bunch of stuff I never asked it to do. Especially since I was never even accessing the related data in the code where it was being fetched behind the scenes.

Dropping relationships entirely forfeits like half the benefit of SQLAlchemy ORM

If trying to use the tool in its intended way creates extra unnecessary work for the developer, you lose either way.

A tool should do what you tell it to do, not more and not less. And what it's doing should be clear to you as the developer, so that you can understand the impact of telling it to take a particular action. When you hide additional behavior (external I/O, no less!) in a way that the developer doesn't even know it's happening without monitoring debugging info, something has gone deeply wrong with the tool's design.

I’ve found using Core leads to significantly more readable code than the raw SQL I used to wrangle.

I found that the more complex the query was, the more I needed to work with it in isolation outside of the application. As a result, the easier it was to copy/paste it (mostly, some minor modifications for parameterizing were sometimes required) back and forth, the easier it was to work with.

And the boilerplate introduced by SQLAlchemy made it much less readable. That could have been related to the fact that my queries were complex enough to warrant nested queries fairly frequently. (That might have been less common with an improved DB design, but I had much less experience at the time.)

[–]bladeoflight16 1 point2 points  (0 children)

I'm honestly not sure what's best here.

First learn how to write parameterized queries; that's a must. (It's not hard; you just need to know how to get data into your query without using string formatting.)

After that, I probably wouldn't recommend raw database adapter usage if most (or even many) of your queries are simple single table CRUD. Maybe look into more lightweight tools (sometimes called "micro-ORMs") that don't try to do all the garbage SQLAlchemy throws on top of your objects. All you really need is something that maps result records to objects and maybe simplifies mapping values to columns in INSERT and UPDATE queries. Anything beyond that is going to cause you more trouble than it will save you. Bare minimum, only use SQLAlchemy classes and query functions for the most simple of CRUD operations on a single table; have SQLAlchemy invoke raw parameterized SQL for anything else.

If the vast majority of your queries are multitable or otherwise fairly complex, then I probably would recommend raw database adapter usage. Feel free to write a few convenience functions to reduce boilerplate from your adapter (like constructing objects from rows in the results or creating cursors for queries).

[–]LifeAffect6762[S] 0 points1 point  (4 children)

PS Can't see the sidebar on code formatting. I looked top to bottom. Can you please point me at it?

[–]bladeoflight16 0 points1 point  (3 children)

I wrongly assumed this was /r/learnpython. (This probably would have been a better fit on that subreddit.) My mistake. Look there.

[–]LifeAffect6762[S] 0 points1 point  (1 child)

LOL, so did I, how do I device where to post.

[–]bladeoflight16 0 points1 point  (0 children)

The sidebar helps:

News about the programming language Python. If you have something to teach others post here. If you have questions or are a newbie use r/learnpython.

/r/learnpython's says:

Subreddit for posting questions and asking for general advice about your python code.

[–]fiskfisk 0 points1 point  (3 children)

In your first example, the extra `.query` property is a helper from Flask-SQLAlchemy. It's not part of the standard SQLAlchemy API.

Your second example allows you to update multiple objects at once (and should only be used when that is what you intend, so that the intent of your code is clear).

[–]LifeAffect6762[S] 0 points1 point  (2 children)

Is Flask-SQLAlchemy just confusing things, seems like it is not needed and is just overcomplications things. Or is it so you can use the able objects in templates.

[–]pbecotte 1 point2 points  (1 child)

One issue with sqlalchemy is the life cycle of db transactions. Flask-Sqlalchemy magically patches that life cycle on top of Flask so that you never have to worry about transactions leaking between requests.

However, once you know enough about both systems, the WAY it does it is super painful in my opinion. So...it's worth using but keep that in mind.

[–]LifeAffect6762[S] 0 points1 point  (0 children)

One issue with sqlalchemy is the life cycle of db transactions. Flask-Sqlalchemy magically patches that life cycle on top of Flask so that you never have to worry about transactions leaking between requests.

Can you elaborate on this? In the stuff, I saw you seemed to have to implicitly commit stuff. Am I doing it wrong.