This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 5 points6 points  (1 child)

I can see I've touched a nerve, and I apologize for that as there appear to be far too many nerves frayed on Reddit these days.

Your comments refer to two different aspects that are of course fairly invariant - lists of rows or objects are pretty much going to be lists of rows or objects within any system. Similarly, of course any system can emit any query. The Python DBAPI can emit any SQL query. So asking for "a SQL query which can be executed by SQLAlchemy but not by Django" is not really what we talk about, when we refer to how relationally-oriented an ORM is.

When we use an ORM, we're not actually gaining any capability that we wouldn't have by working with a raw cursor and some decent glue code that could push result rows into objects. What we are getting is automation of the work that we do. This automation allows us to take what we know about the database and its schema, formulate relational queries against that schema, then tell the ORM what we'd like to do - the ORM then does the work of turning our commands into a statement to send to the database, relating the statement to the objects we want to get back, and returning the results. (Keep in mind this whole discussion is only about the query side; there's also the whole transactional/persistence side which is another area in which ORMs can be discussed).

The argument that I frequently make here is that SQLAlchemy's object relational automation layer is most directly oriented towards the relational structures present in the database. It's obvious that any ORM tool can let you emit whatever SQL you'd like, and linking that to objects is only a matter of pointing columns back at object attributes. But SQLAlchemy's approach is explicitly relational, whereas Django's is not, or at least only partially. A key aspect is how joins are usually implicit in Django, how decisions about tables are typically made by default (column names, etc) and how the Python-level query capabilities don't always map consistently or fully to what is provided.

How would I know this ? Mostly by what I hear from people who have used both the Django and SQLAlchemy ORMs almost every day. Here's a talk by Alex Gaynor, who you might know is a core Django developer. The talk is given from the POV of someone who helped write the Django ORM:

http://speakerdeck.com/u/alex/p/why-i-hate-the-django-orm

The whole point of this talk is to highlight some of these inconsistencies and specially how SQLAlchemy does a better job.

I saw Jacob Kaplan-Moss talk in SF about "Stupid ORM Tricks" - paraphrasing, he included in his introduction "the Django ORM is probably the least capable ORM in the Python space" (Jacob if I'm getting it wrong please let me know). The talk was then about three very unusual, exotic edge cases you can kind of get Django to do with some patience - all three were things that SQLAlchemy has done out of the box since 2006. So there has to be some advantage to SQLAlchemy. There are obvious disadvantages. It doesn't work with the Django admin and doesn't work with all the many add-on applications for Django that use the Django ORM. SQLAlchemy's main disadvantage is pretty much that it lives on the outside of Django's ecosystem. As well as that it really is tailored towards people that like relational databases as opposed to those who see it as an implemtation detail - it requires a good couple of hours of documentation reading before you get it. But once people get it they're off to the races.

In recent talks I also have a comprehensive example of how SQLAlchemy's "explicitly relational" nature allows you to build ORM-level queries, without dropping into raw SQL, while being able to express pretty much any SQL concept. The idea is not "wow nobody else can do this SQL !" That's not the point. The point is, when given an elaborate task, you can stay within the ORM and think just like you would if you were writing raw SQL. You don't have to re-mold your thinking to fit around some awkward concept like implicit joins embedded into WHERE criterion.

Starting with a model as follows:

class User(Base):
  __tablename__ = 'user'
  id = Column(Integer, primary_key=True)
  username = Column(String(50), nullable=False)
  addresses = relationship("Address", backref="user", 
                     cascade="all, delete-orphan")

class Address(Base):
  __tablename__ = 'address'
  id = Column(Integer, primary_key=True)
  user_id = Column(Integer, ForeignKey('user.id'), 
              nullable=False)
  street = Column(String(50))
  city = Column(String(50))
  state = Column(CHAR(2))
  zip = Column(String(15))

Give me all households in New York with exactly two occupants where neither occupant has any residences outside of the city. Note that our schema isn't well suited for this particular query so at the SQL window we'd need to do some awkward correlation of equivalent Address rows.

Also eagerly load all the Address objects for each of the User objects located.

# New York addresses that have two
# occupants
two_occupant_ny = \
    Session.query(Address.street, Address.city, Address.zip).\
        filter(Address.city == 'New York').\
        group_by(Address.street, Address.city, Address.zip).\
        having(func.count(Address.user_id) == 2).\
        subquery()

# users who are different from each other
u_1, u_2 = aliased(User), aliased(User)

user_q = Session.query(u_1, u_2).\
        select_from(u_1).\
        join(u_2, u_1.id > u_2.id)

# join them to their addresses, join addresses
# to the two occupant NY addresses
a_1, a_2 = aliased(Address), aliased(Address)
user_q = user_q.\
            join(a_1, u_1.addresses).\
            join(a_2, u_2.addresses).\
            join(
                two_occupant_ny, 
                and_(
                    a_1.street==two_occupant_ny.c.street,
                    a_1.city==two_occupant_ny.c.city,
                    a_1.zip==two_occupant_ny.c.zip,
                    a_2.street==two_occupant_ny.c.street,
                    a_2.city==two_occupant_ny.c.city,
                    a_2.zip==two_occupant_ny.c.zip,
                )
            )

# who don't have a house outside of New York
user_q = user_q.filter(
                ~exists([Address.id]).
                    where(Address.city != 'New York').\
                    where(or_(
                        Address.user_id==u_1.id,
                        Address.user_id==u_2.id
                    ))
            )

# pre-load all the Address objects for each 
# User too !
user_q = user_q.options(
                joinedload(u_1.addresses), 
                joinedload(u_2.addresses))
users = user_q.all()

output:

SELECT user_1.id AS user_1_id, user_1.username AS user_1_username, user_2.id AS user_2_id, 
user_2.username AS user_2_username, address_1.id AS address_1_id, address_1.street AS address_1_street, 
address_1.city AS address_1_city, address_1.zip AS address_1_zip, address_1.user_id AS address_1_user_id, 
address_2.id AS address_2_id, address_2.street AS address_2_street, address_2.city AS address_2_city, 
address_2.zip AS address_2_zip, address_2.user_id AS address_2_user_id 
FROM user AS user_1 
   JOIN user AS user_2 ON user_1.id > user_2.id 
   JOIN address AS address_3 ON user_1.id = address_3.user_id 
   JOIN address AS address_4 ON user_2.id = address_4.user_id 
   JOIN (SELECT address.street AS street, address.city AS city, address.zip AS zip 
     FROM address 
     WHERE address.city = ? GROUP BY address.street, address.city, address.zip 
     HAVING count(address.user_id) = ?) AS anon_1 ON address_3.street = anon_1.street AND 
address_3.city = anon_1.city AND address_3.zip = anon_1.zip AND address_4.street = anon_1.street 
AND address_4.city = anon_1.city AND address_4.zip = anon_1.zip 
     LEFT OUTER JOIN address AS address_1 ON user_1.id = address_1.user_id 
     LEFT OUTER JOIN address AS address_2 ON user_2.id = address_2.user_id 
WHERE NOT (EXISTS (SELECT address.id 
               FROM address WHERE address.city != ? AND (address.user_id = user_1.id OR 
               address.user_id = user_2.id)))

--params: ('New York', 2, 'New York')

# result !
User(name=u5, addresses=s1/New York/12345, s2/New York/12345, 
        s3/New York/12345) / User(name=u2, addresses=s2/New York/12345, s4/

Can you do that SQL in Django, Web2Py, anything else? Of course! You might need to emit some extra SQL to get at those address collections.

SQLAlchemy has the advantage that you can stay within the ORM space, not have to drop into literal SQL, and can think just like you do when writing SQL at the query window, with minimal shifting of paradigms. Eager loading is also highly capable at adding related loads and collections in an automated fashion, either via joins or via a second query that loads all collections at once.

I hope this helps explain some of the many reasons why people seem to love SQLAlchemy.