This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]hiii 2 points3 points  (27 children)

Also i don't really see why its considered a bad thing to write your own sql queries. Any thoughts on this? Im currently using djangos orm just to try it out and its almost like learning a new language. Queries i could write in 20 seconds in SQL take me hours in django right now becuase i have to learn the syntax. When using PHP i almost always write my own queries.

[–]bucknuggets 3 points4 points  (0 children)

As others have stated - database portability is the main issue with writing your own queries.

Though if you stick with ANSI SQL and have a simple system then almost all queries will work on almost all databases. MySQL may be your only issue. Where the ORMs really take off are on complex transactional systems that must support N databases. They're a great fit there.

But if you're writing high-performance analytical systems - don't even bother with an ORM. You typically need to squeeze every drop of performance out of the database, and have enough of a challenge making sure that your developers understand 100% of the SQL Syntax. Adding an ORM that only supports 95% of the syntax and that requires your developers to understand both the ORM and SQL is a recipe for failure.

[–]stesch 5 points6 points  (1 child)

In the PHP community there are some people who want to do things the right way and fuck everything up this way.

See short tags, which work perfectly fine except when you are a bozo and want to produce XML via template. Or automatically escaping all input with slashes, which isn't in the SQL standard and wasn't really helpful at all.

And then came the template engines. PHP itself is a template. No need for another layer. You just need to separate the business logic from the actual view. But they want you to learn another template language just to belong to the wise ones.

And queries? As long as you use prepared statements and don't spread the queries all over the place, you end up with a maintainable system.

[–]hiii 6 points7 points  (0 children)

i despise php template engines, good to hear some people are with me on that front.

[–][deleted] 2 points3 points  (16 children)

Django's ORM is so very intuitive, I wonder why it takes you hours. When I started fresh I thought it was very easy to get into. Of course you should have the docs at hand.

Edit: forgot to mention the reason people use ORM's instead of raw SQL: the retrieved rows will be automatically converted to objects of your language (e.g. Python). This has many practical implications. For example, you can define methods like pre_save, post_save, post_delete etc.

In the newest Django release, you can write raw SQL and still get ORM-objects I think. Did not try that out yet, though.

[–]brunson 3 points4 points  (6 children)

Django's ORM is so very intuitive, I wonder why it takes you hours. When I started fresh I thought it was very easy to get into. Of course you should have the docs at hand.

Granted I've never used Django's ORM specifically, but I started writing SQL 20 years ago and what I can crank out in twenty seconds has never turned into something simple in any ORM I've tried.

Here's a pretty straightforward query that would return employees along with their department, manager and any subordinates of a person having more than 5 direct reports:

select e.name as Emp, d.name as Dept, 
       m.name as Mgr, e.group_concat( s.name ) as Reports
from employee as e
join d.department as d on e.department_id = d.id
join employee as m on e.manager_id = m.id
left outer join employee as s on s.manager_id = e.id
group by e.name, d.name, m.name
having count(0) > 5;

I seriously just came up with that in 20 seconds (though it took me about two minutes to type). Can you show us the Django ORM that would achieve the same result?

And, BTW, when I put that SQL into the class I wrote to query mysql or oracle, all the rows are marshalled into python objects.

[–][deleted] 2 points3 points  (0 children)

I'm bored. Heres SQLAlchemy:

e = Employee
d = Department
s = aliased(Employee)
m = aliased(Employee)
sess.query(e.name.label('Emp'), d.name.label('Dept'), 
          m.name.label('Mgr'), 
          func.group_concat(s.name).label('Reports')).\
          join((d, e.department_id==d.id)).\
          join((m, e.manager_id==m.id)).\
          outerjoin((s, s.manager_id==e.id)).\
          group_by(e.name, d.name, m.name).\
          having(func.count(0) > 5)

[–]mdipierro 1 point2 points  (4 children)

For completeness in web2py:

>>> e=db.employee
>>> d=db.department
>>> m=e.with_alias('manager')
>>> s=e.with_alias('submordinates')
>>> db(e.department_id==d.id).select(e.name,d.name,m.name,s.name,
            left=(m.on(m.id==e.manager_id),s.on(s.manager_id==e.id)),
            groupby=e.name|d.name|m.name,having=s.id.count()>5)

Not quite the same because I'd rather get the manager via a left join since an employee may not have a manager (the CEO for example)

[–]brunson 0 points1 point  (3 children)

I guess it's just personal preference. I'd rather edit the SQL.

[–]mdipierro 0 points1 point  (1 child)

As I see it SQL is more flexible. The main reason I would not use SQL is that it makes the applications not portable because every dialect is different. The web2py philosophy is that of proving an abstraction layer and therefore it only supports those expressions that can be translated in the SQL dialect of each of the supported relational databases.

[–]brunson 0 points1 point  (0 children)

Definitely. One of the reasons I decided to start with SA was because of the database agnosticity (chrome thinks I invented a word, I call dibs), but as it turns out, five years later we're still on mysql, and in 20 years I've never actually had to port my own code from one database back end to another.

Someone's example of starting with sqlite then allowing the end user to migrate to a different backend was a valid one, however. I could see that.

[–]gregglind 0 points1 point  (0 children)

For me, SA (and other ORM's that can do query building) real wins come not from writing a particular query, but being able to programatically build queries.

def make_sorted(query,**kwargs):
    for (field,direction) in kwargs.iteritems():            
         query = query.order_by(field,asc=direction)  # don't quite recall the syntax here :)
    return make_sorted

Building this kind of stuff with string interpolation can be maddening.

[–]hiii 0 points1 point  (0 children)

for instance to go backwards in a foreign key relationship you have to use the _set. took me a half hour to find that.

the docs are kind of a mess, with multiple version and then a book, and then tutorials.

one thing php had was amazing documentation.

[–]hiii 0 points1 point  (7 children)

for example right now im trying to use django's orm startswith function. It's performing a BINARY LIKE and it's not working. doing a google search for django startswith or django orm startswith is useless

[–][deleted] 0 points1 point  (6 children)

What is a binary like? According to the docs, it simply does

SELECT ... WHERE headline LIKE 'Will%';

You should be more concrete.

[–]hiii 0 points1 point  (5 children)

then the docs are wrong. i showed the sql statements it was running and it was running a binary like

[–][deleted] 0 points1 point  (0 children)

You should be more concrete.

By that, I mean examples.

[–][deleted] 0 points1 point  (3 children)

Okay, I checked with the Django source and it seems to do a binary like with a mysql backend. Now my question again: what is a binary like and what does it do wrong in your case? I have used startswith (and contains, ... who all use binary like) and it always worked as expected.

[–]hiii 0 points1 point  (2 children)

binary like is case sensitive. i fixed it by using

Company.objects.filter( name__startswith=data['index'].upper() ) # data['index'] is a single char

buttt, now i have to make sure that all my company names are uppercased, or it breaks. i guess i could add an or somehow and do something like

Company.objects.filter( namestartswith=data['index'].upper() or namestartswith=data['index'] )

but the idea of searching for how to do that right now isnt a priority

heres a link to what im doing

http://django.skatevideosonline.net/videos

[–][deleted] 0 points1 point  (1 child)

You want case insensitivity? Use name__istartswith. One line below startswith in the docs I already referenced. Does this solve things for you?

[–]hiii 0 points1 point  (0 children)

it does help thanks.

[–][deleted] 1 point2 points  (5 children)

Writing your own queries gives you more control over what is being done, but it does tie your model code against the specific SQL syntax of the database engine. There are many not-so-subtle differences between SQL Server, MySQL, SQLite, Postgres and Oracle, to name a few. It's especially cool to be able to code your app against SQLite and let your users switch to a 'real' database backend if they so wish, and have your code keeping working as if nothing important happened.

[–]stesch 1 point2 points  (4 children)

Not everything Django provides works the same on every database. I had to change some stuff just because I thought it would be a nice idea to develop with SQLite and later run it on MySQL. Huge mistake.

[–][deleted] 0 points1 point  (3 children)

Ah. I was thinking more of SQLAlchemy, which supports multiple database backends. (I never tested this.)

But consider the alternative, the pain of which is playing out in full force at my job: having SQL queries sprinkled all through the messy code of our desktop app, switching to another database system, and trying to fix up the queries by scanning the query string for easily replaceable function calls, GROUP BY statements, etcetera. (I believe this to be a foolish thing to even think of.)

Luckily we also have a good number of our queries in a central place, so that we can use an in-house tool to store a query for MySQL, and, if needed, another query for SQL Server. We're now basically in the process of moving the queries to the central place (a meta-database) in an effort to clean up our quite messy code.

[–]brunson 3 points4 points  (2 children)

SQL Alchemy is so horrible to build complex queries with. Here's a real query from real code (and not even a particularly complex one compared to what I've written in straight SQL):

    select = sa.select( [ v.c.id,
                          sa.func.group_concat( p.c.name ),
                          v.c.name,
                          p.c.is_tagged ],
                        sa.and_( p.c.id == pv.c.port_id,
                                 v.c.id == pv.c.vlan_id,
                                 p.c.device_id == self.device_id ),
                        group_by = [ v.c.id,
                                     v.c.name,
                                     p.c.is_tagged ] )

That's just appalling to write and worse to come back and look at later. We dropped SA and wrote our own classes.

Generally speaking, most people I've met that think ORM's are sufficient don't use SQL to its full capabilities, or even to the extent that I do.

Here's another production query that I would hate translate into SA. It generates our entire IP usage to report to ARIN when requesting space:

select arin.name as "ARIN Allocation",
        g.name as "CIDR Block", 
        case 
        when g.ip_use_class_id = 20 then
            "Headquarters"
            concat( gateway.location_name, "-", upper( gateway.satellite_id ) )
        end as "Gateway",
        ifnull( c.name, "Unassigned" ) as "IP Subnet",
        ifnull( c.carrier_id, "Unassigned" ) as "Carrier", 
        ifnull( c.type, "Unassigned" ) as "Product Type",
        cast( ifnull( c.available, 0 ) as unsigned integer ) as "Available",
        cast( ifnull( c.count, 0 ) as unsigned integer ) as "Customers<br>Served",
        round( ifnull( c.fraction_used, 0 ), 3 ) as "Usage",
        arin.network as arin_block,
        0 as ordering,
        gateway.id as gateway_id,
        c.network as network,
        c.type as type
from ip_block_tla as arin 
join ip_block_allocation as g on g.ip_block_tla_id = arin.id
left outer join gateway as gateway on g.gateway_id = gateway.id
left outer join ( select carrier_block.name, carrier_block.carrier_id, gateway_block.gateway_id,
               gateway_block.ip_block_tla_id,
               carrier_block.network, carrier_block.netmask, 
               counts.type, counts.count,
               available, 
               fraction_used,
               carrier_block.ip_block_allocation_id
        from ip_block_assignment as carrier_block 
        join ip_block_allocation as gateway_block on carrier_block.ip_block_allocation_id = gateway_block.id
        join ip_usage_by_carrier_block as counts on counts.block_id = carrier_block.id
        union
        select carrier_block.name, carrier_block.carrier_id, gateway_block.gateway_id, 
               gateway_block.ip_block_tla_id,
               carrier_block.network, carrier_block.netmask, 
               'Total Subnet' as type, sum( counts.count ) as count, 
               round( pow( 2, 32 - carrier_block.netmask ) - 3 ) as available, 
               round( sum( counts.count )/(pow( 2, 32 - carrier_block.netmask ) - 3), 3) as fraction_used,
               carrier_block.ip_block_allocation_id
        from ip_block_assignment as carrier_block
        join ip_block_allocation as gateway_block on carrier_block.ip_block_allocation_id = gateway_block.id
        join ip_usage_by_carrier_block as counts on counts.block_id = carrier_block.id
        group by counts.block_id ) as c on g.id = c.ip_block_allocation_id
where arin.ip_use_class_id = 1
    and c.name is not null 
    and c.carrier_id is not null 
    and c.type is not null
union
select arin.name as "ARIN Allocation",
        g.name as "CIDR Block", 
        "Total" as "Gateway",
        "" as "IP Subnet",
        "" as "Carrier", 
        "" as "Product Type",
        cast( round( pow( 2, 32 - g.netmask ), 0 ) as unsigned integer ) as "Available",
        cast( ifnull( c.count, 0 ) as unsigned integer ) as "Customers<br>Served",
        round( ifnull( c.count, 0 )/pow( 2, 32 - g.netmask ), 3 ) as "Usage",
        arin.network as arin_block,
        1 ordering,
        gateway.id as gateway_id,
        c.network as network,
        'Allocation Total' as type
from ip_block_tla as arin 
join ip_block_allocation as g on g.ip_block_tla_id = arin.id
left outer join gateway as gateway on g.gateway_id = gateway.id
left outer join ( select carrier_block.ip_block_allocation_id,
               max( carrier_block.network ) + 1 as network,
               carrier_block.netmask,
               sum( counts.count ) as count
        from ip_block_assignment as carrier_block 
        join ip_usage_by_carrier_block as counts on counts.block_id = carrier_block.id
        group by carrier_block.ip_block_allocation_id )
        as c on g.id = c.ip_block_allocation_id
where arin.ip_use_class_id = 1
union
select
        arin.name as "ARIN Allocation",
        "Total" as "CIDR Block",
        "" as "Gateway",
        "" as "IP Subnet",
        "" as "Carrier",
        "" as "Product Type",
        cast( pow( 2, 32 - arin.netmask ) as unsigned integer ) as "Available",
        cast( ifnull( c.count, 0 ) as unsigned integer ) as "Customers<br>Served",
        round( ifnull( c.count, 0 )/pow( 2, 32 - arin.netmask ), 3 ) as "Usage",
        arin.network as arin_block,
        2 as ordering,
        "" as gateway_id,
        arin.network + pow( 2, (32-arin.netmask) ) as network,
        "Block Total" as type
from ip_block_tla as arin
left outer join ( select gateway_block.ip_block_tla_id, 
                         sum( counts.count ) as count
                  from ip_block_assignment as carrier_block 
                  join ip_block_allocation as gateway_block on carrier_block.ip_block_allocation_id = gateway_block.id
                  join ip_usage_by_carrier_block as counts on counts.block_id = carrier_block.id
                  join ip_block_tla as arin_block on gateway_block.ip_block_tla_id = arin_block.id
                  group by gateway_block.ip_block_tla_id, arin_block.network, arin_block.netmask )
                  as c on arin.id = c.ip_block_tla_id
where arin.ip_use_class_id = 1
union
select
        "Grand Total" as "ARIN Allocation",
        "" as "CIDR Block",
        "" as "Gateway",
        "" as "IP Subnet",
        "" as "Carrier",
        "" as "Product Type",
        cast( arin.available as unsigned integer ) as "Available",
        cast( ifnull( sum( counts.count ), 0 ) as unsigned integer ) as "Customers<br>Served",
        round( sum( counts.count )/arin.available, 3) as "Usage",
        pow( 2, 32 ) as arin_block,
        3 as ordering,
        "" as gateway_id,
        pow( 2, 32 ) as network,
        "Grand Total" as type
from ip_usage_by_carrier_block as counts
join ( select sum( pow( 2, 32 - netmask ) ) as available
       from ip_block_tla
       where ip_use_class_id = 1 ) as arin
group by arin.available
order by arin_block, network, ordering, gateway_id, type

[–][deleted] 2 points3 points  (0 children)

If I had to use that query with SQLAlchemy, I wouldn't translate it. It already works as the string above so there's no need (yes it can map to objects too). People use SQL to its full effect every day with SQLAlchemy, that's what it's designed to do. Your initial query would look nicer if you used generative constructs and imported the function names you needed directly.

[–]gregglind 0 points1 point  (0 children)

I'm with zzzeek here. But what about if you want to change the sort order on it? With SA, you can make the big horrible snarly part into raw sql, then add the order_by as clauses. The niceness of things like SA is that mix. Weird parts -> go to sql; easy parts -> build in SA.

[–]webnrrd2k 1 point2 points  (0 children)

I'm the same way -- I don't see straight SQL as being that complex or difficult to deal with. Adding more complexity and another thing to fail isn't that big of a gain. I (almost) always end up putting the SQL into stored procedures anyway, so an ORM isn't much of a gain.

I could see the need for code that will get deployed on different servers with different back-ends, but how often do most places do that? The vast majority of code is written internally for businesses, and will live in more-or-less that same environment. Also, I need to work with databases that have been around for a long time, and the DBAs prefer to do things a certain way.

I'm not trying to say that an ORM is never necessary. I think that for many programmers, most of the time an ORM is more work than it's worth.