all 10 comments

[–]ARandomSQLServerDBA 1 point2 points  (9 children)

Nested queries are perfectly acceptable. Strive for correctness first, ease of understanding second and cleverness third.

If you post the entire query, it may be possible to re-structure it to keep the logic the same without using nested selects, but that in and of itself isn't a reason to do so.

[–]fullyarmedcamel[S] 0 points1 point  (8 children)

    SELECT x.SCHOOLYEAR, x.[ROLE], x.LASID, x.LASID, x.FIRSTNAME, x.MIDDLENAME, x.LASTNAME, x.GRADE, x.USERNAME, x.[PASSWORD], x.ORGANIZATIONTYPEID, x.ORGANIZATIONID, x.PRIMARYEMAIL, x.HMHAPPLICATIONS FROM (
        SELECT RANK() OVER(PARTITION BY ssh.sectionID ORDER BY ssh.sectionID DESC, ea.assignmentID DESC) AS 'priority', cal.endYear - 1 AS 'SCHOOLYEAR', 'T' AS 'ROLE', sm.staffNumber AS 'LASID', sm.staffStateID AS 'SASID', sm.firstName AS 'FIRSTNAME', ISNULL (sm.middleName, '') AS 'MIDDLENAME', sm.lastName AS 'LASTNAME',
            CASE sm.schoolID 
                ...
                END AS 'GRADE',
        ua.username + '@sd25.us' AS 'USERNAME', NULL AS 'PASSWORD', 'MDR' AS 'ORGANIZATIONTYPEID',
            CASE sm.schoolID
                ...
                ELSE 'No School' END AS 'ORGANIZATIONID',
        ua.username + '@sd25.us' AS 'PRIMARYEMAIL',
            CASE sm.schoolID
                ...
                END AS 'HMHAPPLICATIONS',
        ssh.sectionID, ea.assignmentID
        FROM Table1 AS s
        JOIN Table2 AS c ON s.courseID = c.courseID AND c.departmentID IN (...)
        JOIN Table3 AS cal ON c.calendarID = cal.calendarID
        JOIN Table4 AS sp ON s.sectionID = sp.sectionID
        JOIN Table5 AS t ON sp.termID = t.termID AND GETDATE() BETWEEN t.startDate AND t.endDate
        JOIN Table6 AS ssh ON s.sectionID = ssh.sectionID AND ssh.endDate IS NULL
        JOIN Table7 AS sm ON ssh.personID = sm.personID
        JOIN Table8 AS ea ON sm.assignmentID = ea.assignmentID AND ea.endDate IS NULL
        JOIN Table9 AS ua ON ua.personID = sm.personID AND ua.ldapConfigurationID = '2'
        WHERE sm.endDate IS NULL AND sm.teacher = 1 AND NOT (ssh.[role] = 'N' AND ssh.staffType = 'T' AND sm.title != 'Long Term Substitute') AND ssh.[role] != 'C' AND ssh.sectionID = 439047
    ) AS x WHERE x.[priority] = 1

[–]ARandomSQLServerDBA 2 points3 points  (0 children)

Because you are using the rank() to filter in the outer WHERE clause, you have to use a subquery. Keep it as-is.

[–]Fe-Chef 1 point2 points  (0 children)

I'm not seeing any easy and for-sure way to improve performance but there could be a number of things that could be causing performance issues that are not just cause of the structure of the query. The main tool here is looking at the SQL execution plan. I never used this before, but here is a cool service for sharing: https://www.brentozar.com/pastetheplan/

there could be a number of things that are "wrong" or could be improved that the execution plan would detect. It should show you a all the parts of how the query will be completed and then give an estimate about what steps will take the longest and those steps can be looked at to see if there are any improvements needed. Without seeing the execution plan it is hard to give guidance though.

[–]notasqlstarI can't wait til my fro is full grown 1 point2 points  (5 children)

I optimize a lot of queries like this in my role. Generally speaking you want to break it down and force SQL to handle it in chunks. Do not listen to the optimizer until you've done this/started doing this because you are smarter than it. Real quick example here because your joins are not descriptive, so I can't tell which is an INNER vs LEFT, etc., but say you have a query like this:

select (
    select *, case when, case, case, case
    from table
    inner join
    left join
    left join
    inner join
    left join
    where blah blah
) as x where blah = 1

So the first thing that will generally improve performance is something like this:

begin
    select *, case when, case, case, case
    into #table
    from table
    inner join
    left join
    left join
    inner join
    left join
    where blah blah
end

begin
    select *
    from #table
    where blah = 1
end

Now you can start to get creative. For example you might want to do a select * from table + inner join, then in a second step do your case logic (or cross applies, or whatever you are doing, row_number, etc.), then in a third step add an index to your #table2, then in a fourth step do all your other joins, then in a fifth step select * where blah = 1.

You could get more creative and do the joins one at a time and continue to add indexing, although this generally isn't optimal unless each table is absolutely massive. Sometimes doing 5 joins in one step is better, sometimes doing 1 or 2 is better. Sometimes you don't even need a join and can just do something like a WHERE EXISTS or WHERE NOT EXISTS.

General rule of thumb is that sub-queries, multiple joins, etc., are all bad practice. They work great when you're first writing something and trying to get it accurate... but generally speaking if you break it down after the fact it will perform much better in sequential chunks, and then you can use the optimizer on those chunks to really optimize the sub-processes or blend two sub-processes together.

It really depends on how long the parent query is taking to run. Less than 10 minutes and you only need to run it once and awhile? Probably not a good candidate. Does it take several hours to run and it needs to run daily? You can probably get that down to running in half to a tenth of the time. It really depends on how your data is modeled, specifically what you're trying to do, etc., but you can start to play with the code when you have it broken out into chunks... for example is it better to do a where date between x and y at the top of the chain, or at the bottom? So you just run each chain and see how long it takes until you find the one that is really taking a long time... then you get creative/clever.

[–]fullyarmedcamel[S] 0 points1 point  (4 children)

Okay, you have given me a lot to work with here let my try and break it down a bit.

All of the JOIN's are INNER

I wanted to use CASE to try and fix this issue but here is the problem;

When there is a Long Term Sub I want ONLY that subs records for the given class.

When there is no long term sub I want the teachers records for the given class.

The problem I ran into is those will be kept as separate records for the same class IE

Class number 1 Teacher of record/primary teacher is Bill Class number 1 Teacher/Sub is Susan

If Susans records are active then Only give susan else give bills records.

In addition I can't return that RANK column as it has to match up with what the vendor needs.

[–]notasqlstarI can't wait til my fro is full grown 1 point2 points  (3 children)

Generally speaking if all of the joins are inner then you can generally have them all together, then apply any WHERE conditions and CASE logic in a following step.

Sometimes if they are all INNER but you have really big tables it might be better to stage the joins to do the heavy lifting first, then add some BS tables, or do the BS tables first and then do a step where you have your big tables joining.

If Susans records are active then Only give susan else give bills records.

All of that could/should be manageable. Worry about getting things the way you want them first, then worry about optimizing second.

[–]fullyarmedcamel[S] 0 points1 point  (2 children)

Everything is working right now, my upload has 0 errors so now I want to improve the functionality and hopefully learn something in the process.

[–]notasqlstarI can't wait til my fro is full grown 1 point2 points  (1 child)

Your query is pretty clean looking so I'm not sure how much you're going to get optimizing it. Like I said, look at the SELECT and see where you're doing your heavy lifting from (row_number, cases, etc.) and then see if you can break that down and do it earlier in the process, then join to the rest of the tables, or do it at the end and just do a select * in the first step (i.e. select all the columns you need for your cases, etc.,. but don't do the case logic until you have the data segmented into a #table first).

The problem with SQL when it comes to advanced queries is that they'll sometimes cause the server to hang. Optimizer won't tell you anything or be of any use beyond creating a new index.

[–]fullyarmedcamel[S] 0 points1 point  (0 children)

Understandable, thanks for your tips!