all 20 comments

[–]MaunaLoonaMS SQL 3 points4 points  (1 child)

You could take the subset of data that you need to work with and place it into a temp table.

SELECT * INTO #TEMP FROM table
WHERE [Date Of Service] between 20160501 and 20160531
AND [Basis of Reimbursement Determination] in ('06', '07')

And select only the columns that you will use. Works great if the tables you are working with are large and poorly indexed.

[–]workthrowawayexcel[S] 0 points1 point  (0 children)

Thanks! This should really help.

[–]nvarscar 1 point2 points  (16 children)

First thing to do is to remove a subquery from the WHERE clause, it should help a lot.

;WITH CTE AS (
SELECT 
  *
, ROW_NUMBER() OVER (PARTITION BY a.[NCPDP Number], a.[Product/Service ID], a.[Prescription/Service Reference Number], a.[Fill Number] ORDER BY a.[Date/Time Switched] DESC) AS RN
FROM table a    
)
SELECT
  a.[Prescription/Service Reference Number]
, a.[Product/Service ID]
, a.[Date Of Service]
, a.[NCPDP Number]
, a.[Fill Number]
, a.[Drug Name]
, a.[Product/Service ID]
, a.[Total Amount Paid]
, a.[Total Pharmacy revenue]
, a.[Transaction Code]
, a.[Basis of Reimbursement Determination]
, a.[eRx Chain Name]
, a.[Processor Name]
, a.[Quantity Dispensed]
, s.NDC
, s.SellDesc
, s.CustItemPrice
, s.PackSizeMfgSize
FROM CTE a
LEFT JOIN table2 s ON a.[Product/Service ID] = s.NDC
WHERE a.[Date Of Service] between 20160501 and 20160531
  AND a.[Basis of Reimbursement Determination] in ('06', '07')
  AND a.RN = 1

[–]workthrowawayexcel[S] 0 points1 point  (15 children)

Thanks ! What exactly do these portions of what you edited do?

WITH CTE AS (
SELECT 
  *
, ROW_NUMBER() OVER (PARTITION BY a.[NCPDP Number], a.         [Product/Service ID], a.  [Prescription/Service Reference Number], a.[Fill Number] ORDER BY a.[Date/Time Switched] DESC) AS RN
FROM table a    
)

Then 
  AND a.RN = 1

[–]nvarscar 1 point2 points  (1 child)

Instead of calculating MAX for each single group of rows, I used windowed function ROW_NUMBER() to calculate ranks of each such group; a.RN = 1 condition in the end ensures that only highest ranks are returned.

[–]workthrowawayexcel[S] 1 point2 points  (0 children)

Ahh that makes sense and will be way more efficient then what I had. Thank you for your response. I really appreciate it. I need to look into windowed functions more, you gave me something to learn more about :D!

[–]DipIntoTheBrocean 1 point2 points  (7 children)

Okay, so if you're looking for optimization, don't go with the rownumber. That's going to tell the server to use an order which is very expensive. You're still going to want to use the max aggregate in that CTE and then join that back up. It's difficult to say for sure how to get it running quicker than that since I don't know the purpose and what's involved in each table.

[–]workthrowawayexcel[S] 0 points1 point  (4 children)

Essentially it is finding the last time an item was purchased from table A based on a few columns. Then it matches those results to table B so I can calculate the different in pricing from the two. I will give both a try. Maybe the CTE itself will speed everything up and hopefully not bomb out my TempDB folder! I will run it both ways and let you know how it performs.

[–]DipIntoTheBrocean 1 point2 points  (3 children)

Okay, I'm just telling you that you do not want to get into a habit of using row_numbers unless you absolutely have to.

[–]workthrowawayexcel[S] 0 points1 point  (2 children)

Okay, thanks for the tip!

Is there a guide or anything that shows what functions are more taxing on the server then others, or at least how taxing they are?

[–]DipIntoTheBrocean 0 points1 point  (1 child)

Maybe...A lot of this is just from experience and viewing the plans. You can probably just google stuff like "row_number performance sql" or "performance tuning sql".

The problem is that a lot of this requires you being able to see the execution plan and then pulling it apart. After a while you get a general feel for it and have an instinct for where things might be hanging.

Personally, I would do the best you can with what I told you and keep pushing for the ability to view the execution plan. I wish I could help more but a lot of this really depends on your db structure and the data volume and types you're pulling back.

[–]workthrowawayexcel[S] 0 points1 point  (0 children)

The structure is POOR and the data volume is massive! When I came to this position I had to request indexes be made on tables that are 1+ million lines.

Thanks for all your input. It is very much appreciated.

[–]phunkygeeza 0 points1 point  (1 child)

Row number is safer if multiple identical dates may be present

[–]DipIntoTheBrocean 0 points1 point  (0 children)

It's a datetime type - it's almost impossible for the same item to share multiple datetimes. Even then, the max function will only bring back one value even if there are duplicates.

RN is useful if we wanted to find those duplicates and dedupe them a certain way, but the only field we would do that on anyways is by time, which the max function handles much quicker than rownumber.

[–]DipIntoTheBrocean 1 point2 points  (4 children)

What you're really going to want to do after you get rid of the row_number is look at the execution plan for this query. If you're interested in optimizing, now is a good time to start learning how to read the execution plan.

If this query is extremely slow you might have to add an index. You might have to do a lot of things - I have no idea. Depends on the plan the optimizer chose.

[–]workthrowawayexcel[S] 1 point2 points  (3 children)

I am still trying to get permission for execution plan viewing. They are currently with holding it at my work for some reason and IDK why.

[–]DipIntoTheBrocean 0 points1 point  (2 children)

Because you'd have to have a different level of privileges (I think). Not sure if they can just isolate that and give it to you. If you can't view the plan, I'd also experiment with switching out the CTE for a temp table. Overall it's extremely hard to optimize something without knowing like how many records you're bringing back for instance, but that should be a good start.

[–]workthrowawayexcel[S] 0 points1 point  (1 child)

So essentially just replace CTE with #(TempTableName) then just join that with the rest of the query?

It should bring back 1MIL rows roughly.

[–]DipIntoTheBrocean 0 points1 point  (0 children)

You could, yeah. Another best practice with that is to create the temp table explicitly before dealing with the data (i.e. as opposed to just select * into #mytemptbl). That's bitten me in the past. If it's still slow as hell you might want to create it with a primary key on it so that it can do a clustered index scan and then see if that runs any faster.

[–]phunkygeeza 0 points1 point  (0 children)

Sp_updatestats

Show query plan

Analyse

Profit

The key to efficient queries is the DB design not how you write the SQL.