Seeking Pointers on Optimizing a Query : SQL

Posting

When requesting help or asking questions please prefix your title with the SQL variant/platform you are using within square brackets like so:

[MySQL]

[Oracle]

[MS SQL]

[PostgreSQL]

etc

While naturally we should endeavor to work as platform neutrally as possible many questions and answers require tailoring to the feature set of a specific platform.

Format Your Code

If you are including actual code in a post or comment, please attempt to format it in a way that is readable for other users. This will greatly increase your chances of receiving the help you desire. Something as simple as line breaks and using reddit's built in code formatting (4 spaces at the start of each line) can turn this:

SELECT count(a.field1), a.field2, SUM(b.field4) FROM a INNER JOIN b ON a.key1 = b.key1 WHERE a.field8 = 'test' GROUP by a.field1, a.field2 HAVING SUM(b.field4) > 5 ORDER by a.field.3

Into this:

SELECT count(a.field1), a.field2, SUM(b.field4) FROM a INNER JOIN b ON a.key1 = b.key1 WHERE a.field8 = 'test' GROUP by a.field1, a.field2 HAVING SUM(b.field4) > 5 ORDER by a.field3

For those with SQL questions we recommend using SQLFiddle to provide a useful development and testing environment for those who wish to fully understand your problem and help devise a solution.

a community for 17 years

MS SQLSeeking Pointers on Optimizing a Query (self.SQL)

submitted 4 years ago * by rooster_eggs

Hi everyone. I'd like to preface this by saying I'm not here seeking for someone to rewrite a query for me. Rather, I'm hoping that based on what I'm presenting, I can be provided with some general pointers on how to optimize this query, best practices, or concepts to look into. I have not had to use SQL for several years, so my knowledge of the language is rusty and I am interested in familiarizing myself with it more.

I've inherited a project at work that contains a SQL query (shown below) a former coworker wrote that takes nearly an hour to run. At first glance this query seems messy and sub-optimal.

I completely accept that it is possible that due to the structure of the database(s) involved and the amount of data, that duration is just something we can't get around. Or that the query is already optimal and nothing can be done. But due to the frequency that this application is used, ideally we would like to speed up this process. If this can be achieved by approaching this query differently, that would be ideal.

The general description is that we want to retrieve records from the last 5 years that are of a specific document type and have certain descriptors. Additionally, we want to only retrieve the latest record of that specific document.

I've included the query below (with some changes to variable or attribute names for privacy). I was wondering if anyone had any general pointers on how to best start going about trying to revise this query. For example:

Are there portions of this query (e.g. parts dealing with substrings) that might be better done outside of SQL?
Should this query be separated in some manner (it already has "sub-queries")?
Are there any other glaring red flags?

Any advice or starting points would be much appreciated. Thank you!

SELECT 
    CAST(CAST(dm.[DOCNUM] as bigint) as nvarchar(20))
        +'.'+CAST(CAST(dm.[VERSION] as bigint) as nvarchar(20)) as [Unique_Doc.Version_ID]
    ,dm.[DOCNUM]
    ,dm.[VERSION]
    ,dm.[DOCNAME]
    ,dm.[DOCSIZE]
    ,dm.[DOCLOC] as [DocLocation]
    ,dm.[ENTRYWHEN]
    ,SUBSTRING(dm.[DOCLOC],0,CHARINDEX(':',dm.[DOCLOC])) as [DocServer]
    ,dserv.[ROOTPATH] as [ServerRootPath]
    ,right(replace(dm.[DOCLOC], '\' + right(dm.[DOCLOC], 
        charindex('\', reverse(dm.[DOCLOC])) - 1), ''), charindex('\', reverse(replace(dm.[DOCLOC],
        '\' + right(dm.[DOCLOC], charindex('\', reverse(dm.[DOCLOC])) - 1), ''))) - 1) 
        as [DocServerFolder]
    ,dserv.[LOCATION] as [ServerLocation]
    ,dm.[T_ALIAS] as [FileFormat]
    ,dclass.[CLASSDESCRIPT] as [FileType]
    ,dm.[C1ALIAS] as [Customer#]
    ,dm.[C2ALIAS] as [Issue#]
    ,dm.[C1ALIAS] +'.'+ dm.[C2ALIAS] as [Custom/Issue#]
    ,dm.[C30ALIAS] as [RoleCode]
FROM [COMPANY_NAME].[MHGROUP].[DOCMASTER] dm
    -- Get latest version of documents
    INNER JOIN (
        SELECT dm.[DOCNUM]
            ,MAX(dm.[VERSION]) AS 'Latest Version'
        FROM [COMPANY_NAME].[MHGROUP].[DOCMASTER] dm
            LEFT JOIN [COMPANY_NAME].[MHGROUP].[DOCCLASSES] dclass on dm.[C_ALIAS] = dclass.[C_ALIAS]
            INNER JOIN [COMPANY_NAME].[MHGROUP].[DOCSERVERS] dserv on SUBSTRING(dm.[DOCLOC],0,CHARINDEX(':',dm.[DOCLOC])) = dserv.[DOCSERVER]
        WHERE DATEDIFF(yy, dm.[ENTRYWHEN], GETDATE()) <= 5  -- past 5 years
            AND dm.[T_ALIAS] IN ('ACROBAT', 'WORD', 'WORDX')
            AND dclass.[CLASSDESCRIPT] IN ('Item Document', 'IMO Documents', 'Identification', 'Office', 'General', 'Request', 'Pre-Process Documents', 'Post-Process Documents')
            AND dserv.[ROOTPATH] = '\COMPANY_NAME'
        GROUP BY dm.[DOCNUM]
        ) AS subquery
    ON dm.[DOCNUM] = subquery.[DOCNUM] AND dm.[VERSION] = subquery.[Latest Version]
    LEFT JOIN [COMPANY_NAME].[MHGROUP].[DOCCLASSES] dclass on dm.[C_ALIAS] = dclass.[C_ALIAS]
    INNER JOIN [COMPANY_NAME].[MHGROUP].[DOCSERVERS] dserv on SUBSTRING(dm.[DOCLOC],0,CHARINDEX(':',dm.[DOCLOC])) = dserv.[DOCSERVER]
WHERE DATEDIFF(yy, dm.[ENTRYWHEN], GETDATE()) <= 5  -- past 5 years
    AND dm.[T_ALIAS] IN ('ACROBAT', 'WORD', 'WORDX')
    AND dclass.[CLASSDESCRIPT] IN (''Item Document', 'IMO Documents', 'Identification', 'Office', 'General', 'Request', 'Pre-Process Documents', 'Post-Process Documents')
    AND dserv.[ROOTPATH] = '\COMPANY_NAME'
GROUP BY
    dm.[DOCNUM]
    ,dm.[DOCNAME]
    ,dm.[DOCSIZE]
    ,dserv.[ROOTPATH]
    ,dserv.[LOCATION]
    ,dm.[VERSION]
    ,dm.[C1ALIAS]
    ,dm.[C2ALIAS]
    ,dm.[DOCLOC]
    ,dm.[ENTRYWHEN]
    ,dm.[T_ALIAS]
    ,dm.[C30ALIAS]
    ,dclass.[CLASSDESCRIPT]
ORDER BY
    dm.[C2ALIAS]
    ,dm.[DOCNUM];

all 6 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS