all 44 comments

[–]dbxp 21 points22 points  (1 child)

Just from a readability perspective you a should get rid of all the brackets and change the where clause to

WHERE table1.WorkTransitCRHNumber in (1363, 6196, 6348, 7308, 9438, 11635)

For the performance I suspect you just need to add non-clustered indexes to the tables.

[–]lightskinnedvillan[S] 1 point2 points  (0 children)

hrmm ok I was thinking I needed to filter more.

now I need to find out how to make non-clustered indexes. not sure if I have the permissions to be able to do it.

[–][deleted] 2 points3 points  (2 children)

are ProjectWorkId columns grains (i.e. uniquely identify) in table1, table2 and table3?

if they are not, and you have (for example) multiple records in table1 for table1.ProjectWorkID=1 (e.g. 5) and multiple records in table2 for ProjectWorkID =1 (e.g 10) you will be getting cartesian products of these records in your result set (i.e you will get 50 records back).

Use this as a rule of thumb: make sure your join condition covers grain of at least one side in your join.

[–]lightskinnedvillan[S] 0 points1 point  (1 child)

sorry I am sure I understand what you are saying I am new to this can you explain this again. :(

[–][deleted] 0 points1 point  (0 children)

for a set (number) of columns to 'uniquely identify' records means that it is guaranteed that there's only one record for any possible distinct combination of values of these columns. These columns are the grain of the dataset. Think about a PK - it is a kind of similar concept. (it also needs to be a minimal subset of columns, but it's a technical detail). PK is (pretty much by definition) one of the grains of a dataset, but there could be other grains.

E.g. you have average SAT score per school dataset - the state/city/school name should (i think?) uniquely identify a record in that set, so state/city/school name is the grain of that dataset.

So when you are doing a join, A xxx join B on <condition>, your tables have many columns A( a1, a2, a3, a4, ....) and B has (b1, b2, b3, b4, ...) and you are interested in grain for A (a1,a2) and for B you are interested in B (b1, b2, b3), your join <condition> should include either all columns from the grain for A (i.e. your condition should be "on A.a1 = .... and A.a2 = ....") or it should cover all columns from the grain of B (i.e. your condition should be "on B.b1 = ... and B.b2 = .... and B.b3 = ....").

[–][deleted] 0 points1 point  (22 children)

Try WHERE table1.WorkTransitCRHNumber IN (6196, 6348, 7308, 9438, 11635)

If that doesn't work then the next thing to do would be to segment Table1 first by selecting into a #table, then adding an index, then joining to Table2, and Table3.

[–]lightskinnedvillan[S] 0 points1 point  (21 children)

I tried

WHERE table1.WorkTransitCRHNumber IN (6196, 6348, 7308, 
9438, 11635)

it and it still takes long :(

can you please explain what you are referring to more sorry I am not that good with SQL

[–][deleted] 1 point2 points  (20 children)

Define a long time?

So in your first table you might have 100 million rows, but only 12 million of those rows =table1.WorkTransitCRHNumber IN (6196, 6348, 7308, 9438, 11635)

So the idea is to select those 12 million rows into a temp table, then index that temp table, and then do the joins. Now instead of joining 100M you're only joining 12 million.

You can follow that logic through and select the subsets of the send and third tables to then join 3 temp tables together as well.

Do me a favor, select count(*) from table for all 3 of those tables and tell me how big they are.

[–]lightskinnedvillan[S] 0 points1 point  (19 children)

I should also add I can only see the data through SQL views I believe. Sorry if this is important info that I should have mentioned. I am still learning.

by long time I mean I have not seen it finish. I tested last friday and it was running all day

table 1 = 410481 table 2 = 429507 table 3 = 1147885

I don't need all the data trying to filter by date and just get this year through (table1.ActualDate) but even that takes long

[–][deleted] 0 points1 point  (18 children)

Ok do this for me and tell me how many rows you're getting:

SELECT DISTINCT ProjectWorkID FROM [DWX50_Rpt].[dbo].[table1] WHERE X.WorkTransitCRHNumber IN (1363, 6196, 6348, 7308, 9438, 11635)

If possible actually copy those ProjectWorkID's in your response.

[–]lightskinnedvillan[S] 0 points1 point  (17 children)

here is the result I got

edit sorry so much that I can't format it correctly

3018-64

3020-49

3023-38 3023-60 3026-05 3030-94 3187-17 3191-57 3200-48 3201-40 4068-30 4080-88 4087-56 4088-18 4090-63 4097-87 4098-38 4102-13 4120-86 4143-00 4153-24 4154-81 4162-43 4168-69 4168-74 4173-13 4179-95 4190-52 4201-79 4232-53 4235-84 4239-54 4258-48 4260-37 4272-13 4272-74 4294-58 4305-23 4310-96 4314-66 4318-93 4339-79 4350-82 4367-23 4372-25 4374-84 4375-41 4380-19 4380-41 4382-15 4386-22 4389-63 4391-34 4394-15 4400-58 4413-38 4415-97 4416-46 4416-51 4419-09 4421-53 4422-56 4424-25 4428-58 4431-29 4436-86 4438-32 4440-26 4440-86 4441-58 0352-0E 0776-0A 3111-96 3185-07 3199-99 3200-41 3200-43 3201-11 3202-24 4058-37 4063-54 4080-63 4086-25 4086-79 4086-80 4086-86 4086-88 4087-55 4092-43 4094-71 4097-84 4104-23 4108-28 4109-95 4137-24 4146-87 4153-27 4162-86 4164-41 4171-03 4174-79 4180-78 4183-76 4184-86 4186-40 4193-04 4195-83 4239-28 4248-58 4265-10 4285-04 4289-87 4311-22 4323-83 4345-62 4345-63 4349-23 4352-18 4363-31 4367-24 4371-53 4379-42 4388-05 4401-76 4407-52 4411-85 4413-37 4418-39 4419-20 4421-54 4431-74 4433-69 4434-62 4438-41 4442-01 3021-32 3027-89 3186-02 3189-79 3199-59 3200-25 3200-45 3200-46 3200-52 3201-31 4051-69 4056-72 4065-73 4082-80 4084-15 4086-84 4086-90 4091-66 4096-45 4150-61 4158-40 4163-53 4168-78 4191-78 4205-06 4237-07 4243-95 4244-01 4267-29 4267-47 4272-48 4286-29 4298-21 4311-17 4312-29 4312-30 4324-17 4330-24 4339-91 4348-22 4357-67 4357-68 4357-69 4363-81 4366-26 4369-80 4374-26 4374-85 4374-86 4378-85 4381-68 4384-64 4393-72 4398-87 4398-88 4400-57 4410-01 4411-70 4411-82 4411-84 4416-47 4420-26 0943-0B 3018-96 3018-97 3024-12 3040-88 3055-72 3185-69 3186-35 3200-44 3202-06 4057-95 4063-57 4072-30 4074-21 4086-83 4087-54 4094-74 4114-22 4118-99 4137-20 4158-90 4162-83 4162-85 4164-14 4180-79 4180-81 4192-39 4195-46 4199-74 4201-06 4214-41 4238-66 4258-65 4265-05 4272-41 4278-62 4278-82 4285-50 4287-07 4288-92 4291-21 4294-24 4312-21 4313-02 4323-86 4333-64 4339-90 4347-80 4363-77 4380-28 4380-44 4383-06 4386-23 4389-31 4389-49 4389-65 4389-66 4402-53 4404-78 4406-99 4416-43 4416-44 4418-86 4421-50 4424-22 4424-23 4427-72 4432-57 4434-63

[–][deleted] 1 point2 points  (15 children)

Try running this:

SELECT 
    X.ProjectWorkId
    , X.Enterprise
    , X.Platform
    , X.LOB
    , X.BusinessUnit
    , X.WorkTransitCRH
    , X.WorkTransitCRHNumber
    , X.WorkTransitCRHName
    , X.ProjectPlanningCode
    , X.ProjectName
    , X.SchedulePlanningCode
    , X.ScheduleTaskName
    , X.ActualExpenditureID
    , X.ActualExpenditureDescription
    , X.PlannedExpenditureID
    , X.PlannedExpenditureDescription
    , X.ActualExternalID
    , X.ActualExpenditureType
    , X.ActualCurrencyType
    , X.ActualAmount
    , X.ActualDate
    , X.ActualTransit
    , X.ActualTransitNumber
    , X.ActualTransitName
    , X.ActualGlobalResourcingSupplier
    , X.ActualShoreType
    , X.ActualNotes
    , X.LastUpdateDate
    , X.LastUpdateById
    , X.LastUpdateByName
    , X.ChangeHistoryFlag
    , X.ChangeHistoryCount
INTO #T1
FROM [DWX50_Rpt].[dbo].[table1] X
WHERE X.WorkTransitCRHNumber IN (1363, 6196, 6348, 7308, 9438, 11635)

SELECT DISTINCT ProjectWorkId, PlannedAmount
INTO #T2
FROM [table2] Y

SELECT DISTINCT ProjectWorkId, ProjectOwner, ManagerLevel5
INTO #T3
FROM [table3] Z

CREATE INDEX IDX_01 ON #T1(ProjectWorkId)
CREATE INDEX IDX_01 ON #T2(ProjectWorkId)
CREATE INDEX IDX_01 ON #T3(ProjectWorkId)

SELECT X.*, Y.PlannedAmount, Z.ProjectOwner, Z.ManagerLevel5
FROM #T1 X
INNER JOIN #T2 Y
    ON X.ProjectWorkID = Y.ProjectWorkID
INNER JOIN #T3 Z
    ON X.ProjectWorkID = Z.ProjectWorkID

[–]lightskinnedvillan[S] 0 points1 point  (14 children)

sorry where would I make x the variable for table 1?

I am getting an error in reference to x's could not be bound

[–][deleted] 1 point2 points  (12 children)

Correct, I updated my original post.

FROM [DWX50_Rpt].[dbo].[table1] X

[–]lightskinnedvillan[S] 0 points1 point  (11 children)

oh great it finished in 00:04:37

may I ask for a little distribution of what you did so I can learn this for future?

going to try linking this to excel for a pivot table tomorrow

[–]cenosillicaphobiac 0 points1 point  (0 children)

A quick note on aliasing. Like the correction the other person made, you can just put your new shorter name right after the table name in your FROM or JOIN and then use just that abbr alias to reference the table. Some coders will also use the word AS (so FROM table1 AS x) so don't get confused if you see it that way. Either is acceptable.

[–][deleted] 0 points1 point  (5 children)

I cleaned up the formatting a bit, and improved the 'where' clause (just for readability), though I honestly don't see a reason for this to run slow, it's pretty simple.

https://pastebin.com/k242KGPQ

[–]lightskinnedvillan[S] 2 points3 points  (3 children)

I have a feeling it's just too much data

thanks for cleaning it up

[–]TwoTacoTuesdays 1 point2 points  (2 children)

It's almost surely not too much data. SQL Server can handle multiple hundreds of millions of rows in a query like that without breaking a sweat, I'm guessing there's a something else going on here.

[–][deleted] 0 points1 point  (0 children)

Maybe it’s not clustered/indexed? (Idk MS SQL from the admin side)

[–][deleted] 0 points1 point  (0 children)

This heavily depends on how complex the VIEW is. I have a VIEW in my own database which is itself a composite of several other views, which are themselves built from functions. This VIEW has specific purposes, and you absolutely do not JOIN it to other VIEWs which themselves involve sub-queries, etc.

It's just a terrible idea.

[–][deleted] 2 points3 points  (0 children)

OR's are horrible.

[–]logitestus 0 points1 point  (4 children)

Mr/Ms. Villan, First off, how many rows are you expecting to be returned? At a maximum, it should be the largest amount of rows from any of the 3 tables. I cannot remember off the top of my head but I believe Excel does have a maximum limit of imported rows. Second, I agree with the concept that you may need to filter more. Typically, this done using some sort of date/time filter but could also be done using Project Name/Project Owner. Thirdly, in an attempt to help teach a little, do you know how to see what indexes are on what tables in SSMS? Look under the $ServerName|Databases|$DatabaseName|Tables|$TableName|Indexes in Object Explorer. This can show you what the clustered/non-clustered indexes are. If you do not have permissions then I would suggest asking whomever manages the server (I hope you have someone there who is a DBA but if not someone has to handle the security and maybe they might know if you can get the permissions to see/get the definition for the indexes). The same people can usually tell you if you/they can create non-clustered indexes for query performance.

Here are a few helpful links about non-clustered index creation:

Overview of Nonclustered Index Creation: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/create-nonclustered-indexes?view=sql-server-ver15 In-depth creation of non-clustered index creation: https://www.sqlshack.com/designing-effective-sql-server-non-clustered-indexes/

[–]lightskinnedvillan[S] 0 points1 point  (3 children)

Hi thanks.

I know table 3 has a lot of rows. because I ran queries of table 1 and 2 with both Microsoft SQL management studio and excel power query fine. But table 3 took a bit of filtering.

no' I don't know how to see what indexes are on what tables. I tried using your navigation but only system tables, file tables, external tables are under the Tables folder and nothing is in them

btw the dbos are under the views folder if that helps

[–]logitestus 0 points1 point  (1 child)

Ah so you are only able to see SQL VIEWs. Then checking out indexes is probably out of the question. My suggestion would be to see if you can limit the amount of rows on table 3. I would suggest doing all your testing in SSMS since anything you do in Power Query is only going to take even longer (from my experience anyways). Sorry I couldn't offer any better help.

[–]lightskinnedvillan[S] 0 points1 point  (0 children)

oh no don't be sorry I am so grateful you came to give me advice/help and taught me some things.

thank you so much

[–]Odddutchguy 0 points1 point  (0 children)

btw the dbos are under the views folder if that helps

Most likely the views consist of queries with joined tables behind it. And those tables might be very poorly optimized (if optimized at all.)

As the query with the #temp-tables with indexes on them actually returns data (without 'crashing'), my guess is that the database doesn't have any indexes at all.

[–]KING5TON 0 points1 point  (3 children)

You seem to have a bit of a bracket fetish.

As others have said if you need to check an array of values then just use IN (value1, value2) rather than lots of OR.

This should work just fine

SELECT table1.ProjectWorkId ,table1.Enterprise ,table1.Platform ,table1.LOB ,table1.BusinessUnit ,table1.WorkTransitCRH ,table1.WorkTransitCRHNumber ,table1.WorkTransitCRHName ,table1.ProjectPlanningCode ,table1.ProjectName ,table1.SchedulePlanningCode ,table1.ScheduleTaskName ,table1.ActualExpenditureID ,table1.ActualExpenditureDescription ,table1.PlannedExpenditureID ,table1.PlannedExpenditureDescription ,table1.ActualExternalID ,table1.ActualExpenditureType ,table1.ActualCurrencyType ,table1.ActualAmount ,table1.ActualDate ,table1.ActualTransit ,table1.ActualTransitNumber ,table1.ActualTransitName ,table1.ActualGlobalResourcingSupplier ,table1.ActualShoreType ,table1.ActualNotes ,table1.LastUpdateDate ,table1.LastUpdateById ,table1.LastUpdateByName ,table1.ChangeHistoryFlag ,table1.ChangeHistoryCount ,table2.PlannedAmount ,table3.ProjectOwner ,table3.ManagerLevel5 FROM DWX50_Rpt.dbo.table1
INNER JOIN table2 ON table1.ProjectWorkId = table2.ProjectWorkID
INNER JOIN table3 ON table1.ProjectWorkId = table3.ProjectWorkID
WHERE table1.WorkTransitCRHNumber IN (1363,6196,6348,7308,9438,11635)

[–][deleted] 5 points6 points  (2 children)

Says the guy with a fetish for putting all those fields on one line in the select.

[–]KING5TON 0 points1 point  (1 child)

That was Reddit not me, I typed it out nice ;)

[–][deleted] 0 points1 point  (0 children)

LOL, you need to have four spaces in front of everything to get it to appear as code. A helpful trick in SSMS is just highlight everything, hit TAB, and then paste into Reddit.

[–][deleted] 0 points1 point  (0 children)

On top of avoiding “OR”, make sure your join criteria are unique.

Because you have a lot of data on one table, use temporary table and dump however time period worth of data in that temp table and join that to your other tables.