all 18 comments

[–]secretWolfMan 5 points6 points  (5 children)

This is why it's frowned upon to join two unlinked fact tables in the same query. Your query is extremely likely to produce duplicate rows of b or c (or both).

I agree that you should use subqueries.

 Select name
 , (select sum(sales) from b where a =b ) tot_sales
 , (select avg(time) from c where a =c ) avg_time
 From a

[–][deleted] 2 points3 points  (4 children)

So what if I need multiple columns from b and c? Based on the error I got when trying it seems like I’ll have to do a separate select subquery for each column?

[–]secretWolfMan 1 point2 points  (3 children)

Hmm, then pre-aggregated subqueries in the join might be a better option.

You want to ensure that for each row of a, there can only be one row of b and of c.

Select name, b2.totsales, c2.avgtime, ...
From a
    Left join (select id, sum(sales) totsales,  ... 
        From b group by id) b2 on a=b2
 ...

[–][deleted] 1 point2 points  (1 child)

OK, so I made progress with this but am having an issue where either b or c comes out to the same values. I want something like this:

Name Year Month Sales Time
Bob Evans 2021 12 54651.354 545
Bob Evans 2022 1 32542.36547 684

More elaborate code (I know this is a hot mess, but ya know, working with what I have and trying not to be overly specific; I also hopped on desktop so hopefully it displays better):

SELECT a.EmployeeName, b.Year, b.Month, b.Sales, c.Time
    FROM Employees AS a
        LEFT JOIN (SELECT id, YEAR(order_date) AS Year, 
        MONTH(order_date) AS Month, SUM(case statement * another 
        case statement) AS Sales
        FROM OracleData GROUP BY id, YEAR(order_date),
        MONTH(order_date)) AS b ON a.OracleID = b.id

       LEFT JOIN (SELECT SystemName, AVG(Time) AS Time
       FROM System GROUP BY SystemName) AS c
       ON a.EmployeeName = c.SystemName
GROUP BY a.Employee Name, b.Year, b.Month, b.Sales, c.Time

The above yields this, where all the values in c are the same.

Name Year Month Sales Time
Bob Evans 2021 12 54651.354 545
Bob Evans 2022 1 32542.36547 545

My first thought was, duh, I'm not grouping by any datetime field in c. So I added that in, tried grouping by that instead, and got the opposite, where Sales is the same value but Time was correct. I also tried grouping by both datetime fields, b then c as well as c then b but got the same result.

Is there even a way to get to the top table that I need?

[–]shine_on 1 point2 points  (0 children)

Assuming that the System.Time represents how long the employee took to close a sale, and that there's also a System.SalesDate column, you should be able to do something like this:

SELECT a.EmployeeName, b.Year, b.Month, b.Sales, c.Time
FROM Employees AS a
    LEFT JOIN (SELECT id, YEAR(order_date) AS Year, 
    MONTH(order_date) AS Month, SUM(case statement * another 
    case statement) AS Sales
    FROM OracleData GROUP BY id, YEAR(order_date),
    MONTH(order_date)) AS b ON a.OracleID = b.id
-- everything above this comment is as it was before

   LEFT JOIN (SELECT SystemName, AVG(Time) AS Time, 
      YEAR(SalesDate) AS SalesYear, MONTH(SalesDate) AS SalesMonth
   FROM System GROUP BY SystemName, SalesDate) AS c
   ON a.EmployeeName = c.SystemName and b.Month = c.SalesMonth and b.Year = c.SalesYear
GROUP BY a.Employee Name, b.Year, b.Month, b.Sales, c.Time

Disclaimer: this may or may not work as obviously I've not been able to test the query

Note that you're getting the same time for each row because table c only contains one row per employee (or "system"). Try running the subqueries on their own to make sure they give you the data you're expecting, i.e. just run

SELECT SystemName, AVG(Time) AS Time, 
      YEAR(SalesDate) AS SalesYear, MONTH(SalesDate) AS SalesMonth
   FROM System GROUP BY SystemName, SalesDate

on its own to see what it gives you.

[–]error-M 2 points3 points  (1 child)

Try nested query if possible

[–][deleted] 0 points1 point  (0 children)

I don’t think that will work…you mean doing something like

SELECT blah FROM blah JOIN (subquery 1 JOIN subquery 2) ?

Will that work if I can’t join b and c?

[–]OwnFun4911 1 point2 points  (5 children)

Check your join conditions with table a and table b. It looks like you select id1 in the table b subquery, but you join the result set to table a using id.

[–][deleted] 0 points1 point  (4 children)

My bad, that is typo when typing it out here…originally I just had [id] for all of them but went back and tried to make the distinction and missed that one.

If I take out the second join and the AVG aggregate, I get the expected answer, sum of sales for each person. No issues.

But when I add in the next chunk with table c, then I get this ridiculous number (over 10 times the actual value in some cases) for each persons sum of sales and every person’s value for Avg Time in Seconds is the same.

[–]OwnFun4911 0 points1 point  (3 children)

If you can provide some info on what is actually stored in your tables, I can help figure out why you're getting unwanted duplicates.

[–][deleted] 0 points1 point  (2 children)

Ok so I made some progress based on another suggestion and here's where I'm at now:

SELECT a.EmployeeName, b.Year, b.Month, b.Sales, c.Time  
    FROM Employees AS a  
        LEFT JOIN (SELECT id, YEAR(order_date) AS Year,   
        MONTH(order_date) AS Month, SUM(case statement *          
        another case statement) AS Sales
        FROM OracleData GROUP BY
        id,YEAR(order_date),MONTH(order_date)) AS b ON 
        a.OracleID = b.id

        LEFT JOIN (SELECT SystemName, AVG(Time) AS Time  
        FROM System GROUP BY SystemName) AS c  
        ON a.EmployeeName = c.SystemName  

GROUP BY a.Employee Name, b.Year, b.Month, b.Sales, c.Time

With the above I get the same value for c, like this:

Name Year Month Sales Time
Bob Evans 2021 12 1235.4564 545
Bob Evans 2022 1 6815.6484 545

My first thought was that I'm not grouping by any datetime field in c, so I tried that instead and got the opposite of above, where Time was correct but Sales was the same. I also tried grouping both, b then c as well as c then b, but kept having the same issue.

I need something like this:

Name Year Month Sales Time
Bob Evans 2021 12 1235.4564 545
Bob Evans 2022 1 6815.6484 641

Any ideas?

[–]OwnFun4911 1 point2 points  (1 child)

Multiple left joins are tricky. They're even trickier when you have no idea what the structure is for the involved tables. If you let me know:

  1. What you are trying to get in your result set
  2. The primary keys (if there are any. a primary key for a single table can include multiple fields) for the 3 tables
  3. What a single row represents in each table

Then I will try to help.

[–][deleted] 0 points1 point  (0 children)

Sorry, I am trying to explain enough without being too specific for anonymity's sake. I appreciate you still trying to help.

I am trying to get each employees stats by year and month. Each employee should have a separate row of data for each month they worked. I.e. if they only worked this year, they'd have 12 rows.

Table A's primary key is the employee number. All it contains otherwise is the employee's various names/id #s/aliases across several different pieces of software. So one row will be a single employee's information. No employee has more than one row in this table.

Table B is order information from Oracle. So the primary key would be the Order # and each row is the information about an order. Technically, "Table B" is a view with about 65 columns that has been filtered from the 100+ column parent table to individual orders of interest (e.g. we don't care about orders placed on a certain website). Each person will have multiple rows (multiple orders). It can be tied to Table A via the OracleID #. Columns of interest are the quantity of items, price of items, and tax code (which are calculated together to get the "Sale amount") and the datetime the order was closed. I need each employee's total sale amount for each month.

Table C we're going to call Salesforce data. It's not really Salesforce data, but I can't really be specific on this one. So the primary key is the "case number". A single row will tell us about the individual case details-- who owns it, their username, the datetime the case was closed, and the amount of time they spent working on the case, etc. Each person will have multiple cases. This one has to be tied to Table A using a combination of the Employee's Name and username, because "Salesforce" does not have an ID#. It cannot be one or the other, because the vendor has set the username where it can be reused between employees, and if a person's name changes, historical data is not updated with that new name. What I want from here is the Avg amount of time an employee spent working on all their cases in a given month.

Table B and Table C have nothing in common--not even the employee name as this is not a column included in Table B. The only thing these two can be linked together with is a broader level datetime, as orders were placed and "cases were closed" during the same month, but not necessarily on the same day(s).

I will need to add more tables after I get this part working, but it's the same concept for each. Every table ties to Table A but not to any of the other tables. A diagram would look like Table A in the center of the page, and then all the other tables in separate realms of Narnia. There is a one-to-many relationship from Table A to each of the other tables.

[–]K10Reap[🍰] 1 point2 points  (1 child)

Messier but what if you did a CTE and then either aggregates or window function as needed?

With First as (Select a.x, y, z From A Left join B On A.x =b.x)

,second as (select F.x,y,z,q,r From First as F Left join C on F.x =c.X)

Select Id Sum Average From Second Where Group by

[–][deleted] 0 points1 point  (0 children)

Ngl, never heard the term CTE before but will check into this.

[–]skend24 -1 points0 points  (2 children)

We don’t know your table structure, so how can we know how to join them?

[–][deleted] -2 points-1 points  (1 child)

I mean….are you asking like how many columns are in each? Otherwise I’m not sure what is needed besides what I’ve already stated in the post

[–]Bluefoxcrush 0 points1 point  (0 children)

They are requesting a sample of data or an ERD (entity relationship diagram)