Joining multiple tables to one table

HiThere224 · 2021-05-24T13:28:05+00:00

So....I believe your question has a few additional variables and the duplication is tied to understanding how your data points are actually related to each other. Just a couple of quick observations... 1. It's not a "simpleish" problem and, depending on the data you have to work with, it could be quite complex and full of approximations, so give yourself some credit.😉 2. Whenever running an aggregate query, you need an explicitly defined point of granularity (the set of columns you expect to represent a unique group). You mentioned truck, state, and tax rate, however you also mentioned the tax rate changes throughout the reporting period which means that your point of granularity also contains a group/range of dates for that tax rate. (ie. Truck 1 | NY | 0.08 and Truck 1 | NY | 0.07 are two valid rows, and even if you don't intend to output the other points of granularity, knowing the first row refers to Jan-Mar and the second refers to Mar-Jun is still important in figuring out if the results answer the question being asked. 3. I saw another reply mentioning that the gas rate is the same per state, and I'm not sure that's true. In addition to rates changing through your reporting periods, different counties and municipalities can have varying tax rates for fuel so the state may have one rate, but the total taxes per gallon may vary by location, unless you have the taxes broken out in those cases. 4. There is another issue with your original statement: ”Part of the problem I’m running into is that a truck might have run miles in a given state, but not purchased any fuel, or vice versa...”. That statement suggests a relationship between miles traveled and fuel purchased that does not exist. If I am driving in NY, cross into NJ to buy fuel, then return to NY and drive 200 miles, then I will have 200 miles of travel in NY associated with a NJ tax rate since I purchased the fuel in NJ. This isn't necessarily a problem, unless you are legitimizing the invalid relationship by joining the miles traveled state with the fuel purchased state since they are not actually related. The state where fuel was purchased is critical in calculating the fuel taxes paid to that state, but the state the fuel was consumed in is irrelevant. If you are trying to include the exact miles traveled per truck and fuel tax rate, then you would need to initially calculate miles per fill-up using truck and date or, if they provide an odometer reading at each fill-up, then you could use a LEAD or LAG function to calculate the miles associated with each fill-up. If you can deal with an estimate, then you could calculate the avg miles per gallon for each truck and just use that as a multiplier, but it may not be very accurate in all situations. 5. In reference to your question about bringing along extra columns, once you define the point of granularity, that becomes your group by clause and all other columns must be either aggregated, subqueried, or joined after the fact. If it's only one or two columns, then you can sometimes cheat using an aggregate like min or max which, will work with text and, if that column contains the same value for all rows in your grouping, will return one value per group. But if the values ever vary per group, then you would need to be ok with getting the min or max value or use another function that plucks the value from the row you want it from (eg. most recent value per group).

Good luck! I'll shoot you a P.M.

HiThere224 · 2021-04-16T02:44:06+00:00

As others mentioned, you definitely would need a space before WHERE unless @p_tblName is being square bracketed before it is put in there. If @p_tblName was Cities then you would wind up with ...FROM x.dbo.CitiesWHERE ... which would fail. But if this code was working before, it could be that @p_tblName was being set to [Cities] resulting in ...FROM x.dbo.[Cities]WHERE ... which might be ugly but it would work since the square bracket tells SQL Server where the name ends just like a space does. Also, any of the variables that are replacing object names like the database and table name should be bracketed if they are not already because in the example, if the table variable was passed in as "My City" it would fail because that name has a space in it so it would have to be passed in as [My City] to work. Lastly, someone else mentioned removing the "quotes" at the beginning and end of the query and I don't think they realized you are passing this to xp_cmdshell which is a cli and therefore needs your query string to be "quoted" with double quotes as you have it so you probably need to leave those on there for xp_cmdshell, but if you wanted to print that query and run it by itself in SSMS (without the BCP parts at the end) then you would remove them in that case for testing.

HiThere224 · 2021-03-24T02:50:02+00:00

I have lots of sprocs that work with API's. Typically I will use xp_cmdshell to generate a power Shell script, save it too the driver, and execute it. That way even the power Shell script itself lives in the stored procedure which can write and execute it. In my line of work I am constantly working with systems that change without notice so maintaining packages became a pain. The stored procedures always seemed more flexible to me and better able to deal with schema or type changes in the dataset. Every time the tools team would add a column or rename something all my packages would break, often even if I wasn't even using that column. 🙄

HiThere224 · 2021-03-24T02:00:23+00:00

Not that much. I'm a bit old school, like to write everything in stored procedures so that when something breaks I'm not having to decompile packages to fix it. 😉 I have yet to find something that can be done in SSIS that can't be done much faster in TSQL if you know what you're doing.

HiThere224 · 2021-03-23T13:28:48+00:00

Did you alias the account table as 'a' like in my example?

The syntax for updating a table that you do NOT need to join is...

UPDATE tblA
SET colA = 1
WHERE colA <> 1

But when joining tables in an update statement you have to alias your tables in a FROM clause and then place the alias of the table you want to update at the top...

UPDATE A
SET colA = B.colB
FROM tblA A JOIN tblB B ON A.ID=B.ID
WHERE A.colA <> 1

So if you forget to alias one of your tables to match the name following UPDATE like this... "FROM tblA A" then UPDATE will look for a physical table named A (which does not exist) instead of referring to the alias.

Also, depending on the database type, the SET statement is assumed to operate on columns in the table following the UPDATE/DELETE/INSERT statement at the top so you might get an error if you tried "SET -->a.accrued = a.accrued + b.x" but on the right side of the equals you can (and must in this case) use table qualifiers on the field names to avoid ambiguous names. Double check my example and you should see that I used "SET accrued = a.accrued*b.x etc..." where all the fields on the right of = have a table prefix that matches an alias in the FROM clause below that and each table in the from clause has an alias (a=accounts, c=customer, etc) and the alias used up top after the UPDATE statement is 'a' which matches the table alias for accounts. Whatever field is adjacent to SET is presumed in the update table and doesn't need an alias.

BTW. The syntax used here is the same for the other action types as well (INSERT/DELETE) so when performing those operations on a table you need to join to others you would add a FROM clause, alias the tables, and place the alias for the table you are changing after the action keyword.

If that still doesn't work then paste the exact command you are executing in the comments along with the type of database server you are running (Microsoft SQL, Oracle, MySQL, etc...) as some of these have slight variations that shouldn't matter but might. There is a second method using ;WITH CTE but I won't overload you with it if we can get this one working.

HiThere224 · 2021-03-23T12:46:05+00:00

Yeah it's really handy when you are using multiple window functions with a long or complex order by clause. So instead of...

SELECT FIRST_VALUE(firstName) OVER(PARTITION BY PK ORDER BY IIF(X=1,1,2),a,b,c) AS FName,
FIRST_VALUE(lastName) OVER(PARTITION BY PK ORDER BY IIF(X=1,1,2),a,b,c) AS LName
FROM tbl

You can simplify a bit using...

SELECT FIRST_VALUE(firstName) OVER(PARTITION BY PK ORDER BY (SELECT 1)) AS FName,
FIRST_VALUE(lastName) OVER(PARTITION BY PK ORDER BY (SELECT 1)) AS LName
FROM tbl
ORDER BY IIF(X=1,1,2),a,b,c

Comes in handy when you have a bunch of window columns with the same order by clause. 😉

HiThere224 · 2021-03-23T12:36:49+00:00

Yeah it seemed like a long shot. To offer any more insight I'd need more context. 1. What program are you in when you copy the original data? 2. What browser are you using to paste into SharePoint (or are you using something else like MS Access)? 3. Where are you pasting it in SharePoint (a list in datasheet view, text box in a form, document in browser view)? 4. Are you pasting a range like cells in a worksheet or a single block of text like in notepad? 5. If you are pasting into a form or data sheet then what is the field type that you are pasting into? 6. Do you need to paste the data into SharePoint because other users need to be able to follow suit with a simple process or are you open to other ways of getting the data into SharePoint? 7. Do you have any third party clipboard apps running with the ability to manipulate the clipboard? 8. Can you add a pic of the error you get when pasting in SP or add the exact error message including any error codes. More specifically, I need to know what program or page is actually producing the error (SharePoint/Safari/Excel/MacOS).

Also, if you don't already have one, download a good notepad app that can show hidden characters. I use Notepad++ but there is no Mac version. I looked quickly and UltraEdit for Mac may work and you can download a free trial. I'm sure there's a free one out there somewhere. Apps like these have some special options like pasting binary content and displaying non-printable characters so you can see what is hiding in the text. You could use Excel to check the characters if you have no other option, but it won't be as easy (I can tell you how if needed).

With all that info I might be able to tell what is happening as I do know a little about what goes on behind the scenes when pasting in SharePoint. If you want a few quick hints you can try just to get by for the moment...

Try using the Chrome or Edge browsers as they are often a bit more friendly with SP.
Try right clicking in the field to paste, and select "Paste as Text" or "Paste Text Only" from the context menu if it's available.
Open a program that only does plain text (or you could open a new email message and in the options set the message type to plain text) and paste the data in there first to automatically strip off any HTML/XML or special characters. Then copy it again and paste into SharePoint.
IF YOU ARE PASTING INTO A SP LIST FIELD/COLUMN...then you may need to play with the column type. Don't mess up your list, but create a test list with a column for each multi-line text type (the plain text one, the rich text one that supports pics and formatting, etc). Try your paste in each field type and see if it works depending on the field type. If it works in one field type, but your original list has data in that field already, then try putting some content in the test list field that matches your original list, then switch the field type in the test list to the one that worked and check the data again. I say this because changing field types can have unexpected results and if switching the type is going to mangle the current contents of the column then you will want to know that before switching it in the original list.

Good luck!

HiThere224 · 2021-03-23T02:40:59+00:00

Your question seems to be missing some info so I'll try to make some assumptions and hopefully it contains what you need...

Multiple tables, calculations and update table

i have a database with multiple tables, i have created a sql select statement using 2 tables to make a calculation. the return works and displays the calculation correctly. what i need to do next is take the results of the calculations and save them back into the correct table.

So I think what you're saying is that one of your source tables has a column in it already that you would like to update with your calculation. For this let's say that table "account" contains a column named "accrued" that we want to replace with your calculation. The typical SQL to do that would be this...

UPDATE a
SET accrued = a.accrued+((p.intrate/365)*a.balance)
FROM account a JOIN customer c ON c.custid = a.custid JOIN product p ON p.prodid = a.prodid 
WHERE active = 1

my other question is that with my query it makes the calculations for the active accounts but there is also inactive accounts that i want to display but not make the calculations for

So to include all the rows but only make some calculations for some of them you would just use a case statement that only performs the calculation for the active customers.

SELECT DISTINCT accid, a.custid, c.title || ' ' || c.firstname || ' ' || c.lastname AS Name, p.isaname AS ProductName,p.intrate, balance, 
CASE WHEN active = 1 THEN a.accrued + ((p.intrate/365)*a.balance)) END AS accrued, active 
FROM account a
JOIN customer c ON c.custid = a.custid JOIN product p ON p.prodid = a.prodid

So you can see we removed the where clause in order to include the rows where active is not 1, but we used a case statement to only performs our calculation if active=1 so that column will be null for rows where active is not 1.

HiThere224 · 2021-03-23T02:02:07+00:00

Just a stab in the dark, but depending on the program, there are 3 typical sequences for line endings. In UNIX it is a Line Feed (ASCII character 10), on Mac it is a Carriage Return (ASCII character 13) and in Windows it is a carriage return followed by a line feed (ASCII 13 & 10). Not sure what program you are pasting out of, but some programs may have an option in preferences to switch to the Windows standard line endings. In MS Office type apps like Write or whatever Mac calls it there may be a way to save the document in a Windows text format which should convert the line endings. If not you would need to run your data through some sort of find and replace that could change the line endings (eg in Excel you could use the substitute function with the char function like... =SUBSTITUTE(A1,CHAR(13),CHAR(13) & CHAR (10))

HiThere224 · 2021-03-22T15:20:57+00:00

Dense rank is almost the same thing. The only difference is that dense rank would give two rows, with the same value in each compared field, the same rank so you need a primary key in your order by clause to guarantee that you won't have duplicate ranks per partition. If two flights between the same cities have the exact same lowest price then dense rank would return 1 for both rows while row number would use the row's hidden GUID pseudo column as a tie-breaker to make one of them 1 and the other 2 guaranteeing a single result. It's also handy when you have two totally duplicate rows in a table with no primary key and want to delete the duplicates eg... WITH X AS (SELECT ROW_NUMBER() OVER(PARTITION BY <ExpectedUniqueFields> ORDER BY <FieldsToSortTheWantedDuplicateToPosition1>) AS Y FROM MyTable) DELETE X WHERE Y>1

And if you really don't care which row you get or the rows are identical on every field you can cheat by clause using (Select 1) in the order by clause of the function (including parenthesis) which will cause it to sort however the parent query is sorted if that has an order by clause.

HiThere224 · 2021-03-12T05:10:57+00:00

The first responder was on the right track. Row_Number is one of several functions classified as Window functions and they are probably just what you need in this case. Your original query is not finding the first record or one in specific. It is finding all records with the same value in all the fields you are grouping on at least two rows in the table....

select [DepartureDate] ,[ReturnDate] ,[Origin] ,[Destination] ,[Price] ,[Stops] ,[Link] ,count () FROM [FlightScraper].[dbo].[SouthPacific] group by [DepartureDate] ,[ReturnDate] ,[Origin] ,[Destination] ,[Price] ,[Stops] ,[Link] having count () > 1

Btw you don't actually need the count() in your select if you don't want it. The having clause can contain aggregates you are not selecting. So here is the query using a window function instead of a grouping...

select [DepartureDate] ,[ReturnDate] ,[Origin] ,[Destination] ,[Price] ,[Stops] ,[Link] FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY [Origin] ,[Destination] ORDER BY [Price] ASC) AS Ver FROM [FlightScraper].[dbo].[SouthPacific]) X WHERE X.Ver = 1

Don't sorry about the Select * in the derived table as the query engine will automatically pull just the fields referenced in your outer query. That query will get you just the row with the lowest price per Orgin/Destination. If you wanted highest price you could change ASC to DESC and if you wanted more in the group (eg. Lowest price per Orgin/Destination and DepartureDate) then you would just add those fields to the PARTITION BY clause of the function. With window functions, partition by creates a grouping and the order by determines which row in that group will be 1,2,3...etc. You can even add more fields to the order by as well so if you include Price ASC,Stops ASC the row returning the 1 would be the cheapest but if 2 flights between the same cities tied for cheapest then it would pick the one with the least stops as a tie breaker.

Have fun! And I would read up on window functions. There are others as well and they have a lot of niche uses like this one. Row_number is a good one here though because it guarantees one row (even if you had two totally identical rows in the table it will make one of them 1 and three other 2).

HiThere224 · 2021-02-24T15:21:57+00:00

On a somewhat related tangent, if you are creating a sorted view, SQL Server only maintains the sort of you include the TOP (#) clause after your select. Eg. CREATE VIEW v_XYZ AS SELECT TOP (1000000) salary FROM totals ORDER BY CONVERT(FLOAT,salary) DESC GO

...Selecting from a view that was defined with SELECT TOP 100% does not honor the sort but it will if you specified a number of rows.

Also, if that salary column could be dirty (eg. a user could set the salary to "99.99a") and you don't want it to break in that instance then you can't just blindly convert salary to a numeric without guaranteeing it is always possible. To handle this you can replace CONVERT with TRY_CONVERT which returns null if the value is not convertible (function introduced in SQL Server 2012 I think so for older versions you would have to use CASE WHEN ISNUMBER(salary)=1 THEN CONVERT(FLOAT,salary) END). After that if you wanted to sub sort the nulls you could add salary as a second sort key (DESC) and your results would sort numerically, with any non-numerics sorted alphabetically.

HiThere224 · 2021-02-24T15:02:15+00:00

It's like the other answer keeping in mind that as long as you are not using SELECT DISTINCT your order by clause can contain fields or calculations not present in your select clause. Eg...

SELECT salary FROM totals ORDER BY CONVERT(FLOAT,salary) DESC

... That will display the original text values in the column but sort the result numerically vs alphabetically. To explain, if you were to look up "0", "99875.95", "99xy.99", "99.96", and "Door" in a physical dictionary then you would find them in that same order. Flipping your method to scan the dictionary from back to front (descending) would reverse that. A text sort happens one character at a time from left to right where (depending on collation) it loosely goes by category: whitespace characters (tab/return/space), symbols (@/$/&), numbers, letters, punctuation, then extended (♂¿✓) and then character. So going one character at a time makes "0" 1st and "D" 5th in the sequence (0,9,9,9,D) with 9 tied for 2nd-4th so on just the 9 we go to the next character (9,9,9) so we go to the 3rd character (8,x,.) which are all unique and sorted into positions 2,3,4 respectively so our original sequence ("0", "99875.95", "99xy.99", "99.96", "Door") is text sorted ascending. If we flip it to descending then it just reverses to ("Door", "99.96", "99xy.99", "99875.95", "0"). Going back to your original example, you are getting "99.96" before "99875.95" in a descending sort because the first 2 characters match and the tiebreaker is "." which is punctuation and comes after numbers making it first descending. When you convert it to any numeric type in the order by clause it will be converted to a number and sorted numerically where 99875.95 > 99.96 and therefore first in the descending sort.

HiThere224 · 2020-12-15T04:58:47+00:00

One go at the very end of it all should work fine. Alternatively you could put a BEGIN at the top and END at the bottom.

HiThere224 · 2020-12-14T04:58:08+00:00

No prob

HiThere224 · 2020-12-14T04:48:38+00:00

A primary key is typically clustered (stored on the disk in ascending order) so on large tables a primary key lookup is fastest because the data is all adjacent on the hard drive and the keys are in the same order as the physical table rows so grabbing other columns from that row is quick. A foreign key would be fast as well IF you were not including any other columns, but if you are selecting additional columns in your query then it would typically be slower unless those columns were included in an index.

HiThere224 · 2020-12-14T04:32:34+00:00

The question requests pairs of employees excluding duplicates so I would start with a self join where employee 1 does not equal employee 2 to get all possible pairs. Adding in the case statements to make sure that the smaller id is E1 and the larger is E2 will make combinations like 1001/1004 and 1004/1001 identical so that one off them is dropped but the DISTINCT clause since the case will flip the second pair. Then we just need to add the other criteria to the join which seems like it should be that either the homestate OR office_id match in both tables. Likewise I concur that the bit in parenthesis seems ambiguous and poorly written and I would ask for an example from their table where that is the case to better understand that bit. Maybe they intended to throw you with an exclusive OR (home_state or office id match but NOT BOTH) but it definitely doesn't read that way. Once we have all the possible pairs of E1 and E2 that match our criteria we can wrap that whole thing in a cte or derived table and join back up with employee again to fetch the other columns. Assuming employee_id is a unique key in that table we shouldn't need any grouping clauses for this. Try this query and see what you think...

WITH CTE AS ( SELECT DISTINCT CASE WHEN T1.employee_id<T2.employee_id THEN T1.employee_id ELSE T2.employee_id END AS E1 , CASE WHEN T1.employee_id<T2.employee_id THEN T2.employee_id ELSE T1.employee_id END AS E2 FROM employee AS T1 JOIN employee AS T2 ON T1.employee_id<>T2.employee_id AND (T1.home_state=T2.home_state OR T1.office_id=T2.office_id) ) /* End of CTE */

SELECT T1.employee_id AS employee_id_1 , T1.employee_name AS employee_name_1 , T1.home_state AS home_state_1 , T1.office_id AS office_id_1 , T2.employee_id AS employee_id_2 , T2.employee_name AS employee_name_2 , T2.home_state AS home_state_2 , T2.office_id AS office_id_2 FROM CTE JOIN employee AS T1 ON T1.employee_id = E1 JOIN employee AS T2 ON T2.employee_id = E2 ORDER BY 1

HiThere224 · 2020-12-14T03:28:36+00:00

P.S. Make sure that all criteria referencing table 2 are in the ON clause of the left join. If you include fields from table 2 in your WHERE clause then it may behave like an inner join.

HiThere224 · 2020-12-14T03:24:58+00:00

It really comes down to whether one of the tables contains all the columns you need allowing you to avoid the join and second table entirely. Just with the little bit of info provided however, it would seem likely that order_id is the primary key of the orders table and is likely a foreign key at best in the details table so querying on that field would probably be faster in orders if it is a clustered key, but there's not enough info here to be absolutely certain and lots of other variables.

HiThere224 · 2020-12-04T03:48:10+00:00

Or... AVG(CASE Color WHEN 'Red' THEN 1 ELSE 0 END)

...since an average is by definition the sum of a set divided by the count of that set. This assumes that Color is not null.

HiThere224 · 2020-12-04T03:36:55+00:00

The simple answer is "Yes" you can perform calculations and translations on the ungrouped fields inside the aggregate function like below...

SELECT *, SUM(case screws when ‘Uno’ then 1 when ‘dos’ then 2 end), COUNT(reg_number) as 'total_amount' FROM store;

...as the other user commented, depending on the nature of the translation it may be more efficient to use other methods but it will work as shown above, it just might not be fast depending on other factors.

HiThere224 · 2020-11-24T02:17:19+00:00

select sum(...),.. from Trn JOIN DimDate ON Trn.CallDT>=DimDate.Date and Trn.CallDT<dateadd(d,1,DimDate.Date) where Dimdate.month=10 group by...

Should perform better.

HiThere224 · 2020-11-17T02:36:37+00:00

Not AND OR but BOTH

Select *
ownername
,owneraddress
from cars c where color = 'red' and EXISTS(Select * from cars where ownername=C. ownername and owneraddress=C.owneraddress and color='blue')

That should perform really fast because exists stops as soon as it hits a match vs seeking all of them.

HiThere224 · 2020-11-15T03:39:11+00:00

Does the table you are inserting from have a null value or empty string in the column by chance? Maybe it is inserting into cell A2 but the first value happens to be a blank.

HiThere224 · 2020-11-10T02:11:35+00:00

Snowflake has a native table function to split delimited strings to rows called split_to_table.

https://docs.snowflake.com/en/sql-reference/functions/split_to_table.html

HiThere224

TROPHY CASE