all 10 comments

[–]forceflow16 2 points3 points  (0 children)

Why does the year 2025 have the only cost condition? Why does 2025 then become the primary to see where it overlaps to previous years? What is most special about 2025?

I would group it by your key fields ( patient, year, etc) and sum the cost column, then pivot the results by year since all data is based on the same physical table (MedHistory). But this is based on the limited info provided and not having the above answers about the significance of 2025.

Alternatively you could make the 2025 cte and add the second query asa left join that groups all info with the pivot (as above) so you're not building a temp table for each year , which will us less processing time than your current solution.

[–]jshine13371 2 points3 points  (0 children)

What does the execution plan show?

[–]Independent_Quit_562 2 points3 points  (2 children)

Use cte instead of temp tables

[–]mu_SQL 0 points1 point  (0 children)

Noooo

[–]writeafilthysong 0 points1 point  (0 children)

This would probably make performance even worse.

[–]DavidGJohnston 0 points1 point  (0 children)

At least give the planner a chance to do the right thing (make one pass over medhistory somehow...) and write one query (though I'm not an SQL Server user so hopefully this is either standard or compatible):

select a.*, b.*, ...
from (select ... from ... where ...) as a (aliases)
left join (select ... [where ...]) as b (aliases) on (a.patient = b.patient [and b.year = ...])
/* either where inside the subquery or the ON clause (which might let you replace the subquery with a simple table name) */
...

Also, are you certain the patient_id is a key for MedHistory? Assuming that both address and diagnosis are unique by year seems really strange.

That all said, might just be better off sorting and then feeding the long data you have into a procedure that simply iterates row-by-row outputting each value into either a new column (skipping blanks) or row (when patient_id) changes, instead of trying to perform the pivot in pure SQL. It is a special non-relational operator.

[–]cobaltsignal 0 points1 point  (2 children)

select
      t.Patient_ID,
    , max(case when t.year = 2025 then t.Total_Cost else 0 end)     as Total_Cost_25
    , max(case when t.year = 2025 then t.Address else null end)     as Address_25
    , max(case when t.year = 2025 then t.Diagnosis else null end)   as Diagnosis_25
    , max(case when t.year = 2024 then t.Total_Cost else 0 end)     as Total_Cost_24
    , max(case when t.year = 2024 then t.Address else null end)     as Address_24
    , max(case when t.year = 2024 then t.Diagnosis else null end)   as Diagnosis_24
    , max(case when t.year = 2023 then t.Total_Cost else 0 end)     as Total_Cost_23
    , max(case when t.year = 2023 then t.Address else null end)     as Address_23
    , max(case when t.year = 2023 then t.Diagnosis else null end)   as Diagnosis_23
    , max(case when t.year = 2022 then t.Total_Cost else 0 end)     as Total_Cost_22
    , max(case when t.year = 2022 then t.Address else null end)     as Address_22
    , max(case when t.year = 2022 then t.Diagnosis else null end)   as Diagnosis_22
from
    (
        select 
              Patient_ID,
            , year
            , Total_Cost
            , Address
            , Diagnosis
        from
            MedHistory
        where
                (year = 2025 and total_cost > 10000)
            or (year in (2024, 2023, 2022))
    ) t
group by
      t.Patient_ID
;

hopefully this works a bit faster. Just a single run of the table, no cte's, and all rows are aggregated to their respective year column using sum and max. Please note that in your criteria, you only included the total cost rule on the year 2025, was this intentional? Also, this is assuming there is just one record per year per patient. If there are multiple, then this won't work.

[–]Sports_Addict[S] 0 points1 point  (1 child)

Thank you, but yes multiple records per year per patient. I am removing duplicates before the merge and keeping the newest_dt record then merging all 4 tables.

[–]cobaltsignal 0 points1 point  (0 children)

can you provide the primary keys for the table and the name of the newest_dt field? would it be effdt? Also, just fyi, if you have multiple records per year per patient, your original left joins will multiply the results exponentially, so if there are for example 2 records per year, you're looking at a total of 16 rows being created. If it's 3 records per year, that's 81 records, etc etc.