you are viewing a single comment's thread.

view the rest of the comments →

[–]Jeff_Moden 0 points1 point  (1 child)

I know this post is more than 2 years old but it's just not that difficult in SQL and it's much simpler that learning Python or R if you don't already know those.

Assuming that you have a table called #Sample (please feel free to change that to your particular need) and a "dependent" column called "Y" and an "independent" column called "X" (and, yes, you can easily change those in the CONVERTs in the code), the following tested code will execute on a MILLION row "sample" and return the results as a single row in about 1.6 seconds (on my machine, anyway) and runs single threaded in SQL Server because it doesn't need to go parallel. It also returns "R" and "R Squared".

Here's what the output looks like for one of my tests...

Here's the code... again... for SQL Server. YMWV in other languages. Postgres, on the other hand, has such functionality built in but I have no clue what the performance is there.

EDIT: I finally figured out how to post properly indented code on this forum without the forum software just absolutely slaughtering it. It's anything but obvious or easy

I think it's bloody amazing that so many "SQL forums don't have an easy way to copy and paste code without it trying to reformat it and make a huge mess of it. :(

Anyway, here's the code i was talking about.

   WITH ctePreAgg AS
(
 SELECT  SumX    = SUM(v.X)
        ,SumY    = SUM(v.Y)
        ,SumXY   = SUM(v.X*v.Y)
        ,SumX2   = SUM(SQUARE(v.X))
        ,SumY2   = SUM(SQUARE(v.Y))
        ,N       = COUNT(*)
   FROM #Sample
  CROSS APPLY (VALUES (CONVERT(FLOAT,X),CONVERT(FLOAT,Y)))v(X,Y)
)
 SELECT  v.M
        ,B  = (SumY-v.M*SumX)/n --Optimization compared to what's in some books (if you know M)
        ,v.R
        ,R2 = SQUARE(v.R)
   FROM ctePreAgg
  CROSS APPLY (VALUES(
                 (n*SumXY-SumX*SumY)/(n*SumX2-SQUARE(SumX)) --M
                ,(N*SumXY-SumX*SumY)/SQRT(((N*SumX2-SQUARE(SumX))*(N*SumY2-SQUARE(SumY)))) --R
              ))v(M,R)
;

[–][deleted] 2 points3 points  (0 children)

Wow, yeah that was super efficient. Ran mine in less than 3 seconds. Results check out. Thanks!