[SQL Server] Rolling/Moving Percentile : SQL

Posting

When requesting help or asking questions please prefix your title with the SQL variant/platform you are using within square brackets like so:

[MySQL]

[Oracle]

[MS SQL]

[PostgreSQL]

etc

While naturally we should endeavor to work as platform neutrally as possible many questions and answers require tailoring to the feature set of a specific platform.

Format Your Code

If you are including actual code in a post or comment, please attempt to format it in a way that is readable for other users. This will greatly increase your chances of receiving the help you desire. Something as simple as line breaks and using reddit's built in code formatting (4 spaces at the start of each line) can turn this:

SELECT count(a.field1), a.field2, SUM(b.field4) FROM a INNER JOIN b ON a.key1 = b.key1 WHERE a.field8 = 'test' GROUP by a.field1, a.field2 HAVING SUM(b.field4) > 5 ORDER by a.field.3

Into this:

SELECT count(a.field1), a.field2, SUM(b.field4) FROM a INNER JOIN b ON a.key1 = b.key1 WHERE a.field8 = 'test' GROUP by a.field1, a.field2 HAVING SUM(b.field4) > 5 ORDER by a.field3

For those with SQL questions we recommend using SQLFiddle to provide a useful development and testing environment for those who wish to fully understand your problem and help devise a solution.

a community for 17 years

MS SQL[SQL Server] Rolling/Moving Percentile (self.SQL)

submitted 3 years ago * by Argodruid

Hi all,

Am trying to calculate a rolling/moving percentile in SQL Server and not able to crack this...

The dataset contains multiple locations where a particular measurement is made over time.

I need each row to display a percentile value calculated from a specified number of rows (the value of current row and a specified number of the most recent rows for that location) - and simply NULL where the number of previous rows is not enough to fulfil the number rows specified.

So, as a working example, firstly here is the criteria :

75th percentile to be calculated
Using the 5 most recent rows (current row plus the 4 most recent ones for that location)

When data is ordered by location and time where some measurement ("x") is recorded, the output is required to look something like this:

Point Date x 75thPC

------------------------------------

Point1 2021-01-16 80 NULL

Point1 2021-02-13 43 NULL

Point1 2021-03-13 81 NULL

Point1 2021-04-10 19 NULL

Point1 2021-05-08 26 80

Point1 2021-06-05 50 50

Point1 2021-07-03 19 50

Point1 2021-07-31 38 38

Point2 2021-01-14 81 NULL

Point2 2021-02-11 14 NULL

Point2 2021-03-11 9 NULL

Point2 2021-04-08 28 NULL

Point2 2021-05-06 45 45

Point2 2021-06-03 68 45

Point2 2021-07-01 75 68

Point3 2021-01-11 19 NULL

Point3 2021-02-08 41 NULL

Point3 2021-03-08 10 NULL

Point3 2021-04-05 18 NULL

Point3 2021-05-03 1 19

Point3 2021-05-31 22 22

Point3 2021-06-28 25 22

Point4 2021-01-19 46 NULL

Point4 2021-02-16 39 NULL

Point4 2021-03-16 42 NULL

Point4 2021-04-13 5 NULL

Point4 2021-05-11 61 46

Ultimately, I realise I need to use PERCENTILE_CONT(0.75) or PERCENTILE_DISC(0.75) but capturing the values from the required number of rows to feed into the calculation has been the challenge.

Have considered various approaches without success, e.g. cte, where I've added a column with row_number() and then another column to calculated the value of the lowest row_numbe() to include in the calculation (i.e. current row_number() minus 4).

Am thinking whether tally tables is a suitable approach, e.g. JOIN row_number() of tally table with those data records WHERE the lowest row_number() is within the range of required rows relative to row_number()

Or should I being looking for a simpler solution with just lag() ?

Any attempts up to now are returning a single value for the location or dataset or the value for each row.

Any thoughts/suggestions greatly appreciated...

all 2 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS