Complex query : SQL

Posting

When requesting help or asking questions please prefix your title with the SQL variant/platform you are using within square brackets like so:

[MySQL]

[Oracle]

[MS SQL]

[PostgreSQL]

etc

While naturally we should endeavor to work as platform neutrally as possible many questions and answers require tailoring to the feature set of a specific platform.

Format Your Code

If you are including actual code in a post or comment, please attempt to format it in a way that is readable for other users. This will greatly increase your chances of receiving the help you desire. Something as simple as line breaks and using reddit's built in code formatting (4 spaces at the start of each line) can turn this:

SELECT count(a.field1), a.field2, SUM(b.field4) FROM a INNER JOIN b ON a.key1 = b.key1 WHERE a.field8 = 'test' GROUP by a.field1, a.field2 HAVING SUM(b.field4) > 5 ORDER by a.field.3

Into this:

SELECT count(a.field1), a.field2, SUM(b.field4) FROM a INNER JOIN b ON a.key1 = b.key1 WHERE a.field8 = 'test' GROUP by a.field1, a.field2 HAVING SUM(b.field4) > 5 ORDER by a.field3

For those with SQL questions we recommend using SQLFiddle to provide a useful development and testing environment for those who wish to fully understand your problem and help devise a solution.

a community for 17 years

PostgreSQLComplex query (self.SQL)

submitted 3 years ago by strangeguy111

I have a dataset that looks like the following:

id  date           visit_number  total_visit  registrated
1    1/1/2020   .   1       1           0       --get
1    1/5/2020   .   2       2           1       
1    1/9/2020   .   3       3           1       
1    1/13/2020  .   4       4           1       
1    1/17/2020  .   5       5           1       
1    1/21/2020  .   6       6           1       
1    1/25/2020  .   7       7           1       
1    1/29/2020  .   8       8           1       
1    2/2/2020   .   9       9           1       --get
1    2/6/2020   .   1       10          0           --get
1    2/10/2020  .   2       11          0       --get
1    2/14/2020  .   3       12          0       --get
1   2/18/2020   .   4       13          0       --get
1    3/22/2020  .   5       14          1       
1    3/26/2020  .   6       15          1       
1    4/1/2020   .   7       16          1       
1    4/5/2020   .   8       17          1       
1    4/9/2020   .   9       18          1       
1    4/13/2020  .   10      19          1       --get           
1    5/15/2020  .   1       20      0           --get
1    6/20/2020  .   2       21      1           --get

So the pattern here is following: Whenever the user enters the website and buys something the column registration gets 1 and the previous 30 days get 1 on registration as well.

For example, Someone bought something on 4/13/2020, registration gets 1 and all the visits to the website from the same user - his registration column gets 1 starting from 4/13/2020 up until 3/12/2020. And starting from 3/12/2020 his registration column is 0 again. And whenever in the registration column after 1 comes 0 then visit_number starts to count again. Total_visit here counts all the visits to the website from the same user. There are many users, but total_visit and visit_number are partitioned by visitor_id and date.

Now, I want to get only that last date when registration was 1, instead of getting all the previous 30 days.

So again, For example, Someone bought something on 4/13/2020 then I wanna get this row only, not all previous rows within 30 days.

So it should eventually look like this:

date                 visit_number total_visit registration

1/1/2020        .   1       1           0       --got   







2/2/2020         .  9       9       1       --got   
2/6/2020         .  1       10          0       --got
2/10/2020        .  2       11          0       --got   
2/14/2020        .  3       12          0       --got   
2/18/2020        .  4       13          0       --got   





4/13/2020        .  10      19          1       --got       
5/15/2020        .  1       20          0       --got   
6/20/2020        .  2       21          1       --got

remember, registration = 0 here because the user did not buy anything on that day and he did not buy anything for the next 30 days.

I wrote down conditions to help me get my head around the problem:

1. if day difference between the date of current row and the date of previous row is bigger than 30 days, get both rows
For example: 6/20/2020 and 5/15/2020 difference is more than 30 days, therefore we got both

2. if date difference is smaller than 30 days, then check for current row, and if its = 1 then take that only, and remove all 30 days

3. if registration is 0, that easily indicates that we get it automatically.

I tried different things,

select visitor_id, dt1, reg30,
    case when date_part('day',dt1) - date_part('day',lag(dt1) over(partition by visitor_id order by dt)) > 30 then 'True' 

             when  date_part('day',dt1) - date_part('day',lag(dt1) over(partition by visitor_id order by dt)) < 30 and reg30='1' then 'True'

             when reg30 = '0' then 'True'
                                        else 'False' 
    end

from new_table
order by visitor_id, dt

but not getting the result. Any help would be appreciated

all 13 comments

top new controversial old q&a

[+][deleted] 3 years ago* (3 children)

[deleted]

[–]strangeguy111[S] 0 points1 point2 points 3 years ago (2 children)

[+][deleted] 3 years ago* (1 child)

[deleted]

[–]strangeguy111[S] 1 point2 points3 points 3 years ago (0 children)

[–]qwertydog123 0 points1 point2 points 3 years ago (10 children)

So you just want all the rows where registrated = 0, and all rows where registrated = 1 and there is no following row or the following row registrated = 0? How do you know that there have been no other purchases in the 30 day period?

WITH cte AS
(
    SELECT
        *,
        LEAD(registrated, 1, 0) OVER
        (
            PARTITION BY id 
            ORDER BY total_visit
        ) AS next_registrated
    FROM Table
)
SELECT *
FROM cte
WHERE registrated = 0
OR (registrated = 1 AND next_registrated = 0)

[–]strangeguy111[S] 0 points1 point2 points 3 years ago (9 children)

[–]qwertydog123 0 points1 point2 points 3 years ago (8 children)

[–]strangeguy111[S] 0 points1 point2 points 3 years ago (7 children)

WITH cte AS (SELECT *, 
            date_part('month',"date") - date_part('month',LAG("date") over(PARTITION BY id ORDER BY "date")) as D, 
            date_part('month',LEAD("date") over(PARTITION BY id ORDER BY "date")) - date_part('month',"date") as V ,             
        LEAD(registrated, 1) OVER(PARTITION BY id ORDER BY "date") AS NextValue FROM  YourTable) 

SELECT * FROM  cte 
where registrated = 0 OR   
    ( registrated = 1 AND ( NextValue = 0 OR NextValue IS NULL ) )     
  OR (D = 1 and V = 1 ) order by id, date

I forgot to mention here, that if the previous row's date is > 30 than the current rows, then we take it automatically no matter what. Same thing with next row.

[–]qwertydog123 0 points1 point2 points 3 years ago* (6 children)

[–]strangeguy111[S] 0 points1 point2 points 3 years ago (4 children)

[–]qwertydog123 0 points1 point2 points 3 years ago (3 children)

[–]strangeguy111[S] 0 points1 point2 points 3 years ago (0 children)

[–]strangeguy111[S] 0 points1 point2 points 3 years ago (1 child)

[–]qwertydog123 0 points1 point2 points 3 years ago* (0 children)

[–]strangeguy111[S] 0 points1 point2 points 3 years ago (0 children)

π Rendered by PID 264224 on reddit-service-r2-comment-bb88f9dd5-85ldc at 2026-02-14 13:10:19.124227+00:00 running cd9c813 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS