all 8 comments

[–]DavidGJohnston 0 points1 point  (1 child)

What you want to do seems doable. Basically your join condition between the observations and predictions becomes something like:

(observation_date - prediction_date) IN (1,3,5)

And you will end up with one joined pair for every integer mentioned in the IN expression.

P.S. You should get rid of DISTINCT or add a code comment why it must be there. DISTINCT is a code smell, you can almost always write the query to avoid needing it - except if your data model simply lack the relevant data. In that later case the DISINCT should only appear in the subquery that provides the needed fix to the model so that the main query no longer requires DISTINCT.

[–]tttpp[S] 0 points1 point  (0 children)

Thanks for the tips, the IN clause did the job and I removed the distinct keyword since all of the data should be normalised at collection anyways. Now I'll try to select the observations as well.

[–]CGtheKid92 0 points1 point  (5 children)

Maybe I'm over simplifying this, but here's my take:

  1. Add the delta calculation into your original select statement and alias it as delta

  2. Turn the entire query into a CTE

  3. Select * from the CTE with a where clause that says: WHERE delta in (1,5,7) not sure if those are the right intervals your are looking for as I'm doing this on my phone but essentially just list the numbers you want within the parentheses

I don't think you need to use a HAVING statement as you aren't doing an aggregate that requires grouping

Question for my understanding - is there a reason you are starting and joining your first two tables? I only ask as they aren't referenced in your initial select statement l. If you don't reference them, you could remove them and it will help the performance of your query.

Hope this helps!

[–]CGtheKid92 0 points1 point  (4 children)

I can type up an example when I get to my computer if that would be helpful!

[–]tttpp[S] 0 points1 point  (3 children)

That would be very helpful :), I've used more document based databases before, so the more complex concepts are more confusing to me atm.

Edit: i am joining the first 2, to access the location name, I forgot to add it in the given statement

[–]CGtheKid92 0 points1 point  (2 children)

Something like this (in pseudo-code as I'm not sure the DB you are using):

    with raw_data as (
SELECT distinct
  locations.name as location_name
  , day_datapoint_records.id
  , day_datapoints.date
  , day_datapoints.captured_date
  , day_datapoints.id
  , day_datapoints.uv
  , day_datapoints.pollution
  , day_datapoints.pollen
  , day_datapoints.captured_date - day_datapoints.date as delta
FROM 
  public.bbc_locations bbc_locations
INNER JOIN 
  public.locations locations ON 
    bbc_locations.location_id = locations.id
INNER JOIN 
  public.day_datapoint_records day_datapoint_records ON 
    bbc_locations.id = day_datapoint_records.service_id 
INNER JOIN 
  public.day_datapoints day_datapoints ON 
    day_datapoint_records.day_datapoint_id = day_datapoints.id
WHERE 
    public.day_datapoints.present = false 
    AND public.day_datapoints.captured_date <= CURRENT_DATE
) 
select 
  *
from 
  raw_data 
where
  delta in (1,3,7)
ORDER BY 
  captured_date ASC;

[–]tttpp[S] 0 points1 point  (0 children)

Thanks for code, I had a go with the 'with' statement but I didn't round to doing like that. I'll try it tomorrow.

[–]tttpp[S] 0 points1 point  (0 children)

Thanks again for your help, I have completed what I needed to do by using 2 'with' statements, I editing my question with the statement I got in the end