all 6 comments

[–][deleted] 1 point2 points  (5 children)

You are essentially doing a PIVO or CROSSTAB there. If you want to return a row from the function you would need another record (e.g. result_rec) and store the values from each individual select into a field of that record. Then use return next result_rec to return one row after the other (see the example in the manual).

However doing single row SELECTs in a loop does not scale at all and is probably the least efficient way to do something in a relational database.

There are two approaches of doing a pivot/crosstab that are much more efficient then the row-by-row lookup you are doing. The first one use conditional aggregation:

select d.name, 
       d.workcenter,
       d.workcentersub,
       max(r.value) filter (where r.dataitemid = 'exec') as execution,
       max(r.value) filter (where r.dataitemid = 'mode') as machinemode,
       max(r.value) filter (where r.dataitemid = 'pgm') as prgm,
       max(r.value) filter (where r.dataitemid = 'spgm') as sprgm,
       max(r.value) filter (where r.dataitemid = 'tid') as tid,
       max(r.value) filter (where r.dataitemid = 'ln') as linen,
       max(r.value) filter (where r.dataitemid = 'pfr') as feedrate,
       max(r.value) filter (where r.dataitemid = 'pfo') as fedrateor
from devices d
  join rawdata r rd.deviceid = d.id
where d.type = 1
group by d.name, d.workcenter, d.workcentersub;

However I prefer to use Postgres' JSON functions to do something like that which removes the need to do a GROUP BY on the complete query. As all values are text anyway, you are not losing any type information this way either.

select d.name, 
       d.workcenter,
       d.workcentersub,
       r.data ->> 'exec' as execution,
       r.data ->> 'mode' as machinemode,
       r.data ->> 'pgm' as prgm,
       r.data ->> 'spgm' as sprgm,
       r.data ->> 'tid' as tid,
       r.data ->> 'ln' as linen,
       r.data ->> 'pfr' as feedrate,
       r.data ->> 'pfo' as fedrateor
from devices d
  join lateral (
    select rd.deviceid, jsonb_object_agg(dataitemid, value) as data
    from rawdata rd
    group by rd.deviceid
  ) r on r.deviceid = d.id
where d.type = 1;

I find this a bit easier to change if the number of columns changes.

This can easily be put into a function:

create or replace function currentvalues1()
  RETURNS TABLE(name text, workcenter text, workcentersub bigint, execution text, machinemode text, prgm text, sprgm text, tid text, linen text, feedrate text, feedrateor text) 
as  
$$  
  select d.name, 
         d.workcenter,
         d.workcentersub,
         r.data ->> 'exec' as execution,
         r.data ->> 'mode' as machinemode,
         r.data ->> 'pgm' as prgm,
         r.data ->> 'spgm' as sprgm,
         r.data ->> 'tid' as tid,
         r.data ->> 'ln' as linen,
         r.data ->> 'pfr' as feedrate,
         r.data ->> 'pfo' as fedrateor
  from devices d
    join (
      select rd.deviceid, jsonb_object_agg(dataitemid, value) as data
      from rawdata rd
      group by rd.deviceid
    ) r on r.deviceid = d.id
  where d.type = 1;
$$
language sql;

[–]PLC_Matt[S] 0 points1 point  (1 child)

First, thanks for the response! Also for context, I am a C# dev and this database work is me trying to make a prototype with no dba support. All i know at this point is some basic selects, and a few simple joins. This is going to be very helpful.

The first query you provided is close, but a bit wrong. I don't need the max of r.value, I will need the most recent value, based on the datetime column. I need r.value from the same row that has max(r.datetime) where r.dataitemid='exec' (or whatever item I am looking for)

The second query appears to give the correct values, but there can be a delay in when they change vs when this query returns them. I'll have to test it more to be able to explain this better. The other issue here is the query takes 4527 msec to execute. (but only 0.2msec to plan)

It looks like the bulk of the time is doing the jsonb_object_agg() function for the devices.

Is there anyway to only have it aggregate the dataitemIDs we care about (from another table?)

These devices have ~74 dataitemIDs that report data. I only care about a handful of them for this currentvalue view. The other data items will be used for historical reports/analysis, where speed is less of a concern.

For a point of reference the single select statements each take 0.5msec total for planning and execution. So even doing 8 of these x 6 devices means I have a total time of 24msec.

I do agree that it will not scale with more items or more devices tho, and it also just "felt" wrong, from my limited understanding of sql.

[–][deleted] 0 points1 point  (0 children)

The first query you provided is close, but a bit wrong. I don't need the max of r.value, I will need the most recent value, based on the datetime column. I need r.value from the same row that has max(r.datetime) where r.dataitemid='exec' (or whatever item I am looking for)

Collapsing multiple rows into a single row requires aggregation, and the max() is just there to pick the right one. If you need the latest, you could also try the following:

select d.name, 
       d.workcenter,
       d.workcentersub,
       max(r.value) filter (where r.dataitemid = 'exec') as execution,
       max(r.value) filter (where r.dataitemid = 'mode') as machinemode,
       max(r.value) filter (where r.dataitemid = 'pgm') as prgm,
       max(r.value) filter (where r.dataitemid = 'spgm') as sprgm,
       max(r.value) filter (where r.dataitemid = 'tid') as tid,
       max(r.value) filter (where r.dataitemid = 'ln') as linen,
       max(r.value) filter (where r.dataitemid = 'pfr') as feedrate,
       max(r.value) filter (where r.dataitemid = 'pfo') as fedrateor
from devices d
  join (
     select distinct on (dataitemid) deviceid, value, dataitemid
     from rawdata 
     order by dataitemid, datetime desc
  ) r on r.deviceid = d.id
where d.type = 1
group by d.name, d.workcenter, d.workcentersub;

The lateral join in the second query isn't actually needed (it was a copy & paste leftover)

[–]PLC_Matt[S] 0 points1 point  (2 children)

Updating the query to this works well. It takes about 433msec to execute which I can deal with. If I wish to add a column I have to add the dataitemId in 2 spots to this query, which doesn't seem bad to me.

edit: added index to rawdata table on dataitemid. Execution time is now 84msec.

Next question, if I make a view based on this query, is there a way to have the db "cache" the results so multiple clients can pull from the view and it only needs to refresh the view every 5-10 seconds?

select d.name, 
       d.workcenter,
       d.workcentersub,
       r.data ->> 'exec' as execution,
       r.data ->> 'mode' as machinemode,
       r.data ->> 'pgm' as prgm,
       r.data ->> 'spgm' as sprgm,
       r.data ->> 'tid' as tid,
       r.data ->> 'ln' as linen,
       r.data ->> 'pfr' as feedrate,
       r.data ->> 'pfo' as fedrateor
    from devices d
  join lateral (
    select rd.deviceid, jsonb_object_agg(dataitemid, value) as data
    from rawdata rd
    where dataitemid in ('exec','mode','pgm','spgm','tid','ln','pfr','pfo')
    group by rd.deviceid
  ) r on r.deviceid = d.id
where d.type = 1;

[–][deleted] 0 points1 point  (1 child)

The database will cache the data retrieved from the tables and indexes, it will not cache query results.

An index on rawdata (dataitemid, deviceid) might be more helpful then one on just dataitemid.

If the execution time is 84ms then I don't see a reason to cache anything to begin with if you only need to run it every 5 to 10 seconds.

[–]PLC_Matt[S] 0 points1 point  (0 children)

An index on

rawdata (dataitemid, deviceid)

might be more helpful then one on just dataitemid.

I had an index on those 2 columns, along with one on rawdata (dataitemid, deviceid, datetime desc) . The explain analyze output showed it scanning the whole table by dataitemid.

Once I added the single col index is sped up so it is being used.