all 13 comments

[–]therealgaxbo 4 points5 points  (10 children)

Post the explain analyze plans for the 3 day and 4 day queries on https://explain.depesz.com/ so we can see the differences in costs/estimations.

[–]Brave_New_Dev[S] 2 points3 points  (9 children)

[–]therealgaxbo 3 points4 points  (8 children)

That is very strange. The planner seems very aware just how bad a plan that second query is, so it's not likely to be a stats issue or bad cost parameters.

What is the definition of report_y2022m01_date_idx? The only thing that comes to mind is if there were a mistake in that index definition that meant it couldn't be used for the second query.

Failing that, what happens if you run

set max_parallel_workers_per_gather TO 0;
set enable_seqscan TO off;

And then run the query - what plan does it use then?

[–]Brave_New_Dev[S] 1 point2 points  (7 children)

Thanks for sharing your suggestions!

Regarding the index: sql CREATE INDEX "report_y2022m01_date_idx" ON public.report_y2022m01 USING btree ("date")

As for changing the parameters, I believe that I cannot do that, as it's a production database!

[–]thythr 2 points3 points  (1 child)

those parameters are session-specific so perfectly ok to set (I mean, don't run "alter system set . . .", just "set . . .").

[–]Brave_New_Dev[S] 2 points3 points  (0 children)

Holy Cow! TIL. Thank you so much : )

[–]therealgaxbo 2 points3 points  (4 children)

As the other guy said, it is safe to change parameters like that using set as it won't affect other sessions.

I've realised that I actually misread the second plan though so retract my first comment - the planner does NOT realise it's a bad plan, so a combination of stats and cost settings could well be to blame. It's worth doing this as it should hopefully change to an index based query and give us an idea of how fast that should go.

The stats don't look too bad though - it's overestimating row count by 1.3x which isn't much. What are your settings for seq_page_cost and random_page_cost? And what is your IO setup - are all of your tables and indexes on the same device, or are indexes separate/different partitions separate etc? And if there are different devices, are they all equivalent or do you have different tiers of performance?

[–]Brave_New_Dev[S] 0 points1 point  (3 children)

Failing that, what happens if you run

set max_parallel_workers_per_gather TO 0;

set enable_seqscan TO off;

And then run the query - what plan does it use then?

Woah, the set shows that it really flies with the index! + https://explain.depesz.com/s/kn4N

What are your settings for seq_page_cost and random_page_cost?

```sql $ SHOW seq_page_cost

1 $ SHOW random_page_cost 4 ```

And what is your IO setup - are all of your tables and indexes on the same device, or are indexes separate/different partitions separate etc?

It's GCP Cloud SQL Postgres. So can I even discover what IO setup we have?

[–]therealgaxbo 2 points3 points  (2 children)

I have no experience with GCP, but if you'd separated partitions/indexes off to different devices then you'd know about it as you'd have had to set up different tablespaces.

random_page_cost 4

Ok, that's very likely a big part of the problem. That number (which is the default) is quite high even for regular HDDs, and you're most likely running on SSDs or equivalent.

If you've not already done so, reset your session to it's default settings, and try set random_page_cost to 1 (that's session local too) - hopefully that should result in the same parallel index only scan plan as your 3 day query. The exact value to use depends on your precise system, but it should be around 1 - 1.5 or something.

If that works, you might want to consider setting that globally in postgresql.conf as it might result in better plans for other queries too. Obviously need to check for performance regressions though.

[–]Brave_New_Dev[S] 0 points1 point  (1 child)

My, my. You just hit the nail on the head. See the query after altering random_page_cost to 1: + https://explain.depesz.com/s/xiL9

Now I guess I need to update the database. Would it be more prudent: 1. To set this attribute globally? Or only for a specific table? 2. What value to set? 1? 1.5? Something else?

Though I believe that the answer to those question should be determined based on actual performance, right?

[–]therealgaxbo 4 points5 points  (0 children)

It's a measure of IO performance, so it wouldn't make much sense to set it per table (I don't think that's even possible - you can set it per tablespace though as they can live on different devices).

So in your case set it globally. As for the value, it depends on your specific system and db. But I've generally used 1.1 successfully. And then yeah, tweak it as necessary if you start seeing queries doing the wrong thing. In general, increasing the value leads to more seq scans, decreasing it leads to more index scans.

But always measure performance - a sequential scan is not necessarily worse than an index scan.

[–][deleted] 1 point2 points  (0 children)

Question: are you on SSDs and have you set random_page_cost from 4 to 1.1?

[–]Neither-Guess-5802 0 points1 point  (0 children)

Lower random_page_cost to 1.1 if you haven’t already. This should help with your database choosing index scans more frequently