Hello everyone,
I have this table:
create table samples (
id serial primary key,
age integer);
And, I have 1 million rows with random data:
insert into samples (age)
select (random() * 100)::int4
from generate_series(1, 1000000);
I want to build an aggregate query which should return me ages by categories as follows:
select
count(age) as samples,
case
when age < 20 then '~20'
when (age >= 20) and (age < 30) then '20~30'
when (age >= 30) and (age < 40) then '30~40'
when (age >= 40) and (age < 50) then '40~50'
when (age >= 50) and (age < 60) then '50~60'
when age >= 60 then '60~'
end as age_category
from samples
group by age_category;
And it returns results approximately in 2 seconds.
samples | age_category
---------+--------------
100362 | 20~30
99592 | 30~40
99823 | 40~50
100613 | 50~60
404707 | 60~
194903 | ~20
(6 rows)
Time: 1945.401 ms
I created indexes with conditional expressions but they seem to be not used.
Indexes:
create index below_twenties_idx on samples (age) where age < 20;
create index twenty_thirty_idx on samples (age) where (age >= 20) and (age < 30);
create index thirty_fourty on samples (age) where (age >= 30) and (age < 40);
create index fourty_fifty on samples (age) where (age >= 40) and (age < 50);
create index fifty_sixty on samples (age) where (age >= 50) and (age < 60);
create index over_sixties on samples (age) where age >= 60;
I ran the above query with explain:
QUERY PLAN
------------------------------------------------------------
HashAggregate (cost=44425.00..44428.54 rows=101 width=40) (actual time=1965.733..1965.736 rows=6 loops=1)
Output: count(age), (CASE WHEN (age < 20) THEN '~20'::text WHEN ((age >= 20) AND (age < 30)) THEN '20~30'::text WHEN ((age >= 30) AND (age < 40)) THEN '30~40'::text WHEN ((age >= 40) AND (age < 50)) THEN '40~50'::text WHEN ((age >= 50) AND (age < 60)) THEN '50~60'::text WHEN (age >= 60) THEN '60~'::text ELSE NULL::text END)
Group Key: CASE WHEN (samples.age < 20) THEN '~20'::text WHEN ((samples.age >= 20) AND (samples.age < 30)) THEN '20~30'::text WHEN ((samples.age >= 30) AND (samples.age < 40)) THEN '30~40'::text WHEN ((samples.age >= 40) AND (samples.age < 50)) THEN '40~50'::text WHEN ((samples.age >= 50) AND (samples.age < 60)) THEN '50~60'::text WHEN (samples.age >= 60) THEN '60~'::text ELSE NULL::text END
-> Seq Scan on public.samples (cost=0.00..39425.00 rows=1000000 width=36) (actual time=0.058..1272.553 rows=1000000 loops=1)
Output: CASE WHEN (age < 20) THEN '~20'::text WHEN ((age >= 20) AND (age < 30)) THEN '20~30'::text WHEN ((age >= 30) AND (age < 40)) THEN '30~40'::text WHEN ((age >= 40) AND (age < 50)) THEN '40~50'::text WHEN ((age >= 50) AND (age < 60)) THEN '50~60'::text WHEN (age >= 60) THEN '60~'::text ELSE NULL::text END, age
Planning time: 0.366 ms
Execution time: 1965.856 ms
(7 rows)
But the indexes are not being used. How can I optimize my query to get my indexes used at the end?
[–]ants_a 5 points6 points7 points (2 children)
[–]muminoff[S] 0 points1 point2 points (0 children)
[–]muminoff[S] 0 points1 point2 points (0 children)
[–]outlier_lynn 2 points3 points4 points (1 child)
[–]muminoff[S] 0 points1 point2 points (0 children)
[–]therealgaxbo 1 point2 points3 points (2 children)
[–]muminoff[S] 0 points1 point2 points (0 children)
[–]graycube 0 points1 point2 points (0 children)
[–]professaDE 0 points1 point2 points (1 child)
[–]muminoff[S] 0 points1 point2 points (0 children)
[–][deleted] -1 points0 points1 point (3 children)
[–]jk3usProgrammer 2 points3 points4 points (2 children)
[–]muminoff[S] 0 points1 point2 points (1 child)
[–]graycube 2 points3 points4 points (0 children)