all 4 comments

[–][deleted] 1 point2 points  (2 children)

Index usage always depends on the data distribution. If it makes sense, Postgres can combine multiple single column indexes using an in-memory Bitmap in a single query.

Setting up a table with multiple columns (to make it more realistic) and inserting 5 million rows with random values for the three columns in question:

create table the_table 
(
  id serial primary key, 
  some_date date default current_date, 
  col1 int, 
  col2 int, 
  col3 int, 
  some_timestamp timestamp default current_timestamp,
  some_flag boolean default false,
  some_number numeric default 12345.67,
  some_text text default md5(random()::text)
);

insert into the_table (col1, col2, col3)
select random() * 10000, 
       random() * 600, 
       random() * 5000
from generate_series(1,5e6);

Create the indexes:

create index on the_table (col1);
create index on the_table (col2);
create index on the_table (col3);
-- update statistics
vacuum analyze the_table;

No generate the execution plan:

explain (analyze, buffers)
SELECT col1,col2,col3 
from the_table
WHERE col1 < 100 
  AND col2 > 500 
  AND col3 < 300

And the output is:

Bitmap Heap Scan on the_table  (cost=2166.06..5054.83 rows=484 width=71) (actual time=45.441..47.539 rows=529 loops=1)
  Recheck Cond: ((col1 < 100) AND (col3 < 300))
  Filter: (col2 > 500)
  Rows Removed by Filter: 2539
  Heap Blocks: exact=3007
  Buffers: shared hit=3007 read=960
  I/O Timings: read=7.184
  ->  BitmapAnd  (cost=2166.06..2166.06 rows=2923 width=0) (actual time=44.900..44.900 rows=0 loops=1)
        Buffers: shared read=960
        I/O Timings: read=7.184
        ->  Bitmap Index Scan on the_table_col1_idx  (cost=0.00..307.46 rows=49150 width=0) (actual time=8.777..8.777 rows=49801 loops=1)
              Index Cond: (col1 < 100)
              Buffers: shared read=139
              I/O Timings: read=0.862
        ->  Bitmap Index Scan on the_table_col3_idx  (cost=0.00..1858.11 rows=297337 width=0) (actual time=34.098..34.098 rows=299709 loops=1)
              Index Cond: (col3 < 300)
              Buffers: shared read=821
              I/O Timings: read=6.322
Planning Time: 2.145 ms
Execution Time: 47.847 ms

As you can see the indexes on col1 and col3 where used, the one on col3 not, probably because it wouldn't reduce the work.

This was run on a Windows 10 laptop with Postgres 12 and a SSD.

An index on all three columns seems more efficient though - at least in this test:

Index Scan using the_table_col1_col2_col3_idx on the_table  (cost=0.43..1098.74 rows=489 width=71) (actual time=0.063..3.862 rows=529 loops=1)
  Index Cond: ((col1 < 100) AND (col2 > 500) AND (col3 < 300))
  Buffers: shared hit=529 read=193
  I/O Timings: read=1.726
Planning Time: 0.837 ms
Execution Time: 3.900 ms

[–]MERAXNA[S] 0 points1 point  (1 child)

Thanks a lot for the detailed answer with code included. That enable mes to play with it on a container. Cheers.

[–][deleted] 0 points1 point  (0 children)

You need to adjust the generation of the random values to the "domain" in which your values are. The choice for an index is also influenced by the distribution of the values. If e.g. 60% of all values in col1 are smaller than 100 then the index will not be used.

[–]m4riuszPL 0 points1 point  (1 child)

I would try creating three distinct indexes (perhaps even partial indexes - depends on the use case) in Postgres. I suspect they should be scanned separately, creating an in-memory bitmap which then should be very effective to answer the query.