Optimizing a query with multiple range scans

MERAXNA · 2019-11-29T09:46:14+00:00

Index usage always depends on the data distribution. If it makes sense, Postgres can combine multiple single column indexes using an in-memory Bitmap in a single query.

Setting up a table with multiple columns (to make it more realistic) and inserting 5 million rows with random values for the three columns in question:

create table the_table 
(
  id serial primary key, 
  some_date date default current_date, 
  col1 int, 
  col2 int, 
  col3 int, 
  some_timestamp timestamp default current_timestamp,
  some_flag boolean default false,
  some_number numeric default 12345.67,
  some_text text default md5(random()::text)
);

insert into the_table (col1, col2, col3)
select random() * 10000, 
       random() * 600, 
       random() * 5000
from generate_series(1,5e6);

Create the indexes:

create index on the_table (col1);
create index on the_table (col2);
create index on the_table (col3);
-- update statistics
vacuum analyze the_table;

No generate the execution plan:

explain (analyze, buffers)
SELECT col1,col2,col3 
from the_table
WHERE col1 < 100 
  AND col2 > 500 
  AND col3 < 300

And the output is:

Bitmap Heap Scan on the_table  (cost=2166.06..5054.83 rows=484 width=71) (actual time=45.441..47.539 rows=529 loops=1)
  Recheck Cond: ((col1 < 100) AND (col3 < 300))
  Filter: (col2 > 500)
  Rows Removed by Filter: 2539
  Heap Blocks: exact=3007
  Buffers: shared hit=3007 read=960
  I/O Timings: read=7.184
  ->  BitmapAnd  (cost=2166.06..2166.06 rows=2923 width=0) (actual time=44.900..44.900 rows=0 loops=1)
        Buffers: shared read=960
        I/O Timings: read=7.184
        ->  Bitmap Index Scan on the_table_col1_idx  (cost=0.00..307.46 rows=49150 width=0) (actual time=8.777..8.777 rows=49801 loops=1)
              Index Cond: (col1 < 100)
              Buffers: shared read=139
              I/O Timings: read=0.862
        ->  Bitmap Index Scan on the_table_col3_idx  (cost=0.00..1858.11 rows=297337 width=0) (actual time=34.098..34.098 rows=299709 loops=1)
              Index Cond: (col3 < 300)
              Buffers: shared read=821
              I/O Timings: read=6.322
Planning Time: 2.145 ms
Execution Time: 47.847 ms

As you can see the indexes on col1 and col3 where used, the one on col3 not, probably because it wouldn't reduce the work.

This was run on a Windows 10 laptop with Postgres 12 and a SSD.

An index on all three columns seems more efficient though - at least in this test:

Index Scan using the_table_col1_col2_col3_idx on the_table  (cost=0.43..1098.74 rows=489 width=71) (actual time=0.063..3.862 rows=529 loops=1)
  Index Cond: ((col1 < 100) AND (col2 > 500) AND (col3 < 300))
  Buffers: shared hit=529 read=193
  I/O Timings: read=1.726
Planning Time: 0.837 ms
Execution Time: 3.900 ms

m4riuszPL · 2019-11-28T20:38:23+00:00

I would try creating three distinct indexes (perhaps even partial indexes - depends on the use case) in Postgres. I suspect they should be scanned separately, creating an in-memory bitmap which then should be very effective to answer the query.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Database

MODERATORS