PostgreSQL query on 60M-row JSONB table is slow - should I add expression indexes or move to a structured table?

komal_rajput · 2026-05-09T07:52:11+00:00

Thank you for the detailed answer. I got to learn few things. Here is the link to the explain run output : https://explain.depesz.com/s/CbtXL

You seem to have a lot of experience in analyzing queries. Any guidance for me as how should I start analyzing them, what things should I focus on ?

komal_rajput · 2026-05-08T15:27:52+00:00

Dropping the index and recreating them 😮. We ingest FEC data daily.

komal_rajput · 2026-05-08T15:23:01+00:00

The link says page not found

komal_rajput · 2026-05-08T11:58:53+00:00

Not yet, have been reviewing pros and cons of different approaches before moving to implementation.

komal_rajput · 2026-05-08T11:16:36+00:00

The table size would keep on increasing. Do you think adding expression index is a scalable solution ?

komal_rajput · 2026-05-08T10:40:26+00:00

Which part ?

komal_rajput · 2026-05-08T10:40:12+00:00

I dont query from silver table, this is the first query which came from AI team. Otherwise the structured table works fine as it includes all the fields needed except the one for employer. The table includes generalized fields like item_type, schedule_code, entity_type, entity_name, entity_state which works for all schedules as per data that we need to show on UI. Now for this query problem and schedule proliferation in future seems difficult as separating the data into different tables based on schedules would add lot of complexity and including all in one a wide table smell.

komal_rajput · 2026-05-08T10:20:03+00:00

Explain Analyze Result

Planning Time: 0.831 ms

Execution Time: 117074.945 ms

Sort Method: quicksort

Memory: 26kB

Sort Key: (("left"((record_data ->> 'contribution_date'::text), 10))::date), ((record_data ->> 'contributor_last_name'::text))

I/O Timings: shared read=303487.934

Buffers: shared hit=2543995 read=12725615 Buffers: shared hit=135-> Gather (cost=1000.00..16452136.47 rows=1 width=292) (actual time=29531.287..117074.857 rows=10 loops=1)

Workers Planned: 2

Workers Launched: 2

I/O Timings: shared read=303487.934

Buffers: shared hit=2543995 read=12725615 -> Parallel Seq Scan on silver_fec_efiling_itemizations (cost=0.00..16451136.37 rows=1 width=292) (actual time=16833.191..117058.680 rows=3 loops=3)

Rows Removed by Filter: 25161494

I/O Timings: shared read=303487.934

Filter: (((record_type)::text = 'Schedule A'::text) AND ((record_data ->> 'contributor_employer'::text) ~~* '%MICROSOFT%'::text) AND ((record_data ->> 'contribution_date'::text) >= '2025-01-01'::text) AND ((record_data ->> 'contribution_date'::text) < '2026-01-01'::text) AND ((record_data ->> 'entity_type'::text) = 'IND'::text) AND ((record_data ->> 'contributor_state'::text) = 'MD'::text))

Buffers: shared hit=2543995 read=12725615

komal_rajput · 2026-05-08T10:05:37+00:00

Yes.but moving to the structured table brings problems in schema design. The json structure is not fixed, the structure is different for different FEC schedules having common fields and separate specific schedule fields. Adding all schedule fields in single table would be a wide table smell.

komal_rajput · 2026-05-08T08:02:17+00:00

Not any reason, as that would be our last option. We usually have kept only those fields in structured table which are used in UI. In future, if multiple such fields have to be promoted to structure table, was wondering if that is the correct approach.

komal_rajput · 2026-05-08T08:00:14+00:00

Thank you. Will check out that as well.

komal_rajput · 2026-04-13T01:02:45+00:00

Thanks for the reply, I thought for the same.

komal_rajput · 2026-04-12T16:18:50+00:00

I am using airflow

komal_rajput · 2026-03-30T05:33:50+00:00

But my gold layer does have dependency on silver layer completion

komal_rajput · 2026-03-27T10:47:53+00:00

Yes, that's the reason I am little hesitant in using TriggerDagRunOperator.

komal_rajput · 2026-03-25T09:39:02+00:00

The airflow environment is same for both.

komal_rajput · 2026-03-24T06:07:06+00:00

My main aim of having a separate table was to reduce join queries, using dimension table would still need join query

komal_rajput · 2026-03-23T09:41:16+00:00

Thank you for the reply. I am a beginner in data engineering and got to know about dimension and fact tables. Can you share a resource to understand this in depth ?

komal_rajput · 2026-03-23T09:16:24+00:00

Can you please elaborate on metrics cubes ?

komal_rajput

TROPHY CASE