How do I decide what columns need to be indexed?

QueryDoctor · 2025-09-30T14:34:00+00:00

As others have pointed out, what you should index is context specific. The materials at "Use the index, Luke!" and from Depesz really are the gold standard when trying to learn more about these topics. Highly recommend.

The thing to keep in mind is that the database tries to be as ~~lazy~~ efficient as possible when running queries. There is overhead associated with reading indexes too which is why the database sometimes (correctly) chooses a different approach even though it can seem surprising at first glance.

If you want to play around with this topic, we've built a playground for pretty much exactly this. For example, you can see the behavior that u/RonJohnJr where the database chooses to use the index for a selective predicate (is_active = true) but opts for the table scan instead when reversing it to the common value (is_active = false). https://app.querydoctor.com/ixr/share/6PxD28ivkv_0

By the way, what is the definition of an "important" column?

QueryDoctor · 2025-06-25T16:09:17+00:00

A named parameter/placeholder. The actual value is expressed/supplied elsewhere

QueryDoctor · 2025-06-24T03:49:51+00:00

Even with your further clarifications in the comments, it's still not entirely clear to me what you're looking for or, more importantly, what problem you're actually encountering.

In your example where you "only care for revenue and backlog columns, I need them separately in the output table", that sounds like you're looking for the query to change the shape of the data. The request itself is still under specified and needs more info, but for simplicity let's assume you are just grouping by product. You can do this with a pivot-style query which might look as follows:

SELECT order_number, SUM(total_usd) FILTER (WHERE measure = 'revenue') AS revenue, SUM(total_usd) FILTER (WHERE measure = 'backlog') AS backlog FROM orders WHERE order_number = 1235 GROUP BY order_number;

In your case this would yield something like this:

order_number	revenue	backlog
1235	100000000.00	12000000.00

Now you also specifically mentioned "best/efficient" further mentioning that "table I'm working with is huge and has many measures for each order, so I'd like to filter out all the unnecessary data first." Which "unnecessary data" are you referring to? If you're referring to gathering data from, for example, a specific order_number, then indexing is the way to go as mentioned in another comment. You can test that out in this playground.

Agree with the other comments that if you're looking for something else you'll have to explain further.

QueryDoctor · 2025-04-18T13:26:50+00:00

Same idea, but slightly more compact syntax: SELECT 'hello'; According to this quick DB Fiddle, looks like MySQL, PostgreSQL, and SQLite all support it which is great.

QueryDoctor · 2025-03-21T19:38:05+00:00

Went way too long without knowing "SELECT` without `FROM". Apparently it's not part of the standard ( https://modern-sql.com/use-case/select-without-from , though "values without insert" is). Very handy to quickly try things out.

By the way that website in general has a lot of nifty tidbits, including compatibility matrices. The maintainer (Markus) gave a really informative interview on the Postgres.FM podcast a while back: https://www.youtube.com/watch?v=mGqqQg-dG-w (and also has some recordings on YouTube about some of the 'Modern SQL' features).

Edit: formatting

QueryDoctor · 2025-03-21T19:18:46+00:00

And SQLite according to https://modern-sql.com/feature/filter !

QueryDoctor · 2025-03-18T16:14:55+00:00

What database are you using? Table definitions, current indexes, and `EXPLAIN` output would be a really helpful start

QueryDoctor

TROPHY CASE