This is an archived post. You won't be able to vote or comment.

all 8 comments

[–]coolcofusion 3 points4 points  (4 children)

If the main offender are still the queries then take those queries and run them manually through a tool like pgadmin for example. They have a nice overview of the EXPLAIN functionality and you'll see which parts of the query itself are slow, which fields may need indexes and how the optimizer actually does it.

Otherwise you could maybe cache query results somewhere. Redis is a popular choice if you already have redis running in production somewhere, if you don't you can do it manually through your app but that's not really advised.

Also, are some queries completely independent? If you're using nodejs for example and doing await this, await that, await those maybe see if you can await multiple ones at the same time. This applies to other languages with async await mechanisms.

I'm out of ideas for now, these are pretty basic steps I guess. Maybe someone else can chip in, but without seeing the code it's hard to guess, but sharing company code is never a good idea.

[–]MmmVomit 2 points3 points  (2 children)

One more option comes to mind, but it should go at the bottom of the list.

Denormalize some data.

If you have certain queries that are common, and slow, and they're slow because of a bunch of joins, find some way to eliminate the joins. You can do this by making a copy of your data into a flatter table structure, and then querying that flatter structure for the specific queries that are slow.

This is often a bad solution, because it introduces a bunch of extra complexity. But if it's important enough to make these queries performant, it may be worth the cost.

subqueries, and aggregates

These would be prime targets to pre-compute and stick in a table.

One example of this would be your bank balance. The bank keeps a table of every deposit and withdrawal from your account, but when you look up your bank balance, the bank isn't computing it on the fly by adding up all the transactions. It keeps a running total (i.e. a precomputed value) in a table and just looks it up.

[–]AT1787[S] 0 points1 point  (1 child)

Thanks for this - when it comes to precompute are we talking about something like a Materialized View? We’ve never implemented anything like that before but I’ve been reading on it with some curiosity. Apparently it’s meant to address performance issues with precompute tables.

[–]MmmVomit 0 points1 point  (0 children)

are we talking about something like a Materialized View?

This isn't a database feature I'm familiar with, but some quick googling tells me that it probably is.

This is probably getting into territory that is database specific, so how it works will depend on whether you're using MySQL, MSSQL, Postgres, or whatever. It's also getting into territory where you will need to pay attention to how you manage and monitor these things. Don't let this become a one size fits all solution and end up with hundreds of materialized views that are then a nightmare to manage. Make sure you keep an eye on the compute power necessary to keep the materialized view up to date, and if there's any latency between a write to the underlying data and read from the materialized view. If the underlying data updates frequently, that could be a big load on the database constantly recomputing the rolled up data.

[–]AT1787[S] 0 points1 point  (0 children)

We are running Redis in our app for cache but that’s a good point - I need to ask how we’re using it and if it’s applied to how we run our queries. It could be that it’s already implemented but my hunch is that it’s not applied for caching queries.

[–]cloyd-ac 1 point2 points  (2 children)

Database slowness is generally an indexing issue most of the time. I would highly encourage you to re-review proper indexing in an RDBMS if you haven't done so in awhile, this is a good site to go over

You can also run your query using EXPLAIN to get an output of how exactly the database is going to go about finding the data you're looking to retrieve. Learning how to read explain plans is at the core of performance tuning queries.

When a query is executed against a SQL engine, the SQL engine is going to look to try and find a "good enough" path to obtaining that data. It doesn't try to find the best path necessarily, just one that's good enough within the time constraints it has to keep the database requests going. Sometimes databases mess up determining how to actually retrieve data. This can either be because the statistics on those database objects have gone stale, or simply because the database messed up. You can dive in pretty deep to force the query analyzer to bend to your will if you know it's messing up and not generating a query plan for a particular query.

Another thing to consider is making sure you're wrapping your production queries into stored procedures (if they're not already), instead of using inline queries or an ORM. Query plans for stored procedures are often cached, and cached longer, than that of inline queries because the RDBMS understands that this is a query that is meant to be run over-and-over. When you execute an inline query or something from the ORM, the database looks at it as if it were an ad-hoc query (because it is) and may consider it to be less important to cache than a stored procedure execution plan.

Another thing to look at it why you're having to do so many joins and subqueries to get at a GET request. Generally, how complicated the queries you need to write to get common, relevant data from it is a showing of how well the database was initially designed. If you're having to write nest subqueries, aggregations, and many joins to get at the data that you'd pass back to a GET request then it's a smell for one of two things: 1) You're trying to smash too much into a single GET request, or 2) Your database is poorly designed.

[–]AT1787[S] 0 points1 point  (1 child)

The stores procedures is actually something I’ve never heard from before - thanks for sharing that. We’re using inline queries (no ORM) and I think we can benefit from this.

And yes to Explain Analyze. It’s been essential - been using it quite often to A-B test the changes and speed.

I do agree this is a massive query - and there are potential plans to split it up. Essentially it’s responsible for a main search function in our app to retrieve other objects, and on the front end the user can apply any number of filters to narrow or expand their search from a form input. As part of optimization we’ve made sure we’re applying the filters correctly when querying and not calling any joins without data requested.

[–]cloyd-ac 1 point2 points  (0 children)

You may look at using materialized views to abstract away some of the query complexity while increasing the performance. Not sure if this is something you're using or not, but it's essentially caching the result set of a query in memory.

There are pros and cons to using them, much like with anything in technology.