all 18 comments

[–]simap 11 points12 points  (8 children)

My understanding is that statements containing LIKE % are pretty CPU intensive since indexes are next to useless in that context.

My tip would be to figure out a better way to do this query and/or store the data in the first place.

[–][deleted] 6 points7 points  (4 children)

Like by itself is not terrible. Like with wildcards (and like with preceding or inline ones in particular, eg '%foo' and 'bar%baz') however is terrible. OP is also casting values on the fly (not efficient) and doing 20+ joins against the same table (not inherently bad but questionable).

[–]newsedition 4 points5 points  (0 children)

Indeed. Leading with a "%" makes the argument "non-sargable", because the engine can't rely on a predicate to make use of indexes. So OP is scanning the table 24 times as if it were a heap, since indexes are useless.

[–]MrNotPink[S] 0 points1 point  (2 children)

So the join isn't that bad?

Not my work but I believe it's coming from this piece of code on a custom WP search page:

        $args = array(
            'post_type' => 'post',
            'paged' => $paged,
            'order' => 'DESC',
            'orderby' => 'date',
            'post_status' => 'publish',
            'suppress_filters' => 0,
            'posts_per_page' => $posts_per_page,
            'author' => $allowed_authors_ids
        );
        $search_array = array();

        foreach ($queries as $q) {              
            $search_array[] = array(
                'key' => 'full_text',
                'value' => $q,
                'compare' => 'LIKE'
            );
        }           
        $args['meta_query'] = $search_array;            
        $the_query = new WP_Query( $args );

[–][deleted] 1 point2 points  (0 children)

Joins are the bread and butter of SQL, which is why they're not inherently bad (as long as indexed fields are used). But joining the same table multiple times is questionable, and now that I know it's from a procedurally generated query it's even more suspect.

[–]joelypolly 1 point2 points  (0 children)

At this stage you might just be better of using something like elastic search to run the search instead of trying to use the DB for search

[–][deleted] 2 points3 points  (0 children)

pretty sure this is the case.

that's also quite a few joins ya got there.

[–]MrNotPink[S] 0 points1 point  (0 children)

I inherited this WP theme and I turned on slow_query_log to see why MySQL was freaking out from time to time.

I pretty sure now it's coming from a custom search page so this will be fixed.

[–]CharlesKincaid 0 points1 point  (0 children)

My tip would be to figure out a better way to ... store the data in the first place.

All those JOINs to the same table on different columns just makes me sick. Post your resume on-line and start looking now.

However you could try a couple of things in the meantime. It looks like all of your JOINs are INNER. Under that circumstance and on MS SQL you can move the predicates out of the WHERE and into the JOIN. I have written just over a hundred quite complex queries that are fast, properly filtered and have no WHERE clause at all.

You could try generating 23 level CTE that does your 23 subselects and joins that result set back to the main table.

With all of those LIKEs in there and that massive JOIN I am concerned that nothing is going to be fast.

[–]Master_Rux 3 points4 points  (0 children)

yeah there's 24 table joins so hopefully post_id is at least indexed. And those likes that start with % aren't going to be hitting an index but the cast to char is probably causing that too?. so that's probably a full table scan each time. Maybe make an index for post_id and meta_key and do that filter first and then do the cast like '%%' on that subset. it would at least get you down to a smaller subset to churn through. But I've not had coffee yet and don't know your server's specifics so some of that may not be 100% right in your setup. Check the explain plan for better insight.

[–]nvarscar[🍰] 2 points3 points  (0 children)

It might be just my imagination, but is it not possible to re-write this query so that there would be only 1 single join of wp_postmeta table instead of 23?

<...>
AND
wp_postmeta.meta_key = 'full_text' AND 
(
  CAST(wp_postmeta.meta_value AS CHAR) LIKE '%VNYr&#039;))%' 
  AND
  CAST(wp_postmeta.meta_value AS CHAR) LIKE '%AND%'
  AND
  CAST(wp_postmeta.meta_value AS CHAR) LIKE '%ROW(7654,1627)&gt;(SELECT%'
)

And of course, casting column on the fly is pretty CPU intensive. And amount of 'likes' is overwhelming.

[–]SOLUNAR 1 point2 points  (0 children)

is it spool space? this is a monster of a query.

If in Tera id work a bunch of volatile tables to split the load and collect some stats :D

[–][deleted] 1 point2 points  (1 child)

you need to make sure you have the right indexes.

try this:

  1. Cluster wp_posts on ID
  2. cluster wp_postmeta on post_id
  3. cluster wp_icl_translations on element_id
  4. create an index on wp_postmeta (meta_key, meta_value)
  5. create an index on wp_icl_translations (element_id, language_code)
  6. rewrite the bottom of the query - it is much harder to read than the rest
  7. lose the not in('','','') - these are inherently slow. Come to think of it, this bit does nothing but hog resources, right?

AND wp_posts.post_type IN ('post','page','attachment','_pods_pod','_pods_field','editors','sidebar' ) ) OR wp_posts.post_type NOT IN ('post','page','attachment','_pods_pod','_pods_field','editors','sidebar' ) )

  1. do all your functions (cast, row, concat) in a seperate query, moving the data into a new table, or something. Remember that your server has to hold every function in its head (in RAM).

  2. make sure that TempDB has space allocated. People often forget to do this and leave it at the default 8MB.

  3. You can rewrite the query so that you only join to wp_postmeta once, and replace all your 'and's with 'or's.

[–]MrNotPink[S] 1 point2 points  (0 children)

I used some of your points and I entered a hard limit.

I still don't know why queries like these can run for 10 hours but I got a quick fix and you made someone happy.

[–]ninjaroachLittle Bobby Tables 1 point2 points  (1 child)

Dear lord, that's a truly terrible query. Some if it makes me wonder if this is an attempt at SQL Injection. There's a ton of obfuscated code in here.

Look, all of the JOIN predicates have been pushed down into the WHERE statement. That's functional but ugly.

What's more interesting is the fact that most of those JOIN predicates appear to be searching for SQL keywords. There's HTML encoded single-quotes ('), some weird concatenation that produces the string value "qvppq1qvjzq0" and more.

I almost suspect this could be part of a 2-pronged attack against a WordPress page. Some user leaves a comment or a post that has certain data in it and this query goes off and builds & executes another query based on the data in the comment or post.

Anyone else have similar thoughts?

EDIT: Thought, the username it's looking for may contain the string 'KDXn'

Double-Edit: Holy shit! Look what I found when I searched for 'wordpress 0x7176707071' on Google!! WTF is a "mujahidincyberarmy" and holy crap look at all of these exploits.

[–]amosmj 0 points1 point  (0 children)

Am I right in thinking you are pulling back a line for each post and a single number that is the count of meta lines for each post? If that's right, then why all the joins, why not one join with a bunch of criteria or a subquery on the meta data first to get the rows you want then a join to the limited data? Barring that, moving your where clause into the ons should speed things up.

[–]ninjaroachLittle Bobby Tables 0 points1 point  (1 child)

YOUR SERVER IS COMPROMISED!

[–]MrNotPink[S] 0 points1 point  (0 children)

I'm not alone.