PDF (A4) - 40.5Mb
Man Pages (TGZ) - 259.5Kb
Man Pages (Zip) - 366.7Kb
Info (Gzip) - 4.1Mb
Info (Zip) - 4.1Mb
Certain optimizations are applicable to comparisons that use theIN (or=ANY) operator to test subquery results. This section discusses these optimizations, particularly with regard to the challenges thatNULL values present. The last part of the discussion suggests how you can help the optimizer.
Consider the following subquery comparison:
outer_expr IN (SELECTinner_expr FROM ... WHEREsubquery_where) MySQL evaluates queries“from outside to inside.” That is, it first obtains the value of the outer expressionouter_expr, and then runs the subquery and captures the rows that it produces.
A very useful optimization is to“inform” the subquery that the only rows of interest are those where the inner expressioninner_expr is equal toouter_expr. This is done by pushing down an appropriate equality into the subquery'sWHERE clause to make it more restrictive. The converted comparison looks like this:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where ANDouter_expr=inner_expr)After the conversion, MySQL can use the pushed-down equality to limit the number of rows it must examine to evaluate the subquery.
More generally, a comparison ofN values to a subquery that returnsN-value rows is subject to the same conversion. Ifoe_i andie_i represent corresponding outer and inner expression values, this subquery comparison:
(oe_1, ...,oe_N) IN (SELECTie_1, ...,ie_N FROM ... WHEREsubquery_where)Becomes:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where ANDoe_1 =ie_1 AND ... ANDoe_N =ie_N)For simplicity, the following discussion assumes a single pair of outer and inner expression values.
The“pushdown” strategy just described works if either of these conditions is true:
outer_exprandinner_exprcannot beNULL.You need not distinguish
NULLfromFALSEsubquery results. If the subquery is a part of anORorANDexpression in theWHEREclause, MySQL assumes that you do not care. Another instance where the optimizer notices thatNULLandFALSEsubquery results need not be distinguished is this construct:... WHEREouter_expr IN (subquery)In this case, the
WHEREclause rejects the row whetherIN (returnssubquery)NULLorFALSE.
Suppose thatouter_expr is known to be a non-NULL value but the subquery does not produce a row such thatouter_expr =inner_expr. Then evaluates as follows:outer_expr IN (SELECT ...)
In this situation, the approach of looking for rows with is no longer valid. It is necessary to look for such rows, but if none are found, also look for rows whereouter_expr =inner_exprinner_expr isNULL. Roughly speaking, the subquery can be converted to something like this:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where AND (outer_expr=inner_expr ORinner_expr IS NULL)) The need to evaluate the extraIS NULL condition is why MySQL has theref_or_null access method:
mysql> EXPLAIN SELECTouter_expr IN (SELECT t2.maybe_null_key FROM t2, t3 WHERE ...) FROM t1;*************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1...*************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: t2 type: ref_or_nullpossible_keys: maybe_null_key key: maybe_null_key key_len: 5 ref: func rows: 2 Extra: Using where; Using index... Theunique_subquery andindex_subquery subquery-specific access methods also have“orNULL” variants.
The additionalOR ... IS NULL condition makes query execution slightly more complicated (and some optimizations within the subquery become inapplicable), but generally this is tolerable.
The situation is much worse whenouter_expr can beNULL. According to the SQL interpretation ofNULL as“unknown value,”NULL IN (SELECT should evaluate to:inner_expr ...)
For proper evaluation, it is necessary to be able to check whether theSELECT has produced any rows at all, so cannot be pushed down into the subquery. This is a problem because many real world subqueries become very slow unless the equality can be pushed down.outer_expr =inner_expr
Essentially, there must be different ways to execute the subquery depending on the value ofouter_expr.
The optimizer chooses SQL compliance over speed, so it accounts for the possibility thatouter_expr might beNULL:
If
outer_exprisNULL, to evaluate the following expression, it is necessary to execute theSELECTto determine whether it produces any rows:NULL IN (SELECTinner_expr FROM ... WHEREsubquery_where)It is necessary to execute the original
SELECThere, without any pushed-down equalities of the kind mentioned previously.On the other hand, when
outer_expris notNULL, it is absolutely essential that this comparison:outer_expr IN (SELECTinner_expr FROM ... WHEREsubquery_where)Be converted to this expression that uses a pushed-down condition:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where ANDouter_expr=inner_expr)Without this conversion, subqueries are slow.
To solve the dilemma of whether or not to push down conditions into the subquery, the conditions are wrapped within“trigger” functions. Thus, an expression of the following form:
outer_expr IN (SELECTinner_expr FROM ... WHEREsubquery_where)Is converted into:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where AND trigcond(outer_expr=inner_expr))More generally, if the subquery comparison is based on several pairs of outer and inner expressions, the conversion takes this comparison:
(oe_1, ...,oe_N) IN (SELECTie_1, ...,ie_N FROM ... WHEREsubquery_where)And converts it to this expression:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where AND trigcond(oe_1=ie_1) AND ... AND trigcond(oe_N=ie_N) ) Eachtrigcond( is a special function that evaluates to the following values:X)
Xwhen the“linked” outer expressionoe_iis notNULLTRUEwhen the“linked” outer expressionoe_iisNULL
Trigger functions arenot triggers of the kind that you create withCREATE TRIGGER.
Equalities that are wrapped withintrigcond() functions are not first class predicates for the query optimizer. Most optimizations cannot deal with predicates that may be turned on and off at query execution time, so they assume anytrigcond( to be an unknown function and ignore it. Triggered equalities can be used by those optimizations:X)
Reference optimizations:
trigcond(can be used to constructX=Y[ORYIS NULL])ref,eq_ref, orref_or_nulltable accesses.Index lookup-based subquery execution engines:
trigcond(can be used to constructX=Y)unique_subqueryorindex_subqueryaccesses.Table-condition generator: If the subquery is a join of several tables, the triggered condition is checked as soon as possible.
When the optimizer uses a triggered condition to create some kind of index lookup-based access (as for the first two items of the preceding list), it must have a fallback strategy for the case when the condition is turned off. This fallback strategy is always the same: Do a full table scan. InEXPLAIN output, the fallback shows up asFull scan on NULL key in theExtra column:
mysql> EXPLAIN SELECT t1.col1, t1.col1 IN (SELECT t2.key1 FROM t2 WHERE t2.col2=t1.col2) FROM t1\G*************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1 ...*************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: t2 type: index_subquerypossible_keys: key1 key: key1 key_len: 5 ref: func rows: 2 Extra: Using where; Full scan on NULL key If you runEXPLAIN followed bySHOW WARNINGS, you can see the triggered condition:
*************************** 1. row *************************** Level: Note Code: 1003Message: select `test`.`t1`.`col1` AS `col1`, <in_optimizer>(`test`.`t1`.`col1`, <exists>(<index_lookup>(<cache>(`test`.`t1`.`col1`) in t2 on key1 checking NULL where (`test`.`t2`.`col2` = `test`.`t1`.`col2`) having trigcond(<is_not_null_test>(`test`.`t2`.`key1`))))) AS `t1.col1 IN (select t2.key1 from t2 where t2.col2=t1.col2)` from `test`.`t1` The use of triggered conditions has some performance implications. ANULL IN (SELECT ...) expression now may cause a full table scan (which is slow) when it previously did not. This is the price paid for correct results (the goal of the trigger-condition strategy is to improve compliance, not speed).
For multiple-table subqueries, execution ofNULL IN (SELECT ...) is particularly slow because the join optimizer does not optimize for the case where the outer expression isNULL. It assumes that subquery evaluations withNULL on the left side are very rare, even if there are statistics that indicate otherwise. On the other hand, if the outer expression might beNULL but never actually is, there is no performance penalty.
To help the query optimizer better execute your queries, use these suggestions:
Declare a column as
NOT NULLif it really is. This also helps other aspects of the optimizer by simplifying condition testing for the column.If you need not distinguish a
NULLfromFALSEsubquery result, you can easily avoid the slow execution path. Replace a comparison that looks like this:outer_expr [NOT] IN (SELECTinner_expr FROM ...)with this expression:
(outer_expr IS NOT NULL) AND (outer_expr [NOT] IN (SELECTinner_expr FROM ...))Then
NULL IN (SELECT ...)is never evaluated because MySQL stops evaluatingANDparts as soon as the expression result is clear.Another possible rewrite:
[NOT] EXISTS (SELECTinner_expr FROM ... WHEREinner_expr=outer_expr)
Thesubquery_materialization_cost_based flag of theoptimizer_switch system variable enables control over the choice between subquery materialization andIN-to-EXISTS subquery transformation. SeeSection 10.9.2, “Switchable Optimizations”.
PDF (A4) - 40.5Mb
Man Pages (TGZ) - 259.5Kb
Man Pages (Zip) - 366.7Kb
Info (Gzip) - 4.1Mb
Info (Zip) - 4.1Mb