@@ -7,7 +7,7 @@ actual output plan, the /path code generates all possible ways to join the
77tables, and /prep handles special cases like inheritance. /util is utility
88stuff. /geqo is the separate "genetic optimization" planner --- it does
99a semi-random search through the join tree space, rather than exhaustively
10- considering all possible join trees. (But each join considered by geqo
10+ considering all possible join trees. (But each join considered by/ geqo
1111is given to /path to create paths for, so we consider all possible
1212implementation paths for each specific join even in GEQO mode.)
1313
@@ -40,7 +40,7 @@ the WHERE clause "tab1.col1 = tab2.col1" generates a JoinInfo for tab1
4040listing tab2 as an unjoined relation, and also one for tab2 showing tab1
4141as an unjoined relation.
4242
43- If we have only a single base relation in the query, we are donehere .
43+ If we have only a single base relation in the query, we are donenow .
4444Otherwise we have to figure out how to join the base relations into a
4545single join relation.
4646
@@ -225,5 +225,185 @@ way, the next level up will have the maximum freedom to build mergejoins
225225without sorting, since it can pick from any of the paths retained for its
226226inputs.
227227
228- See path/pathkeys.c for an explanation of the PathKeys data structure that
229- represents what is known about the sort order of a particular Path.
228+
229+ PathKeys
230+ --------
231+
232+ The PathKeys data structure represents what is known about the sort order
233+ of a particular Path.
234+
235+ Path.pathkeys is a List of Lists of PathKeyItem nodes that represent
236+ the sort order of the result generated by the Path. The n'th sublist
237+ represents the n'th sort key of the result.
238+
239+ In single/base relation RelOptInfo's, the Paths represent various ways
240+ of scanning the relation and the resulting ordering of the tuples.
241+ Sequential scan Paths have NIL pathkeys, indicating no known ordering.
242+ Index scans have Path.pathkeys that represent the chosen index's ordering,
243+ if any. A single-key index would create a pathkey with a single sublist,
244+ e.g. ( (tab1.indexkey1/sortop1) ). A multi-key index generates a sublist
245+ per key, e.g. ( (tab1.indexkey1/sortop1) (tab1.indexkey2/sortop2) ) which
246+ shows major sort by indexkey1 (ordering by sortop1) and minor sort by
247+ indexkey2 with sortop2.
248+
249+ Note that a multi-pass indexscan (OR clause scan) has NIL pathkeys since
250+ we can say nothing about the overall order of its result. Also, an
251+ indexscan on an unordered type of index generates NIL pathkeys. However,
252+ we can always create a pathkey by doing an explicit sort. The pathkeys
253+ for a Sort plan's output just represent the sort key fields and the
254+ ordering operators used.
255+
256+ Things get more interesting when we consider joins. Suppose we do a
257+ mergejoin between A and B using the mergeclause A.X = B.Y. The output
258+ of the mergejoin is sorted by X --- but it is also sorted by Y. We
259+ represent this fact by listing both keys in a single pathkey sublist:
260+ ( (A.X/xsortop B.Y/ysortop) ). This pathkey asserts that the major
261+ sort order of the Path can be taken to be *either* A.X or B.Y.
262+ They are equal, so they are both primary sort keys. By doing this,
263+ we allow future joins to use either var as a pre-sorted key, so upper
264+ Mergejoins may be able to avoid having to re-sort the Path. This is
265+ why pathkeys is a List of Lists.
266+
267+ We keep a sortop associated with each PathKeyItem because cross-data-type
268+ mergejoins are possible; for example int4 = int8 is mergejoinable.
269+ In this case we need to remember that the left var is ordered by int4lt
270+ while the right var is ordered by int8lt. So the different members of
271+ each sublist could have different sortops.
272+
273+ Note that while the order of the top list is meaningful (primary vs.
274+ secondary sort key), the order of each sublist is arbitrary. Each sublist
275+ should be regarded as a set of equivalent keys, with no significance
276+ to the list order.
277+
278+ With a little further thought, it becomes apparent that pathkeys for
279+ joins need not only come from mergejoins. For example, if we do a
280+ nestloop join between outer relation A and inner relation B, then any
281+ pathkeys relevant to A are still valid for the join result: we have
282+ not altered the order of the tuples from A. Even more interesting,
283+ if there was a mergeclause (more formally, an "equijoin clause") A.X=B.Y,
284+ and A.X was a pathkey for the outer relation A, then we can assert that
285+ B.Y is a pathkey for the join result; X was ordered before and still is,
286+ and the joined values of Y are equal to the joined values of X, so Y
287+ must now be ordered too. This is true even though we used neither an
288+ explicit sort nor a mergejoin on Y.
289+
290+ More generally, whenever we have an equijoin clause A.X = B.Y and a
291+ pathkey A.X, we can add B.Y to that pathkey if B is part of the joined
292+ relation the pathkey is for, *no matter how we formed the join*. It works
293+ as long as the clause has been applied at some point while forming the
294+ join relation. (In the current implementation, we always apply qual
295+ clauses as soon as possible, ie, as far down in the plan tree as possible.
296+ So we can always make this deduction. If we postponed filtering by qual
297+ clauses then we'd not be able to assume pathkey equivalence until after
298+ the equality check(s) had been applied.)
299+
300+ In short, then: when producing the pathkeys for a merge or nestloop join,
301+ we can keep all of the keys of the outer path, since the ordering of the
302+ outer path will be preserved in the result. Furthermore, we can add to
303+ each pathkey sublist any inner vars that are equijoined to any of the
304+ outer vars in the sublist; this works regardless of whether we are
305+ implementing the join using that equijoin clause as a mergeclause,
306+ or merely enforcing the clause after-the-fact as a qpqual filter.
307+
308+ Although Hashjoins also work only with equijoin operators, it is *not*
309+ safe to consider the output of a Hashjoin to be sorted in any particular
310+ order --- not even the outer path's order. This is true because the
311+ executor might have to split the join into multiple batches. Therefore
312+ a Hashjoin is always given NIL pathkeys. (Also, we need to use only
313+ mergejoinable operators when deducing which inner vars are now sorted,
314+ because a mergejoin operator tells us which left- and right-datatype
315+ sortops can be considered equivalent, whereas a hashjoin operator
316+ doesn't imply anything about sort order.)
317+
318+ Pathkeys are also useful to represent an ordering that we wish to achieve,
319+ since they are easily compared to the pathkeys of a potential candidate
320+ path. So, SortClause lists are turned into pathkeys lists for use inside
321+ the optimizer.
322+
323+ OK, now for how it *really* works:
324+
325+ We did implement pathkeys just as described above, and found that the
326+ planner spent a huge amount of time comparing pathkeys, because the
327+ representation of pathkeys as unordered lists made it expensive to decide
328+ whether two were equal or not. So, we've modified the representation
329+ as described next.
330+
331+ If we scan the WHERE clause for equijoin clauses (mergejoinable clauses)
332+ during planner startup, we can construct lists of equivalent pathkey items
333+ for the query. There could be more than two items per equivalence set;
334+ for example, WHERE A.X = B.Y AND B.Y = C.Z AND D.R = E.S creates the
335+ equivalence sets { A.X B.Y C.Z } and { D.R E.S } (plus associated sortops).
336+ Any pathkey item that belongs to an equivalence set implies that all the
337+ other items in its set apply to the relation too, or at least all the ones
338+ that are for fields present in the relation. (Some of the items in the
339+ set might be for as-yet-unjoined relations.) Furthermore, any multi-item
340+ pathkey sublist that appears at any stage of planning the query *must* be
341+ a subset of one or another of these equivalence sets; there's no way we'd
342+ have put two items in the same pathkey sublist unless they were equijoined
343+ in WHERE.
344+
345+ Now suppose that we allow a pathkey sublist to contain pathkey items for
346+ vars that are not yet part of the pathkey's relation. This introduces
347+ no logical difficulty, because such items can easily be seen to be
348+ irrelevant; we just mandate that they be ignored. But having allowed
349+ this, we can declare (by fiat) that any multiple-item pathkey sublist
350+ must be "equal()" to the appropriate equivalence set. In effect,
351+ whenever we make a pathkey sublist that mentions any var appearing in an
352+ equivalence set, we instantly add all the other vars equivalenced to it,
353+ whether they appear yet in the pathkey's relation or not. And we also
354+ mandate that the pathkey sublist appear in the same order as the
355+ equivalence set it comes from. (In practice, we simply return a pointer
356+ to the relevant equivalence set without building any new sublist at all.
357+ Each equivalence set becomes a "canonical pathkey" for all its members.)
358+ This makes comparing pathkeys very simple and fast, and saves a lot of
359+ work and memory space for pathkey construction as well.
360+
361+ Note that pathkey sublists having just one item still exist, and are
362+ not expected to be equal() to any equivalence set. This occurs when
363+ we describe a sort order that involves a var that's not mentioned in
364+ any equijoin clause of the WHERE. We could add singleton sets containing
365+ such vars to the query's list of equivalence sets, but there's little
366+ point in doing so.
367+
368+ By the way, it's OK and even useful for us to build equivalence sets
369+ that mention multiple vars from the same relation. For example, if
370+ we have WHERE A.X = A.Y and we are scanning A using an index on X,
371+ we can legitimately conclude that the path is sorted by Y as well;
372+ and this could be handy if Y is the variable used in other join clauses
373+ or ORDER BY. So, any WHERE clause with a mergejoinable operator can
374+ contribute to an equivalence set, even if it's not a join clause.
375+
376+ As sketched so far, equijoin operators allow us to conclude that
377+ A.X = B.Y and B.Y = C.Z together imply A.X = C.Z, even when different
378+ datatypes are involved. What is not immediately obvious is that to use
379+ the "canonical pathkey" representation, we *must* make this deduction.
380+ An example (from a real bug in Postgres 7.0) is a mergejoin for a query
381+ like
382+ SELECT * FROM t1, t2 WHERE t1.f2 = t2.f3 AND t1.f1 = t2.f3;
383+ The canonical-pathkey mechanism is able to deduce that t1.f1 = t1.f2
384+ (ie, both appear in the same canonical pathkey set). If we sort t1
385+ and then apply a mergejoin, we *must* filter the t1 tuples using the
386+ implied qualification f1 = f2, because otherwise the output of the sort
387+ will be ordered by f1 or f2 (whichever we sort on) but not both. The
388+ merge will then fail since (depending on which qual clause it applies
389+ first) it's expecting either ORDER BY f1,f2 or ORDER BY f2,f1, but the
390+ actual output of the sort has neither of these orderings. The best fix
391+ for this is to generate all the implied equality constraints for each
392+ equijoin set and add these clauses to the query's qualification list.
393+ In other words, we *explicitly* deduce f1 = f2 and add this to the WHERE
394+ clause. The constraint will be applied as a qpqual to the output of the
395+ scan on t1, resulting in sort output that is indeed ordered by both vars.
396+ This approach provides more information to the selectivity estimation
397+ code than it would otherwise have, and reduces the number of tuples
398+ processed in join stages, so it's a win to make these deductions even
399+ if we weren't forced to.
400+
401+ Yet another implication of all this is that mergejoinable operators
402+ must form closed equivalence sets. For example, if "int2 = int4"
403+ and "int4 = int8" are both marked mergejoinable, then there had better
404+ be a mergejoinable "int2 = int8" operator as well. Otherwise, when
405+ we're given WHERE int2var = int4var AND int4var = int8var, we'll fail
406+ while trying to create a representation of the implied clause
407+ int2var = int8var.
408+
409+ -- bjm & tgl