24

Oracle SQL can do hierarchical queries since v2, using their proprietary CONNECT BY syntax. In their latest 11g release 2, they added recursive subquery factoring, also known as the recursive with clause. This is the ANSI standard, and if I understand correctly, this one has been implemented by other RDBMS vendors as well.

When comparing the connect-by with the recursive with, I noticed a difference in the result set when using cycle detection. The connect by results are more intuitive to me, so I'm wondering if Oracle's implementation contains a bug, or if this is standard ANSI and expected behaviour. Therefore my question is if you can check the recursive with query using other databases like MySQL, DB2, SQL Server and others. Provided those databases support the recursive with clause of course.

Here is how it works on Oracle 11.2.0.1.0

SQL> select *  2    from t  3  /        ID  PARENT_ID---------- ----------         1          2         2          12 rows selected.

The query using CONNECT BY syntax:

SQL>  select id  2        , parent_id  3        , connect_by_iscycle  4     from t  5  connect by nocycle parent_id = prior id  6    start with id = 1  7  /        ID  PARENT_ID CONNECT_BY_ISCYCLE---------- ---------- ------------------         1          2                  0         2          1                  12 rows selected.

Which looks intuitive to me. However, using the new ANSI syntax it returns one more row:

SQL> with tr (id,parent_id) as  2  ( select id  3         , parent_id  4      from t  5     where id = 1  6     union all  7    select t.id  8         , t.parent_id  9      from t 10           join tr on t.parent_id = tr.id 11  ) cycle id set is_cycle to '1' default '0' 12  select id 13       , parent_id 14       , is_cycle 15    from tr 16  /        ID  PARENT_ID I---------- ---------- -         1          2 0         2          1 0         1          2 13 rows selected.

This is the script you can use to check:

create table t( id        number, parent_id number);insert into t values (1, 2);insert into t values (2, 1);commit;with tr (id,parent_id) as( select id       , parent_id    from t   where id = 1   union all  select t.id       , t.parent_id    from t         join tr on t.parent_id = tr.id) cycle id set is_cycle to '1' default '0'select id     , parent_id     , is_cycle  from tr;
askedNov 13, 2009 at 20:59
Rob van Wijk's user avatar
1
  • The only DBs I'm aware of that support theWITH clause are Oracle 9i+, SQL Server 2005+, and DB2 (dunno version). MySQL definitely does not support theWITH clause - the request has been in since 2006 :/CommentedNov 13, 2009 at 23:08

7 Answers7

14

From documentation onCONNECT_BY_ISCYCLE:

TheCONNECT_BY_ISCYCLE pseudocolumn returns1 if the current row has a child which is also its ancestor

and that onCYCLE:

A row is considered to form a cycle if one of its ancestor rows has the same values for the cycle columns.

In your example, row2 does have a child which is also its ancestor, but itsid has not been returned yet.

In other words,CONNECT_BY_ISCYCLE checks thechildren (which are yet to be returned), whileCYCLE checks thecurrent row (which is already returned).

CONNECT BY is row based, while recursiveCTE's are set-based.

Note that Oracle's documentation onCYCLE mentions an "ancestor row". However, generally speaking, there is no concept of an "ancestor row" in a recursiveCTE. It's a set based operation which can yield results completely out of the tree. Generally speaking, the anchor part and the recursive part can even use the different tables.

Since recursiveCTE's areusually used to build hierarchy trees,Oracle decided to add a cycle check. But due the set-based way the recursiveCTE's operate, it's generally impossible to tell will the next step generate a cycle or not, because without a clear definition of the "ancestor row" cycle condition cannot be defined either.

To perform the "next" step, the whole "current" set needs to be available, but to generate each row of the current set (which includes the cycle column) we just need to have the results of the "next" operation.

It's not a problem if the current set always consists of a single row (like inCONNECT BY), but it is a problem if the recursive operation defined on a set as a whole.

Didn't look intoOracle 11 yet, butSQL Server implements recursiveCTE's by just hiding aCONNECT BY behind them, which requires placing numerous restrictions (all of which effectively forbid all set-based operations).

PostgreSQL's implementation, on the other hand, is truly set-based: you can do any operation with the anchor part in the recursive part. It does not have any means to detect cycles, though, because cycles are not defined in the first place.

As was mentioned before,MySQL does not implementCTE's at all (it does not implementHASH JOIN's orMERGE JOINs as well, only the nested loops, so don't be surprised much).

Ironically, I received a letter today on this very subject, which I will cover in my blog.

Update:

RecursiveCTE's inSQL Server are no more thanCONNECT BY in disguise. See this article in my blog for shocking details:

answeredNov 18, 2009 at 18:02
Quassnoi's user avatar
Sign up to request clarification or add additional context in comments.

10 Comments

Good explanation, Quassnoi. Thanks. So the recursive with clause works as expected. It's just different than the connect-by. It's nice to be aware of these little nuances between the two.
This explanation is nice, but I believe it is incorrect. Specifically, what is not clear is the definition of "ancestor". If the definition was set-based as you suggest, then we should detect a cycle with the following data and recursive query. Yet if you look at the output,CYCLE is 0 in all rows. If your explanation was correct, one of the (1, 3) rows should haveCYCLE = 1 - indeed, the values in both columns are the same, and one row is generated in the anchor member while the other is from the recursive member. One is the ancestor of the other, if the definition wasset based.
with t(parent, child) as ( select 1, 2 from dual union all select 1, 3 from dual union all select 2, 3 from dual ), r (parent, child) as ( select parent, child from t where parent = 1 union all select r.parent, t.child from t join r on t.parent = r.child ) cycle parent, child set cycle to '1' default '0' select * from r;
@mathguy: I explicitly mentioned that yes, it's not clear, and no, I "didn't look into Oracle 11 yet" in the post. Oracle requires thatthe recursive member must follow the anchor member and must referencequery_name exactly once. I don't have an installation handy, but I believe the other limitations, which effectively make the recursive CTE row-based, do apply as well. If you define the cycle column explicitly like you do in your example, Oracle uses it; if you don't, it uses the subset of the anchor part columns used in the join to the recursive part.
@mathguy: PostgreSQL, as I said, does not have a built-in cycle detector, does not limit the recursive CTE complexity in any way and does not have any objections against running infinite recursions
|
6

PostgreSQL supports WITH-style hierarchical queries, but doesn't have any automatic cycle detection. This means that you need to write your own and the number of rows returned depends on the way you specify join conditions in the recursive part of the query.

Both examples use an array if IDs (called all_ids) to detect loops:

WITH recursive tr (id, parent_id, all_ids, cycle) AS (    SELECT id, parent_id, ARRAY[id], false    FROM t    WHERE id = 1    UNION ALL    SELECT t.id, t.parent_id, all_ids || t.id, t.id = ANY(all_ids)    FROM t    JOIN tr ON t.parent_id = tr.id AND NOT cycle)SELECT id, parent_id, cycleFROM tr; id | parent_id | cycle----+-----------+-------  1 |         2 | f  2 |         1 | f  1 |         2 | tWITH recursive tr (id, parent_id, all_ids, cycle) AS (    SELECT id, parent_id, ARRAY[id], false    FROM t    WHERE id = 1    UNION ALL    SELECT t.id, t.parent_id, all_ids || t.id, (EXISTS(SELECT 1 FROM t AS x WHERE x.id = t.parent_id))    FROM t    JOIN tr ON t.parent_id = tr.id    WHERE NOT t.id = ANY(all_ids))SELECT id, parent_id, cycleFROM tr; id | parent_id | cycle----+-----------+-------  1 |         2 | f  2 |         1 | t
answeredNov 14, 2009 at 1:06
Aleksander Kmetec's user avatar

2 Comments

Very interesting queries. Oracle doesn't support that array syntax, but it's nice to see that the results are the same as in Oracle's ANSI syntax. It leads me to think that the ANSI query maybe doesn't contain a bug after all. Thanks, Aleksander.
Late to the party, but nowadays PostgreSQL also allows you to useUNION instead ofUNION ALL to remove cycles automatically.
3

AFAIK:

  • MySQL doesn't support recursive CTE's
  • SQL Sever does not support cycledetection in recursive CTE's
answeredNov 13, 2009 at 23:03
Andomar's user avatar

3 Comments

MySQL doesn't support CTEs at all.
That's clear, thanks. So MySQL and SQL Server cannot be used to test this script.
Indeed, MS SQL Server has a maximum recursion limit, something like 100 by default.
3

Results of the connect by may not always be intuitive.

Below queries demonstrate different approaches to detect cycles starting withid = 3 for the graph on the picture.

create table graph (id, id_parent) as(select 2, 1 from dualunion all select 3, 1 from dualunion all select 4, 3 from dualunion all select 5, 4 from dualunion all select 3, 5 from dual)

enter image description here

SQL> select level lvl, graph.*, connect_by_iscycle cycle  2    from graph  3   start with id = 3  4  connect by nocycle prior id = id_parent;       LVL         ID  ID_PARENT      CYCLE---------- ---------- ---------- ----------         1          3          1          0         2          4          3          0         3          5          4          1         1          3          5          0         2          4          3          0         3          5          4          16 rows selected.SQL> select level lvl, graph.*, connect_by_iscycle cycle  2    from graph  3   start with id = 3  4  connect by nocycle prior id = id_parent  5         and prior id_parent is not null;       LVL         ID  ID_PARENT      CYCLE---------- ---------- ---------- ----------         1          3          1          0         2          4          3          0         3          5          4          0         4          3          5          1         1          3          5          0         2          4          3          0         3          5          4          17 rows selected.SQL> with t(id, id_parent) as  2   (select *  3      from graph  4     where id = 3  5    union all  6    select g.id, g.id_parent  7      from t  8      join graph g  9        on t.id = g.id_parent) 10  search depth first by id set ord 11  cycle id set cycle to 1 default 0 12  select * from t;        ID  ID_PARENT        ORD C---------- ---------- ---------- -         3          1          1 0         4          3          2 0         5          4          3 0         3          5          4 1         3          5          5 0         4          3          6 0         5          4          7 0         3          5          8 18 rows selected.

Node withid = 3 has two parents so Oracle traverses two cycles in this example.

(1, 3) -> (3, 4) -> (4, 5) -> (5, 3)

and

(5, 3) -> (3, 4) -> (4, 5)

Edge (5, 3) is missing from the result of the first query and first cycle.At the same time edge (5, 3) appears in the result for the third query and second cycle twice.

Why so? You can check description of the cycle detection logic in the answer provided by Quassnoi. In plain English it means that

(1) connect by detects a cycle ifchild ID for current row is part of IDs visited so far

(2) rec with detects a cycle ifID for current row is part of IDs visited so far

Result of the second query looks the most natural although there is additional predicateand prior id_parent is not null. In this case

(3) it detects a cycle ifID for current row is part ofparent IDs visited so far

All this conditions are implemented in columns cnt1, cnt2, cnt3 in below query.

SQL> with t(id, id_parent, path_id, path_id_parent, cnt1, cnt2, cnt3) as  2   (select g.*,  3           cast('->' || g.id as varchar2(4000)),  4           cast('->' || g.id_parent as varchar2(4000)),  5           0,  6           0,  7           0  8      from graph g  9     where id = 3 10    union all 11    select g.id, 12           g.id_parent, 13           t.path_id || '->' || g.id, 14           t.path_id_parent || '->' || g.id_parent, 15           regexp_count(t.path_id || '->', '->' || 16            (select id from graph c where c.id_parent = g.id) || '->'), 17           regexp_count(t.path_id || '->', '->' || g.id || '->'), 18           regexp_count(t.path_id_parent || '->', '->' || g.id || '->') 19      from t 20      join graph g 21        on t.id = g.id_parent 22    -- and t.cnt1 = 0 23    -- and t.cnt2 = 0 24    -- and t.cnt3 = 0 25    ) 26  search depth first by id set ord 27  cycle id set cycle to 1 default 0 28  select * from t;        ID  ID_PARENT PATH_ID         PATH_ID_PARENT  CNT1 CNT2 CNT3        ORD C---------- ---------- --------------- --------------- ---- ---- ---- ---------- -         3          1 ->3             ->1                0    0    0          1 0         4          3 ->3->4          ->1->3             0    0    0          2 0         5          4 ->3->4->5       ->1->3->4          1    0    0          3 0         3          5 ->3->4->5->3    ->1->3->4->5       1    1    1          4 1         3          5 ->3             ->5                0    0    0          5 0         4          3 ->3->4          ->5->3             0    0    0          6 0         5          4 ->3->4->5       ->5->3->4          1    0    1          7 0         3          5 ->3->4->5->3    ->5->3->4->5       1    1    1          8 18 rows selected.

If you uncomment filter by cnt1/cnt2/cnt3 and remove "cycle id set cycle to 1 default 0" then query will return result as corresponding query above. In other wordsyou can avoidcycle clause and implement whatever cycle detection logic you find more intuitive.

Additional details about traversing hierarchies and cycle detection can be found in the bookOracle SQL Revealed.

answeredNov 20, 2018 at 22:25
Dr Y Wit's user avatar

Comments

1

MySQL Server version 5.0.45 didn't likewith:

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'with tr (id, parent_id) as (select id, parent_id from t where id = 1 union all s' at line 1.

Amarnath Balasubramanian's user avatar
Amarnath Balasubramanian
9,5028 gold badges38 silver badges62 bronze badges
answeredNov 13, 2009 at 21:08
wallyk's user avatar

2 Comments

Thanks for trying, wallyk. But does MySQL support something similar and this just isn't the right syntax, or doesn't MySQL doesn't support recursive queries at all?
I don't think it does. It was only a few years ago that it did not support stored procedures, so it has evolved rather quickly. I wouldn't be surprised if such things are in the latest version, or a upcoming version.
0
WITH RECURSIVE s (master, slave, all_ids, cycle) AS(    SELECT master, slave, ARRAY[master], false FROM binding WHERE master=3477    UNION ALL    SELECT d.master, d.slave, all_ids || d.master, d.slave = ANY(all_ids)    FROM        binding AS d    JOIN        s    ON (d.master = s.slave)    WHERE NOT d.master = ANY(all_ids))SELECT *FROM s;

I think better is this conditiond.slave = ANY(all_ids)

Fractaliste's user avatar
Fractaliste
5,99511 gold badges47 silver badges88 bronze badges
answeredJan 26, 2014 at 14:50
user3237729's user avatar

Comments

0

"Therefore my question is if you can check the recursive with query using other databases like MySQL, DB2, SQL Server and others"

MariaDB 10.5.2 and newer support cycle detection:

WITH

The CYCLE clause enables CTE cycle detection, avoiding excessive or infinite loops, MariaDB supports a relaxed, non-standard grammar.

WITH RECURSIVE ... ( ...)CYCLE <cycle column list> RESTRICT

Example:

CREATE TABLE t(id INT, parent_id INT);INSERT INTO t(id, parent_id) VALUES (1, NULL),(2,1),(3,2),(1,3);WITH RECURSIVE cte AS (  SELECT id, parent_id, 0 lvl   FROM t WHERE parent_id IS NULL  UNION ALL  SELECT t.id, t.parent_id, lvl + 1 AS lvl  FROM cte c1  JOIN t ON c1.id = t.parent_id)CYCLE id, parent_id RESTRICT SELECT * FROM cte ORDER BY lvl;

db<>fiddle demo

answeredJun 28, 2020 at 12:20
Lukasz Szozda's user avatar

Comments

Your Answer

Sign up orlog in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to ourterms of service and acknowledge you have read ourprivacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.