Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Cover image for Trees in SQL

Business applications often need to represent and store information such as a company's organizational hierarchy, a country's administrative divisions, an inventory structure, or simply a set of folders and files. Despite their apparent differences, these examples share a common feature— they are all trees. Not the green, wood-made kind that provide pleasant shade on a sunny day, but, mathematically speaking, connected acyclic graphs. In computer science, the links between nodes are described in terms of parent-child relationships. Each node in the tree, except the root, has exactly one parent and zero, one, or many children.

Tree example

The figure above shows an example of a tree. Vertex A is called the root of the tree. Nodes without subtrees — C, D, F, G, and H — are called leaves. Edges of the tree show parent-child relationships between nodes. A rooted tree is k-ary if each node has no more than k children. A binary tree is a well-known particular case of a k-ary tree where k = 2.

One of the common tasks performed on tree data is the retrieval of all child nodes at any nesting level, given a parent node. For example, calculating the total turnover of an account involves determining the turnovers of all its subaccounts, or compiling a list of goods belonging to a given goods group includes lists from all the nested subgroups.

In imperative programming languages, we usually use a graph traversal algorithm to find all subnodes. Although most modern database servers support SQL queries with recursive common table expressions, using the recursive part inside complex queries on large amounts of data can significantly degrade performance.

The reason is that SQL deals primarily with sets, not graphs, so one should implement dedicated data structures to represent nodes and edges as nested sets for efficient data selection.

In this article, we will consider four ways of storing tree-like structures inside a relational database and evaluate them in terms of query performance and the space needed for storing the underlying data.

Link to the parent

Perhaps the most intuitive and easiest way to represent tree-like data is to use the adjacency matrix in a relational table. Every record in the table corresponds to a node in the tree and holds the unique ID of the node (primary key) and the ID of its parent (for root nodes, the parent is null).

Below is a DDL query that creates thetest1 table:

CREATETABLEtest1(idINTEGERNOTNULL,parentINTEGER,nameVARCHAR(60)NOTNULL,PRIMARYKEY(id),FOREIGNKEY(parent)REFERENCEStest1(id)ONUPDATECASCADEONDELETECASCADE)
Enter fullscreen modeExit fullscreen mode

Pay attention to theON DELETE CASCADE clause of theFOREIGN KEY constraint declared on thePARENT field. It guarantees that the entire subtree will be deleted at once if the parent node is deleted. To protect the subtree from accidental deletion, the rule should be changed fromCASCADE toRESTRICT.

Corresponding data for the tree shown at the beginning of the article looks like:

idparentname
1NULLA
21B
31E
41H
52C
62D
73F
83G

Operations on tree data

Addition of a new tree node is done by theINSERT INTO operator:

INSERTINTOtest1(id,parent)VALUES(<uniqueidofthenode>,<idoftheparentnodeorNULLforarootnode>)
Enter fullscreen modeExit fullscreen mode

To move the node from one parent to another, a value of thePARENT field has to be changed:

UPDATEtest1SETparent=<idofthenewparent>WHEREid=<idofthenodebeingmoved>
Enter fullscreen modeExit fullscreen mode

Obviously, to delete the node, one should invokeDELETE FROM operator providing ID of the node.

DELETEFROMtest1WHEREid=<idofthenode>
Enter fullscreen modeExit fullscreen mode

Selecting whole subtree of the given node

With such an organization of the data, a simpleSELECT operator allows us to fetch only one level of nesting of node's descendants.

SELECTidFROMtest1WHEREparent=:P
Enter fullscreen modeExit fullscreen mode

So, what should we do if the task is to extract all subnodes, regardless of how deep they lie in the subtree?

An unconventional solution would be to write a query for every possible level of nesting and then merge their outputs using theUNION operator. Fortunately, we can use either recursive stored procedures or recursive common table expressions to assist us.

The procedure below returns the IDs of all subnodes of a given node. The second parameter specifies whether the parent node will be included in the resulting set.

CREATEPROCEDURERecTest1(AnIDINTEGER,SelfINTEGER)RETURNS(IDINTEGER)ASBEGINIF(:Self<>0)THENBEGINID=:AnID;SUSPEND;ENDFORSELECTidFROMtest1WHEREparent=:AnIDINTO:IDDOFORSELECTidFROMRecTest1(:ID,1)INTO:IDDOSUSPEND;END;
Enter fullscreen modeExit fullscreen mode

We can use the procedure on its own orJOIN its output with the tree data to select the required columns:

SELECTt.*FROMtest1tJOINRecTest1(:P,0)rONt.id=r.id
Enter fullscreen modeExit fullscreen mode

If a specific SQL server supports recursive common table expressions, we can write a query that selects all nested subnodes of the given node:

WITHRECURSIVEtreeAS(SELECTid,parentFROMtest1WHEREparent=<idofthegivennode>UNIONALLSELECTg.id,g.parentFROMtest1gJOINtreehONg.parent=h.id)SELECTgt.id,gt.parentFROMtreegt
Enter fullscreen modeExit fullscreen mode

Storing auxiliary links

In the previous example, we placed an id-parent link in the table with tree data. Since we only know about the relationship between a given node and its children, we need recursive queries to scan the tree in depth, which is not efficient when working with large amounts of data.

To overcome this limitation, an auxiliary table should be introduced. Such a table holds pairs of connected nodes along with the distance between them, where a distance of 1 indicates a parent-child relationship, and distances greater than 1 correspond to deeper levels of ancestry.

Image description

CREATETABLEtest2(idINTEGERNOTNULL,nameVARCHAR(60)NOTNULL,PRIMARYKEY(id));CREATETABLEtest2_link(id_fromINTEGERNOTNULL,id_toINTEGERNOTNULL,distanceINTEGERNOTNULL,PRIMARYKEY(id_from,id_to),FOREIGNKEY(id_from)REFERENCEStest2(id)ONUPDATECASCADEONDELETECASCADE,FOREIGNKEY(id_to)REFERENCEStest2(id)ONUPDATECASCADEONDELETECASCADE);
Enter fullscreen modeExit fullscreen mode

Here is data for the tree:

id_fromid_todistance
BA1
CB1
DB1
CA2
DA2
EA1
FE1
GE1
FA2
GA2
HA1

Let's get back to our business. Selection of the subtree now done much easier. No recursion required.

SELECTt.*FROMtest2tJOINtest2_linklONl.id_from=t.idWHEREl.id_to=:P
Enter fullscreen modeExit fullscreen mode

Usingdistance field, we can arrange the resulting set by node's closeness to the given root node.

Manipulating Tree Data

While we have managed to avoid recursion when retrieving data, this has significantly complicated the processes of inserting and modifying records.

Adding a Node

To add a node, you need to perform two operations: insert into the tree nodes table and the links table.

For example, to add a child node with an identifier2 to a parent node with an identifier1, execute the following commands:

INSERTINTOtest2(id)VALUES(2);INSERTINTOtest2_link(id_from,id_to,distance)VALUES(2,1,1);
Enter fullscreen modeExit fullscreen mode

Creating Triggers

To automate the maintenance of tree relationships, it's advisable to create two triggers.

The first trigger fires after inserting a record into thetest2 table and adds a self-referential link with a distance of0 intotest2_link:

CREATETRIGGERtest2_aiFORtest2AFTERINSERTPOSITION32000ASBEGININSERTINTOtest2_link(id_from,id_to,distance)VALUES(NEW.id,NEW.id,0);END;
Enter fullscreen modeExit fullscreen mode

The second trigger fires after inserting a record into thetest2_link table.

Its purpose is to create links between the new node and its descendants (if adding a subtree) and all ancestors of the parent node:

CREATETRIGGERtest2_link_aiFORtest2_linkAFTERINSERTPOSITION32000ASDECLAREVARIABLEAnIDINTEGER;DECLAREVARIABLEADistanceINTEGER;BEGINIF(NEW.distance=1)THENBEGININSERTINTOtest2_link(id_from,id_to,distance)SELECTdown.id_from,up.id_to,down.distance+up.distance+1FROMtest2_linkup,test2_linkdownWHEREup.id_from=NEW.id_toANDdown.id_to=NEW.id_fromANDdown.distance+up.distance>0;ENDEND;
Enter fullscreen modeExit fullscreen mode

Moving a Node

To move a node (or subtree) from one parent to another, follow these steps:

  1. Remove the link to the current parent, detaching the subtree:
DELETEFROMtest2_linkWHEREid_from=<IDofthenodeorsubtree>ANDid_to=<IDofthecurrentparent>ANDdistance=1;
Enter fullscreen modeExit fullscreen mode
  1. Add the subtree to the new parent:
INSERTINTOtest2_link(id_from,id_to,distance)VALUES(<IDofthenodeorsubtree>,<IDofthenewparent>,1);
Enter fullscreen modeExit fullscreen mode

The previously created insert trigger will handle the new relationships automatically.

Create a trigger that, upon deleting a link with a distance of1, removes all links between the nodes of the subtree and the nodes along the path from the parent node to the tree root:

CREATETRIGGERtest2_link_bdFORtest2_linkBEFOREDELETEPOSITION32000ASDECLAREVARIABLEAnIDFromINTEGER;DECLAREVARIABLEAnIDToINTEGER;BEGINIF(OLD.distance=1)THENBEGINFORSELECTdown.id_from,up.id_toFROMtest2_linkdownJOINtest2_linkupONup.id_from=OLD.id_fromANDup.distance+down.distance>1ANDdown.id_to=OLD.id_fromINTO:AnIDFrom,:AnIDToDODELETEFROMtest2_linkWHEREid_from=:AnIDFromANDid_to=:AnIDTo;ENDEND;
Enter fullscreen modeExit fullscreen mode

Deleting a Node

Deleting a node doesn't require special handling, as foreign keys in thetest2_link table are configured for cascading deletions.

When a node is deleted from thetest2 table, all related records intest2_link are automatically removed:

DELETEFROMtest2WHEREid=<NodeID>;
Enter fullscreen modeExit fullscreen mode

Ensuring Data Integrity

To maintain logical data integrity and adherence to our model, create two additional triggers.

The first trigger prevents editing records in thetest2_link table:

CREATEEXCEPTIONtree_e_incorrect_operation'Incorrect operation';CREATETRIGGERtest2_link_auFORtest2_linkBEFOREUPDATEPOSITION32000ASBEGINEXCEPTIONtree_e_incorrect_operation;END;
Enter fullscreen modeExit fullscreen mode

The second trigger deletes a node if a record with a distance of0 is removed from the links table:

CREATETRIGGERtest2_link_adFORtest2_linkAFTERDELETEPOSITION32000ASBEGINIF(OLD.distance=0)THENBEGINDELETEFROMtest2WHEREid=OLD.id_from;ENDEND;
Enter fullscreen modeExit fullscreen mode

These measures will help maintain data integrity and ensure the correct tree structure within the database.

Composite Path

It's not necessary to use a separate table to store the list of nodes leading to the root of a tree. Instead, you can concatenate the ordered list of node identifiers into a string, using a chosen delimiter, and store it directly within the node records table. For example:

CREATETABLEtest3(idINTEGERNOTNULL,parentINTEGER,pathVARCHAR(60)NOTNULL,PRIMARYKEY(id),FOREIGNKEY(parent)REFERENCEStest3(id)ONUPDATECASCADEONDELETECASCADE);
Enter fullscreen modeExit fullscreen mode

To optimize retrieval operations using theSTARTING WITH operator, it's essential to create the following index to enhance performance:

CREATEINDEXtest3_x_pathONtest3(path);
Enter fullscreen modeExit fullscreen mode

Image description

If you choose a forward slash (/) as the delimiter, thetest3 table for the tree depicted in the figure above would be populated as follows:

idparentpath
ANULL/
BA/A/
CB/A/B/
DB/A/B/
EA/A/
FE/A/E/
GE/A/E/
HA/A/

Retrieving Subtree Nodes

To retrieve all nodes in the subtree rooted at nodeP, execute the following query:

SELECT*FROMtest3WHEREpathSTARTINGWITH(SELECTpathFROMtest3WHEREid=:P);
Enter fullscreen modeExit fullscreen mode

If you wish to exclude the root node itself from the results, modify the query accordingly:

SELECT*FROMtest3WHEREpathSTARTINGWITH(SELECTpathFROMtest3WHEREid=:P)ANDid<>:P;
Enter fullscreen modeExit fullscreen mode

Manipulating Tree Data

Similar to the simplest method of organizing hierarchical data discussed earlier, adding, moving, or deleting a node can be accomplished using standard SQL commands:INSERT,UPDATE, orDELETE.

The automatic generation of the path string can be managed by implementing the following triggers:

CREATETRIGGERtest3_biFORtest3BEFOREINSERTPOSITION32000ASDECLAREVARIABLEPINTEGER;BEGINNEW.path='';P=NEW.parent;WHILE(NOT:PISNULL)DOBEGINNEW.path=CAST(:PASVARCHAR(60))||'/'||NEW.path;SELECTparentFROMtest3WHEREid=:PINTO:P;ENDNEW.path='/'||NEW.path;END;
Enter fullscreen modeExit fullscreen mode
CREATEEXCEPTIONtree_e_incorrect_operation2'Incorrect operation';CREATETRIGGERtest3_buFORtest3BEFOREUPDATEPOSITION32000ASDECLAREVARIABLEPINTEGER;DECLAREVARIABLETINTEGER;DECLAREVARIABLENPVARCHAR(60);DECLAREVARIABLEOPVARCHAR(60);BEGINIF((NEW.parent<>OLD.parent)OR(NEW.parentISNULLANDNOTOLD.parentISNULL)OR(NOTNEW.parentISNULLANDOLD.parentISNULL))THENBEGINNEW.path='';P=NEW.parent;WHILE(NOT:PISNULL)DOBEGINNEW.path=CAST(:PASVARCHAR(60))||'/'||NEW.path;SELECTparentFROMtest3WHEREid=:PINTO:P;ENDNEW.path='/'||NEW.path;/* Check for circular references */IF(POSITION(('/'||CAST(NEW.idASVARCHAR(60))||'/')INNEW.path)>0)THENEXCEPTIONtree_e_incorrect_operation2;NP=NEW.path||CAST(NEW.idASVARCHAR(60))||'/';OP=OLD.path||CAST(OLD.idASVARCHAR(60))||'/';T=LENGTH(OP)+1;UPDATEtest3SETpath=:NP||SUBSTRING(pathFROM:TFOR60)WHEREpathSTARTINGWITH:OP;ENDEND;
Enter fullscreen modeExit fullscreen mode

Storing the composite path from the tree root to the current node imposes a limitation on the maximum tree depth, approximately equal toL div (K + 1), whereL is the length of the string field used to store the path, andK is the average length of the string representation of the record key.

Go to thepart 2...

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

Read next

yujin profile image

Creating a Synchronized Scrolling Two-Column Layout

Yujin -

majeedatwahab profile image

Building Responsive Card Layouts with HTML and CSS Flexbox

Majeedat Abdulwahab -

mayumi_okamoto_6ce103fea7 profile image

Mock Smarter, Not Harder: Boost Your Frontend Efficiency

Mubarak -

truongpx396 profile image

🐹 Common Design Patterns In Golang Projects 🧩

Truong Phung -

Solution and software architect. Hardcore developer. Mentor. Project manager. CEO and co-founder.
  • Location
    Poland
  • Education
    Physics
  • Work
    GS
  • Joined

More fromAndrej Kirejeŭ

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp