
Business applications often need to represent and store information such as a company's organizational hierarchy, a country's administrative divisions, an inventory structure, or simply a set of folders and files. Despite their apparent differences, these examples share a common feature— they are all trees. Not the green, wood-made kind that provide pleasant shade on a sunny day, but, mathematically speaking, connected acyclic graphs. In computer science, the links between nodes are described in terms of parent-child relationships. Each node in the tree, except the root, has exactly one parent and zero, one, or many children.
The figure above shows an example of a tree. Vertex A is called the root of the tree. Nodes without subtrees — C, D, F, G, and H — are called leaves. Edges of the tree show parent-child relationships between nodes. A rooted tree is k-ary if each node has no more than k children. A binary tree is a well-known particular case of a k-ary tree where k = 2.
One of the common tasks performed on tree data is the retrieval of all child nodes at any nesting level, given a parent node. For example, calculating the total turnover of an account involves determining the turnovers of all its subaccounts, or compiling a list of goods belonging to a given goods group includes lists from all the nested subgroups.
In imperative programming languages, we usually use a graph traversal algorithm to find all subnodes. Although most modern database servers support SQL queries with recursive common table expressions, using the recursive part inside complex queries on large amounts of data can significantly degrade performance.
The reason is that SQL deals primarily with sets, not graphs, so one should implement dedicated data structures to represent nodes and edges as nested sets for efficient data selection.
In this article, we will consider four ways of storing tree-like structures inside a relational database and evaluate them in terms of query performance and the space needed for storing the underlying data.
Link to the parent
Perhaps the most intuitive and easiest way to represent tree-like data is to use the adjacency matrix in a relational table. Every record in the table corresponds to a node in the tree and holds the unique ID of the node (primary key) and the ID of its parent (for root nodes, the parent is null).
Below is a DDL query that creates thetest1
table:
CREATETABLEtest1(idINTEGERNOTNULL,parentINTEGER,nameVARCHAR(60)NOTNULL,PRIMARYKEY(id),FOREIGNKEY(parent)REFERENCEStest1(id)ONUPDATECASCADEONDELETECASCADE)
Pay attention to theON DELETE CASCADE
clause of theFOREIGN KEY
constraint declared on thePARENT
field. It guarantees that the entire subtree will be deleted at once if the parent node is deleted. To protect the subtree from accidental deletion, the rule should be changed fromCASCADE
toRESTRICT
.
Corresponding data for the tree shown at the beginning of the article looks like:
id | parent | name |
---|---|---|
1 | NULL | A |
2 | 1 | B |
3 | 1 | E |
4 | 1 | H |
5 | 2 | C |
6 | 2 | D |
7 | 3 | F |
8 | 3 | G |
Operations on tree data
Addition of a new tree node is done by theINSERT INTO
operator:
INSERTINTOtest1(id,parent)VALUES(<uniqueidofthenode>,<idoftheparentnodeorNULLforarootnode>)
To move the node from one parent to another, a value of thePARENT
field has to be changed:
UPDATEtest1SETparent=<idofthenewparent>WHEREid=<idofthenodebeingmoved>
Obviously, to delete the node, one should invokeDELETE FROM
operator providing ID of the node.
DELETEFROMtest1WHEREid=<idofthenode>
Selecting whole subtree of the given node
With such an organization of the data, a simpleSELECT
operator allows us to fetch only one level of nesting of node's descendants.
SELECTidFROMtest1WHEREparent=:P
So, what should we do if the task is to extract all subnodes, regardless of how deep they lie in the subtree?
An unconventional solution would be to write a query for every possible level of nesting and then merge their outputs using theUNION
operator. Fortunately, we can use either recursive stored procedures or recursive common table expressions to assist us.
The procedure below returns the IDs of all subnodes of a given node. The second parameter specifies whether the parent node will be included in the resulting set.
CREATEPROCEDURERecTest1(AnIDINTEGER,SelfINTEGER)RETURNS(IDINTEGER)ASBEGINIF(:Self<>0)THENBEGINID=:AnID;SUSPEND;ENDFORSELECTidFROMtest1WHEREparent=:AnIDINTO:IDDOFORSELECTidFROMRecTest1(:ID,1)INTO:IDDOSUSPEND;END;
We can use the procedure on its own orJOIN
its output with the tree data to select the required columns:
SELECTt.*FROMtest1tJOINRecTest1(:P,0)rONt.id=r.id
If a specific SQL server supports recursive common table expressions, we can write a query that selects all nested subnodes of the given node:
WITHRECURSIVEtreeAS(SELECTid,parentFROMtest1WHEREparent=<idofthegivennode>UNIONALLSELECTg.id,g.parentFROMtest1gJOINtreehONg.parent=h.id)SELECTgt.id,gt.parentFROMtreegt
Storing auxiliary links
In the previous example, we placed an id-parent link in the table with tree data. Since we only know about the relationship between a given node and its children, we need recursive queries to scan the tree in depth, which is not efficient when working with large amounts of data.
To overcome this limitation, an auxiliary table should be introduced. Such a table holds pairs of connected nodes along with the distance between them, where a distance of 1 indicates a parent-child relationship, and distances greater than 1 correspond to deeper levels of ancestry.
CREATETABLEtest2(idINTEGERNOTNULL,nameVARCHAR(60)NOTNULL,PRIMARYKEY(id));CREATETABLEtest2_link(id_fromINTEGERNOTNULL,id_toINTEGERNOTNULL,distanceINTEGERNOTNULL,PRIMARYKEY(id_from,id_to),FOREIGNKEY(id_from)REFERENCEStest2(id)ONUPDATECASCADEONDELETECASCADE,FOREIGNKEY(id_to)REFERENCEStest2(id)ONUPDATECASCADEONDELETECASCADE);
Here is data for the tree:
id_from | id_to | distance |
---|---|---|
B | A | 1 |
C | B | 1 |
D | B | 1 |
C | A | 2 |
D | A | 2 |
E | A | 1 |
F | E | 1 |
G | E | 1 |
F | A | 2 |
G | A | 2 |
H | A | 1 |
Let's get back to our business. Selection of the subtree now done much easier. No recursion required.
SELECTt.*FROMtest2tJOINtest2_linklONl.id_from=t.idWHEREl.id_to=:P
Usingdistance
field, we can arrange the resulting set by node's closeness to the given root node.
Manipulating Tree Data
While we have managed to avoid recursion when retrieving data, this has significantly complicated the processes of inserting and modifying records.
Adding a Node
To add a node, you need to perform two operations: insert into the tree nodes table and the links table.
For example, to add a child node with an identifier2
to a parent node with an identifier1
, execute the following commands:
INSERTINTOtest2(id)VALUES(2);INSERTINTOtest2_link(id_from,id_to,distance)VALUES(2,1,1);
Creating Triggers
To automate the maintenance of tree relationships, it's advisable to create two triggers.
The first trigger fires after inserting a record into thetest2
table and adds a self-referential link with a distance of0
intotest2_link
:
CREATETRIGGERtest2_aiFORtest2AFTERINSERTPOSITION32000ASBEGININSERTINTOtest2_link(id_from,id_to,distance)VALUES(NEW.id,NEW.id,0);END;
The second trigger fires after inserting a record into thetest2_link
table.
Its purpose is to create links between the new node and its descendants (if adding a subtree) and all ancestors of the parent node:
CREATETRIGGERtest2_link_aiFORtest2_linkAFTERINSERTPOSITION32000ASDECLAREVARIABLEAnIDINTEGER;DECLAREVARIABLEADistanceINTEGER;BEGINIF(NEW.distance=1)THENBEGININSERTINTOtest2_link(id_from,id_to,distance)SELECTdown.id_from,up.id_to,down.distance+up.distance+1FROMtest2_linkup,test2_linkdownWHEREup.id_from=NEW.id_toANDdown.id_to=NEW.id_fromANDdown.distance+up.distance>0;ENDEND;
Moving a Node
To move a node (or subtree) from one parent to another, follow these steps:
- Remove the link to the current parent, detaching the subtree:
DELETEFROMtest2_linkWHEREid_from=<IDofthenodeorsubtree>ANDid_to=<IDofthecurrentparent>ANDdistance=1;
- Add the subtree to the new parent:
INSERTINTOtest2_link(id_from,id_to,distance)VALUES(<IDofthenodeorsubtree>,<IDofthenewparent>,1);
The previously created insert trigger will handle the new relationships automatically.
Create a trigger that, upon deleting a link with a distance of1
, removes all links between the nodes of the subtree and the nodes along the path from the parent node to the tree root:
CREATETRIGGERtest2_link_bdFORtest2_linkBEFOREDELETEPOSITION32000ASDECLAREVARIABLEAnIDFromINTEGER;DECLAREVARIABLEAnIDToINTEGER;BEGINIF(OLD.distance=1)THENBEGINFORSELECTdown.id_from,up.id_toFROMtest2_linkdownJOINtest2_linkupONup.id_from=OLD.id_fromANDup.distance+down.distance>1ANDdown.id_to=OLD.id_fromINTO:AnIDFrom,:AnIDToDODELETEFROMtest2_linkWHEREid_from=:AnIDFromANDid_to=:AnIDTo;ENDEND;
Deleting a Node
Deleting a node doesn't require special handling, as foreign keys in thetest2_link
table are configured for cascading deletions.
When a node is deleted from thetest2
table, all related records intest2_link
are automatically removed:
DELETEFROMtest2WHEREid=<NodeID>;
Ensuring Data Integrity
To maintain logical data integrity and adherence to our model, create two additional triggers.
The first trigger prevents editing records in thetest2_link
table:
CREATEEXCEPTIONtree_e_incorrect_operation'Incorrect operation';CREATETRIGGERtest2_link_auFORtest2_linkBEFOREUPDATEPOSITION32000ASBEGINEXCEPTIONtree_e_incorrect_operation;END;
The second trigger deletes a node if a record with a distance of0
is removed from the links table:
CREATETRIGGERtest2_link_adFORtest2_linkAFTERDELETEPOSITION32000ASBEGINIF(OLD.distance=0)THENBEGINDELETEFROMtest2WHEREid=OLD.id_from;ENDEND;
These measures will help maintain data integrity and ensure the correct tree structure within the database.
Composite Path
It's not necessary to use a separate table to store the list of nodes leading to the root of a tree. Instead, you can concatenate the ordered list of node identifiers into a string, using a chosen delimiter, and store it directly within the node records table. For example:
CREATETABLEtest3(idINTEGERNOTNULL,parentINTEGER,pathVARCHAR(60)NOTNULL,PRIMARYKEY(id),FOREIGNKEY(parent)REFERENCEStest3(id)ONUPDATECASCADEONDELETECASCADE);
To optimize retrieval operations using theSTARTING WITH
operator, it's essential to create the following index to enhance performance:
CREATEINDEXtest3_x_pathONtest3(path);
If you choose a forward slash (/
) as the delimiter, thetest3
table for the tree depicted in the figure above would be populated as follows:
id | parent | path |
---|---|---|
A | NULL | / |
B | A | /A/ |
C | B | /A/B/ |
D | B | /A/B/ |
E | A | /A/ |
F | E | /A/E/ |
G | E | /A/E/ |
H | A | /A/ |
Retrieving Subtree Nodes
To retrieve all nodes in the subtree rooted at nodeP
, execute the following query:
SELECT*FROMtest3WHEREpathSTARTINGWITH(SELECTpathFROMtest3WHEREid=:P);
If you wish to exclude the root node itself from the results, modify the query accordingly:
SELECT*FROMtest3WHEREpathSTARTINGWITH(SELECTpathFROMtest3WHEREid=:P)ANDid<>:P;
Manipulating Tree Data
Similar to the simplest method of organizing hierarchical data discussed earlier, adding, moving, or deleting a node can be accomplished using standard SQL commands:INSERT
,UPDATE
, orDELETE
.
The automatic generation of the path string can be managed by implementing the following triggers:
CREATETRIGGERtest3_biFORtest3BEFOREINSERTPOSITION32000ASDECLAREVARIABLEPINTEGER;BEGINNEW.path='';P=NEW.parent;WHILE(NOT:PISNULL)DOBEGINNEW.path=CAST(:PASVARCHAR(60))||'/'||NEW.path;SELECTparentFROMtest3WHEREid=:PINTO:P;ENDNEW.path='/'||NEW.path;END;
CREATEEXCEPTIONtree_e_incorrect_operation2'Incorrect operation';CREATETRIGGERtest3_buFORtest3BEFOREUPDATEPOSITION32000ASDECLAREVARIABLEPINTEGER;DECLAREVARIABLETINTEGER;DECLAREVARIABLENPVARCHAR(60);DECLAREVARIABLEOPVARCHAR(60);BEGINIF((NEW.parent<>OLD.parent)OR(NEW.parentISNULLANDNOTOLD.parentISNULL)OR(NOTNEW.parentISNULLANDOLD.parentISNULL))THENBEGINNEW.path='';P=NEW.parent;WHILE(NOT:PISNULL)DOBEGINNEW.path=CAST(:PASVARCHAR(60))||'/'||NEW.path;SELECTparentFROMtest3WHEREid=:PINTO:P;ENDNEW.path='/'||NEW.path;/* Check for circular references */IF(POSITION(('/'||CAST(NEW.idASVARCHAR(60))||'/')INNEW.path)>0)THENEXCEPTIONtree_e_incorrect_operation2;NP=NEW.path||CAST(NEW.idASVARCHAR(60))||'/';OP=OLD.path||CAST(OLD.idASVARCHAR(60))||'/';T=LENGTH(OP)+1;UPDATEtest3SETpath=:NP||SUBSTRING(pathFROM:TFOR60)WHEREpathSTARTINGWITH:OP;ENDEND;
Storing the composite path from the tree root to the current node imposes a limitation on the maximum tree depth, approximately equal toL div (K + 1)
, whereL
is the length of the string field used to store the path, andK
is the average length of the string representation of the record key.
Go to thepart 2...
Top comments(0)
For further actions, you may consider blocking this person and/orreporting abuse
Read next

Creating a Synchronized Scrolling Two-Column Layout
Yujin -

Building Responsive Card Layouts with HTML and CSS Flexbox
Majeedat Abdulwahab -

Mock Smarter, Not Harder: Boost Your Frontend Efficiency
Mubarak -

🐹 Common Design Patterns In Golang Projects 🧩
Truong Phung -