Anatomy of the DOM
The DOM represents an XML or HTML document as a tree. This page introduces the basic structure of the DOM tree and the various properties and methods used to navigate it.
To begin with, we need to introduce some concepts related to trees. A tree is a data structure made up ofnodes. Each node holds somedata. The nodes are organized in a hierarchical way—every node has a singleparent node (except for the root node, which has no parent), and an ordered list of zero or morechild nodes. Now we can define the following:
- A node with no parent is called theroot of the tree.
- A node with no children is called aleaf.
- Nodes that share the same parent are calledsiblings. Siblings belong to the same child node list of their parent, so they have a well-defined order.
- If we can go from node A to node B by repeatedly following parent links, then A is adescendant of B, and B is anancestor of A.
- Nodes in a tree are listed intree order by first listing the node itself, then recursively listing each of its child nodes in order (preorder, depth-first traversal).
And here are a few important properties of trees:
- Every node is associated with a unique root node.
- If node A is the parent of node B, then node B is a child of node A.
- Cycles are not allowed: no node can be an ancestor or descendant of itself.
In this article
The Node interface and its subclasses
All nodes in the DOM are represented by objects that implement theNode interface. TheNode interface embodies many of the previously defined concepts:
- The
parentNodeproperty returns the parent node, ornullif the node has no parent. - The
childNodesproperty returns aNodeListof the child nodes. ThefirstChildandlastChildproperties return the first and last elements of this list, respectively, ornullif there are no children. - The
getRootNode()method returns the root of the tree that contains the node, by repeatedly following parent links. - The
hasChildNodes()method returnstrueif it has any child nodes, i.e., it is not a leaf. - The
previousSiblingandnextSiblingproperties return the previous and next sibling nodes, respectively, ornullif there is no such sibling. - The
contains()method returnstrueif a given node is a descendant of the node. - The
compareDocumentPosition()method compares two nodes by tree order. TheComparing nodes section discusses this method in more detail.
You rarely work with plainNode objects—instead, all objects in the DOM implement one of the interfaces that inherit fromNode, which represent additional semantics in the document. The node types restrict what data they contain, and what children types are valid. Consider how the following HTML document is represented in the DOM:
<!doctype html><html lang="en"> <head> <meta charset="utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <title>Document</title> </head> <body> <h1>Hello, world!</h1> <p>This is a paragraph.</p> </body></html>It produces the following DOM tree:
The root of this DOM tree is aDocument node, which represents the entire document. This node is exposed globally as thedocument variable. This node has two important child nodes:
- An optional
DocumentTypenode representing thedoctype declaration. In our case, there is one. This node is also accessible via thedoctypeproperty of theDocumentnode. - An optional
Elementnode representing the root element. For HTML documents (such as our case), this is typically theHTMLHtmlElement. For SVG documents, this is typically theSVGSVGElement. This node is also accessible via thedocumentElementproperty of theDocumentnode.
TheDocumentType node is always a leaf node. TheElement node is where most of the document content is represented. Each element under it, such as<head>,<body>, and<p>, is also represented by anElement node. In fact, each is a subclass ofElement specific to that tag name, defined in the HTML specification, such asHTMLHeadElement andHTMLBodyElement, with additional properties and methods to represent the semantics of that element, but here we focus on the common behaviors of the DOM. TheElement nodes can have otherElement nodes as children, representing nested elements. For example, the<head> element has three children: two<meta> elements and a<title> element. Additionally, elements can also haveText nodes andCDATASection nodes as children, representing text content. For example, the<p> element has a single child, aText node containing the string "This is a paragraph.".Text nodes andCDATASection nodes are always leaf nodes.
All nodes that can have children (Document,DocumentFragment, andElement) allow two types of children:Comment andProcessingInstruction nodes. These nodes are always leaf nodes.
Each element, in addition to having child nodes, can also have attributes, represented asAttr nodes.Attr extend theNode interface, but they are not part of the main tree structure, because they are not the child of any node and their parent node isnull. Instead, they are stored in a separate named node map, accessible via theattributes property of theElement node.
TheNode interface defines anodeType property that indicates the type of the node. To summarize, we introduced the following node types:
| Node type | nodeType value | Valid children (besidesComment andProcessingInstruction) |
|---|---|---|
Document | Node.DOCUMENT_NODE (9) | DocumentType,Element |
DocumentType | Node.DOCUMENT_TYPE_NODE (10) | None |
Element | Node.ELEMENT_NODE (1) | Element,Text,CDATASection |
Text | Node.TEXT_NODE (3) | None |
CDATASection | Node.CDATA_SECTION_NODE (4) | None |
Comment | Node.COMMENT_NODE (8) | None |
ProcessingInstruction | Node.PROCESSING_INSTRUCTION_NODE (7) | None |
Attr | Node.ATTRIBUTE_NODE (2) | None |
Note:You may notice we skipped some node types here. TheNode.ENTITY_REFERENCE_NODE (5),Node.ENTITY_NODE (6), andNode.NOTATION_NODE (12) values are no longer used, while theNode.DOCUMENT_FRAGMENT_NODE (11) value will be introduced inBuilding and updating the DOM tree.
Data of each node
Each node type has its own way of representing the data it holds. TheNode interface itself defines three properties related to data, summarized in the following table:
| Node type | nodeName | nodeValue | textContent |
|---|---|---|---|
Document | "#document" | null | null |
DocumentType | Itsname (e.g.,"html") | null | null |
Element | ItstagName (e.g.,"HTML","BODY") | null | Concatenation of all its text node descendants in tree order |
Text | "#text" | Itsdata | Itsdata |
CDATASection | "#cdata-section" | Itsdata | Itsdata |
Comment | "#comment" | Itsdata | Itsdata |
ProcessingInstruction | Itstarget | Itsdata | Itsdata |
Attr | Itsname | Itsvalue | Itsvalue |
Document
TheDocument node does not hold any data itself, so itsnodeValue andtextContent are alwaysnull. ItsnodeName is always"#document".
TheDocument does define some metadata about the document, coming from the environment (for example, the HTTP response that served the document):
- The
URLanddocumentURIproperties return the document's URL. - The
characterSetproperty returns the character encoding used by the document, such as"UTF-8". - The
compatModeproperty returns the rendering mode of the document, either"CSS1Compat"(standards mode) or"BackCompat"(quirks mode). - The
contentTypeproperty returns themedia type of the document, such as"text/html"for HTML documents.
DocumentType
ADocumentType in the document looks like this:
<!doctype name PUBLIC "publicId" "systemId">There are three parts you can specify, which correspond to the three properties of theDocumentType node:name,publicId, andsystemId. For HTML documents, the doctype is always<!doctype html>, so thename is"html" and bothpublicId andsystemId are empty strings.
Element
AnElement in the document looks like this:
<p>This is a paragraph.</p>In addition to the contents, there are two parts you can specify: the tag name and the attributes. The tag name corresponds to thetagName property of theElement node, which is"P" in this case (note that it is always in uppercase for HTML elements). The attributes correspond to theAttr nodes stored in theattributes property of theElement node. We will discuss attributes in more detail in theElement and its attributes section.
TheElement node does not hold any data itself, so itsnodeValue is alwaysnull. ItstextContent is the concatenation of all its text node descendants in tree order, which is"This is a paragraph." in this case. For the following element:
<div>Hello, <span>world</span>!</div>ThetextContent is"Hello, world!", concatenating the text node"Hello, ", the text node"world" inside the<span> element, and the text node"!".
CharacterData
Text,CDATASection,Comment, andProcessingInstruction all inherit from theCharacterData interface, which is a subclass ofNode. TheCharacterData interface defines a single property,data, which holds the text content of the node. Thedata property is also used to implement thenodeValue andtextContent properties of these nodes.
ForText andCDATASection, thedata property holds the text content of the node. In the following document (note that we use an SVG document, because HTML does not allow CDATA sections):
<text>Some text</text><style><![CDATA[h1 { color: red; }]]></style>The text node inside the<text> element has"Some text" asdata, and the CDATA section inside the<style> element has"h1 { color: red; }" asdata.
ForComment, thedata property holds the content of the comment, starting after the<!-- and ending before the-->. For example, in the following document:
<!-- This is a comment -->The comment node has" This is a comment " asdata.
ForProcessingInstruction, thedata property holds the content of the processing instruction, starting after the target and ending before the?>. For example, in the following document:
<?xml-stylesheet type="text/xsl" href="style.xsl"?>The processing instruction node has'type="text/xsl" href="style.xsl"' asdata, and"xml-stylesheet" as itstarget.
In addition, theCharacterData interface defines thelength property, which returns the length of thedata string, and thesubstringData() method, which returns a substring of thedata.
Attr
For the following element:
<p>This is a paragraph.</p>The<p> element has two attributes, represented by twoAttr nodes. Each attribute consists of a name and a value, corresponding to thename andvalue properties. The first attribute has"class" asname and"note" asvalue, while the second attribute has"id" asname and"intro" asvalue.
Element and its attributes
As previously mentioned, the attributes of anElement node are represented byAttr nodes, which are stored in a separate named node map, accessible via theattributes property of theElement node. ThisNamedNodeMap interface defines three important properties:
length, which returns the number of attributes.item()method, which returns theAttrat a given index.getNamedItem()method, which returns theAttrwith a given name.
TheElement interface also defines several methods to work with attributes directly, without needing to access the named node map:
element.getAttribute(name)is equivalent toelement.attributes.getNamedItem(name).value, if the attribute exists.element.getAttributeNode(name)is equivalent toelement.attributes.getNamedItem(name).element.hasAttribute(name)is equivalent toelement.attributes.getNamedItem(name) !== null.element.getAttributeNames()returns an array of all attribute names.element.hasAttributes()is equivalent toelement.attributes.length > 0.
You can also access the owner element of an attribute via theownerElement property of theAttr node.
There are two special attributes,id andclass, which have their own properties on theElement interface:id andclassName, thatreflect the value of the corresponding attribute. In addition, theclassList property returns aDOMTokenList representing the list of classes in theclass attribute.
Working with the element tree
BecauseElement nodes form the backbone of the document structure, you can specifically traverse the element nodes, skipping other nodes (such asText andComment).
- For all nodes, the
parentElementproperty returns the parent node if it is anElement, ornullif the parent is not anElement(for example, if the parent is aDocument). This is in contrast toparentNode, which returns the parent node regardless of its type. - For
Document,DocumentFragment, andElement, thechildrenproperty returns anHTMLCollectionof only the childElementnodes. This is in contrast tochildNodes, which returns all child nodes. ThefirstElementChildandlastElementChildproperties return the first and last elements of this collection, respectively, ornullif there are no child elements. ThechildElementCountproperty returns the number of child elements. - For
ElementandCharacterData, thepreviousElementSiblingandnextElementSiblingproperties return the previous and next sibling node that is anElement, ornullif there is no such sibling. This is in contrast topreviousSiblingandnextSibling, which may return any type of sibling node.
Comparing nodes
There are three important methods that compare nodes:isEqualNode(),isSameNode(),compareDocumentPosition().
TheisSameNode() method is legacy. Now, it behaves like thestrict equality operator (===), returningtrue if and only if the two nodes are the same object.
TheisEqualNode() method compares two nodes structurally. Two nodes are considered equal if they have the same type, the same data, and their child nodes are also equal at each index. In theData of each node section, we already defined the data relevant for each node type:
- For
Document, there is no data, so only the child nodes need to be compared. - For
DocumentType, thename,publicId, andsystemIdproperties need to be compared. - For
Element, thetagName(more accurately, thenamespaceURI,prefix, andlocalName; we will introduce these in theXML namespaces guide) and the attributes need to be compared. - For
Attr, thename(more accurately, thenamespaceURI,prefix, andlocalName; we will introduce these in theXML namespaces guide) andvalueproperties need to be compared. - For all
CharacterDatanodes (Text,CDATASection,Comment, andProcessingInstruction), thedataproperty needs to be compared. ForProcessingInstruction, thetargetproperty also needs to be compared.
Thea.compareDocumentPosition(b) method compares two nodes by tree order. It returns a bitmask indicating their relative positions. The possible cases are:
- Returns
0ifaandbare the same node. - If the two nodes are both attributes of the same element node, returns
Node.DOCUMENT_POSITION_PRECEDING | Node.DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC(34) ifaprecedesbin the attribute list, orNode.DOCUMENT_POSITION_FOLLOWING | Node.DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC(36) ifafollowsb. If either node is an attribute, the owner element is used for further comparisons. - If the two nodes don't have the same root node, returns either
Node.DOCUMENT_POSITION_DISCONNECTED | Node.DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC | Node.DOCUMENT_POSITION_PRECEDING(35) orNode.DOCUMENT_POSITION_DISCONNECTED | Node.DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC | Node.DOCUMENT_POSITION_FOLLOWING(37). Which one is returned is implementation-specific. - If
ais an ancestor ofb(including whenbis an attribute ofa), returnsNode.DOCUMENT_POSITION_CONTAINS | Node.DOCUMENT_POSITION_PRECEDING(10). - If
ais a descendant ofb(including whenais an attribute ofb), returnsNode.DOCUMENT_POSITION_CONTAINED_BY | Node.DOCUMENT_POSITION_FOLLOWING(20). - If
aprecedesbin tree order, returnsNode.DOCUMENT_POSITION_PRECEDING(2). - If
afollowsbin tree order, returnsNode.DOCUMENT_POSITION_FOLLOWING(4).
Bitmask values are used, so you can use a bitwise AND operation to check for specific relationships. For example, to check ifa precedesb, you can do:
if (a.compareDocumentPosition(b) & Node.DOCUMENT_POSITION_PRECEDING) { // a precedes b}Which accounts for the cases ofa andb being attributes of the same element,a being an ancestor ofb, anda precedingb in tree order.
Summary
Here are all the features we've introduced so far. There are a lot, but they are all useful in different scenarios.
- All nodes in the DOM implement the
Nodeinterface. - To navigate around the DOM tree:
parentNode,childNodes,firstChild/lastChild,hasChildNodes(),getRootNode(),previousSibling/nextSibling. - To navigate around the element tree:
parentElement,children,firstElementChild/lastElementChild,childElementCount,previousElementSibling/nextElementSibling. - The
nodeTypeproperty indicates the type of the node. ThenodeName,nodeValue, andtextContentproperties provide the data held by the node. - The
Documentnode and its two important children:doctypeanddocumentElement. - The
DocumentTypenode and its three properties:name,publicId, andsystemId. - The
Elementnode and its properties:tagName,attributes. - The
Attrnode and its properties:nameandvalue. - The
CharacterDatainterface and its property:data. - The four
CharacterDatasubclasses:Text,CDATASection,Comment, andProcessingInstruction.ProcessingInstructionalso has thetargetproperty. - The various ways to work with attributes, including the
id,className, andclassListproperties. - The three methods to compare nodes:
isEqualNode(),isSameNode(), andcompareDocumentPosition().