Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Jennifer Lyn Parsons
Jennifer Lyn Parsons

Posted on • Originally published ataquantityofstuff.com

     

Xpath for you and me

I learned XPath a few years ago and always found myself frustrated with the documentation for it. There were a few basic concepts that seemed to trip me up on it as I learned it. My hope with this brief article is that I can make it a little easier for the next person who needs to learn this small but mighty tool.

What is XPath?

XPath stands for XML Path Language

XPath is designed to be used to point to parts of an XML document. We use it to do pattern matching between DOM nodes. It is used in XSLT, Selenium and other areas where DOM navigation is useful.

When looking at the syntax of an xpath query, view it as if the DOM is a file hierarchy that we are navigating, similar to URL paths. It intuitively makes a bit more sense that way. Each parent element is a "folder" that can contain other folders (child elements).

The general syntax is similar to regex and CSS selectors as well.

XPath query structure

XPath queries are made up of four parts.

  • The prefix determines the starting point of the query.
  • The axis refers to the relationship of the context node.
  • The step is also the context node, the identifier of the element we’re referencing.
  • The predicate makes the step more specific

Note: The less specific XPath queries are the more expensive they become, performance-wise. Similar to CSS selectors, there is a balance between specificity vs. flexibility and performance.

Parts of an XPath query
//ul/a[@id='link']
PrefixStepAxisStep with predicate

Axis selector examples

Axis selectors allow us to "drill down" into the structure we're processing to access the node we're looking for.

Axis selectorExamplesContext
///section/div//aAnywhere in the document when prefix (This will set the context to any descendent element)
././aChild relative to the current node
//html/body/divStart at the root (This will also select the context to any child element)
..Self node
....Parent node
*./*Any node

Navigation

XPath also allows you to navigate up and down the hierarchy of the DOM, just like with folder navigation.

Selectors can be chained and can include some limited logic. They are based on various pattern matching criteria, similar to regex.

  • relationship (child, sibling, preceding, self)
  • attributes (id, class name, href)
  • order (first, last)
  • content (contains string “xyz”)

Selector examples

ExampleContext
//ul/li/aRelationship selector, matches a direct child relationship
//input[@type="submit"]Attribute selector
//ul/li[2]Order selector, selects second child<li>. Note: this is not zero indexed.
//button[contains(text(),"Go")]Contains text, in this case matching a substring
//a[@name or @href]Or logic
//ul/li/../../.Selects the parent of the<ul> (for example, a<div><ul><li> </li></ul></div> structure)
//h1[not(@id)]Not selector. This example selects any<h1> without an id
./a[1][@href='/']An example of chaining. Here we’re selecting the first<a href=”/”> within the current context

A note aboutcontains(). This selector is rather loose and will select any string that contains the string parameter that is passed to it. This can cause unexpected results. In the example above, any button with the stringGo in it will be selected, in this caseGo Home andGo to Next Page would both be selected. Combining the various selectors can produce the results you seek.

Resources

MDN
https://developer.mozilla.org/en-US/docs/Web/XPath

Devhints.io
https://devhints.io/xpath

Scrapy documentation
https://doc.scrapy.org/en/xpath-tutorial/topics/xpath-tutorial.html

Top comments(1)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss
CollapseExpand
 
gautamraju15 profile image
ray_v101
  • Joined

hey i am stuck , can you help me with this
stackoverflow.com/q/59986563/5956254

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

paladin. writer. gen-x. comic geek. tea snob. vegetarian. au‍tistic. | lunastationquarterly.com | selfcare.tech
  • Location
    New Jersey
  • Work
    Software Engineer at Truss
  • Joined

Trending onDEV CommunityHot

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp