Movatterモバイル変換


[0]ホーム

URL:


Processing XML

version 5.0.2
Table of Contents

1. Parsing XML

1.1. XmlParser and XmlSlurper

The most commonly used approach for parsing XML with Groovy is to useone of:

  • groovy.xml.XmlParser

  • groovy.xml.XmlSlurper

Both have the same approach to parse an XML. Both come with a bunch ofoverloaded parse methods plus some special methods such asparseText,parseFile and others. For the next example we will use theparseTextmethod. It parses an XMLString and recursively converts it to a listor map of objects.

XmlSlurper
def text = '''    <list>        <technology>            <name>Groovy</name>        </technology>    </list>'''def list = new XmlSlurper().parseText(text)(1)assert list instanceof groovy.xml.slurpersupport.GPathResult(2)assert list.technology.name == 'Groovy'(3)
1Parsing the XML and returning the root node as a GPathResult
2Checking we’re using a GPathResult
3Traversing the tree in a GPath style
XmlParser
def text = '''    <list>        <technology>            <name>Groovy</name>        </technology>    </list>'''def list = new XmlParser().parseText(text)(1)assert list instanceof groovy.util.Node(2)assert list.technology.name.text() == 'Groovy'(3)
1Parsing the XML and returning the root node as a Node
2Checking we’re using a Node
3Traversing the tree in a GPath style

Let’s see thesimilarities betweenXMLParser andXMLSlurper first:

  • Both are based onSAX so they both are low memory footprint

  • Both can update/transform the XML

But they have keydifferences:

  • XmlSlurper evaluates the structure lazily. So if you update the xmlyou’ll have to evaluate the whole tree again.

  • XmlSlurper returnsGPathResult instances when parsing XML

  • XmlParser returnsNode objects when parsing XML

When to use one or the another?

There is a discussion atStackOverflow. Theconclusions written here are based partially on this entry.
  • If you want to transform an existing document to another thenXmlSlurper will be the choice

  • If you want to update and read at the same time thenXmlParser isthe choice.

The rationale behind this is that every time you create a node withXmlSlurper it won’t be available until you parse the document againwith anotherXmlSlurper instance. Need to read just a few nodesXmlSlurper is for you ".

  • If you just have to read a few nodesXmlSlurper should be yourchoice, since it will not have to create a complete structure inmemory"

In general both classes perform similar way. Even the way of usingGPath expressions with them are the same (both usebreadthFirst() anddepthFirst() expressions). So I guess it depends on the write/readfrequency.

1.2. DOMCategory

There is another way of parsing XML documents with Groovy with theuse ofgroovy.xml.dom.DOMCategory which is a category class whichadds GPath style operations to Java’s DOM classes.

Java has in-built support for DOM processing of XML using classesrepresenting the various parts of XML documents, e.g.Document,Element,NodeList,Attr etc. For more information about these classes,refer to the respective JavaDocs.

Having an XML like the following:

static def CAR_RECORDS = '''<records>  <car name='HSV Maloo' make='Holden' year='2006'>    <country>Australia</country>    <record type='speed'>Production Pickup Truck with speed of 271kph</record>  </car>  <car name='P50' make='Peel' year='1962'>    <country>Isle of Man</country>    <record type='size'>Smallest Street-Legal Car at 99cm wide and 59 kg in weight</record>  </car>  <car name='Royale' make='Bugatti' year='1931'>    <country>France</country>    <record type='price'>Most Valuable Car at $15 million</record>  </car></records>'''

You can parse it usinggroovy.xml.DOMBuilder andgroovy.xml.dom.DOMCategory.

def reader = new StringReader(CAR_RECORDS)def doc = DOMBuilder.parse(reader)(1)def records = doc.documentElementuse(DOMCategory) {(2)    assert records.car.size() == 3}
1Parsing the XML
2CreatingDOMCategory scope to be able to use helper method calls

2. GPath

The most common way of querying XML in Groovy is usingGPath:

GPath is a path expression language integrated into Groovy whichallows parts of nested structured data to be identified. In thissense, it has similar aims and scope as XPath does for XML. The twomain places where you use GPath expressions is when dealing withnested POJOs or when dealing with XML

It is similar toXPathexpressions and you can use it not only with XML but also with POJOclasses. As an example, you can specify a path to an object or elementof interest:

  • a.b.c → for XML, yields all the<c> elements inside<b> inside<a>

  • a.b.c → all POJOs, yields the<c> properties for all the<b>properties of<a> (sort of like a.getB().getC() in JavaBeans)

For XML, you can also specify attributes, e.g.:

  • a["@href"] → the href attribute of all the a elements

  • a.'@href' → an alternative way of expressing this

  • a.@href → an alternative way of expressing this when using XmlSlurper

Let’s illustrate this with an example:

static final String books = '''    <response version-api="2.0">        <value>            <books>                <book available="20">                    <title>Don Quixote</title>                    <author>Miguel de Cervantes</author>                </book>                <book available="14">                    <title>Catcher in the Rye</title>                   <author>JD Salinger</author>               </book>               <book available="13">                   <title>Alice in Wonderland</title>                   <author>Lewis Carroll</author>               </book>               <book available="5">                   <title>Don Quixote</title>                   <author>Miguel de Cervantes</author>               </book>           </books>       </value>    </response>'''

2.1. Simply traversing the tree

First thing we could do is to get a value using POJO’s notation. Let’sget the first book’s author’s name

Getting node value
def response = new XmlSlurper().parseText(books)def authorResult = response.value.books.book[0].authorassert authorResult.text() == 'Miguel de Cervantes'

First we parse the document withXmlSlurper and then we have toconsider the returning value as the root of the XML document, so inthis case is "response".

That’s why we start traversing the document from response and thenvalue.books.book[0].author. Note that inXPath the node arrays startsin [1] instead of [0], but becauseGPath is Java-based it begins atindex 0.

In the end we’ll have the instance of theauthor node and because wewanted the text inside that node we should be calling thetext()method. Theauthor node is an instance ofGPathResult type andtext() a method giving us the content of that node as a String.

When usingGPath with an XML parsed withXmlSlurper we’ll have as aresult aGPathResult object.GPathResult has many other convenientmethods to convert the text inside a node to any other type such as:

  • toInteger()

  • toFloat()

  • toBigInteger()

  • …​

All these methods try to convert aString to the appropriate type.

If we were using an XML parsed withXmlParser we could be dealing withinstances of typeNode. But still all the actions applied toGPathResult in these examples could be applied to a Node aswell. Creators of both parsers took into accountGPath compatibility.

Next step is to get some values from a given node’s attribute. In the following samplewe want to get the first book’s author’s id. We’ll be using two different approaches. Let’s see the code first:

Getting an attribute’s value
def response = new XmlSlurper().parseText(books)def book = response.value.books.book[0](1)def bookAuthorId1 = book.@id(2)def bookAuthorId2 = book['@id'](3)assert bookAuthorId1 == '1'(4)assert bookAuthorId1.toInteger() == 1(5)assert bookAuthorId1 == bookAuthorId2
1Getting the first book node
2Getting the book’s id attribute@id
3Getting the book’s id attribute withmap notation['@id']
4Getting the value as a String
5Getting the value of the attribute as anInteger

As you can see there are two types of notations to get attributes,the

  • direct notation with@nameoftheattribute

  • map notation using['@nameoftheattribute']

Both of them are equally valid.

2.2. Flexible navigation with children (*), depthFirst (**) and breadthFirst

If you ever have used XPath, you may have used expressions like:

  • /following-sibling::othernode : Look for a node "othernode" in the same level

  • // : Look everywhere

More or less we have their counterparts in GPath with the shortcuts* (akachildren()) and** (akadepthFirst()).

The first example shows a simple use of*, which only iterates over the direct children of the node.

Using *
def response = new XmlSlurper().parseText(books)// .'*' could be replaced by .children()def catcherInTheRye = response.value.books.'*'.find { node ->    // node.@id == 2 could be expressed as node['@id'] == 2    node.name() == 'book' && node.@id == '2'}assert catcherInTheRye.title.text() == 'Catcher in the Rye'

This test searches for any child nodes of the "books" node matching the givencondition. In a bit more detail, the expression says:Look for any node witha tag name equal to 'book' having an id with a value of '2' directly underthe 'books' node.

This operation roughly corresponds to thebreadthFirst() method, except thatit only stops atone level instead of continuing to the inner levels.

What if we would like to look for a given valuewithout having to know exactly where it is. Let’s say that theonly thing we know is the id of the author "Lewis Carroll" . How arewe going to be able to find that book? Using** is the solution:

Using **
def response = new XmlSlurper().parseText(books)// .'**' could be replaced by .depthFirst()def bookId = response.'**'.find { book ->    book.author.text() == 'Lewis Carroll'}.@idassert bookId == 3

** is the same as looking for somethingeverywhere in thetree from this point down. In this case, we’ve used the methodfind(Closure cl) to find just the first occurrence.

What if we want to collect all book’s titles? That’s easy, just usefindAll:

def response = new XmlSlurper().parseText(books)def titles = response.'**'.findAll { node -> node.name() == 'title' }*.text()assert titles.size() == 4

In the last two examples,** is used as a shortcut for thedepthFirst()method. It goes as far down the tree as it can while navigating down thetree from a given node. ThebreadthFirst() method finishes off all nodeson a given level before traversing down to the next level.

The following example shows the difference between these two methods:

depthFirst() vs .breadthFirst
def response = new XmlSlurper().parseText(books)def nodeName = { node -> node.name() }def withId2or3 = { node -> node.@id in [2, 3] }assert ['book', 'author', 'book', 'author'] ==        response.value.books.depthFirst().findAll(withId2or3).collect(nodeName)assert ['book', 'book', 'author', 'author'] ==        response.value.books.breadthFirst().findAll(withId2or3).collect(nodeName)

In this example, we search for any nodes with an id attribute with value 2 or 3.There are bothbook andauthor nodes that match that criteria. The differenttraversal orders will find the same nodes in each case but in different orderscorresponding to how the tree was traversed.

It is worth mentioning again that there are some useful methodsconverting a node’s value to an integer, float, etc. Those methodscould be convenient when doing comparisons like this:

helpers
def response = new XmlSlurper().parseText(books)def titles = response.value.books.book.findAll { book ->    /* You can use toInteger() over the GPathResult object */    book.@id.toInteger() > 2}*.titleassert titles.size() == 2

In this case the number 2 has been hardcoded but imagine that valuecould have come from any other source (database…​ etc.).

3. Creating XML

The most commonly used approach for creating XML with Groovy is to usea builder, i.e. one of:

  • groovy.xml.MarkupBuilder

  • groovy.xml.StreamingMarkupBuilder

3.1. MarkupBuilder

Here is an example of using Groovy’s MarkupBuilder to create a new XML file:

Creating Xml with MarkupBuilder
def writer = new StringWriter()def xml = new MarkupBuilder(writer)(1)xml.records() {(2)    car(name: 'HSV Maloo', make: 'Holden', year: 2006) {        country('Australia')        record(type: 'speed', 'Production Pickup Truck with speed of 271kph')    }    car(name: 'Royale', make: 'Bugatti', year: 1931) {        country('France')        record(type: 'price', 'Most Valuable Car at $15 million')    }}def records = new XmlSlurper().parseText(writer.toString())(3)assert records.car.first().name.text() == 'HSV Maloo'assert records.car.last().name.text() == 'Royale'
1Create an instance ofMarkupBuilder
2Start creating the XML tree
3Create an instance ofXmlSlurper to traverse and test thegenerated XML

Let’s take a look a little bit closer:

Creating XML elements
def xmlString = "<movie>the godfather</movie>"(1)def xmlWriter = new StringWriter()(2)def xmlMarkup = new MarkupBuilder(xmlWriter)xmlMarkup.movie("the godfather")(3)assert xmlString == xmlWriter.toString()(4)
1We’re creating a reference string to compare against
2ThexmlWriter instance is used byMarkupBuilder to convert thexml representation to a String instance eventually
3ThexmlMarkup.movie(…​) call will create an XML node with a tagcalledmovie and with contentthe godfather.
Creating XML elements with attributes
def xmlString = "<movie id='2'>the godfather</movie>"def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter)xmlMarkup.movie(id: "2", "the godfather")(1)assert xmlString == xmlWriter.toString()
1This time in order to create both attributes and node content youcan create as many map entries as you like and finally add a valueto set the node’s content
The value could be anyObject, the value will be serialized to itsString representation.
Creating XML nested elements
def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter)xmlMarkup.movie(id: 2) {(1)    name("the godfather")}def movie = new XmlSlurper().parseText(xmlWriter.toString())assert movie.@id == 2assert movie.name.text() == 'the godfather'
1A closure represents the children elements of a given node. Noticethis time instead of using a String for the attribute we’re using anumber.

Sometimes you may want to use a specific namespace in your xml documents:

Namespace aware
def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter)xmlMarkup        .'x:movies'('xmlns:x': 'http://www.groovy-lang.org') {(1)    'x:movie'(id: 1, 'the godfather')    'x:movie'(id: 2, 'ronin')}def movies =        new XmlSlurper()(2)                .parseText(xmlWriter.toString())                .declareNamespace(x: 'http://www.groovy-lang.org')assert movies.'x:movie'.last().@id == 2assert movies.'x:movie'.last().text() == 'ronin'
1Creating a node with a given namespacexmlns:x
2Creating aXmlSlurper registering the namespace to be able totest the XML we just created

What about having some more meaningful example. We may want togenerate more elements, to have some logic when creating our XML:

Mix code
def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter)xmlMarkup        .'x:movies'('xmlns:x': 'http://www.groovy-lang.org') {    (1..3).each { n ->(1)        'x:movie'(id: n, "the godfather $n")        if (n % 2 == 0) {(2)            'x:movie'(id: n, "the godfather $n (Extended)")        }    }}def movies =        new XmlSlurper()                .parseText(xmlWriter.toString())                .declareNamespace(x: 'http://www.groovy-lang.org')assert movies.'x:movie'.size() == 4assert movies.'x:movie'*.text().every { name -> name.startsWith('the') }
1Generating elements from a range
2Using a conditional for creating a given element

Of course the instance of a builder can be passed as a parameter torefactor/modularize your code:

Mix code
def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter)(1)Closure<MarkupBuilder> buildMovieList = { MarkupBuilder builder ->    (1..3).each { n ->        builder.'x:movie'(id: n, "the godfather $n")        if (n % 2 == 0) {            builder.'x:movie'(id: n, "the godfather $n (Extended)")        }    }    return builder}xmlMarkup.'x:movies'('xmlns:x': 'http://www.groovy-lang.org') {    buildMovieList(xmlMarkup)(2)}def movies =        new XmlSlurper()                .parseText(xmlWriter.toString())                .declareNamespace(x: 'http://www.groovy-lang.org')assert movies.'x:movie'.size() == 4assert movies.'x:movie'*.text().every { name -> name.startsWith('the') }
1In this case we’ve created a Closure to handle the creation of a list of movies
2Just using thebuildMovieList function when necessary

3.2. StreamingMarkupBuilder

The classgroovy.xml.StreamingMarkupBuilder is a builder class forcreating XML markup. This implementation uses agroovy.xml.streamingmarkupsupport.StreamingMarkupWriter to handleoutput.

Using StreamingMarkupBuilder
def xml = new StreamingMarkupBuilder().bind {(1)    records {        car(name: 'HSV Maloo', make: 'Holden', year: 2006) {(2)            country('Australia')            record(type: 'speed', 'Production Pickup Truck with speed of 271kph')        }        car(name: 'P50', make: 'Peel', year: 1962) {            country('Isle of Man')            record(type: 'size', 'Smallest Street-Legal Car at 99cm wide and 59 kg in weight')        }        car(name: 'Royale', make: 'Bugatti', year: 1931) {            country('France')            record(type: 'price', 'Most Valuable Car at $15 million')        }    }}def records = new XmlSlurper().parseText(xml.toString())(3)assert records.car.size() == 3assert records.car.find { it.@name == 'P50' }.country.text() == 'Isle of Man'
1Note thatStreamingMarkupBuilder.bind returns aWritableinstance that may be used to stream the markup to a Writer
2We’re capturing the output in a String to parse it again and checkthe structure of the generated XML withXmlSlurper.

3.3. MarkupBuilderHelper

Thegroovy.xml.MarkupBuilderHelper is, as its name reflects, ahelper forgroovy.xml.MarkupBuilder.

This helper normally can be accessed from within an instance of classgroovy.xml.MarkupBuilder or an instance ofgroovy.xml.StreamingMarkupBuilder.

This helper could be handy in situations when you may want to:

  • Produce a comment in the output

  • Produce an XML processing instruction in the output

  • Produce an XML declaration in the output

  • Print data in the body of the current tag, escaping XML entities

  • Print data in the body of the current tag

In bothMarkupBuilder andStreamingMarkupBuilder this helper isaccessed by the propertymkp:

Using MarkupBuilder’s 'mkp'
def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter).rules {    mkp.comment('THIS IS THE MAIN RULE')(1)    rule(sentence: mkp.yield('3 > n'))(2)}(3)assert xmlWriter.toString().contains('3 &gt; n')assert xmlWriter.toString().contains('<!-- THIS IS THE MAIN RULE -->')
1Usingmkp to create a comment in the XML
2Usingmkp to generate an escaped value
3Checking both assumptions were true

Here is another example to show the use ofmkp property accessiblefrom within thebind method scope when usingStreamingMarkupBuilder:

Using StreamingMarkupBuilder’s 'mkp'
def xml = new StreamingMarkupBuilder().bind {    records {        car(name: mkp.yield('3 < 5'))(1)        car(name: mkp.yieldUnescaped('1 < 3'))(2)    }}assert xml.toString().contains('3 &lt; 5')assert xml.toString().contains('1 < 3')
1If we want to generate an escaped value for the name attribute withmkp.yield
2Checking the values later on withXmlSlurper

3.4. DOMToGroovy

Suppose we have an existing XML document and we want to automategeneration of the markup without having to type it all in? We justneed to useorg.codehaus.groovy.tools.xml.DOMToGroovy as shown inthe following example:

Building MarkupBuilder from DOMToGroovy
def songs = """    <songs>      <song>        <title>Here I go</title>        <band>Whitesnake</band>      </song>    </songs>"""def builder =        javax.xml.parsers.DocumentBuilderFactory.newInstance().newDocumentBuilder()def inputStream = new ByteArrayInputStream(songs.bytes)def document = builder.parse(inputStream)def output = new StringWriter()def converter = new DomToGroovy(new PrintWriter(output))(1)converter.print(document)(2)String xmlRecovered =        new GroovyShell()                .evaluate("""           def writer = new StringWriter()           def builder = new groovy.xml.MarkupBuilder(writer)           builder.${output}           return writer.toString()        """)(3)assert new XmlSlurper().parseText(xmlRecovered).song.title.text() == 'Here I go'(4)
1CreatingDOMToGroovy instance
2Converts the XML toMarkupBuilder calls which are available in the outputStringWriter
3Usingoutput variable to create the whole MarkupBuilder
4Back to XML string

4. Manipulating XML

In this chapter you’ll see the different ways of adding / modifying /removing nodes usingXmlSlurper orXmlParser. The xml we are goingto be handling is the following:

def xml = """<response version-api="2.0">    <value>        <books>            <book>                <title>Don Quixote</title>                <author>Miguel de Cervantes</author>            </book>        </books>    </value></response>"""

4.1. Adding nodes

The main difference betweenXmlSlurper andXmlParser is that whenformer creates the nodes they won’t be available until the document’sbeen evaluated again, so you should parse the transformed documentagain in order to be able to see the new nodes. So keep that in mindwhen choosing any of both approaches.

If you needed to see a node right after creating it thenXmlParsershould be your choice, but if you’re planning to do many changes tothe XML and send the result to another process maybeXmlSlurper wouldbe more efficient.

You can’t create a new node directly using theXmlSlurper instance,but you can withXmlParser. The way of creating a new node fromXmlParser is through its methodcreateNode(..)

def parser = new XmlParser()def response = parser.parseText(xml)def numberOfResults = parser.createNode(        response,        new QName("numberOfResults"),        [:])numberOfResults.value = "1"assert response.numberOfResults.text() == "1"

ThecreateNode() method receives the following parameters:

  • parent node (could be null)

  • The qualified name for the tag (In this case we only use the localpart without any namespace). We’re using an instance ofgroovy.namespace.QName

  • A map with the tag’s attributes (None in this particular case)

Anyway you won’t normally be creating a node from the parser instancebut from the parsed XML instance. That is from aNode or aGPathResult instance.

Take a look at the next example. We are parsing the xml withXmlParserand then creating a new node from the parsed document’s instance(Notice the method here is slightly different in the way it receivesthe parameters):

def parser = new XmlParser()def response = parser.parseText(xml)response.appendNode(        new QName("numberOfResults"),        [:],        "1")response.numberOfResults.text() == "1"

When usingXmlSlurper,GPathResult instances don’t havecreateNode()method.

4.2. Modifying / Removing nodes

We know how to parse the document, add new nodes, now I want to changea given node’s content. Let’s start usingXmlParser andNode. Thisexample changes the first book information to actually another book.

def response = new XmlParser().parseText(xml)/* Use the same syntax as groovy.xml.MarkupBuilder */response.value.books.book[0].replaceNode {(1)    book(id: "3") {        title("To Kill a Mockingbird")        author(id: "3", "Harper Lee")    }}def newNode = response.value.books.book[0]assert newNode.name() == "book"assert newNode.@id == "3"assert newNode.title.text() == "To Kill a Mockingbird"assert newNode.author.text() == "Harper Lee"assert newNode.author.@id.first() == "3"

When usingreplaceNode() the closure we pass as parameter shouldfollow the same rules as if we were usinggroovy.xml.MarkupBuilder:

Here’s the same example usingXmlSlurper:

def response = new XmlSlurper().parseText(books)/* Use the same syntax as groovy.xml.MarkupBuilder */response.value.books.book[0].replaceNode {    book(id: "3") {        title("To Kill a Mockingbird")        author(id: "3", "Harper Lee")    }}assert response.value.books.book[0].title.text() == "Don Quixote"/* That mkp is a special namespace used to escape away from the normal building mode   of the builder and get access to helper markup methods   'yield', 'pi', 'comment', 'out', 'namespaces', 'xmlDeclaration' and   'yieldUnescaped' */def result = new StreamingMarkupBuilder().bind { mkp.yield response }.toString()def changedResponse = new XmlSlurper().parseText(result)assert changedResponse.value.books.book[0].title.text() == "To Kill a Mockingbird"

Notice how usingXmlSlurper we have to parse the transformed documentagain in order to find the created nodes. In this particular examplecould be a little bit annoying isn’t it?

Finally both parsers also use the same approach for adding a newattribute to a given attribute. This time again the difference iswhether you want the new nodes to be available right away ornot. FirstXmlParser:

def parser = new XmlParser()def response = parser.parseText(xml)response.@numberOfResults = "1"assert response.@numberOfResults == "1"

AndXmlSlurper:

def response = new XmlSlurper().parseText(books)response.@numberOfResults = "2"assert response.@numberOfResults == "2"

When usingXmlSlurper, adding a new attribute doesnot require you to perform a new evaluation.

4.3. Printing XML

4.3.1. XmlUtil

Sometimes is useful to get not only the value of a given node but thenode itself (for instance to add this node to another XML).

For that you can usegroovy.xml.XmlUtil class. It has several staticmethods to serialize the xml fragment from several type of sources(Node, GPathResult, String…​)

Getting a node as a string
def response = new XmlParser().parseText(xml)def nodeToSerialize = response.'**'.find { it.name() == 'author' }def nodeAsText = XmlUtil.serialize(nodeToSerialize)assert nodeAsText ==        XmlUtil.serialize('<?xml version="1.0" encoding="UTF-8"?><author>Miguel de Cervantes</author>')
Version 5.0.2
Last updated 2025-10-15 09:28:07 +1000

[8]ページ先頭

©2009-2025 Movatter.jp