Movatterモバイル変換

1. Parsing XML

1.1. XmlParser and XmlSlurper

The most commonly used approach for parsing XML with Groovy is to useone of:

groovy.xml.XmlParser
groovy.xml.XmlSlurper

Both have the same approach to parse an XML. Both come with a bunch ofoverloaded parse methods plus some special methods such asparseText,parseFile and others. For the next example we will use theparseTextmethod. It parses an XMLString and recursively converts it to a listor map of objects.

XmlSlurper

def text = '''    <list>        <technology>            <name>Groovy</name>        </technology>    </list>'''def list = new XmlSlurper().parseText(text)(1)assert list instanceof groovy.xml.slurpersupport.GPathResult(2)assert list.technology.name == 'Groovy'(3)

1	Parsing the XML and returning the root node as a GPathResult
2	Checking we’re using a GPathResult
3	Traversing the tree in a GPath style

XmlParser

def text = '''    <list>        <technology>            <name>Groovy</name>        </technology>    </list>'''def list = new XmlParser().parseText(text)(1)assert list instanceof groovy.util.Node(2)assert list.technology.name.text() == 'Groovy'(3)

1	Parsing the XML and returning the root node as a Node
2	Checking we’re using a Node
3	Traversing the tree in a GPath style

Let’s see thesimilarities betweenXMLParser andXMLSlurper first:

Both are based onSAX so they both are low memory footprint
Both can update/transform the XML

But they have keydifferences:

XmlSlurper evaluates the structure lazily. So if you update the xmlyou’ll have to evaluate the whole tree again.
XmlSlurper returnsGPathResult instances when parsing XML
XmlParser returnsNode objects when parsing XML

When to use one or the another?

There is a discussion atStackOverflow. Theconclusions written here are based partially on this entry.

If you want to transform an existing document to another thenXmlSlurper will be the choice
If you want to update and read at the same time thenXmlParser isthe choice.

The rationale behind this is that every time you create a node withXmlSlurper it won’t be available until you parse the document againwith anotherXmlSlurper instance. Need to read just a few nodesXmlSlurper is for you ".

If you just have to read a few nodesXmlSlurper should be yourchoice, since it will not have to create a complete structure inmemory"

In general both classes perform similar way. Even the way of usingGPath expressions with them are the same (both usebreadthFirst() anddepthFirst() expressions). So I guess it depends on the write/readfrequency.

1.2. DOMCategory

There is another way of parsing XML documents with Groovy with theuse ofgroovy.xml.dom.DOMCategory which is a category class whichadds GPath style operations to Java’s DOM classes.

Java has in-built support for DOM processing of XML using classesrepresenting the various parts of XML documents, e.g.Document,Element,NodeList,Attr etc. For more information about these classes,refer to the respective JavaDocs.

Having an XML like the following:

static def CAR_RECORDS = '''<records>  <car name='HSV Maloo' make='Holden' year='2006'>    <country>Australia</country>    <record type='speed'>Production Pickup Truck with speed of 271kph</record>  </car>  <car name='P50' make='Peel' year='1962'>    <country>Isle of Man</country>    <record type='size'>Smallest Street-Legal Car at 99cm wide and 59 kg in weight</record>  </car>  <car name='Royale' make='Bugatti' year='1931'>    <country>France</country>    <record type='price'>Most Valuable Car at $15 million</record>  </car></records>'''

You can parse it usinggroovy.xml.DOMBuilder andgroovy.xml.dom.DOMCategory.

def reader = new StringReader(CAR_RECORDS)def doc = DOMBuilder.parse(reader)(1)def records = doc.documentElementuse(DOMCategory) {(2)    assert records.car.size() == 3}

1	Parsing the XML
2	Creating`DOMCategory` scope to be able to use helper method calls

2. GPath

The most common way of querying XML in Groovy is usingGPath:

GPath is a path expression language integrated into Groovy whichallows parts of nested structured data to be identified. In thissense, it has similar aims and scope as XPath does for XML. The twomain places where you use GPath expressions is when dealing withnested POJOs or when dealing with XML

It is similar toXPathexpressions and you can use it not only with XML but also with POJOclasses. As an example, you can specify a path to an object or elementof interest:

a.b.c → for XML, yields all the<c> elements inside<b> inside<a>
a.b.c → all POJOs, yields the<c> properties for all the<b>properties of<a> (sort of like a.getB().getC() in JavaBeans)

For XML, you can also specify attributes, e.g.:

a["@href"] → the href attribute of all the a elements
a.'@href' → an alternative way of expressing this
a.@href → an alternative way of expressing this when using XmlSlurper

Let’s illustrate this with an example:

static final String books = '''    <response version-api="2.0">        <value>            <books>                <book available="20">                    <title>Don Quixote</title>                    <author>Miguel de Cervantes</author>                </book>                <book available="14">                    <title>Catcher in the Rye</title>                   <author>JD Salinger</author>               </book>               <book available="13">                   <title>Alice in Wonderland</title>                   <author>Lewis Carroll</author>               </book>               <book available="5">                   <title>Don Quixote</title>                   <author>Miguel de Cervantes</author>               </book>           </books>       </value>    </response>'''

2.1. Simply traversing the tree

First thing we could do is to get a value using POJO’s notation. Let’sget the first book’s author’s name

Getting node value

def response = new XmlSlurper().parseText(books)def authorResult = response.value.books.book[0].authorassert authorResult.text() == 'Miguel de Cervantes'

First we parse the document withXmlSlurper and then we have toconsider the returning value as the root of the XML document, so inthis case is "response".

That’s why we start traversing the document from response and thenvalue.books.book[0].author. Note that inXPath the node arrays startsin [1] instead of [0], but becauseGPath is Java-based it begins atindex 0.

In the end we’ll have the instance of theauthor node and because wewanted the text inside that node we should be calling thetext()method. Theauthor node is an instance ofGPathResult type andtext() a method giving us the content of that node as a String.

When usingGPath with an XML parsed withXmlSlurper we’ll have as aresult aGPathResult object.GPathResult has many other convenientmethods to convert the text inside a node to any other type such as:

toInteger()
toFloat()
toBigInteger()
…

All these methods try to convert aString to the appropriate type.

If we were using an XML parsed withXmlParser we could be dealing withinstances of typeNode. But still all the actions applied toGPathResult in these examples could be applied to a Node aswell. Creators of both parsers took into accountGPath compatibility.

Next step is to get some values from a given node’s attribute. In the following samplewe want to get the first book’s author’s id. We’ll be using two different approaches. Let’s see the code first:

Getting an attribute’s value

def response = new XmlSlurper().parseText(books)def book = response.value.books.book[0](1)def bookAuthorId1 = book.@id(2)def bookAuthorId2 = book['@id'](3)assert bookAuthorId1 == '1'(4)assert bookAuthorId1.toInteger() == 1(5)assert bookAuthorId1 == bookAuthorId2

1	Getting the first book node
2	Getting the book’s id attribute`@id`
3	Getting the book’s id attribute with`map notation['@id']`
4	Getting the value as a String
5	Getting the value of the attribute as an`Integer`

As you can see there are two types of notations to get attributes,the

direct notation with@nameoftheattribute
map notation using['@nameoftheattribute']

Both of them are equally valid.

2.2. Flexible navigation with children (*), depthFirst (**) and breadthFirst

If you ever have used XPath, you may have used expressions like:

/following-sibling::othernode : Look for a node "othernode" in the same level
// : Look everywhere

More or less we have their counterparts in GPath with the shortcuts* (akachildren()) and** (akadepthFirst()).

The first example shows a simple use of*, which only iterates over the direct children of the node.

Using *

def response = new XmlSlurper().parseText(books)// .'*' could be replaced by .children()def catcherInTheRye = response.value.books.'*'.find { node ->    // node.@id == 2 could be expressed as node['@id'] == 2    node.name() == 'book' && node.@id == '2'}assert catcherInTheRye.title.text() == 'Catcher in the Rye'

This test searches for any child nodes of the "books" node matching the givencondition. In a bit more detail, the expression says:Look for any node witha tag name equal to 'book' having an id with a value of '2' directly underthe 'books' node.

This operation roughly corresponds to thebreadthFirst() method, except thatit only stops atone level instead of continuing to the inner levels.

What if we would like to look for a given valuewithout having to know exactly where it is. Let’s say that theonly thing we know is the id of the author "Lewis Carroll" . How arewe going to be able to find that book? Using** is the solution:

Using **

def response = new XmlSlurper().parseText(books)// .'**' could be replaced by .depthFirst()def bookId = response.'**'.find { book ->    book.author.text() == 'Lewis Carroll'}.@idassert bookId == 3

** is the same as looking for somethingeverywhere in thetree from this point down. In this case, we’ve used the methodfind(Closure cl) to find just the first occurrence.

What if we want to collect all book’s titles? That’s easy, just usefindAll:

def response = new XmlSlurper().parseText(books)def titles = response.'**'.findAll { node -> node.name() == 'title' }*.text()assert titles.size() == 4

In the last two examples,** is used as a shortcut for thedepthFirst()method. It goes as far down the tree as it can while navigating down thetree from a given node. ThebreadthFirst() method finishes off all nodeson a given level before traversing down to the next level.

The following example shows the difference between these two methods:

depthFirst() vs .breadthFirst

def response = new XmlSlurper().parseText(books)def nodeName = { node -> node.name() }def withId2or3 = { node -> node.@id in [2, 3] }assert ['book', 'author', 'book', 'author'] ==        response.value.books.depthFirst().findAll(withId2or3).collect(nodeName)assert ['book', 'book', 'author', 'author'] ==        response.value.books.breadthFirst().findAll(withId2or3).collect(nodeName)

In this example, we search for any nodes with an id attribute with value 2 or 3.There are bothbook andauthor nodes that match that criteria. The differenttraversal orders will find the same nodes in each case but in different orderscorresponding to how the tree was traversed.

It is worth mentioning again that there are some useful methodsconverting a node’s value to an integer, float, etc. Those methodscould be convenient when doing comparisons like this:

helpers

def response = new XmlSlurper().parseText(books)def titles = response.value.books.book.findAll { book ->    /* You can use toInteger() over the GPathResult object */    book.@id.toInteger() > 2}*.titleassert titles.size() == 2

In this case the number 2 has been hardcoded but imagine that valuecould have come from any other source (database… etc.).

3. Creating XML

The most commonly used approach for creating XML with Groovy is to usea builder, i.e. one of:

groovy.xml.MarkupBuilder
groovy.xml.StreamingMarkupBuilder

3.1. MarkupBuilder

Here is an example of using Groovy’s MarkupBuilder to create a new XML file:

Creating Xml with MarkupBuilder

def writer = new StringWriter()def xml = new MarkupBuilder(writer)(1)xml.records() {(2)    car(name: 'HSV Maloo', make: 'Holden', year: 2006) {        country('Australia')        record(type: 'speed', 'Production Pickup Truck with speed of 271kph')    }    car(name: 'Royale', make: 'Bugatti', year: 1931) {        country('France')        record(type: 'price', 'Most Valuable Car at $15 million')    }}def records = new XmlSlurper().parseText(writer.toString())(3)assert records.car.first().name.text() == 'HSV Maloo'assert records.car.last().name.text() == 'Royale'

1	Create an instance of`MarkupBuilder`
2	Start creating the XML tree
3	Create an instance of`XmlSlurper` to traverse and test thegenerated XML

Let’s take a look a little bit closer:

Creating XML elements

def xmlString = "<movie>the godfather</movie>"(1)def xmlWriter = new StringWriter()(2)def xmlMarkup = new MarkupBuilder(xmlWriter)xmlMarkup.movie("the godfather")(3)assert xmlString == xmlWriter.toString()(4)

1	We’re creating a reference string to compare against
2	The`xmlWriter` instance is used by`MarkupBuilder` to convert thexml representation to a String instance eventually
3	The`xmlMarkup.movie(…)` call will create an XML node with a tagcalled`movie` and with content`the godfather`.

Creating XML elements with attributes

def xmlString = "<movie id='2'>the godfather</movie>"def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter)xmlMarkup.movie(id: "2", "the godfather")(1)assert xmlString == xmlWriter.toString()

1	This time in order to create both attributes and node content youcan create as many map entries as you like and finally add a valueto set the node’s content

The value could be anyObject, the value will be serialized to itsString representation.

Creating XML nested elements

def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter)xmlMarkup.movie(id: 2) {(1)    name("the godfather")}def movie = new XmlSlurper().parseText(xmlWriter.toString())assert movie.@id == 2assert movie.name.text() == 'the godfather'

1	A closure represents the children elements of a given node. Noticethis time instead of using a String for the attribute we’re using anumber.

Sometimes you may want to use a specific namespace in your xml documents:

Namespace aware

def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter)xmlMarkup        .'x:movies'('xmlns:x': 'http://www.groovy-lang.org') {(1)    'x:movie'(id: 1, 'the godfather')    'x:movie'(id: 2, 'ronin')}def movies =        new XmlSlurper()(2)                .parseText(xmlWriter.toString())                .declareNamespace(x: 'http://www.groovy-lang.org')assert movies.'x:movie'.last().@id == 2assert movies.'x:movie'.last().text() == 'ronin'

1	Creating a node with a given namespace`xmlns:x`
2	Creating a`XmlSlurper` registering the namespace to be able totest the XML we just created

What about having some more meaningful example. We may want togenerate more elements, to have some logic when creating our XML:

Mix code

def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter)xmlMarkup        .'x:movies'('xmlns:x': 'http://www.groovy-lang.org') {    (1..3).each { n ->(1)        'x:movie'(id: n, "the godfather $n")        if (n % 2 == 0) {(2)            'x:movie'(id: n, "the godfather $n (Extended)")        }    }}def movies =        new XmlSlurper()                .parseText(xmlWriter.toString())                .declareNamespace(x: 'http://www.groovy-lang.org')assert movies.'x:movie'.size() == 4assert movies.'x:movie'*.text().every { name -> name.startsWith('the') }

1	Generating elements from a range
2	Using a conditional for creating a given element

Of course the instance of a builder can be passed as a parameter torefactor/modularize your code:

Mix code

def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter)(1)Closure<MarkupBuilder> buildMovieList = { MarkupBuilder builder ->    (1..3).each { n ->        builder.'x:movie'(id: n, "the godfather $n")        if (n % 2 == 0) {            builder.'x:movie'(id: n, "the godfather $n (Extended)")        }    }    return builder}xmlMarkup.'x:movies'('xmlns:x': 'http://www.groovy-lang.org') {    buildMovieList(xmlMarkup)(2)}def movies =        new XmlSlurper()                .parseText(xmlWriter.toString())                .declareNamespace(x: 'http://www.groovy-lang.org')assert movies.'x:movie'.size() == 4assert movies.'x:movie'*.text().every { name -> name.startsWith('the') }

1	In this case we’ve created a Closure to handle the creation of a list of movies
2	Just using the`buildMovieList` function when necessary

3.2. StreamingMarkupBuilder

The classgroovy.xml.StreamingMarkupBuilder is a builder class forcreating XML markup. This implementation uses agroovy.xml.streamingmarkupsupport.StreamingMarkupWriter to handleoutput.

Using StreamingMarkupBuilder

def xml = new StreamingMarkupBuilder().bind {(1)    records {        car(name: 'HSV Maloo', make: 'Holden', year: 2006) {(2)            country('Australia')            record(type: 'speed', 'Production Pickup Truck with speed of 271kph')        }        car(name: 'P50', make: 'Peel', year: 1962) {            country('Isle of Man')            record(type: 'size', 'Smallest Street-Legal Car at 99cm wide and 59 kg in weight')        }        car(name: 'Royale', make: 'Bugatti', year: 1931) {            country('France')            record(type: 'price', 'Most Valuable Car at $15 million')        }    }}def records = new XmlSlurper().parseText(xml.toString())(3)assert records.car.size() == 3assert records.car.find { it.@name == 'P50' }.country.text() == 'Isle of Man'

1	Note that`StreamingMarkupBuilder.bind` returns a`Writable`instance that may be used to stream the markup to a Writer
2	We’re capturing the output in a String to parse it again and checkthe structure of the generated XML with`XmlSlurper`.

3.3. MarkupBuilderHelper

Thegroovy.xml.MarkupBuilderHelper is, as its name reflects, ahelper forgroovy.xml.MarkupBuilder.

This helper normally can be accessed from within an instance of classgroovy.xml.MarkupBuilder or an instance ofgroovy.xml.StreamingMarkupBuilder.

This helper could be handy in situations when you may want to:

Produce a comment in the output
Produce an XML processing instruction in the output
Produce an XML declaration in the output
Print data in the body of the current tag, escaping XML entities
Print data in the body of the current tag

In bothMarkupBuilder andStreamingMarkupBuilder this helper isaccessed by the propertymkp:

Using MarkupBuilder’s 'mkp'

def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter).rules {    mkp.comment('THIS IS THE MAIN RULE')(1)    rule(sentence: mkp.yield('3 > n'))(2)}(3)assert xmlWriter.toString().contains('3 &gt; n')assert xmlWriter.toString().contains('<!-- THIS IS THE MAIN RULE -->')

1	Using`mkp` to create a comment in the XML
2	Using`mkp` to generate an escaped value
3	Checking both assumptions were true

Here is another example to show the use ofmkp property accessiblefrom within thebind method scope when usingStreamingMarkupBuilder:

Using StreamingMarkupBuilder’s 'mkp'

def xml = new StreamingMarkupBuilder().bind {    records {        car(name: mkp.yield('3 < 5'))(1)        car(name: mkp.yieldUnescaped('1 < 3'))(2)    }}assert xml.toString().contains('3 &lt; 5')assert xml.toString().contains('1 < 3')

1	If we want to generate an escaped value for the name attribute with`mkp.yield`
2	Checking the values later on with`XmlSlurper`

3.4. DOMToGroovy

Suppose we have an existing XML document and we want to automategeneration of the markup without having to type it all in? We justneed to useorg.codehaus.groovy.tools.xml.DOMToGroovy as shown inthe following example:

Building MarkupBuilder from DOMToGroovy

def songs = """    <songs>      <song>        <title>Here I go</title>        <band>Whitesnake</band>      </song>    </songs>"""def builder =        javax.xml.parsers.DocumentBuilderFactory.newInstance().newDocumentBuilder()def inputStream = new ByteArrayInputStream(songs.bytes)def document = builder.parse(inputStream)def output = new StringWriter()def converter = new DomToGroovy(new PrintWriter(output))(1)converter.print(document)(2)String xmlRecovered =        new GroovyShell()                .evaluate("""           def writer = new StringWriter()           def builder = new groovy.xml.MarkupBuilder(writer)           builder.${output}           return writer.toString()        """)(3)assert new XmlSlurper().parseText(xmlRecovered).song.title.text() == 'Here I go'(4)

1	Creating`DOMToGroovy` instance
2	Converts the XML to`MarkupBuilder` calls which are available in the output`StringWriter`
3	Using`output` variable to create the whole MarkupBuilder
4	Back to XML string

4. Manipulating XML

In this chapter you’ll see the different ways of adding / modifying /removing nodes usingXmlSlurper orXmlParser. The xml we are goingto be handling is the following:

def xml = """<response version-api="2.0">    <value>        <books>            <book>                <title>Don Quixote</title>                <author>Miguel de Cervantes</author>            </book>        </books>    </value></response>"""

4.1. Adding nodes

The main difference betweenXmlSlurper andXmlParser is that whenformer creates the nodes they won’t be available until the document’sbeen evaluated again, so you should parse the transformed documentagain in order to be able to see the new nodes. So keep that in mindwhen choosing any of both approaches.

If you needed to see a node right after creating it thenXmlParsershould be your choice, but if you’re planning to do many changes tothe XML and send the result to another process maybeXmlSlurper wouldbe more efficient.

You can’t create a new node directly using theXmlSlurper instance,but you can withXmlParser. The way of creating a new node fromXmlParser is through its methodcreateNode(..)

def parser = new XmlParser()def response = parser.parseText(xml)def numberOfResults = parser.createNode(        response,        new QName("numberOfResults"),        [:])numberOfResults.value = "1"assert response.numberOfResults.text() == "1"

ThecreateNode() method receives the following parameters:

parent node (could be null)
The qualified name for the tag (In this case we only use the localpart without any namespace). We’re using an instance ofgroovy.namespace.QName
A map with the tag’s attributes (None in this particular case)

Anyway you won’t normally be creating a node from the parser instancebut from the parsed XML instance. That is from aNode or aGPathResult instance.

Take a look at the next example. We are parsing the xml withXmlParserand then creating a new node from the parsed document’s instance(Notice the method here is slightly different in the way it receivesthe parameters):

def parser = new XmlParser()def response = parser.parseText(xml)response.appendNode(        new QName("numberOfResults"),        [:],        "1")response.numberOfResults.text() == "1"

When usingXmlSlurper,GPathResult instances don’t havecreateNode()method.

4.2. Modifying / Removing nodes

We know how to parse the document, add new nodes, now I want to changea given node’s content. Let’s start usingXmlParser andNode. Thisexample changes the first book information to actually another book.

def response = new XmlParser().parseText(xml)/* Use the same syntax as groovy.xml.MarkupBuilder */response.value.books.book[0].replaceNode {(1)    book(id: "3") {        title("To Kill a Mockingbird")        author(id: "3", "Harper Lee")    }}def newNode = response.value.books.book[0]assert newNode.name() == "book"assert newNode.@id == "3"assert newNode.title.text() == "To Kill a Mockingbird"assert newNode.author.text() == "Harper Lee"assert newNode.author.@id.first() == "3"

When usingreplaceNode() the closure we pass as parameter shouldfollow the same rules as if we were usinggroovy.xml.MarkupBuilder:

Here’s the same example usingXmlSlurper:

def response = new XmlSlurper().parseText(books)/* Use the same syntax as groovy.xml.MarkupBuilder */response.value.books.book[0].replaceNode {    book(id: "3") {        title("To Kill a Mockingbird")        author(id: "3", "Harper Lee")    }}assert response.value.books.book[0].title.text() == "Don Quixote"/* That mkp is a special namespace used to escape away from the normal building mode   of the builder and get access to helper markup methods   'yield', 'pi', 'comment', 'out', 'namespaces', 'xmlDeclaration' and   'yieldUnescaped' */def result = new StreamingMarkupBuilder().bind { mkp.yield response }.toString()def changedResponse = new XmlSlurper().parseText(result)assert changedResponse.value.books.book[0].title.text() == "To Kill a Mockingbird"

Notice how usingXmlSlurper we have to parse the transformed documentagain in order to find the created nodes. In this particular examplecould be a little bit annoying isn’t it?

Finally both parsers also use the same approach for adding a newattribute to a given attribute. This time again the difference iswhether you want the new nodes to be available right away ornot. FirstXmlParser:

def parser = new XmlParser()def response = parser.parseText(xml)response.@numberOfResults = "1"assert response.@numberOfResults == "1"

AndXmlSlurper:

def response = new XmlSlurper().parseText(books)response.@numberOfResults = "2"assert response.@numberOfResults == "2"

When usingXmlSlurper, adding a new attribute doesnot require you to perform a new evaluation.

4.3. Printing XML

4.3.1. XmlUtil

Sometimes is useful to get not only the value of a given node but thenode itself (for instance to add this node to another XML).

For that you can usegroovy.xml.XmlUtil class. It has several staticmethods to serialize the xml fragment from several type of sources(Node, GPathResult, String…)

Getting a node as a string

def response = new XmlParser().parseText(xml)def nodeToSerialize = response.'**'.find { it.name() == 'author' }def nodeAsText = XmlUtil.serialize(nodeToSerialize)assert nodeAsText ==        XmlUtil.serialize('<?xml version="1.0" encoding="UTF-8"?><author>Miguel de Cervantes</author>')

Movatterモバイル変換

Processing XML