The most commonly used approach for parsing XML with Groovy is to useone of:
groovy.xml.XmlParser
groovy.xml.XmlSlurper
Both have the same approach to parse an XML. Both come with a bunch ofoverloaded parse methods plus some special methods such asparseText,parseFile and others. For the next example we will use theparseTextmethod. It parses an XMLString and recursively converts it to a listor map of objects.
def text = ''' <list> <technology> <name>Groovy</name> </technology> </list>'''def list = new XmlSlurper().parseText(text)(1)assert list instanceof groovy.xml.slurpersupport.GPathResult(2)assert list.technology.name == 'Groovy'(3)| 1 | Parsing the XML and returning the root node as a GPathResult |
| 2 | Checking we’re using a GPathResult |
| 3 | Traversing the tree in a GPath style |
def text = ''' <list> <technology> <name>Groovy</name> </technology> </list>'''def list = new XmlParser().parseText(text)(1)assert list instanceof groovy.util.Node(2)assert list.technology.name.text() == 'Groovy'(3)| 1 | Parsing the XML and returning the root node as a Node |
| 2 | Checking we’re using a Node |
| 3 | Traversing the tree in a GPath style |
Let’s see thesimilarities betweenXMLParser andXMLSlurper first:
Both are based onSAX so they both are low memory footprint
Both can update/transform the XML
But they have keydifferences:
XmlSlurper evaluates the structure lazily. So if you update the xmlyou’ll have to evaluate the whole tree again.
XmlSlurper returnsGPathResult instances when parsing XML
XmlParser returnsNode objects when parsing XML
When to use one or the another?
| There is a discussion atStackOverflow. Theconclusions written here are based partially on this entry. |
If you want to transform an existing document to another thenXmlSlurper will be the choice
If you want to update and read at the same time thenXmlParser isthe choice.
The rationale behind this is that every time you create a node withXmlSlurper it won’t be available until you parse the document againwith anotherXmlSlurper instance. Need to read just a few nodesXmlSlurper is for you ".
If you just have to read a few nodesXmlSlurper should be yourchoice, since it will not have to create a complete structure inmemory"
In general both classes perform similar way. Even the way of usingGPath expressions with them are the same (both usebreadthFirst() anddepthFirst() expressions). So I guess it depends on the write/readfrequency.
There is another way of parsing XML documents with Groovy with theuse ofgroovy.xml.dom.DOMCategory which is a category class whichadds GPath style operations to Java’s DOM classes.
Java has in-built support for DOM processing of XML using classesrepresenting the various parts of XML documents, e.g.Document,Element,NodeList,Attr etc. For more information about these classes,refer to the respective JavaDocs. |
Having an XML like the following:
static def CAR_RECORDS = '''<records> <car name='HSV Maloo' make='Holden' year='2006'> <country>Australia</country> <record type='speed'>Production Pickup Truck with speed of 271kph</record> </car> <car name='P50' make='Peel' year='1962'> <country>Isle of Man</country> <record type='size'>Smallest Street-Legal Car at 99cm wide and 59 kg in weight</record> </car> <car name='Royale' make='Bugatti' year='1931'> <country>France</country> <record type='price'>Most Valuable Car at $15 million</record> </car></records>'''You can parse it usinggroovy.xml.DOMBuilder andgroovy.xml.dom.DOMCategory.
def reader = new StringReader(CAR_RECORDS)def doc = DOMBuilder.parse(reader)(1)def records = doc.documentElementuse(DOMCategory) {(2) assert records.car.size() == 3}| 1 | Parsing the XML |
| 2 | CreatingDOMCategory scope to be able to use helper method calls |
The most common way of querying XML in Groovy is usingGPath:
GPath is a path expression language integrated into Groovy whichallows parts of nested structured data to be identified. In thissense, it has similar aims and scope as XPath does for XML. The twomain places where you use GPath expressions is when dealing withnested POJOs or when dealing with XML
It is similar toXPathexpressions and you can use it not only with XML but also with POJOclasses. As an example, you can specify a path to an object or elementof interest:
a.b.c → for XML, yields all the<c> elements inside<b> inside<a>
a.b.c → all POJOs, yields the<c> properties for all the<b>properties of<a> (sort of like a.getB().getC() in JavaBeans)
For XML, you can also specify attributes, e.g.:
a["@href"] → the href attribute of all the a elements
a.'@href' → an alternative way of expressing this
a.@href → an alternative way of expressing this when using XmlSlurper
Let’s illustrate this with an example:
static final String books = ''' <response version-api="2.0"> <value> <books> <book available="20"> <title>Don Quixote</title> <author>Miguel de Cervantes</author> </book> <book available="14"> <title>Catcher in the Rye</title> <author>JD Salinger</author> </book> <book available="13"> <title>Alice in Wonderland</title> <author>Lewis Carroll</author> </book> <book available="5"> <title>Don Quixote</title> <author>Miguel de Cervantes</author> </book> </books> </value> </response>'''First thing we could do is to get a value using POJO’s notation. Let’sget the first book’s author’s name
def response = new XmlSlurper().parseText(books)def authorResult = response.value.books.book[0].authorassert authorResult.text() == 'Miguel de Cervantes'First we parse the document withXmlSlurper and then we have toconsider the returning value as the root of the XML document, so inthis case is "response".
That’s why we start traversing the document from response and thenvalue.books.book[0].author. Note that inXPath the node arrays startsin [1] instead of [0], but becauseGPath is Java-based it begins atindex 0.
In the end we’ll have the instance of theauthor node and because wewanted the text inside that node we should be calling thetext()method. Theauthor node is an instance ofGPathResult type andtext() a method giving us the content of that node as a String.
When usingGPath with an XML parsed withXmlSlurper we’ll have as aresult aGPathResult object.GPathResult has many other convenientmethods to convert the text inside a node to any other type such as:
toInteger()
toFloat()
toBigInteger()
…
All these methods try to convert aString to the appropriate type.
If we were using an XML parsed withXmlParser we could be dealing withinstances of typeNode. But still all the actions applied toGPathResult in these examples could be applied to a Node aswell. Creators of both parsers took into accountGPath compatibility.
Next step is to get some values from a given node’s attribute. In the following samplewe want to get the first book’s author’s id. We’ll be using two different approaches. Let’s see the code first:
def response = new XmlSlurper().parseText(books)def book = response.value.books.book[0](1)def bookAuthorId1 = book.@id(2)def bookAuthorId2 = book['@id'](3)assert bookAuthorId1 == '1'(4)assert bookAuthorId1.toInteger() == 1(5)assert bookAuthorId1 == bookAuthorId2| 1 | Getting the first book node |
| 2 | Getting the book’s id attribute@id |
| 3 | Getting the book’s id attribute withmap notation['@id'] |
| 4 | Getting the value as a String |
| 5 | Getting the value of the attribute as anInteger |
As you can see there are two types of notations to get attributes,the
direct notation with@nameoftheattribute
map notation using['@nameoftheattribute']
Both of them are equally valid.
If you ever have used XPath, you may have used expressions like:
/following-sibling::othernode : Look for a node "othernode" in the same level
// : Look everywhere
More or less we have their counterparts in GPath with the shortcuts* (akachildren()) and** (akadepthFirst()).
The first example shows a simple use of*, which only iterates over the direct children of the node.
def response = new XmlSlurper().parseText(books)// .'*' could be replaced by .children()def catcherInTheRye = response.value.books.'*'.find { node -> // node.@id == 2 could be expressed as node['@id'] == 2 node.name() == 'book' && node.@id == '2'}assert catcherInTheRye.title.text() == 'Catcher in the Rye'This test searches for any child nodes of the "books" node matching the givencondition. In a bit more detail, the expression says:Look for any node witha tag name equal to 'book' having an id with a value of '2' directly underthe 'books' node.
This operation roughly corresponds to thebreadthFirst() method, except thatit only stops atone level instead of continuing to the inner levels.
What if we would like to look for a given valuewithout having to know exactly where it is. Let’s say that theonly thing we know is the id of the author "Lewis Carroll" . How arewe going to be able to find that book? Using** is the solution:
def response = new XmlSlurper().parseText(books)// .'**' could be replaced by .depthFirst()def bookId = response.'**'.find { book -> book.author.text() == 'Lewis Carroll'}.@idassert bookId == 3** is the same as looking for somethingeverywhere in thetree from this point down. In this case, we’ve used the methodfind(Closure cl) to find just the first occurrence.
What if we want to collect all book’s titles? That’s easy, just usefindAll:
def response = new XmlSlurper().parseText(books)def titles = response.'**'.findAll { node -> node.name() == 'title' }*.text()assert titles.size() == 4In the last two examples,** is used as a shortcut for thedepthFirst()method. It goes as far down the tree as it can while navigating down thetree from a given node. ThebreadthFirst() method finishes off all nodeson a given level before traversing down to the next level.
The following example shows the difference between these two methods:
def response = new XmlSlurper().parseText(books)def nodeName = { node -> node.name() }def withId2or3 = { node -> node.@id in [2, 3] }assert ['book', 'author', 'book', 'author'] == response.value.books.depthFirst().findAll(withId2or3).collect(nodeName)assert ['book', 'book', 'author', 'author'] == response.value.books.breadthFirst().findAll(withId2or3).collect(nodeName)In this example, we search for any nodes with an id attribute with value 2 or 3.There are bothbook andauthor nodes that match that criteria. The differenttraversal orders will find the same nodes in each case but in different orderscorresponding to how the tree was traversed.
It is worth mentioning again that there are some useful methodsconverting a node’s value to an integer, float, etc. Those methodscould be convenient when doing comparisons like this:
def response = new XmlSlurper().parseText(books)def titles = response.value.books.book.findAll { book -> /* You can use toInteger() over the GPathResult object */ book.@id.toInteger() > 2}*.titleassert titles.size() == 2In this case the number 2 has been hardcoded but imagine that valuecould have come from any other source (database… etc.).
The most commonly used approach for creating XML with Groovy is to usea builder, i.e. one of:
groovy.xml.MarkupBuilder
groovy.xml.StreamingMarkupBuilder
Here is an example of using Groovy’s MarkupBuilder to create a new XML file:
def writer = new StringWriter()def xml = new MarkupBuilder(writer)(1)xml.records() {(2) car(name: 'HSV Maloo', make: 'Holden', year: 2006) { country('Australia') record(type: 'speed', 'Production Pickup Truck with speed of 271kph') } car(name: 'Royale', make: 'Bugatti', year: 1931) { country('France') record(type: 'price', 'Most Valuable Car at $15 million') }}def records = new XmlSlurper().parseText(writer.toString())(3)assert records.car.first().name.text() == 'HSV Maloo'assert records.car.last().name.text() == 'Royale'| 1 | Create an instance ofMarkupBuilder |
| 2 | Start creating the XML tree |
| 3 | Create an instance ofXmlSlurper to traverse and test thegenerated XML |
Let’s take a look a little bit closer:
def xmlString = "<movie>the godfather</movie>"(1)def xmlWriter = new StringWriter()(2)def xmlMarkup = new MarkupBuilder(xmlWriter)xmlMarkup.movie("the godfather")(3)assert xmlString == xmlWriter.toString()(4)| 1 | We’re creating a reference string to compare against |
| 2 | ThexmlWriter instance is used byMarkupBuilder to convert thexml representation to a String instance eventually |
| 3 | ThexmlMarkup.movie(…) call will create an XML node with a tagcalledmovie and with contentthe godfather. |
def xmlString = "<movie id='2'>the godfather</movie>"def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter)xmlMarkup.movie(id: "2", "the godfather")(1)assert xmlString == xmlWriter.toString()| 1 | This time in order to create both attributes and node content youcan create as many map entries as you like and finally add a valueto set the node’s content |
The value could be anyObject, the value will be serialized to itsString representation. |
def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter)xmlMarkup.movie(id: 2) {(1) name("the godfather")}def movie = new XmlSlurper().parseText(xmlWriter.toString())assert movie.@id == 2assert movie.name.text() == 'the godfather'| 1 | A closure represents the children elements of a given node. Noticethis time instead of using a String for the attribute we’re using anumber. |
Sometimes you may want to use a specific namespace in your xml documents:
def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter)xmlMarkup .'x:movies'('xmlns:x': 'http://www.groovy-lang.org') {(1) 'x:movie'(id: 1, 'the godfather') 'x:movie'(id: 2, 'ronin')}def movies = new XmlSlurper()(2) .parseText(xmlWriter.toString()) .declareNamespace(x: 'http://www.groovy-lang.org')assert movies.'x:movie'.last().@id == 2assert movies.'x:movie'.last().text() == 'ronin'| 1 | Creating a node with a given namespacexmlns:x |
| 2 | Creating aXmlSlurper registering the namespace to be able totest the XML we just created |
What about having some more meaningful example. We may want togenerate more elements, to have some logic when creating our XML:
def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter)xmlMarkup .'x:movies'('xmlns:x': 'http://www.groovy-lang.org') { (1..3).each { n ->(1) 'x:movie'(id: n, "the godfather $n") if (n % 2 == 0) {(2) 'x:movie'(id: n, "the godfather $n (Extended)") } }}def movies = new XmlSlurper() .parseText(xmlWriter.toString()) .declareNamespace(x: 'http://www.groovy-lang.org')assert movies.'x:movie'.size() == 4assert movies.'x:movie'*.text().every { name -> name.startsWith('the') }| 1 | Generating elements from a range |
| 2 | Using a conditional for creating a given element |
Of course the instance of a builder can be passed as a parameter torefactor/modularize your code:
def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter)(1)Closure<MarkupBuilder> buildMovieList = { MarkupBuilder builder -> (1..3).each { n -> builder.'x:movie'(id: n, "the godfather $n") if (n % 2 == 0) { builder.'x:movie'(id: n, "the godfather $n (Extended)") } } return builder}xmlMarkup.'x:movies'('xmlns:x': 'http://www.groovy-lang.org') { buildMovieList(xmlMarkup)(2)}def movies = new XmlSlurper() .parseText(xmlWriter.toString()) .declareNamespace(x: 'http://www.groovy-lang.org')assert movies.'x:movie'.size() == 4assert movies.'x:movie'*.text().every { name -> name.startsWith('the') }| 1 | In this case we’ve created a Closure to handle the creation of a list of movies |
| 2 | Just using thebuildMovieList function when necessary |
The classgroovy.xml.StreamingMarkupBuilder is a builder class forcreating XML markup. This implementation uses agroovy.xml.streamingmarkupsupport.StreamingMarkupWriter to handleoutput.
def xml = new StreamingMarkupBuilder().bind {(1) records { car(name: 'HSV Maloo', make: 'Holden', year: 2006) {(2) country('Australia') record(type: 'speed', 'Production Pickup Truck with speed of 271kph') } car(name: 'P50', make: 'Peel', year: 1962) { country('Isle of Man') record(type: 'size', 'Smallest Street-Legal Car at 99cm wide and 59 kg in weight') } car(name: 'Royale', make: 'Bugatti', year: 1931) { country('France') record(type: 'price', 'Most Valuable Car at $15 million') } }}def records = new XmlSlurper().parseText(xml.toString())(3)assert records.car.size() == 3assert records.car.find { it.@name == 'P50' }.country.text() == 'Isle of Man'| 1 | Note thatStreamingMarkupBuilder.bind returns aWritableinstance that may be used to stream the markup to a Writer |
| 2 | We’re capturing the output in a String to parse it again and checkthe structure of the generated XML withXmlSlurper. |
Thegroovy.xml.MarkupBuilderHelper is, as its name reflects, ahelper forgroovy.xml.MarkupBuilder.
This helper normally can be accessed from within an instance of classgroovy.xml.MarkupBuilder or an instance ofgroovy.xml.StreamingMarkupBuilder.
This helper could be handy in situations when you may want to:
Produce a comment in the output
Produce an XML processing instruction in the output
Produce an XML declaration in the output
Print data in the body of the current tag, escaping XML entities
Print data in the body of the current tag
In bothMarkupBuilder andStreamingMarkupBuilder this helper isaccessed by the propertymkp:
def xmlWriter = new StringWriter()def xmlMarkup = new MarkupBuilder(xmlWriter).rules { mkp.comment('THIS IS THE MAIN RULE')(1) rule(sentence: mkp.yield('3 > n'))(2)}(3)assert xmlWriter.toString().contains('3 > n')assert xmlWriter.toString().contains('<!-- THIS IS THE MAIN RULE -->')| 1 | Usingmkp to create a comment in the XML |
| 2 | Usingmkp to generate an escaped value |
| 3 | Checking both assumptions were true |
Here is another example to show the use ofmkp property accessiblefrom within thebind method scope when usingStreamingMarkupBuilder:
def xml = new StreamingMarkupBuilder().bind { records { car(name: mkp.yield('3 < 5'))(1) car(name: mkp.yieldUnescaped('1 < 3'))(2) }}assert xml.toString().contains('3 < 5')assert xml.toString().contains('1 < 3')| 1 | If we want to generate an escaped value for the name attribute withmkp.yield |
| 2 | Checking the values later on withXmlSlurper |
Suppose we have an existing XML document and we want to automategeneration of the markup without having to type it all in? We justneed to useorg.codehaus.groovy.tools.xml.DOMToGroovy as shown inthe following example:
def songs = """ <songs> <song> <title>Here I go</title> <band>Whitesnake</band> </song> </songs>"""def builder = javax.xml.parsers.DocumentBuilderFactory.newInstance().newDocumentBuilder()def inputStream = new ByteArrayInputStream(songs.bytes)def document = builder.parse(inputStream)def output = new StringWriter()def converter = new DomToGroovy(new PrintWriter(output))(1)converter.print(document)(2)String xmlRecovered = new GroovyShell() .evaluate(""" def writer = new StringWriter() def builder = new groovy.xml.MarkupBuilder(writer) builder.${output} return writer.toString() """)(3)assert new XmlSlurper().parseText(xmlRecovered).song.title.text() == 'Here I go'(4)| 1 | CreatingDOMToGroovy instance |
| 2 | Converts the XML toMarkupBuilder calls which are available in the outputStringWriter |
| 3 | Usingoutput variable to create the whole MarkupBuilder |
| 4 | Back to XML string |
In this chapter you’ll see the different ways of adding / modifying /removing nodes usingXmlSlurper orXmlParser. The xml we are goingto be handling is the following:
def xml = """<response version-api="2.0"> <value> <books> <book> <title>Don Quixote</title> <author>Miguel de Cervantes</author> </book> </books> </value></response>"""The main difference betweenXmlSlurper andXmlParser is that whenformer creates the nodes they won’t be available until the document’sbeen evaluated again, so you should parse the transformed documentagain in order to be able to see the new nodes. So keep that in mindwhen choosing any of both approaches.
If you needed to see a node right after creating it thenXmlParsershould be your choice, but if you’re planning to do many changes tothe XML and send the result to another process maybeXmlSlurper wouldbe more efficient.
You can’t create a new node directly using theXmlSlurper instance,but you can withXmlParser. The way of creating a new node fromXmlParser is through its methodcreateNode(..)
def parser = new XmlParser()def response = parser.parseText(xml)def numberOfResults = parser.createNode( response, new QName("numberOfResults"), [:])numberOfResults.value = "1"assert response.numberOfResults.text() == "1"ThecreateNode() method receives the following parameters:
parent node (could be null)
The qualified name for the tag (In this case we only use the localpart without any namespace). We’re using an instance ofgroovy.namespace.QName
A map with the tag’s attributes (None in this particular case)
Anyway you won’t normally be creating a node from the parser instancebut from the parsed XML instance. That is from aNode or aGPathResult instance.
Take a look at the next example. We are parsing the xml withXmlParserand then creating a new node from the parsed document’s instance(Notice the method here is slightly different in the way it receivesthe parameters):
def parser = new XmlParser()def response = parser.parseText(xml)response.appendNode( new QName("numberOfResults"), [:], "1")response.numberOfResults.text() == "1"When usingXmlSlurper,GPathResult instances don’t havecreateNode()method.
We know how to parse the document, add new nodes, now I want to changea given node’s content. Let’s start usingXmlParser andNode. Thisexample changes the first book information to actually another book.
def response = new XmlParser().parseText(xml)/* Use the same syntax as groovy.xml.MarkupBuilder */response.value.books.book[0].replaceNode {(1) book(id: "3") { title("To Kill a Mockingbird") author(id: "3", "Harper Lee") }}def newNode = response.value.books.book[0]assert newNode.name() == "book"assert newNode.@id == "3"assert newNode.title.text() == "To Kill a Mockingbird"assert newNode.author.text() == "Harper Lee"assert newNode.author.@id.first() == "3"When usingreplaceNode() the closure we pass as parameter shouldfollow the same rules as if we were usinggroovy.xml.MarkupBuilder:
Here’s the same example usingXmlSlurper:
def response = new XmlSlurper().parseText(books)/* Use the same syntax as groovy.xml.MarkupBuilder */response.value.books.book[0].replaceNode { book(id: "3") { title("To Kill a Mockingbird") author(id: "3", "Harper Lee") }}assert response.value.books.book[0].title.text() == "Don Quixote"/* That mkp is a special namespace used to escape away from the normal building mode of the builder and get access to helper markup methods 'yield', 'pi', 'comment', 'out', 'namespaces', 'xmlDeclaration' and 'yieldUnescaped' */def result = new StreamingMarkupBuilder().bind { mkp.yield response }.toString()def changedResponse = new XmlSlurper().parseText(result)assert changedResponse.value.books.book[0].title.text() == "To Kill a Mockingbird"Notice how usingXmlSlurper we have to parse the transformed documentagain in order to find the created nodes. In this particular examplecould be a little bit annoying isn’t it?
Finally both parsers also use the same approach for adding a newattribute to a given attribute. This time again the difference iswhether you want the new nodes to be available right away ornot. FirstXmlParser:
def parser = new XmlParser()def response = parser.parseText(xml)response.@numberOfResults = "1"assert response.@numberOfResults == "1"AndXmlSlurper:
def response = new XmlSlurper().parseText(books)response.@numberOfResults = "2"assert response.@numberOfResults == "2"When usingXmlSlurper, adding a new attribute doesnot require you to perform a new evaluation.
Sometimes is useful to get not only the value of a given node but thenode itself (for instance to add this node to another XML).
For that you can usegroovy.xml.XmlUtil class. It has several staticmethods to serialize the xml fragment from several type of sources(Node, GPathResult, String…)
def response = new XmlParser().parseText(xml)def nodeToSerialize = response.'**'.find { it.name() == 'author' }def nodeAsText = XmlUtil.serialize(nodeToSerialize)assert nodeAsText == XmlUtil.serialize('<?xml version="1.0" encoding="UTF-8"?><author>Miguel de Cervantes</author>')