5
\$\begingroup\$

I have started learning XML parsing in Java (currently reading Java & XML by Brett McLaughlin). Here is my first program to parse XML using SAX parser. Please review the code and let me know if that is correct way of parsing books.xml and storing into to aList<Book> (The code is working fine. I just want to ensure I'm doing the right way, as per industry standards).

books.xml: (only partial xml is presented)

<?xml version="1.0"?><catalog>   <book>      <author>Gambardella, Matthew</author>      <title>XML Developer's Guide</title>      <genre>Computer</genre>      <price>44.95</price>      <publish_date>2000-10-01</publish_date>      <description>An in-depth look at creating applications       with XML.</description>   </book>   <book>      <author>Ralls, Kim</author>      <title>Midnight Rain</title>      <genre>Fantasy</genre>      <price>5.95</price>      <publish_date>2000-12-16</publish_date>      <description>A former architect battles corporate zombies,       an evil sorceress, and her own childhood to become queen       of the world.</description>   </book></catalog>

Content Handler:

public class MyFirstContentHandler implements ContentHandler {    private List<Book> books = new ArrayList<Book>();    private Book book = null;    private String elementName = null;    @Override    public void characters(char[] ch, int start, int length)            throws SAXException {        // TODO Auto-generated method stub        String value = new String(ch, start, length);            if(null != elementName) {                switch(elementName) {                case "author":                    book.setAuthor(value);                    elementName = null;                    break;                case "title":                    book.setTitle(value);                    elementName = null;                    break;                case "genre":                    book.setGenre(value);                    elementName = null;                    break;                case "price":                    book.setPrice(Double.parseDouble(value));                    elementName = null;                    break;                case "publish_date":                    DateFormat dateFormat = new SimpleDateFormat("yyyy-mm-dd");                    try {                        book.setPublishDate(dateFormat.parse(value));                    } catch (ParseException e) {                        // TODO Auto-generated catch block                        e.printStackTrace();                    }                    elementName = null;                    break;                case "description":                    book.setDescription(value);                    elementName = null;                    break;            }        }    }    @Override    public void endDocument() throws SAXException {        // TODO Auto-generated method stub    }    @Override    public void endElement(String uri, String localName, String qName)            throws SAXException {        // TODO Auto-generated method stub        if(localName.equals("book")) {            books.add(book);            book = null;        }    }    @Override    public void endPrefixMapping(String prefix) throws SAXException {        // TODO Auto-generated method stub    }    @Override    public void ignorableWhitespace(char[] ch, int start, int length)            throws SAXException {        // TODO Auto-generated method stub    }    @Override    public void processingInstruction(String target, String data)            throws SAXException {        // TODO Auto-generated method stub    }    @Override    public void setDocumentLocator(Locator locator) {        // TODO Auto-generated method stub    }    @Override    public void skippedEntity(String name) throws SAXException {        // TODO Auto-generated method stub    }    @Override    public void startDocument() throws SAXException {        // TODO Auto-generated method stub    }    @Override    public void startElement(String uri, String localName, String qName,            Attributes atts) throws SAXException {        // TODO Auto-generated method stub//      System.out.println(localName);//      System.out.println(atts.getValue(atts.getLocalName(0)));        elementName = localName;        if(localName.equals("book"))            book = new Book(atts.getValue(atts.getLocalName(0)));    }    @Override    public void startPrefixMapping(String prefix, String uri)            throws SAXException {        // TODO Auto-generated method stub    }    public List<Book> getBooks() {        return books;    }}

Main Program: (I iterate over the List and print the book details)

public class MyFirstSAXParser {    public static void main(String args[]) {        try {            //Vreate instance (xml reader) needed for parsing            XMLReader xmlReader = XMLReaderFactory.createXMLReader();            //Register content handler            MyFirstContentHandler contentHandler = new MyFirstContentHandler();            xmlReader.setContentHandler(contentHandler);            //Register error handler            //Parse            InputSource inputSource = new InputSource(new FileReader("books.xml"));            xmlReader.parse(inputSource);            List<Book> books = contentHandler.getBooks();            for(Book book: books) {                System.out.println("Book:");                System.out.println("\tId: "+book.getId());                System.out.println("\tAuthor: "+book.getAuthor());                System.out.println("\tTitle: "+book.getTitle());                System.out.println("\tGenre: "+book.getGenre());                System.out.println("\tPrice: "+book.getPrice());                System.out.println("\tPublish Date: "+book.getPublishDate());                System.out.println("\tDescription: "+book.getDescription());            }        } catch (SAXException e) {            // TODO Auto-generated catch block            e.printStackTrace();        } catch (IOException e) {            e.printStackTrace();        }    }}
Jamal's user avatar
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
askedJul 7, 2015 at 0:48
JavaHopper's user avatar
\$\endgroup\$
1
  • \$\begingroup\$Welcome to CR! Are you on Java 8?\$\endgroup\$CommentedJul 7, 2015 at 1:32

1 Answer1

5
\$\begingroup\$

Your code is, in general well structured, and neat.

You have a bunch of auto-generated content, like comments, and TODO items. You should remove those to indicate they are handled.

As for the sax parsing, you have done pretty well, though there are some bugs and issues you have to address.... and it all boils down to thecharacters method.

The characters method can be called multiple times inside any element. Typically the SAX parser reads chunks of data, and, if the chunk ends half-way through the text of an element, you may end up with onecharacters call for the last part of one chunk, and another for the first part of the next chunk.

This makes it very hard to put decision logic inside the characters method like you have done.

Instead, you should put the logic you want inside thestartElement() andendElement() methods, and do simple String concatenation inside thecharacters method.

Consider something like:

private static final String[] dataTags = {"author", "title", "genre", "price", "publish_date", "description"};private static Set<String> dataTagSet = new HashSet<>(Arrays.asList(dataTags));

That sets up a set of data tags. Now, we create an instance StringBuilder... to cache the characters when they come in:

private final StringBuilder characterCache = new StringBuilder(256);

Then, we use that Stringbuilder to cache the characters, and append them... but we are smart about resetting it when needed.....

public void characters(char[] ch, int start, int length)        throws SAXException {    characterCache.append(ch, start, length);}

That makes the characters easy to handle... now, the startElement is also easy...

public void startElement(String uri, String localName, String qName,        Attributes atts) throws SAXException {    if("book".equals(localName)) {        book = new Book(atts.getValue(atts.getLocalName(0)));    }    // every element resets the characterCache.    characterCache.setLength(0);}

As an aside, note how I made the checkif("book".equals(localName)) instead ofif(localName.equals("book")) - that's a trick that is an easy habit that prevents null-pointer exceptions. In this case you will never have a null localname, but, if you always code your constant first, you will never have a null reference.

OK, so, thecharacters andstartElement methods are now simpler. Let's put the logic in theendElement (let's use only one dateFormat instance too... it's faster):

private final DateFormat dateFormat = new SimpleDateFormat("yyyy-mm-dd");@Overridepublic void endElement(String uri, String localName, String qName)        throws SAXException {    if(book != null && dataTagSet.contains(localName)) {        // we have a data item for our current book.        String value = characterCache.toString();        switch(localName) {            case "author":                book.setAuthor(value);                break;            case "title":                book.setTitle(value);                break;            case "genre":                book.setGenre(value);                break;            case "price":                book.setPrice(Double.parseDouble(value));                break;            case "publish_date":                try {                    book.setPublishDate(dateFormat.parse(value));                } catch (ParseException e) {                    e.printStackTrace();                }                break;            case "description":                book.setDescription(value);                break;        }    } else if("book".equals(localName)) {        books.add(book);        book = null;    }    characterCache.setLength(0);}
answeredJul 7, 2015 at 3:34
rolfl's user avatar
\$\endgroup\$
3
  • \$\begingroup\$Thank you!! Apart from this, I had another doubt. I am using Eclipse IDE and the code ran fine without including apache xerces libraries. I was just wondering as to whyXMLReader xmlReader = XMLReaderFactory.createXMLReader(); din't throw up any error because the author says vendor dependent XMLReader gets instantiated whereas I din't had any vendor I believe if apache jars were not included in lib\$\endgroup\$CommentedJul 7, 2015 at 12:43
  • \$\begingroup\$The Xerces XML parser (a slightly modified, forked version) is effectively embedded inside the standard Java runtime, and has been for a few versions. Brett's book is somewhat dated in that regard. Brett contributed a lot of insight and code in to the development of JDOM (which is a Java XML library that I maintain), and you will likely find JDOM to be easier to work with than raw SAX for many tasks with XML in Java (shameless plug:jdom.org )\$\endgroup\$CommentedJul 7, 2015 at 12:50
  • \$\begingroup\$Hmmm... found it. The Xerces parser is in the rt.jar file in Java's core libraries, and, in eclipse, for example, you can browse the jar to packagecom.sun.org.apache.xerces.internal to find it - There are some other differences in addition to the changed package name.\$\endgroup\$CommentedJul 7, 2015 at 13:03

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.