Thursday, 13th October 2011
Follow WikiJava on twitter now. @Wikijava

Parsing an XML file and getting info from it via Xpath

From WikiJava

Jump to: navigation, search
The author suggests:

buy this book

This article shows how to open an XML file and extract data from it using Xpath.

The example uses only libraries which are contained in the standard API (JDK 6). so no additional libraries are required.

Contents

the article

Image:250px-Subversion.png
You can download the complete code of this article from the Subversion repository at this link

Using the username:readonly and password: readonly

See the using the SVN repository instructions page for more help about this.

before retrieving information from an XML file using Xpath you must create a org.w3c.dom.Document out of it.

this is in the example below done by the code below.

	    DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory
		    .newInstance();
	    documentBuilderFactory.setNamespaceAware(true);
	    DocumentBuilder builder = documentBuilderFactory
		    .newDocumentBuilder();
	    Document doc = builder.parse(file);


This is basically opening the stream containing the XML file and parsing it into the org.w3c.dom.Document object which basically represents the whole XML into the memory like a tree.

Once you have the Document object containing your XML in a tree structure you could explore it manually like a normal tree. Or you can use the great power of Xpath to extract data from the Document.

XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
 
XPathExpression expr = xpath.compile(query);
NodeSet nodeSetResult = (NodeSet) expr.evaluate(doc, XPathConstants.NODESET);

The snippet above obtains from the XPathFactory factory an XPath object.

The XPath object is then initialized via the compile(String) method. Successively the evaluate(Document, String) is called, in order to execute the Xpath Query in the Document that was loaded before.

the second argument of the evaluate(Document, String) method represents the expected result type of the query. It is important to remember that the method returns an Object that needs to be explicitly casted.

XMLReader.java

package org.wikijava.examples.xpath;
 
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
 
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
 
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
 
import com.sun.org.apache.xpath.internal.NodeSet;
 
public class XMLReader {
 
    /**
     * @param args
     * @throws Exception 
     */
    public static void main(String[] args) throws Exception {
 
	if (args.length < 2) {
	    throw new Exception("error, check the command line");
	}
	String file = args[1];
	String query = args[2];
 
	List<String> resultList = new ArrayList<String>();
	try {
 
	    DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory
		    .newInstance();
	    documentBuilderFactory.setNamespaceAware(true);
	    DocumentBuilder builder = documentBuilderFactory
		    .newDocumentBuilder();
	    Document doc = builder.parse(file);
 
	    if (doc == null) {
		throw new Exception("unable to load Document");
	    }
 
	    XPathFactory xPathFactory = XPathFactory.newInstance();
	    XPath xpath = xPathFactory.newXPath();
 
	    XPathExpression expr = xpath.compile(query);
	    NodeSet nodeSetResult = (NodeSet) expr.evaluate(doc,
		    XPathConstants.NODESET);
 
	    for (int i = 0; i < nodeSetResult.size(); i++) {
		String result = nodeSetResult.item(i).getTextContent();
		resultList.add(result);
	    }
 
	} catch (XPathExpressionException e) {
	    throw new Exception(e);
	} catch (SAXException e) {
	    throw new Exception(e);
	} catch (IOException e) {
	    throw new Exception(e);
	} catch (ParserConfigurationException e) {
	    throw new Exception(e);
	}
    }
 
}

Comments from the users

To be notified via mail on the updates of this discussion you can login and click on watch at the top of the page

I Can't compile it

When i try to compile this code I get this error:

C:\Documents and Settings\...\Desktop>javac XMLReader.java

XMLReader.java:20: warning: com.sun.org.apache.xpath.internal.NodeSet is Sun proprietary API and may be removed in a future release import com.sun.org.apache.xpath.internal.NodeSet;
                                        ^
XMLReader.java:54: warning: com.sun.org.apache.xpath.internal.NodeSet is Sun pro
prietary API and may be removed in a future release
            NodeSet nodeSetResult = (NodeSet) expr.evaluate(doc,
            ^
XMLReader.java:54: warning: com.sun.org.apache.xpath.internal.NodeSet is Sun pro
prietary API and may be removed in a future release
            NodeSet nodeSetResult = (NodeSet) expr.evaluate(doc,
                                     ^
3 warnings

What Could I Do ?

--78.13.58.249 09:54, 16 December 2008 (UTC)

Hi, The one you get is not an error but just a Warning, your compilation goes successful except those warnings, which you can forget about. Unless you plan to compile this program with other JDKs in the far future. After compilation you indeed get the compiled XMLReader.class file.
If you want to run this program from command line you should better remove the package declaration at the top of the file (which makes it more complex to write the command line). Then you can compile (forget about the warnings) and execute it with: java XMLReader. Keep posting if you get more troubles with it. --DonGiulio 10:48, 16 December 2008 (UTC)

Code Update

by the way I fixed a couple of things, the new code is in the repository. Just details though

--DonGiulio 11:34, 16 December 2008 (UTC)


Comments on wikijava are disabled now, cause excessive spam.