In this document we'll describe lxml's SAX support. lxml has support forproducing SAX events for an ElementTree or Element. lxml can also turn SAXevents into an ElementTree. The SAX API used by lxml is compatible with thatin the Python core (xml.sax), so is useful for interfacing lxml with code thatuses the Python core SAX facilities.
Contents
First of all, lxml has support for building a new tree given SAX events. Todo this, we use the special SAX content handler defined by lxml namedlxml.sax.ElementTreeContentHandler:
>>>importlxml.sax>>>handler=lxml.sax.ElementTreeContentHandler()
Now let's fire some SAX events at it:
>>>handler.startElementNS((None,'a'),'a',{})>>>handler.startElementNS((None,'b'),'b',{(None,'foo'):'bar'})>>>handler.characters('Hello world')>>>handler.endElementNS((None,'b'),'b')>>>handler.endElementNS((None,'a'),'a')
This constructs an equivalent tree. You can access it through theetreeproperty of the handler:
>>>tree=handler.etree>>>lxml.etree.tostring(tree.getroot())b'<a><b foo="bar">Hello world</b></a>'
By passing amakeelement function the constructor ofElementTreeContentHandler, e.g. the one of a parser you configured, youcan determine which element class lookup scheme should be used.
Let's make a tree we can generate SAX events for:
>>>f=StringIO('<a><b>Text</b></a>')>>>tree=lxml.etree.parse(f)
To see whether the correct SAX events are produced, we'll write a customcontent handler.:
>>>fromxml.sax.handlerimportContentHandler>>>classMyContentHandler(ContentHandler):...def__init__(self):...self.a_amount=0...self.b_amount=0...self.text=None......defstartElementNS(self,name,qname,attributes):...uri,localname=name...iflocalname=='a':...self.a_amount+=1...iflocalname=='b':...self.b_amount+=1......defcharacters(self,data):...self.text=data
Note that it only defines the startElementNS() method and not startElement().The SAX event generator in lxml.sax currently only supports namespace-awareprocessing.
To test the content handler, we can produce SAX events from the tree:
>>>handler=MyContentHandler()>>>lxml.sax.saxify(tree,handler)
This is what we expect:
>>>handler.a_amount1>>>handler.b_amount1>>>handler.text'Text'
lxml.sax is a simple way to interface with the standard XML support in thePython library. Note, however, that this is a one-way solution, as Python'sDOM implementation cannot generate SAX events from a DOM tree.
You can use xml.dom.pulldom to build a minidom from lxml:
>>>fromxml.dom.pulldomimportSAX2DOM>>>handler=SAX2DOM()>>>lxml.sax.saxify(tree,handler)
PullDOM makes the result available through thedocument attribute:
>>>dom=handler.document>>>print(dom.firstChild.localName)a