Authors: | Stefan Behnel Holger Joukl |
---|
lxml supports an alternative API similar to theAmara bindery orgnosis.xml.objectify through acustom Element implementation. The main ideais to hide the usage of XML behind normal Python objects, sometimes referredto as data-binding. It allows you to use XML as if you were dealing with anormal Python object hierarchy.
Accessing the children of an XML element deploys object attribute access. Ifthere are multiple children with the same name, slicing and indexing can beused. Python data types are extracted from XML content automatically and madeavailable to the normal Python operators.
Contents
To set up and useobjectify, you need both thelxml.etreemodule andlxml.objectify:
>>>fromlxmlimportetree>>>fromlxmlimportobjectify
The objectify API is very different from the ElementTree API. If itis used, it should not be mixed with other element implementations(such as trees parsed withlxml.etree), to avoid non-obviousbehaviour.
Thebenchmark page has some hints on performance optimisation ofcode using lxml.objectify.
To make the doctests in this document look a little nicer, we also usethis:
>>>importlxml.usedoctest
Imported from within a doctest, this relieves us from caring about the exactformatting of XML output.
Inlxml.objectify, element trees provide an API that models the behaviourof normal Python object trees as closely as possible.
The main idea behind theobjectify API is to hide XML element accessbehind the usual object attribute access pattern. Asking an element for anattribute will return the sequence of children with corresponding tag names:
>>>root=objectify.Element("root")>>>b=objectify.SubElement(root,"b")>>>print(root.b[0].tag)b>>>root.index(root.b[0])0>>>b=objectify.SubElement(root,"b")>>>print(root.b[0].tag)b>>>print(root.b[1].tag)b>>>root.index(root.b[1])1
For convenience, you can omit the index '0' to access the first child:
>>>print(root.b.tag)b>>>root.index(root.b)0>>>delroot.b
Iteration and slicing also obey the requested tag:
>>>x1=objectify.SubElement(root,"x")>>>x2=objectify.SubElement(root,"x")>>>x3=objectify.SubElement(root,"x")>>>[el.tagforelinroot.x]['x', 'x', 'x']>>>[el.tagforelinroot.x[1:3]]['x', 'x']>>>[el.tagforelinroot.x[-1:]]['x']>>>delroot.x[1:2]>>>[el.tagforelinroot.x]['x', 'x']
If you want to iterate over all children or need to provide a specificnamespace for the tag, use theiterchildren() method. Like the othermethods for iteration, it supports an optional tag keyword argument:
>>>[el.tagforelinroot.iterchildren()]['b', 'x', 'x']>>>[el.tagforelinroot.iterchildren(tag='b')]['b']>>>[el.tagforelinroot.b]['b']
XML attributes are accessed as in the normal ElementTree API:
>>>c=objectify.SubElement(root,"c",myattr="someval")>>>print(root.c.get("myattr"))someval>>>root.c.set("c","oh-oh")>>>print(root.c.get("c"))oh-oh
In addition to the normal ElementTree API for appending elements to trees,subtrees can also be added by assigning them to object attributes. In thiscase, the subtree is automatically deep copied and the tag name of its root isupdated to match the attribute name:
>>>el=objectify.Element("yet_another_child")>>>root.new_child=el>>>print(root.new_child.tag)new_child>>>print(el.tag)yet_another_child>>>root.y=[objectify.Element("y"),objectify.Element("y")]>>>[el.tagforelinroot.y]['y', 'y']
The latter is a short form for operations on the full slice:
>>>root.y[:]=[objectify.Element("y")]>>>[el.tagforelinroot.y]['y']
You can also replace children that way:
>>>child1=objectify.SubElement(root,"child")>>>child2=objectify.SubElement(root,"child")>>>child3=objectify.SubElement(root,"child")>>>el=objectify.Element("new_child")>>>subel=objectify.SubElement(el,"sub")>>>root.child=el>>>print(root.child.sub.tag)sub>>>root.child[2]=el>>>print(root.child[2].sub.tag)sub
Note that special care must be taken when changing the tag name of an element:
>>>print(root.b.tag)b>>>root.b.tag="notB">>>root.bTraceback (most recent call last):...AttributeError:no such child: b>>>print(root.notB.tag)notB
As withlxml.etree, you can either create anobjectify tree byparsing an XML document or by building one from scratch. To parse adocument, just use theparse() orfromstring() functions ofthe module:
>>>fileobject=StringIO('<test/>')>>>tree=objectify.parse(fileobject)>>>print(isinstance(tree.getroot(),objectify.ObjectifiedElement))True>>>root=objectify.fromstring('<test/>')>>>print(isinstance(root,objectify.ObjectifiedElement))True
To build a new tree in memory,objectify replicates the standardfactory functionElement() fromlxml.etree:
>>>obj_el=objectify.Element("new")>>>print(isinstance(obj_el,objectify.ObjectifiedElement))True
After creating such an Element, you can use theusual API oflxml.etree to add SubElements to the tree:
>>>child=objectify.SubElement(obj_el,"newchild",attr="value")
New subelements will automatically inherit the objectify behaviourfrom their tree. However, all independent elements that you createthrough theElement() factory of lxml.etree (instead of objectify)will not support theobjectify API by themselves:
>>>subel=objectify.SubElement(obj_el,"sub")>>>print(isinstance(subel,objectify.ObjectifiedElement))True>>>independent_el=etree.Element("new")>>>print(isinstance(independent_el,objectify.ObjectifiedElement))False
To simplify the generation of trees even further, you can use the E-factory:
>>>E=objectify.E>>>root=E.root(...E.a(5),...E.b(6.21),...E.c(True),...E.d("how",tell="me")...)>>>print(etree.tostring(root,pretty_print=True))<root xmlns:py="http://codespeak.net/lxml/objectify/pytype"> <a py:pytype="int">5</a> <b py:pytype="float">6.21</b> <c py:pytype="bool">true</c> <d py:pytype="str" tell="me">how</d></root>
This allows you to write up a specific language in tags:
>>>ROOT=objectify.E.root>>>TITLE=objectify.E.title>>>HOWMANY=getattr(objectify.E,"how-many")>>>root=ROOT(...TITLE("The title"),...HOWMANY(5)...)>>>print(etree.tostring(root,pretty_print=True))<root xmlns:py="http://codespeak.net/lxml/objectify/pytype"> <title py:pytype="str">The title</title> <how-many py:pytype="int">5</how-many></root>
objectify.E is an instance ofobjectify.ElementMaker. By default, itcreates pytype annotated Elements without a namespace. You can switch off thepytype annotation by passing False to theannotate keyword argument of theconstructor. You can also pass a default namespace and annsmap:
>>>myE=objectify.ElementMaker(annotate=False,...namespace="http://my/ns",nsmap={None:"http://my/ns"})>>>root=myE.root(myE.someint(2))>>>print(etree.tostring(root,pretty_print=True))<root xmlns="http://my/ns"> <someint>2</someint></root>
During tag lookups, namespaces are handled mostly behind the scenes.If you access a child of an Element without specifying a namespace,the lookup will use the namespace of the parent:
>>>root=objectify.Element("{http://ns/}root")>>>b=objectify.SubElement(root,"{http://ns/}b")>>>c=objectify.SubElement(root,"{http://other/}c")>>>print(root.b.tag){http://ns/}b
Note that theSubElement() factory oflxml.etree does notinherit any namespaces when creating a new subelement. Elementcreation must be explicit about the namespace, and is simplifiedthrough the E-factory as described above.
Lookups, however, inherit namespaces implicitly:
>>>print(root.b.tag){http://ns/}b>>>print(root.c)Traceback (most recent call last):...AttributeError:no such child: {http://ns/}c
To access an element in a different namespace than its parent, you canusegetattr():
>>>c=getattr(root,"{http://other/}c")>>>print(c.tag){http://other/}c
For convenience, there is also a quick way through item access:
>>>c=root["{http://other/}c"]>>>print(c.tag){http://other/}c
The same approach must be used to access children with tag names that are notvalid Python identifiers:
>>>el=objectify.SubElement(root,"{http://ns/}tag-name")>>>print(root["tag-name"].tag){http://ns/}tag-name>>>new_el=objectify.Element("{http://ns/}new-element")>>>el=objectify.SubElement(new_el,"{http://ns/}child")>>>el=objectify.SubElement(new_el,"{http://ns/}child")>>>el=objectify.SubElement(new_el,"{http://ns/}child")>>>root["tag-name"]=[new_el,new_el]>>>print(len(root["tag-name"]))2>>>print(root["tag-name"].tag){http://ns/}tag-name>>>print(len(root["tag-name"].child))3>>>print(root["tag-name"].child.tag){http://ns/}child>>>print(root["tag-name"][1].child.tag){http://ns/}child
or for names that have a special meaning in lxml.objectify:
>>>root=objectify.XML("<root><text>TEXT</text></root>")>>>print(root.text.text)Traceback (most recent call last):...AttributeError:'NoneType' object has no attribute 'text'>>>print(root["text"].text)TEXT
When dealing with XML documents from different sources, you will oftenrequire them to follow a common schema. In lxml.objectify, thisdirectly translates to enforcing a specific object tree, i.e. expectedobject attributes are ensured to be there and to have the expectedtype. This can easily be achieved through XML Schema validation atparse time. Also see thedocumentation on validation on thistopic.
First of all, we need a parser that knows our schema, so let's say weparse the schema from a file-like object (or file or filename):
>>>f=StringIO('''\... <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">... <xsd:element name="a" type="AType"/>... <xsd:complexType name="AType">... <xsd:sequence>... <xsd:element name="b" type="xsd:string" />... </xsd:sequence>... </xsd:complexType>... </xsd:schema>...''')>>>schema=etree.XMLSchema(file=f)
When creating the validating parser, we must make sure itreturnsobjectify trees. This is best done with themakeparser()function:
>>>parser=objectify.makeparser(schema=schema)
Now we can use it to parse a valid document:
>>>xml="<a><b>test</b></a>">>>a=objectify.fromstring(xml,parser)>>>print(a.b)test
Or an invalid document:
>>>xml=b"<a><b>test</b><c/></a>">>>a=objectify.fromstring(xml,parser)# doctest: +ELLIPSISTraceback (most recent call last):lxml.etree.XMLSyntaxError:Element 'c': This element is not expected...
Note that the same works for parse-time DTD validation, except thatDTDs do not support any data types by design.
For both convenience and speed, objectify supports its own path language,represented by theObjectPath class:
>>>root=objectify.Element("{http://ns/}root")>>>b1=objectify.SubElement(root,"{http://ns/}b")>>>c=objectify.SubElement(b1,"{http://ns/}c")>>>b2=objectify.SubElement(root,"{http://ns/}b")>>>d=objectify.SubElement(root,"{http://other/}d")>>>path=objectify.ObjectPath("root.b.c")>>>print(path)root.b.c>>>path.hasattr(root)True>>>print(path.find(root).tag){http://ns/}c>>>find=objectify.ObjectPath("root.b.c")>>>print(find(root).tag){http://ns/}c>>>find=objectify.ObjectPath("root.{http://other/}d")>>>print(find(root).tag){http://other/}d>>>find=objectify.ObjectPath("root.{not}there")>>>print(find(root).tag)Traceback (most recent call last):...AttributeError:no such child: {not}there>>>find=objectify.ObjectPath("{not}there")>>>print(find(root).tag)Traceback (most recent call last):...ValueError:root element does not match: need {not}there, got {http://ns/}root>>>find=objectify.ObjectPath("root.b[1]")>>>print(find(root).tag){http://ns/}b>>>find=objectify.ObjectPath("root.{http://ns/}b[1]")>>>print(find(root).tag){http://ns/}b
Apart from strings, ObjectPath also accepts lists of path segments:
>>>find=objectify.ObjectPath(['root','b','c'])>>>print(find(root).tag){http://ns/}c>>>find=objectify.ObjectPath(['root','{http://ns/}b[1]'])>>>print(find(root).tag){http://ns/}b
You can also use relative paths starting with a '.' to ignore the actual rootelement and only inherit its namespace:
>>>find=objectify.ObjectPath(".b[1]")>>>print(find(root).tag){http://ns/}b>>>find=objectify.ObjectPath(['','b[1]'])>>>print(find(root).tag){http://ns/}b>>>find=objectify.ObjectPath(".unknown[1]")>>>print(find(root).tag)Traceback (most recent call last):...AttributeError:no such child: {http://ns/}unknown>>>find=objectify.ObjectPath(".{http://other/}unknown[1]")>>>print(find(root).tag)Traceback (most recent call last):...AttributeError:no such child: {http://other/}unknown
For convenience, a single dot represents the empty ObjectPath (identity):
>>>find=objectify.ObjectPath(".")>>>print(find(root).tag){http://ns/}root
ObjectPath objects can be used to manipulate trees:
>>>root=objectify.Element("{http://ns/}root")>>>path=objectify.ObjectPath(".some.child.{http://other/}unknown")>>>path.hasattr(root)False>>>path.find(root)Traceback (most recent call last):...AttributeError:no such child: {http://ns/}some>>>path.setattr(root,"my value")# creates children as necessary>>>path.hasattr(root)True>>>print(path.find(root).text)my value>>>print(root.some.child["{http://other/}unknown"].text)my value>>>print(len(path.find(root)))1>>>path.addattr(root,"my new value")>>>print(len(path.find(root)))2>>>[el.textforelinpath.find(root)]['my value', 'my new value']
As with attribute assignment,setattr() accepts lists:
>>>path.setattr(root,["v1","v2","v3"])>>>[el.textforelinpath.find(root)]['v1', 'v2', 'v3']
Note, however, that indexing is only supported in this context if the childrenexist. Indexing of non existing children will not extend or create a list ofsuch children but raise an exception:
>>>path=objectify.ObjectPath(".{non}existing[1]")>>>path.setattr(root,"my value")Traceback (most recent call last):...TypeError:creating indexed path attributes is not supported
It is worth noting that ObjectPath does not depend on theobjectify moduleor the ObjectifiedElement implementation. It can also be used in combinationwith Elements from the normal lxml.etree API.
The objectify module knows about Python data types and tries its best to letelement content behave like them. For example, they support the normal mathoperators:
>>>root=objectify.fromstring(..."<root><a>5</a><b>11</b><c>true</c><d>hoi</d></root>")>>>root.a+root.b16>>>root.a+=root.b>>>print(root.a)16>>>root.a=2>>>print(root.a+2)4>>>print(1+root.a)3>>>print(root.c)True>>>root.c=False>>>ifnotroot.c:...print("false!")false!>>>print(root.d+" test !")hoi test !>>>root.d="%s -%s">>>print(root.d%(1234,12345))1234 - 12345
However, data elements continue to provide the objectify API. This means thatsequence operations such aslen(), slicing and indexing (e.g. of strings)cannot behave as the Python types. Like all other tree elements, they showthe normal slicing behaviour of objectify elements:
>>>root=objectify.fromstring("<root><a>test</a><b>toast</b></root>")>>>print(root.a+' me')# behaves like a string, right?test me>>>len(root.a)# but there's only one 'a' element!1>>>[a.tagforainroot.a]['a']>>>print(root.a[0].tag)a>>>print(root.a)test>>>[str(a)forainroot.a[:1]]['test']
If you need to run sequence operations on data types, you must ask the API forthereal Python value. The string value is always available through thenormal ElementTree.text attribute. Additionally, all data classesprovide a.pyval attribute that returns the value as plain Python type:
>>>root=objectify.fromstring("<root><a>test</a><b>5</b></root>")>>>root.a.text'test'>>>root.a.pyval'test'>>>root.b.text'5'>>>root.b.pyval5
Note, however, that both attributes are read-only in objectify. If you wantto change values, just assign them directly to the attribute:
>>>root.a.text="25"Traceback (most recent call last):...TypeError:attribute 'text' of 'StringElement' objects is not writable>>>root.a.pyval=25Traceback (most recent call last):...TypeError:attribute 'pyval' of 'StringElement' objects is not writable>>>root.a=25>>>print(root.a)25>>>print(root.a.pyval)25
In other words,objectify data elements behave like immutable Pythontypes. You can replace them, but not modify them.
To see the data types that are currently used, you can call the module leveldump() function that returns a recursive string representation forelements:
>>>root=objectify.fromstring("""...<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">... <a attr1="foo" attr2="bar">1</a>... <a>1.2</a>... <b>1</b>... <b>true</b>... <c>what?</c>... <d xsi:nil="true"/>...</root>...""")>>>print(objectify.dump(root))root = None [ObjectifiedElement] a = 1 [IntElement] * attr1 = 'foo' * attr2 = 'bar' a = 1.2 [FloatElement] b = 1 [IntElement] b = True [BoolElement] c = 'what?' [StringElement] d = None [NoneElement] * xsi:nil = 'true'
You can freely switch between different types for the same child:
>>>root=objectify.fromstring("<root><a>5</a></root>")>>>print(objectify.dump(root))root = None [ObjectifiedElement] a = 5 [IntElement]>>>root.a='nice string!'>>>print(objectify.dump(root))root = None [ObjectifiedElement] a = 'nice string!' [StringElement] * py:pytype = 'str'>>>root.a=True>>>print(objectify.dump(root))root = None [ObjectifiedElement] a = True [BoolElement] * py:pytype = 'bool'>>>root.a=[1,2,3]>>>print(objectify.dump(root))root = None [ObjectifiedElement] a = 1 [IntElement] * py:pytype = 'int' a = 2 [IntElement] * py:pytype = 'int' a = 3 [IntElement] * py:pytype = 'int'>>>root.a=(1,2,3)>>>print(objectify.dump(root))root = None [ObjectifiedElement] a = 1 [IntElement] * py:pytype = 'int' a = 2 [IntElement] * py:pytype = 'int' a = 3 [IntElement] * py:pytype = 'int'
Normally, elements use the standard string representation for str() that isprovided by lxml.etree. You can enable a pretty-print representation forobjectify elements like this:
>>>objectify.enable_recursive_str()>>>root=objectify.fromstring("""...<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">... <a attr1="foo" attr2="bar">1</a>... <a>1.2</a>... <b>1</b>... <b>true</b>... <c>what?</c>... <d xsi:nil="true"/>...</root>...""")>>>print(str(root))root = None [ObjectifiedElement] a = 1 [IntElement] * attr1 = 'foo' * attr2 = 'bar' a = 1.2 [FloatElement] b = 1 [IntElement] b = True [BoolElement] c = 'what?' [StringElement] d = None [NoneElement] * xsi:nil = 'true'
This behaviour can be switched off in the same way:
>>>objectify.enable_recursive_str(False)
Objectify uses two different types of Elements. Structural Elements (or treeElements) represent the object tree structure. Data Elements represent thedata containers at the leafs. You can explicitly create tree Elements withtheobjectify.Element() factory and data Elements with theobjectify.DataElement() factory.
When Element objects are created, lxml.objectify must determine whichimplementation class to use for them. This is relatively easy for treeElements and less so for data Elements. The algorithm is as follows:
You can change the default classes for tree Elements and empty data Elementsat setup time. TheObjectifyElementClassLookup() call accepts two keywordarguments,tree_class andempty_data_class, that determine the Elementclasses used in these cases. By default,tree_class is a class calledObjectifiedElement andempty_data_class is aStringElement.
The "type hint" mechanism deploys an XML attribute defined aslxml.objectify.PYTYPE_ATTRIBUTE. It may contain any of the followingstring values: int, long, float, str, unicode, NoneType:
>>>print(objectify.PYTYPE_ATTRIBUTE){http://codespeak.net/lxml/objectify/pytype}pytype>>>ns,name=objectify.PYTYPE_ATTRIBUTE[1:].split('}')>>>root=objectify.fromstring("""\...<root xmlns:py='%s'>... <a py:pytype='str'>5</a>... <b py:pytype='int'>5</b>... <c py:pytype='NoneType' />...</root>..."""%ns)>>>print(root.a+10)510>>>print(root.b+10)15>>>print(root.c)None
Note that you can change the name and namespace used for thisattribute through theset_pytype_attribute_tag(tag) modulefunction, in case your application ever needs to. There is also autility functionannotate() that recursively generates thisattribute for the elements of a tree:
>>>root=objectify.fromstring("<root><a>test</a><b>5</b></root>")>>>print(objectify.dump(root))root = None [ObjectifiedElement] a = 'test' [StringElement] b = 5 [IntElement]>>>objectify.annotate(root)>>>print(objectify.dump(root))root = None [ObjectifiedElement] a = 'test' [StringElement] * py:pytype = 'str' b = 5 [IntElement] * py:pytype = 'int'
A second way of specifying data type information uses XML Schema types aselement annotations. Objectify knows those that can be mapped to normalPython types:
>>>root=objectify.fromstring('''\... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"... xmlns:xsd="http://www.w3.org/2001/XMLSchema">... <d xsi:type="xsd:double">5</d>... <i xsi:type="xsd:int" >5</i>... <s xsi:type="xsd:string">5</s>... </root>... ''')>>>print(objectify.dump(root))root = None [ObjectifiedElement] d = 5.0 [FloatElement] * xsi:type = 'xsd:double' i = 5 [IntElement] * xsi:type = 'xsd:int' s = '5' [StringElement] * xsi:type = 'xsd:string'
Again, there is a utility functionxsiannotate() that recursivelygenerates the "xsi:type" attribute for the elements of a tree:
>>>root=objectify.fromstring('''\... <root><a>test</a><b>5</b><c>true</c></root>... ''')>>>print(objectify.dump(root))root = None [ObjectifiedElement] a = 'test' [StringElement] b = 5 [IntElement] c = True [BoolElement]>>>objectify.xsiannotate(root)>>>print(objectify.dump(root))root = None [ObjectifiedElement] a = 'test' [StringElement] * xsi:type = 'xsd:string' b = 5 [IntElement] * xsi:type = 'xsd:integer' c = True [BoolElement] * xsi:type = 'xsd:boolean'
Note, however, thatxsiannotate() will always use the first XML Schemadatatype that is defined for any given Python type, see alsoDefining additional data classes.
The utility functiondeannotate() can be used to get rid of 'py:pytype'and/or 'xsi:type' information:
>>>root=objectify.fromstring('''\...<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"... xmlns:xsd="http://www.w3.org/2001/XMLSchema">... <d xsi:type="xsd:double">5</d>... <i xsi:type="xsd:int" >5</i>... <s xsi:type="xsd:string">5</s>...</root>''')>>>objectify.annotate(root)>>>print(objectify.dump(root))root = None [ObjectifiedElement] d = 5.0 [FloatElement] * py:pytype = 'float' * xsi:type = 'xsd:double' i = 5 [IntElement] * py:pytype = 'int' * xsi:type = 'xsd:int' s = '5' [StringElement] * py:pytype = 'str' * xsi:type = 'xsd:string'>>>objectify.deannotate(root)>>>print(objectify.dump(root))root = None [ObjectifiedElement] d = 5 [IntElement] i = 5 [IntElement] s = 5 [IntElement]
You can control which type attributes should be de-annotated with the keywordarguments 'pytype' (default: True) and 'xsi' (default: True).deannotate() can also remove 'xsi:nil' attributes by setting 'xsi_nil=True'(default: False):
>>>root=objectify.fromstring('''\...<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"... xmlns:xsd="http://www.w3.org/2001/XMLSchema">... <d xsi:type="xsd:double">5</d>... <i xsi:type="xsd:int" >5</i>... <s xsi:type="xsd:string">5</s>... <n xsi:nil="true"/>...</root>''')>>>objectify.annotate(root)>>>print(objectify.dump(root))root = None [ObjectifiedElement] d = 5.0 [FloatElement] * py:pytype = 'float' * xsi:type = 'xsd:double' i = 5 [IntElement] * py:pytype = 'int' * xsi:type = 'xsd:int' s = '5' [StringElement] * py:pytype = 'str' * xsi:type = 'xsd:string' n = None [NoneElement] * py:pytype = 'NoneType' * xsi:nil = 'true'>>>objectify.deannotate(root,xsi_nil=True)>>>print(objectify.dump(root))root = None [ObjectifiedElement] d = 5 [IntElement] i = 5 [IntElement] s = 5 [IntElement] n = '' [StringElement]
Note thatdeannotate() does not remove the namespace declarationsof thepytype namespace by default. To remove them as well, andto generally clean up the namespace declarations in the document(usually when done with the whole processing), pass the optioncleanup_namespaces=True. This option is new in lxml 2.3.2. Inolder versions, use the functionlxml.etree.cleanup_namespaces()instead.
For convenience, theDataElement() factory creates an Element with aPython value in one step. You can pass the required Python type name or theXSI type name:
>>>root=objectify.Element("root")>>>root.x=objectify.DataElement(5,_pytype="int")>>>print(objectify.dump(root))root = None [ObjectifiedElement] x = 5 [IntElement] * py:pytype = 'int'>>>root.x=objectify.DataElement(5,_pytype="str",myattr="someval")>>>print(objectify.dump(root))root = None [ObjectifiedElement] x = '5' [StringElement] * myattr = 'someval' * py:pytype = 'str'>>>root.x=objectify.DataElement(5,_xsi="integer")>>>print(objectify.dump(root))root = None [ObjectifiedElement] x = 5 [IntElement] * py:pytype = 'int' * xsi:type = 'xsd:integer'
XML Schema types reside in the XML schema namespace thusDataElement()tries to correctly prefix the xsi:type attribute value for you:
>>>root=objectify.Element("root")>>>root.s=objectify.DataElement(5,_xsi="string")>>>objectify.deannotate(root,xsi=False)>>>print(etree.tostring(root,pretty_print=True))<root xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <s xsi:type="xsd:string">5</s></root>
DataElement() uses a default nsmap to set these prefixes:
>>>el=objectify.DataElement('5',_xsi='string')>>>namespaces=list(el.nsmap.items())>>>namespaces.sort()>>>forprefix,namespaceinnamespaces:...print("%s -%s"%(prefix,namespace))py - http://codespeak.net/lxml/objectify/pytypexsd - http://www.w3.org/2001/XMLSchemaxsi - http://www.w3.org/2001/XMLSchema-instance>>>print(el.get("{http://www.w3.org/2001/XMLSchema-instance}type"))xsd:string
While you can set custom namespace prefixes, it is necessary to provide validnamespace information if you choose to do so:
>>>el=objectify.DataElement('5',_xsi='foo:string',...nsmap={'foo':'http://www.w3.org/2001/XMLSchema'})>>>namespaces=list(el.nsmap.items())>>>namespaces.sort()>>>forprefix,namespaceinnamespaces:...print("%s -%s"%(prefix,namespace))foo - http://www.w3.org/2001/XMLSchemapy - http://codespeak.net/lxml/objectify/pytypexsi - http://www.w3.org/2001/XMLSchema-instance>>>print(el.get("{http://www.w3.org/2001/XMLSchema-instance}type"))foo:string
Note how lxml chose a default prefix for the XML Schema Instancenamespace. We can override it as in the following example:
>>>el=objectify.DataElement('5',_xsi='foo:string',...nsmap={'foo':'http://www.w3.org/2001/XMLSchema',...'myxsi':'http://www.w3.org/2001/XMLSchema-instance'})>>>namespaces=list(el.nsmap.items())>>>namespaces.sort()>>>forprefix,namespaceinnamespaces:...print("%s -%s"%(prefix,namespace))foo - http://www.w3.org/2001/XMLSchemamyxsi - http://www.w3.org/2001/XMLSchema-instancepy - http://codespeak.net/lxml/objectify/pytype>>>print(el.get("{http://www.w3.org/2001/XMLSchema-instance}type"))foo:string
Care must be taken if different namespace prefixes have been used for the samenamespace. Namespace information gets merged to avoid duplicate definitionswhen adding a new sub-element to a tree, but this mechanism does not adapt theprefixes of attribute values:
>>>root=objectify.fromstring("""<root xmlns:schema="http://www.w3.org/2001/XMLSchema"/>""")>>>print(etree.tostring(root,pretty_print=True))<root xmlns:schema="http://www.w3.org/2001/XMLSchema"/>>>>s=objectify.DataElement("17",_xsi="string")>>>print(etree.tostring(s,pretty_print=True))<value xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="xsd:string">17</value>>>>root.s=s>>>print(etree.tostring(root,pretty_print=True))<root xmlns:schema="http://www.w3.org/2001/XMLSchema"> <s xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="xsd:string">17</s></root>
It is your responsibility to fix the prefixes of attribute values if youchoose to deviate from the standard prefixes. A convenient way to do this forxsi:type attributes is to use thexsiannotate() utility:
>>>objectify.xsiannotate(root)>>>print(etree.tostring(root,pretty_print=True))<root xmlns:schema="http://www.w3.org/2001/XMLSchema"> <s xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" py:pytype="str" xsi:type="schema:string">17</s></root>
Of course, it is discouraged to use different prefixes for one and the samenamespace when building up an objectify tree.
You can plug additional data classes into objectify that will be used inexactly the same way as the predefined types. Data classes can either inheritfromObjectifiedDataElement directly or from one of the specialisedclasses likeNumberElement orBoolElement. The numeric types requirean initial call to the NumberElement methodself._setValueParser(function)to set their type conversion function (string -> numeric Python type). Thiscall should be placed into the element_init() method.
The registration of data classes uses thePyType class:
>>>classChristmasDate(objectify.ObjectifiedDataElement):...defcall_santa(self):...print("Ho ho ho!")>>>defcheckChristmasDate(date_string):...ifnotdate_string.startswith('24.12.'):...raiseValueError# or TypeError>>>xmas_type=objectify.PyType('date',checkChristmasDate,ChristmasDate)
The PyType constructor takes a string type name, an (optional) callable typecheck and the custom data class. If a type check is provided it must accept astring as argument and raise ValueError or TypeError if it cannot handle thestring value.
PyTypes are used if an element carries apy:pytype attribute denoting itsdata type or, in absence of such an attribute, if the given type check callabledoes not raise a ValueError/TypeError exception when applied to the elementtext.
If you want, you can also register this type under an XML Schema type name:
>>>xmas_type.xmlSchemaTypes=("date",)
XML Schema types will be considered if the element has anxsi:typeattribute that specifies its data type. The line above binds the XSD typedate to the newly defined Python type. Note that this must be done beforethe next step, which is to register the type. Then you can use it:
>>>xmas_type.register()>>>root=objectify.fromstring(..."<root><a>24.12.2000</a><b>12.24.2000</b></root>")>>>root.a.call_santa()Ho ho ho!>>>root.b.call_santa()Traceback (most recent call last):...AttributeError:no such child: call_santa
If you need to specify dependencies between the type check functions, you canpass a sequence of type names through thebefore andafter keywordarguments of theregister() method. The PyType will then try to registeritself before or after the respective types, as long as they are currentlyregistered. Note that this only impacts the currently registered types at thetime of registration. Types that are registered later on will not care aboutthe dependencies of already registered types.
If you provide XML Schema type information, this will override the type checkfunction defined above:
>>>root=objectify.fromstring('''\... <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">... <a xsi:type="date">12.24.2000</a>... </root>... ''')>>>print(root.a)12.24.2000>>>root.a.call_santa()Ho ho ho!
To unregister a type, call itsunregister() method:
>>>root.a.call_santa()Ho ho ho!>>>xmas_type.unregister()>>>root.a.call_santa()Traceback (most recent call last):...AttributeError:no such child: call_santa
Be aware, though, that this does not immediately apply to elements to whichthere already is a Python reference. Their Python class will only be changedafter all references are gone and the Python object is garbage collected.
In some cases, the normal data class setup is not enough. Being basedonlxml.etree, however,lxml.objectify supports veryfine-grained control over the Element classes used in a tree. All youhave to do is configure a differentclass lookup mechanism (orwrite one yourself).
The first step for the setup is to create a new parser that buildsobjectify documents. The objectify API is meant for data-centric XML(as opposed to document XML with mixed content). Therefore, weconfigure the parser to let it remove whitespace-only text from theparsed document if it is not enclosed by an XML element. Note thatthis alters the document infoset, so if you consider the removedspaces as data in your specific use case, you should go with a normalparser and just set the element class lookup. Most applications,however, will work fine with the following setup:
>>>parser=objectify.makeparser(remove_blank_text=True)
What this does internally, is:
>>>parser=etree.XMLParser(remove_blank_text=True)>>>lookup=objectify.ObjectifyElementClassLookup()>>>parser.set_element_class_lookup(lookup)
If you want to change the lookup scheme, say, to get additionalsupport fornamespace specific classes, you can register theobjectify lookup as a fallback of the namespace lookup. In this case,however, you have to take care that the namespace classes inherit fromobjectify.ObjectifiedElement, not only from the normallxml.etree.ElementBase, so that they support theobjectifyAPI. The above setup code then becomes:
>>>lookup=etree.ElementNamespaceClassLookup(...objectify.ObjectifyElementClassLookup())>>>parser.set_element_class_lookup(lookup)
See the documentation onclass lookup schemes for more information.
Such a different Element API obviously implies some side effects to the normalbehaviour of the rest of the API.