Like the tool?
Help making it better!
Your donation helps!

lxml
- lxml
- Why lxml?
  - Motto
  - Aims
- Installing lxml
- Benchmarks and Speed
- ElementTree compatibility of lxml.etree
- lxml FAQ - Frequently Asked Questions
- Projects for Crowd Funding

Developing with lxml
- The lxml.etree Tutorial
- API reference
- APIs specific to lxml.etree
- Parsing XML and HTML with lxml
- Validation with lxml
- XPath and XSLT with lxml
  - XPath
  - XSLT
- lxml.objectify
- lxml.html
- lxml.cssselect
- BeautifulSoup Parser
- html5lib Parser
  - Differences to regular HTML parsing
  - Function Reference

Extending lxml
- Document loading and URL resolving
- Python extensions for XPath and XSLT
  - XPath Extension functions
  - XSLT extension elements
- Using custom Element classes in lxml
- Sax support
- The public C-API of lxml.etree

Like the tool?
Help making it better!
Your donation helps!

Benchmarks and Speed

Author:	Stefan Behnel

lxml.etree is a very fast XML library. Most of this is due to thespeed of libxml2, e.g. the parser and serialiser, or the XPath engine.Other areas of lxml were specifically written for high performance inhigh-level operations, such as the tree iterators.

On the other hand, the simplicity of lxml sometimes hides internaloperations that are more costly than the API suggests. If you are notaware of these cases, lxml may not always perform as you expect. Acommon example in the Python world is the Python list type. New usersoften expect it to be a linked list, while it actually is implementedas an array, which results in a completely different complexity forcommon operations.

Similarly, the tree model of libxml2 is more complex than what lxml'sElementTree API projects into Python space, so some operations mayshow unexpected performance. Rest assured that most lxml users willnot notice this in real life, as lxml is very fast in absolutenumbers. It is definitely fast enough for most applications, so lxmlis probably somewhere between 'fast enough' and 'the best choice' foryours. Read somemessages fromhappy users to see what we mean.

This text describes where lxml.etree (abbreviated to 'lxe') excels, giveshints on some performance traps and compares the overall performance to theoriginalElementTree (ET) andcElementTree (cET) libraries by Fredrik Lundh.The cElementTree library is a fast C-implementation of the originalElementTree.

Contents

General notes

First thing to say: thereis an overhead involved in having a DOM-like Clibrary mimic the ElementTree API. As opposed to ElementTree, lxml has togenerate Python representations of tree nodes on the fly when asked for them,and the internal tree structure of libxml2 results in a higher maintenanceoverhead than the simpler top-down structure of ElementTree. What this meansis: the more of your code runs in Python, the less you can benefit from thespeed of lxml and libxml2. Note, however, that this is true for mostperformance critical Python applications. No one would implement Fouriertransformations in pure Python when you can use NumPy.

The up side then is that lxml provides powerful tools like tree iterators,XPath and XSLT, that can handle complex operations at the speed of C. Theirpythonic API in lxml makes them so flexible that most applications can easilybenefit from them.

How to read the timings

The statements made here are backed by the (micro-)benchmark scriptsbench_etree.py,bench_xpath.py andbench_objectify.py that come withthe lxml source distribution. They are distributed under the same BSD licenseas lxml itself, and the lxml project would like to promote them as a generalbenchmarking suite for all ElementTree implementations. New benchmarks arevery easy to add as tiny test methods, so if you write a performance test fora specific part of the API yourself, please consider sending it to the lxmlmailing list.

The timings presented below compare lxml 3.1.1 (with libxml2 2.9.0) to thelatest released versions of ElementTree (with cElementTree as acceleratormodule) in the standard library of CPython 3.3.0. They were runsingle-threaded on a 2.9GHz 64bit double core Intel i7 machine underUbuntu Linux 12.10 (Quantal). The C libraries were compiled with thesame platform specific optimisation flags. The Python interpreter wasalso manually compiled for the platform. Note that many of the followingElementTree timings are therefore better than what a normal Pythoninstallation with the standard library (c)ElementTree modules would yield.Note also that CPython 2.7 and 3.2+ come with a newer ElementTree version,so older Python installations will not perform as good for (c)ElementTree,and sometimes substantially worse.

The scripts run a number of simple tests on the different libraries, usingdifferent XML tree configurations: different tree sizes (T1-4), with orwithout attributes (-/A), with or without ASCII string or unicode text(-/S/U), and either against a tree or its serialised XML form (T/X). In theresult extracts cited below, T1 refers to a 3-level tree with many children atthe third level, T2 is swapped around to have many children below the rootelement, T3 is a deep tree with few children at each level and T4 is a smalltree, slightly broader than deep. If repetition is involved, this usuallymeans running the benchmark in a loop over all children of the tree root,otherwise, the operation is run on the root node (C/R).

As an example, the character code(SATR T1) states that the benchmark wasrunning for tree T1, with plain string text (S) and attributes (A). It wasrun against the root element (R) in the tree structure of the data (T).

Note that very small operations are repeated in integer loops to make themmeasurable. It is therefore not always possible to compare the absolutetimings of, say, a single access benchmark (which usually loops) and a 'getall in one step' benchmark, which already takes enough time to be measurableand is therefore measured as is. An example is the index access to a singlechild, which cannot be compared to the timings forgetchildren(). Take alook at the concrete benchmarks in the scripts to understand how the numberscompare.

Parsing and Serialising

Serialisation is an area where lxml excels. The reason is that itexecutes entirely at the C level, without any interaction with Pythoncode. The results are rather impressive, especially for UTF-8, whichis native to libxml2. While 20 to 40 times faster than (c)ElementTree1.2 (which was part of the standard library before Python 2.7/3.2),lxml is still more than 10 times as fast as the much improvedElementTree 1.3 in recent Python versions:

lxe: tostring_utf16  (S-TR T1)    7.9958 msec/passcET: tostring_utf16  (S-TR T1)   83.1358 msec/passlxe: tostring_utf16  (UATR T1)    8.3222 msec/passcET: tostring_utf16  (UATR T1)   84.4688 msec/passlxe: tostring_utf16  (S-TR T2)    8.2297 msec/passcET: tostring_utf16  (S-TR T2)   87.3415 msec/passlxe: tostring_utf8   (S-TR T2)    6.5677 msec/passcET: tostring_utf8   (S-TR T2)   76.2064 msec/passlxe: tostring_utf8   (U-TR T3)    1.1952 msec/passcET: tostring_utf8   (U-TR T3)   22.0058 msec/pass

The difference is somewhat smaller for plain text serialisation:

lxe: tostring_text_ascii     (S-TR T1)    2.7738 msec/passcET: tostring_text_ascii     (S-TR T1)    4.7629 msec/passlxe: tostring_text_ascii     (S-TR T3)    0.8273 msec/passcET: tostring_text_ascii     (S-TR T3)    1.5273 msec/passlxe: tostring_text_utf16     (S-TR T1)    2.7659 msec/passcET: tostring_text_utf16     (S-TR T1)   10.5038 msec/passlxe: tostring_text_utf16     (U-TR T1)    2.8017 msec/passcET: tostring_text_utf16     (U-TR T1)   10.5207 msec/pass

Thetostring() function also supports serialisation to a Pythonunicode string object, which is currently faster in ElementTreeunder CPython 3.3:

lxe: tostring_text_unicode   (S-TR T1)    2.6896 msec/passcET: tostring_text_unicode   (S-TR T1)    1.0056 msec/passlxe: tostring_text_unicode   (U-TR T1)    2.7366 msec/passcET: tostring_text_unicode   (U-TR T1)    1.0154 msec/passlxe: tostring_text_unicode   (S-TR T3)    0.7997 msec/passcET: tostring_text_unicode   (S-TR T3)    0.3154 msec/passlxe: tostring_text_unicode   (U-TR T4)    0.0048 msec/passcET: tostring_text_unicode   (U-TR T4)    0.0160 msec/pass

For parsing, lxml.etree and cElementTree compete for the medal.Depending on the input, either of the two can be faster. The (c)ETlibraries use a very thin layer on top of the expat parser, which isknown to be very fast. Here are some timings from the benchmarkingsuite:

lxe: parse_bytesIO   (SAXR T1)   13.0246 msec/passcET: parse_bytesIO   (SAXR T1)    8.2929 msec/passlxe: parse_bytesIO   (S-XR T3)    1.3542 msec/passcET: parse_bytesIO   (S-XR T3)    2.4023 msec/passlxe: parse_bytesIO   (UAXR T3)    7.5610 msec/passcET: parse_bytesIO   (UAXR T3)   11.2455 msec/pass

And another couple of timingsfrom a benchmark that Fredrik Lundhused to promote cElementTree, comparing a number of differentparsers. First, parsing a 274KB XML file containing Shakespeare'sHamlet:

xml.etree.ElementTree.parse done in 0.017 secondsxml.etree.cElementTree.parse done in 0.007 secondsxml.etree.cElementTree.XMLParser.feed(): 6636 nodes read in 0.007 secondslxml.etree.parse done in 0.003 secondsdrop_whitespace.parse done in 0.003 secondslxml.etree.XMLParser.feed(): 6636 nodes read in 0.004 secondsminidom tree read in 0.080 seconds

And a 3.4MB XML file containing the Old Testament:

xml.etree.ElementTree.parse done in 0.038 secondsxml.etree.cElementTree.parse done in 0.030 secondsxml.etree.cElementTree.XMLParser.feed(): 25317 nodes read in 0.030 secondslxml.etree.parse done in 0.016 secondsdrop_whitespace.parse done in 0.015 secondslxml.etree.XMLParser.feed(): 25317 nodes read in 0.022 secondsminidom tree read in 0.288 seconds

Here are the same benchmarks again, but including the memory usageof the process in KB before and after parsing (using os.fork() tomake sure we start from a clean state each time). For the 274KBhamlet.xml file:

Memory usage: 7284xml.etree.ElementTree.parse done in 0.017 secondsMemory usage: 9432 (+2148)xml.etree.cElementTree.parse done in 0.007 secondsMemory usage: 9432 (+2152)xml.etree.cElementTree.XMLParser.feed(): 6636 nodes read in 0.007 secondsMemory usage: 9448 (+2164)lxml.etree.parse done in 0.003 secondsMemory usage: 11032 (+3748)drop_whitespace.parse done in 0.003 secondsMemory usage: 10224 (+2940)lxml.etree.XMLParser.feed(): 6636 nodes read in 0.004 secondsMemory usage: 11804 (+4520)minidom tree read in 0.080 secondsMemory usage: 12324 (+5040)

And for the 3.4MB Old Testament XML file:

Memory usage: 10420xml.etree.ElementTree.parse done in 0.038 secondsMemory usage: 20660 (+10240)xml.etree.cElementTree.parse done in 0.030 secondsMemory usage: 20660 (+10240)xml.etree.cElementTree.XMLParser.feed(): 25317 nodes read in 0.030 secondsMemory usage: 20844 (+10424)lxml.etree.parse done in 0.016 secondsMemory usage: 27624 (+17204)drop_whitespace.parse done in 0.015 secondsMemory usage: 24468 (+14052)lxml.etree.XMLParser.feed(): 25317 nodes read in 0.022 secondsMemory usage: 29844 (+19424)minidom tree read in 0.288 secondsMemory usage: 28788 (+18368)

As can be seen from the sizes, both lxml.etree and cElementTree arerather memory friendly compared to the pure Python librariesElementTree and (especially) minidom. Comparing to older CPythonversions, the memory footprint of the minidom library was considerablyreduced in CPython 3.3, by about a factor of 4 in this case.

For plain parser performance, lxml.etree and cElementTree tend to stayrather close to each other, usually within a factor of two, withwinners well distributed over both sides. Similar timings can beobserved for theiterparse() function:

lxe: iterparse_bytesIO   (SAXR T1)   17.9198 msec/passcET: iterparse_bytesIO   (SAXR T1)   14.4982 msec/passlxe: iterparse_bytesIO   (UAXR T3)    8.8522 msec/passcET: iterparse_bytesIO   (UAXR T3)   12.9857 msec/pass

However, if you benchmark the complete round-trip of a serialise-parsecycle, the numbers will look similar to these:

lxe: write_utf8_parse_bytesIO   (S-TR T1)   19.8867 msec/passcET: write_utf8_parse_bytesIO   (S-TR T1)   80.7259 msec/passlxe: write_utf8_parse_bytesIO   (UATR T2)   23.7896 msec/passcET: write_utf8_parse_bytesIO   (UATR T2)   98.0766 msec/passlxe: write_utf8_parse_bytesIO   (S-TR T3)    3.0684 msec/passcET: write_utf8_parse_bytesIO   (S-TR T3)   24.6122 msec/passlxe: write_utf8_parse_bytesIO   (SATR T4)    0.3495 msec/passcET: write_utf8_parse_bytesIO   (SATR T4)    1.9610 msec/pass

For applications that require a high parser throughput of large files,and that do little to no serialization, both cET and lxml.etree are agood choice. The cET library is particularly fast for iterparseapplications that extract small amounts of data or aggregateinformation from large XML data sets that do not fit into memory. Ifit comes to round-trip performance, however, lxml is multiple timesfaster in total. So, whenever the input documents are notconsiderably larger than the output, lxml is the clear winner.

Regarding HTML parsing, Ian Bicking has done somebenchmarking onlxml's HTML parser, comparing it to a number of other famous HTMLparser tools for Python. lxml wins this contest by quite a length.To give an idea, the numbers suggest that lxml.html can run a coupleof parse-serialise cycles in the time that other tools need forparsing alone. The comparison even shows some very favourable resultsregarding memory consumption.

Liza Daly has written an article that presents a couple of tweaks toget the most out of lxml's parser for very large XML documents. Shequite favourably positionslxml.etree as a tool forhigh-performance XML parsing.

Finally,xml.com has a couple of publications about XML parserperformance. Farwick and Hafner have written two interesting articlesthat compare the parser of libxml2 to some major Java based XMLparsers. One deals withevent-driven parser performance, the otherone presentsbenchmark results comparing DOM parsers. Bothcomparisons suggest that libxml2's parser performance is largelysuperior to all commonly used Java parsers in almost all cases. Notethat the C parser benchmark results are based onxmlbench, which usesa simpler setup for libxml2 than lxml does.

The ElementTree API

Since all three libraries implement the same API, their performance iseasy to compare in this area. A major disadvantage for lxml'sperformance is the different tree model that underlies libxml2. Itallows lxml to provide parent pointers for elements and full XPathsupport, but also increases the overhead of tree building andrestructuring. This can be seen from the tree setup times of thebenchmark (given in seconds):

lxe:       --     S-     U-     -A     SA     UA     T1: 0.0299 0.0343 0.0344 0.0293 0.0345 0.0342     T2: 0.0368 0.0423 0.0418 0.0427 0.0474 0.0459     T3: 0.0088 0.0084 0.0086 0.0251 0.0258 0.0261     T4: 0.0002 0.0002 0.0002 0.0005 0.0006 0.0006cET:       --     S-     U-     -A     SA     UA     T1: 0.0050 0.0045 0.0093 0.0044 0.0043 0.0043     T2: 0.0073 0.0075 0.0074 0.0201 0.0075 0.0074     T3: 0.0033 0.0213 0.0032 0.0034 0.0033 0.0035     T4: 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

The timings are somewhat close to each other, although cET can beseveral times faster than lxml for larger trees. One of thereasons is that lxml must encode incoming string data and tag namesinto UTF-8, and additionally discard the created Python elementsafter their use, when they are no longer referenced. ElementTreerepresents the tree itself through these objects, which reducesthe overhead in creating them.

Child access

The same tree overhead makes operations like collecting children as inlist(element) more costly in lxml. Where cET can quickly createa shallow copy of their list of children, lxml has to create a Pythonobject for each child and collect them in a list:

lxe: root_list_children        (--TR T1)    0.0038 msec/passcET: root_list_children        (--TR T1)    0.0010 msec/passlxe: root_list_children        (--TR T2)    0.0455 msec/passcET: root_list_children        (--TR T2)    0.0050 msec/pass

This handicap is also visible when accessing single children:

lxe: first_child               (--TR T2)    0.0424 msec/passcET: first_child               (--TR T2)    0.0384 msec/passlxe: last_child                (--TR T1)    0.0477 msec/passcET: last_child                (--TR T1)    0.0467 msec/pass

... unless you also add the time to find a child index in a biggerlist. ET and cET use Python lists here, which are based on arrays.The data structure used by libxml2 is a linked tree, and thus, alinked list of children:

lxe: middle_child              (--TR T1)    0.0710 msec/passcET: middle_child              (--TR T1)    0.0420 msec/passlxe: middle_child              (--TR T2)    1.7393 msec/passcET: middle_child              (--TR T2)    0.0396 msec/pass

Element creation

As opposed to ET, libxml2 has a notion of documents that each element must bein. This results in a major performance difference for creating independentElements that end up in independently created documents:

lxe: create_elements           (--TC T2)    1.0045 msec/passcET: create_elements           (--TC T2)    0.0753 msec/pass

Therefore, it is always preferable to create Elements for the document theyare supposed to end up in, either as SubElements of an Element or using theexplicitElement.makeelement() call:

lxe: makeelement               (--TC T2)    1.0586 msec/passcET: makeelement               (--TC T2)    0.1483 msec/passlxe: create_subelements        (--TC T2)    0.8826 msec/passcET: create_subelements        (--TC T2)    0.0827 msec/pass

So, if the main performance bottleneck of an application is creating large XMLtrees in memory through calls to Element and SubElement, cET is the bestchoice. Note, however, that the serialisation performance may even out thisadvantage, especially for smaller trees and trees with many attributes.

Merging different sources

A critical action for lxml is moving elements between document contexts. Itrequires lxml to do recursive adaptations throughout the moved tree structure.

The following benchmark appends all root children of the second tree to theroot of the first tree:

lxe: append_from_document      (--TR T1,T2)    1.0812 msec/passcET: append_from_document      (--TR T1,T2)    0.1104 msec/passlxe: append_from_document      (--TR T3,T4)    0.0155 msec/passcET: append_from_document      (--TR T3,T4)    0.0060 msec/pass

Although these are fairly small numbers compared to parsing, this easily showsthe different performance classes for lxml and (c)ET. Where the latter do nothave to care about parent pointers and tree structures, lxml has to deeptraverse the appended tree. The performance difference therefore increaseswith the size of the tree that is moved.

This difference is not always as visible, but applies to most parts of theAPI, like inserting newly created elements:

lxe: insert_from_document         (--TR T1,T2)    3.9763 msec/passcET: insert_from_document         (--TR T1,T2)    0.1459 msec/pass

or replacing the child slice by a newly created element:

lxe: replace_children_element   (--TC T1)    0.0749 msec/passcET: replace_children_element   (--TC T1)    0.0081 msec/pass

as opposed to replacing the slice with an existing element from thesame document:

lxe: replace_children           (--TC T1)    0.0052 msec/passcET: replace_children           (--TC T1)    0.0036 msec/pass

While these numbers are too small to provide a major performanceimpact in practice, you should keep this difference in mind when youmerge very large trees. Note that Elements have amakeelement()method that allows to create an Element within the same document,thus avoiding the merge overhead when inserting it into that tree.

deepcopy

Deep copying a tree is fast in lxml:

lxe: deepcopy_all              (--TR T1)    3.1650 msec/passcET: deepcopy_all              (--TR T1)   53.9973 msec/passlxe: deepcopy_all              (-ATR T2)    3.7365 msec/passcET: deepcopy_all              (-ATR T2)   61.6267 msec/passlxe: deepcopy_all              (S-TR T3)    0.7913 msec/passcET: deepcopy_all              (S-TR T3)   13.6220 msec/pass

So, for example, if you have a database-like scenario where you parse in alarge tree and then search and copy independent subtrees from it for furtherprocessing, lxml is by far the best choice here.

Tree traversal

Another important area in XML processing is iteration for treetraversal. If your algorithms can benefit from step-by-steptraversal of the XML tree and especially if few elements are ofinterest or the target element tag name is known, the.iter()method is a good choice:

lxe: iter_all             (--TR T1)    1.0529 msec/passcET: iter_all             (--TR T1)    0.2635 msec/passlxe: iter_islice          (--TR T2)    0.0110 msec/passcET: iter_islice          (--TR T2)    0.0050 msec/passlxe: iter_tag             (--TR T2)    0.0079 msec/passcET: iter_tag             (--TR T2)    0.0112 msec/passlxe: iter_tag_all         (--TR T2)    0.1822 msec/passcET: iter_tag_all         (--TR T2)    0.5343 msec/pass

This translates directly into similar timings forElement.findall():

lxe: findall              (--TR T2)    1.7176 msec/passcET: findall              (--TR T2)    0.9973 msec/passlxe: findall              (--TR T3)    0.3967 msec/passcET: findall              (--TR T3)    0.2525 msec/passlxe: findall_tag          (--TR T2)    0.2258 msec/passcET: findall_tag          (--TR T2)    0.5770 msec/passlxe: findall_tag          (--TR T3)    0.1085 msec/passcET: findall_tag          (--TR T3)    0.1919 msec/pass

Note that all three libraries currently use the same Pythonimplementation for.findall(), except for their native treeiterator (element.iter()). In general, lxml is very fastfor iteration, but loses ground against cET when many Elementsare found and need to be instantiated. So, the more selectiveyour search is, the faster lxml will run.

XPath

The following timings are based on the benchmark scriptbench_xpath.py.

This part of lxml does not have an equivalent in ElementTree. However, lxmlprovides more than one way of accessing it and you should take care which partof the lxml API you use. The most straight forward way is to call thexpath() method on an Element or ElementTree:

lxe: xpath_method         (--TC T1)    0.3982 msec/passlxe: xpath_method         (--TC T2)    7.8895 msec/passlxe: xpath_method         (--TC T3)    0.0477 msec/passlxe: xpath_method         (--TC T4)    0.3982 msec/pass

This is well suited for testing and when the XPath expressions are as diverseas the trees they are called on. However, if you have a single XPathexpression that you want to apply to a larger number of different elements,theXPath class is the most efficient way to do it:

lxe: xpath_class          (--TC T1)    0.0713 msec/passlxe: xpath_class          (--TC T2)    1.1325 msec/passlxe: xpath_class          (--TC T3)    0.0215 msec/passlxe: xpath_class          (--TC T4)    0.0722 msec/pass

Note that this still allows you to use variables in the expression, so you canparse it once and then adapt it through variables at call time. In othercases, where you have a fixed Element or ElementTree and want to run differentexpressions on it, you should consider theXPathEvaluator:

lxe: xpath_element        (--TR T1)    0.1101 msec/passlxe: xpath_element        (--TR T2)    2.0473 msec/passlxe: xpath_element        (--TR T3)    0.0267 msec/passlxe: xpath_element        (--TR T4)    0.1087 msec/pass

While it looks slightly slower, creating an XPath object for each of theexpressions generates a much higher overhead here:

lxe: xpath_class_repeat           (--TC T1   )    0.3884 msec/passlxe: xpath_class_repeat           (--TC T2   )    7.6182 msec/passlxe: xpath_class_repeat           (--TC T3   )    0.0465 msec/passlxe: xpath_class_repeat           (--TC T4   )    0.3877 msec/pass

Note that tree iteration can be substantially faster than XPath ifyour code short-circuits after the first couple of elements werefound. The XPath engine will always return the complete result set,regardless of how much of it will actually be used.

Here is an example where only the first matching element is beingsearched, a case for which XPath has syntax support as well:

lxe: find_single                (--TR T2)    0.0184 msec/passcET: find_single                (--TR T2)    0.0052 msec/passlxe: iter_single                (--TR T2)    0.0024 msec/passcET: iter_single                (--TR T2)    0.0007 msec/passlxe: xpath_single               (--TR T2)    0.0033 msec/pass

When looking for the first two elements out of many, the numbersexplode for XPath, as restricting the result subset requires amore complex expression:

lxe: iterfind_two               (--TR T2)    0.0184 msec/passcET: iterfind_two               (--TR T2)    0.0062 msec/passlxe: iter_two                   (--TR T2)    0.0029 msec/passcET: iter_two                   (--TR T2)    0.0017 msec/passlxe: xpath_two                  (--TR T2)    0.2768 msec/pass

A longer example

... based on lxml 1.3.

A while ago, Uche Ogbuji posted abenchmark proposal that wouldread in a 3MB XML version of theOld Testament of the Bible andlook for the wordbegat in all verses. Apparently, it is containedin 120 out of almost 24000 verses. This is easy to implement inElementTree usingfindall(). However, the fastest and most memoryfriendly way to do this is obviouslyiterparse(), as most of thedata is not of any interest.

Now, Uche's original proposal was more or less the following:

defbench_ET():tree=ElementTree.parse("ot.xml")result=[]forvintree.findall("//v"):text=v.textif'begat'intext:result.append(text)returnlen(result)

which takes about one second on my machine today. The fasteriterparse()variant looks like this:

defbench_ET_iterparse():result=[]forevent,vinElementTree.iterparse("ot.xml"):ifv.tag=='v':text=v.textif'begat'intext:result.append(text)v.clear()returnlen(result)

The improvement is about 10%. At the time I first tried (early 2006), lxmldidn't haveiterparse() support, but thefindall() variant was alreadyfaster than ElementTree. This changes immediately when you switch tocElementTree. The latter only needs 0.17 seconds to do the trick today andonly some impressive 0.10 seconds when running the iterparse version. Andeven back then, it was quite a bit faster than what lxml could achieve.

Since then, lxml has matured a lot and has gotten much faster. The iterparsevariant now runs in 0.14 seconds, and if you remove thev.clear(), it iseven a little faster (which isn't the case for cElementTree).

One of the many great tools in lxml is XPath, a Swiss army knife for findingthings in XML documents. It is possible to move the whole thing to a pureXPath implementation, which looks like this:

defbench_lxml_xpath_all():tree=etree.parse("ot.xml")result=tree.xpath("//v[contains(., 'begat')]/text()")returnlen(result)

This runs in about 0.13 seconds and is about the shortest possibleimplementation (in lines of Python code) that I could come up with. Now, thisis already a rather complex XPath expression compared to the simple "//v"ElementPath expression we started with. Since this is also valid XPath, let'stry this instead:

defbench_lxml_xpath():tree=etree.parse("ot.xml")result=[]forvintree.xpath("//v"):text=v.textif'begat'intext:result.append(text)returnlen(result)

This gets us down to 0.12 seconds, thus showing that a generic XPathevaluation engine cannot always compete with a simpler, tailored solution.However, since this is not much different from the original findall variant,we can remove the complexity of the XPath call completely and just go withwhat we had in the beginning. Under lxml, this runs in the same 0.12 seconds.

But there is one thing left to try. We can replace the simple ElementPathexpression with a native tree iterator:

defbench_lxml_getiterator():tree=etree.parse("ot.xml")result=[]forvintree.getiterator("v"):text=v.textif'begat'intext:result.append(text)returnlen(result)

This implements the same thing, just without the overhead of parsing andevaluating a path expression. And this makes it another bit faster, down to0.11 seconds. For comparison, cElementTree runs this version in 0.17 seconds.

So, what have we learned?

Python code is not slow. The pure XPath solution was not even as fast asthe first shot Python implementation. In general, a few more lines inPython make things more readable, which is much more important than the last5% of performance.
It's important to know the available options - and it's worth starting withthe most simple one. In this case, a programmer would then probably havestarted withgetiterator("v") oriterparse(). Either of them wouldalready have been the most efficient, depending on which library is used.
It's important to know your tool. lxml and cElementTree are both very fastlibraries, but they do not have the same performance characteristics. Thefastest solution in one library can be comparatively slow in the other. Ifyou optimise, optimise for the specific target platform.
It's not always worth optimising. After all that hassle we got from 0.12seconds for the initial implementation to 0.11 seconds. Switching over tocElementTree and writing aniterparse() based version would have givenus 0.10 seconds - not a big difference for 3MB of XML.
Take care what operation is really dominating in your use case. If we splitup the operations, we can see that lxml is slightly slower than cElementTreeonparse() (both about 0.06 seconds), but more visibly slower oniterparse(): 0.07 versus 0.10 seconds. However, tree iteration in lxmlis incredibly fast, so it can be better to parse the whole tree and theniterate over it rather than usingiterparse() to do both in one step.Or, you can just wait for the lxml developers to optimise iterparse in oneof the next releases...

lxml.objectify

The following timings are based on the benchmark scriptbench_objectify.py.

Objectify is a data-binding API for XML based on lxml.etree, that was added inversion 1.1. It uses standard Python attribute access to traverse the XMLtree. It also features ObjectPath, a fast path language based on the samememe.

Just like lxml.etree, lxml.objectify creates Python representations ofelements on the fly. To save memory, the normal Python garbage collectionmechanisms will discard them when their last reference is gone. In caseswhere deeply nested elements are frequently accessed through the objectifyAPI, the create-discard cycles can become a bottleneck, as elements have to beinstantiated over and over again.

ObjectPath

ObjectPath can be used to speed up the access to elements that are deep in thetree. It avoids step-by-step Python element instantiations along the path,which can substantially improve the access time:

lxe: attribute                  (--TR T1)    4.1828 msec/passlxe: attribute                  (--TR T2)   17.3802 msec/passlxe: attribute                  (--TR T4)    3.8657 msec/passlxe: objectpath                 (--TR T1)    0.9289 msec/passlxe: objectpath                 (--TR T2)   13.3109 msec/passlxe: objectpath                 (--TR T4)    0.9289 msec/passlxe: attributes_deep            (--TR T1)    6.2900 msec/passlxe: attributes_deep            (--TR T2)   20.4713 msec/passlxe: attributes_deep            (--TR T4)    6.1679 msec/passlxe: objectpath_deep            (--TR T1)    1.3049 msec/passlxe: objectpath_deep            (--TR T2)   14.0815 msec/passlxe: objectpath_deep            (--TR T4)    1.3051 msec/pass

Note, however, that parsing ObjectPath expressions is not for free either, sothis is most effective for frequently accessing the same element.

Caching Elements

A way to improve the normal attribute access time is static instantiation ofthe Python objects, thus trading memory for speed. Just create a cachedictionary and run:

cache[root]=list(root.iter())

after parsing and:

delcache[root]

when you are done with the tree. This will keep the Python elementrepresentations of all elements alive and thus avoid the overhead of repeatedPython object creation. You can also consider using filters or generatorexpressions to be more selective. By choosing the right trees (or evensubtrees and elements) to cache, you can trade memory usage against accessspeed:

lxe: attribute_cached           (--TR T1)    3.1357 msec/passlxe: attribute_cached           (--TR T2)   15.8911 msec/passlxe: attribute_cached           (--TR T4)    2.9194 msec/passlxe: attributes_deep_cached     (--TR T1)    3.8984 msec/passlxe: attributes_deep_cached     (--TR T2)   16.8300 msec/passlxe: attributes_deep_cached     (--TR T4)    3.6936 msec/passlxe: objectpath_deep_cached     (--TR T1)    0.7496 msec/passlxe: objectpath_deep_cached     (--TR T2)   12.3763 msec/passlxe: objectpath_deep_cached     (--TR T4)    0.7427 msec/pass

Things to note: you cannot currently useweakref.WeakKeyDictionary objectsfor this as lxml's element objects do not support weak references (which arecostly in terms of memory). Also note that new element objects that you addto these trees will not turn up in the cache automatically and will thereforestill be garbage collected when all their Python references are gone, so thisis most effective for largely immutable trees. You should consider using aset instead of a list in this case and add new elements by hand.

Further optimisations

Here are some more things to try if optimisation is required:

A lot of time is usually spent in tree traversal to find the addressedelements in the tree. If you often work in subtrees, do what you would alsodo with deep Python objects: assign the parent of the subtree to a variableor pass it into functions instead of starting at the root. This allowsaccessing its descendants more directly.
Try assigning data values directly to attributes instead of passing themthrough DataElement.
If you use custom data types that are costly to parse, try runningobjectify.annotate() over read-only trees to speed up the attribute typeinference on read access.

Note that none of these measures is guaranteed to speed up your application.As usual, you should prefer readable code over premature optimisations andprofile your expected use cases before bothering to apply optimisations atrandom.

Movatterモバイル変換