Movatterモバイル変換


[0]ホーム

URL:


[Python-Dev] Fixing the XML batteries

Calvin Spealmanironfroggy at gmail.com
Mon Feb 6 13:48:00 CET 2012


On Dec 9, 2011 3:04 AM, "Stefan Behnel" <stefan_ml at behnel.de> wrote:>> Hi everyone,>> I think Py3.3 would be a good milestone for cleaning up the stdlibsupport for XML. Note upfront: you may or may not know me as the maintainerof lxml, the de-facto non-stdlib standard Python XML tool. This (lengthy)post was triggered by the following kind of conversation that I keep havingwith new XML users in Python (mostly on c.l.py), which hints at someserious flaw in the stdlib.>> User: I'm trying to do XML stuff XYZ in Python and have problem ABC.> Me: What library are you using? Could you show us some code?> User: My code looks like this snippet: ...> Me: You are using minidom which is known to be hard to use, slow and useslots of memory. Use the xml.etree.ElementTree package instead, or ratherits C implementation cElementTree, also in the stdlib.> User (coming back after a while): thanks, that was exactly what [I didn'tknow] I was looking for.>> What does this tell us?>> 1) MiniDOM is what new users find first. It's highly visible becausethere are still lots of ancient "Python and XML" web pages out there thatdate back from the time before Python 2.5 (or rather something like 2.2),when it was the only XML tree library in the stdlib. It's also the firsthit from the top when you search for "XML" on the stdlib docs page andcontains the (to some people) familiar word "DOM", which lets users stoptheir search and start writing code, not expecting to find a separatealternative in the same stdlib, way further down. And the description as"mini", "simple" and "lightweight" suggests to users that it's going to beeasy to use and efficient.>> 2) MiniDOM is not what users want. It leads to complicated, unpythoniccode and lots of problems. It is neither easy to use, nor efficient, nor"lightweight", "simple" or "mini", not in absolute numbers (seehttp://bugs.python.org/issue11379#msg148584 and following for a recentdiscussion). It's also badly maintained in the sense that its performancecharacteristics could likely be improved, but no-one is seriouslyinterested in doing that, because it would not lead to something thatactually *is* fast or memory friendly compared to any of the 'real'alternatives that are available right now.>> 3) ElementTree is what users should use, MiniDOM is not. ElementTree wasadded to the stdlib in Py2.5 on popular demand, exactly because it is veryeasy to use, very fast, and very memory friendly. And because users did notwant to use MiniDOM any more. Today, ElementTree has a rather straightupgrade path towards lxml.etree if more XML features like validation orXSLT are needed. MiniDOM has nothing like that to offer. It's a dead end.>> 4) In the stdlib, cElementTree is independent of ElementTree, but totallyhidden in the documentation. In conversations like the above, it'sunnecessarily complex to explain to users that there is ElementTree (whichis documented in the stdlib), but that what they want to use is reallycElementTree, which has the same API but does not have a stdlibdocumentation page that I can send them to. Note that the other Pythonimplementations simply provide cElementTree as an alias for ElementTree.That leaves CPython as the only Python implementation that really has thesetwo separate modules.>> So, there are many problems here. And I think they make it unnecessarilycomplicated for users to process XML in Python and that the currentsituation helps in turning away new users from Python as a language for XMLprocessing. Python does have impressively great tools for working with XML.It's just that the stdlib and its documentation do not reflect or evenappreciate that.>> What should change?>> a) The stdlib documentation should help users to choose the right toolright from the start. Instead of using the totally misleading wording thatit uses now, it should be honest about the performance characteristics ofMiniDOM and should actively suggest that those who don't know what tochoose (or even *that* they can choose) should not use MiniDOM in the firstplace. I created a ticket (issue11379) for a minor step in this direction,but given the responses, I'm rather convinced that there's a lot more thatcan be done and should be done, and that it should be done now, right forthe next release.>> b) cElementTree should finally loose it's "special" status as a separatelibrary and disappear as an accelerator module behind ElementTree. This hasbeen suggested a couple of times already, and AFAIR, there was someopposition because 1) ET was maintained outside of the stdlib and 2) theAPIs of both were not identical. However, getting ET 1.3 into Py2.7 and 3.2was a U-turn. Today, ET is *only* being maintained in the stdlib by FlorentXicluna (who is doing a good job with it), and ET 1.3 has basically madethe APIs of both implementations compatible again. So, 3.3 would be theright milestone for fixing the "two libs for one" quirk.>> Given that this is the third time during the last couple of years thatI'm suggesting to finally fix the stdlib and its documentation, I won'tprovide any further patches before it has finally been accepted that a)this is a problem and b) it should be fixed, thus allowing the patches toactually serve a purpose. If we can agree on that, I'll happily help inmaking this change happen.>> Stefan>>this gets a strong +1 from me and, I suspect, anyone else who spends asignificant amount of time in any of the python support communities(python-list, #python, etc). Defaults exist not only in our code, but alsoin our documentation and presentation, and those defaults are wrong here._______________________________________________> Python-Dev mailing list>Python-Dev at python.org>http://mail.python.org/mailman/listinfo/python-dev> Unsubscribe:http://mail.python.org/mailman/options/python-dev/ironfroggy%40gmail.com-------------- next part --------------An HTML attachment was scrubbed...URL: <http://mail.python.org/pipermail/python-dev/attachments/20120206/5e30d3a4/attachment.html>


More information about the Python-Devmailing list

[8]ページ先頭

©2009-2025 Movatter.jp