lxml supports a number of interesting languages for tree traversal and elementselection. The most important is obviouslyXPath, but there is alsoObjectPath in thelxml.objectify module. The newest child of this familyisCSS selection, which is made available in form of thelxml.cssselectmodule.
Although it started its life in lxml,cssselect is now an independent project.It translates CSS selectors to XPath 1.0 expressions that can be used withlxml's XPath engine.lxml.cssselect adds a few convenience shortcuts intothat package.
To installcssselect, run
pip install cssselect
lxml will then import and use it automatically.
The most important class in thelxml.cssselect module isCSSSelector. Itprovides the same interface as theXPath class, but accepts a CSS selectorexpression as input:
>>>fromlxml.cssselectimportCSSSelector>>>sel=CSSSelector('div.content')>>>sel#doctest: +ELLIPSIS<CSSSelector ... for 'div.content'>>>>sel.css'div.content'
The selector actually compiles to XPath, and you can see theexpression by inspecting the object:
>>>sel.path"descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' content ')]"
To use the selector, simply call it with a document or elementobject:
>>>fromlxml.etreeimportfromstring>>>h=fromstring('''<div>... <div>... text... </div></div>''')>>>[e.get('id')foreinsel(h)]['inner']
UsingCSSSelector is equivalent to translating withcssselectand using theXPath class:
>>>fromcssselectimportGenericTranslator>>>fromlxml.etreeimportXPath>>>sel=XPath(GenericTranslator().css_to_xpath('div.content'))
CSSSelector takes atranslator parameter to let you choose whichtranslator to use. It can be'xml' (the default),'xhtml','html'or aTranslator object.
lxmlElement objects have acssselect convenience method.
>>>h.cssselect('div.content')==sel(h)True
Note however that pre-compiling the expression with theCSSSelector orXPath class can provide a substantial speedup.
The method also accepts atranslator parameter. OnHtmlElementobjects, the default is changed to'html'.
MostLevel 3 selectors are supported. The details are in thecssselect documentation.
In CSS you can usenamespace-prefix|element, similar tonamespace-prefix:element in an XPath expression. In fact, it mapsone-to-one, and the same rules are used to map namespace prefixes tonamespace URIs: theCSSSelector class accepts a dictionary as itsnamespaces argument.