|
1 |
| -"""A collection of modules for building different kinds oftree from |
2 |
| -HTMLdocuments. |
| 1 | +"""A collection of modules for building different kinds oftrees from HTML |
| 2 | +documents. |
3 | 3 |
|
4 | 4 | To create a treebuilder for a new type of tree, you need to do
|
5 | 5 | implement several things:
|
6 | 6 |
|
7 |
| -1) A set of classes for various types of elements: Document, Doctype, |
8 |
| -Comment, Element. These must implement the interface of |
9 |
| -_base.treebuilders.Node (although comment nodes have a different |
10 |
| -signature for their constructor, see treebuilders.etree.Comment) |
11 |
| -Textual content may also be implemented as another node type, or not, as |
12 |
| -your tree implementation requires. |
13 |
| -
|
14 |
| -2) A treebuilder object (called TreeBuilder by convention) that |
15 |
| -inherits from treebuilders._base.TreeBuilder. This has 4 required attributes: |
16 |
| -documentClass - the class to use for the bottommost node of a document |
17 |
| -elementClass - the class to use for HTML Elements |
18 |
| -commentClass - the class to use for comments |
19 |
| -doctypeClass - the class to use for doctypes |
20 |
| -It also has one required method: |
21 |
| -getDocument - Returns the root node of the complete document tree |
22 |
| -
|
23 |
| -3) If you wish to run the unit tests, you must also create a |
24 |
| -testSerializer method on your treebuilder which accepts a node and |
25 |
| -returns a string containing Node and its children serialized according |
26 |
| -to the format used in the unittests |
| 7 | +1. A set of classes for various types of elements: Document, Doctype, Comment, |
| 8 | + Element. These must implement the interface of ``base.treebuilders.Node`` |
| 9 | + (although comment nodes have a different signature for their constructor, |
| 10 | + see ``treebuilders.etree.Comment``) Textual content may also be implemented |
| 11 | + as another node type, or not, as your tree implementation requires. |
| 12 | +
|
| 13 | +2. A treebuilder object (called ``TreeBuilder`` by convention) that inherits |
| 14 | + from ``treebuilders.base.TreeBuilder``. This has 4 required attributes: |
| 15 | +
|
| 16 | + * ``documentClass`` - the class to use for the bottommost node of a document |
| 17 | + * ``elementClass`` - the class to use for HTML Elements |
| 18 | + * ``commentClass`` - the class to use for comments |
| 19 | + * ``doctypeClass`` - the class to use for doctypes |
| 20 | +
|
| 21 | + It also has one required method: |
| 22 | +
|
| 23 | + * ``getDocument`` - Returns the root node of the complete document tree |
| 24 | +
|
| 25 | +3. If you wish to run the unit tests, you must also create a ``testSerializer`` |
| 26 | + method on your treebuilder which accepts a node and returns a string |
| 27 | + containing Node and its children serialized according to the format used in |
| 28 | + the unittests |
| 29 | +
|
27 | 30 | """
|
28 | 31 |
|
29 | 32 | from __future__importabsolute_import,division,unicode_literals
|
|
34 | 37 |
|
35 | 38 |
|
36 | 39 | defgetTreeBuilder(treeType,implementation=None,**kwargs):
|
37 |
| -"""Get a TreeBuilder class for various types of tree with built-in support |
38 |
| -
|
39 |
| - treeType - the name of the tree type required (case-insensitive). Supported |
40 |
| - values are: |
41 |
| -
|
42 |
| - "dom" - A generic builder for DOM implementations, defaulting to |
43 |
| - a xml.dom.minidom based implementation. |
44 |
| - "etree" - A generic builder for tree implementations exposing an |
45 |
| - ElementTree-like interface, defaulting to |
46 |
| - xml.etree.cElementTree if available and |
47 |
| - xml.etree.ElementTree if not. |
48 |
| - "lxml" - A etree-based builder for lxml.etree, handling |
49 |
| - limitations of lxml's implementation. |
50 |
| -
|
51 |
| - implementation - (Currently applies to the "etree" and "dom" tree types). A |
52 |
| - module implementing the tree type e.g. |
53 |
| - xml.etree.ElementTree or xml.etree.cElementTree.""" |
| 40 | +"""Get a TreeBuilder class for various types of trees with built-in support |
| 41 | +
|
| 42 | + :arg treeType: the name of the tree type required (case-insensitive). Supported |
| 43 | + values are: |
| 44 | +
|
| 45 | + * "dom" - A generic builder for DOM implementations, defaulting to a |
| 46 | + xml.dom.minidom based implementation. |
| 47 | + * "etree" - A generic builder for tree implementations exposing an |
| 48 | + ElementTree-like interface, defaulting to xml.etree.cElementTree if |
| 49 | + available and xml.etree.ElementTree if not. |
| 50 | + * "lxml" - A etree-based builder for lxml.etree, handling limitations |
| 51 | + of lxml's implementation. |
| 52 | +
|
| 53 | + :arg implementation: (Currently applies to the "etree" and "dom" tree |
| 54 | + types). A module implementing the tree type e.g. xml.etree.ElementTree |
| 55 | + or xml.etree.cElementTree. |
| 56 | +
|
| 57 | + :arg kwargs: Any additional options to pass to the TreeBuilder when |
| 58 | + creating it. |
| 59 | +
|
| 60 | + Example: |
| 61 | +
|
| 62 | + >>> from html5lib.treebuilders import getTreeBuilder |
| 63 | + >>> builder = getTreeBuilder('etree') |
| 64 | +
|
| 65 | + """ |
54 | 66 |
|
55 | 67 | treeType=treeType.lower()
|
56 | 68 | iftreeTypenotintreeBuilderCache:
|
|