Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit6e822c2

Browse files
committed
updated README, pulled all 0.95 - 1.0 changes from git logs
1 parentae6520f commit6e822c2

File tree

2 files changed

+127
-52
lines changed

2 files changed

+127
-52
lines changed

‎CHANGES.rst

Lines changed: 49 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,56 @@ Change Log
66

77
Released on XXX, 2013
88

9+
* Implementation updated to implement the `HTML specification
10+
<http://www.whatwg.org/specs/web-apps/current-work/>`_ as of 5th May
11+
2013 (`SVN<http://svn.whatwg.org/webapps/>`_ revision r7867).
12+
13+
* Python 3.2+ supported in a single codebase using the ``six`` library.
14+
15+
* Removed support for Python 2.5 and older.
16+
17+
* Removed the deprecated Beautiful Soup 3 treebuilder.
18+
``beautifulsoup4`` can use ``html5lib`` as a parser instead. Note that
19+
since it doesn't support namespaces, foreign content like SVG and
20+
MathML is parsed incorrectly.
21+
922
* Removed ``simpletree`` from the package. The default tree builder is
10-
now ``etree`` (using the ``xml.etree.ElementTree/cElementTree``
11-
implementation).
23+
now ``etree`` (using the ``xml.etree.cElementTree`` implementation if
24+
available, and ``xml.etree.ElementTree`` otherwise).
25+
26+
* Removed the ``XHTMLSerializer`` as it never actually guaranteed its
27+
output was well-formed XML, and hence provided little of use.
28+
29+
* Optional heuristic character encoding detection now based on
30+
``charade`` for Python 2.6 - 3.3 compatibility.
31+
32+
* Optional ``Genshi`` treewalker support fixed.
33+
34+
* Many bugfixes, including:
35+
36+
* #33: null in attribute value breaks XML AttValue;
37+
38+
* #4: nested, indirect descendant, <button> causes infinite loop;
39+
40+
* `Google Code 215
41+
<http://code.google.com/p/html5lib/issues/detail?id=215>`_: Properly
42+
detect seekable streams;
43+
44+
* `Google Code 206
45+
<http://code.google.com/p/html5lib/issues/detail?id=206>`_: add
46+
support for <video preload=...>, <audio preload=...>;
47+
48+
* `Google Code 205
49+
<http://code.google.com/p/html5lib/issues/detail?id=205>`_: add
50+
support for <video poster=...>;
51+
52+
* `Google Code 202
53+
<http://code.google.com/p/html5lib/issues/detail?id=202>`_: Unicode
54+
file breaks InputStream.
55+
56+
* Source code is now mostly PEP 8 compliant.
57+
58+
* Test harness has been improved and now depends on ``nose``.
1259

1360

1461
0.95

‎README.rst

Lines changed: 78 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,63 +1,98 @@
11
html5lib
22
========
33

4+
..image::https://travis-ci.org/html5lib/html5lib-python.png?branch=master
5+
:target:https://travis-ci.org/html5lib/html5lib-python
6+
47
html5lib is a pure-python library for parsing HTML. It is designed to
58
conform to the WHATWG HTML specification, as is implemented by all major
69
web browsers.
710

811

9-
Requirements
10-
------------
12+
Usage
13+
-----
1114

12-
Python 2.6 and above as well as Python 3.0 and above are
13-
supported. Implementations known to work are CPython (as the reference
14-
implementation) and PyPy. Jython is known *not* to work due to various
15-
bugs in its implementation of the language. Others such as IronPython
16-
may or may not work; if you wish to try, you are strongly encouraged
17-
to run the testsuite and report back!
15+
Simple usage follows this pattern:
1816

19-
The only required library dependency is ``six``, this can be found
20-
packaged in PyPI.
17+
..code-block::python
2118
22-
Optionally:
19+
import html5lib
20+
withopen("mydocument.html","rb")as f:
21+
document= html5lib.parse(f)
2322
24-
- ``datrie`` can be used to improve parsing performance (though in
25-
almost all cases the improvement is marginal);
23+
or:
2624

27-
- ``lxml`` is supported as a tree format (for both building and
28-
walking) under CPython (but *not* PyPy where it is known to cause
29-
segfaults);
25+
..code-block::python
3026
31-
- ``genshi`` has a treewalker (but not builder); and
27+
import html5lib
28+
document= html5lib.parse("<p>Hello World!")
3229
33-
- ``charade`` can be used as a fallback when character encoding cannot
34-
be determined; ``chardet``, from which it was forked, can also be used
35-
on Python 2.
30+
By default, the ``document`` will be an ``xml.etree`` element instance.
31+
Whenever possible, html5lib chooses the accelerated ``ElementTree``
32+
implementation (i.e. ``xml.etree.cElementTree`` on Python 2.x).
33+
34+
Two other tree types are supported: ``xml.dom.minidom`` and
35+
``lxml.etree``. To use an alternative format, specify the name of
36+
a treebuilder:
37+
38+
..code-block::python
39+
40+
import html5lib
41+
withopen("mydocument.html","rb")as f:
42+
lxml_etree_document= html5lib.parse(f,treebuilder="lxml")
43+
44+
To have more control over the parser, create a parser object explicitly.
45+
For instance, to make the parser raise exceptions on parse errors, use:
46+
47+
..code-block::python
48+
49+
import html5lib
50+
withopen("mydocument.html","rb")as f:
51+
parser= html5lib.HTMLParser(strict=True)
52+
document= parser.parse(f)
53+
54+
When you're instantiating parser objects explicitly, pass a treebuilder
55+
class as the ``tree`` keyword argument to use an alternative document
56+
format:
57+
58+
..code-block::python
59+
60+
import html5lib
61+
parser= html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))
62+
minidom_document= parser.parse("<p>Hello World!")
63+
64+
More documentation is available at http://html5lib.readthedocs.org/.
3665

3766

3867
Installation
3968
------------
4069

41-
html5lib is packaged with distutils. To install it use::
70+
html5lib works on CPython 2.6+, CPython 3.2+ and PyPy. To install it,
71+
use:
4272

43-
$ python setup.py install
73+
..code-block::bash
4474
75+
$ pip install html5lib
4576
46-
Usage
47-
-----
4877
49-
Simple usage follows this pattern::
78+
Optional Dependencies
79+
---------------------
5080

51-
import html5lib
52-
with open("mydocument.html", "r") as fp:
53-
document = html5lib.parse(f)
81+
The following third-party libraries may be used for additional
82+
functionality:
5483

55-
or::
84+
- ``datrie`` can be used to improve parsing performance (though in
85+
almost all cases the improvement is marginal);
5686

57-
import html5lib
58-
document = html5lib.parse("<p>Hello World!")
87+
- ``lxml`` is supported as a tree format (for both building and
88+
walking) under CPython (but *not* PyPy where it is known to cause
89+
segfaults);
5990

60-
More documentation is available in the docstrings.
91+
- ``genshi`` has a treewalker (but not builder); and
92+
93+
- ``charade`` can be used as a fallback when character encoding cannot
94+
be determined; ``chardet``, from which it was forked, can also be used
95+
on Python 2.
6196

6297

6398
Bugs
@@ -70,28 +105,21 @@ Please report any bugs on the `issue tracker
70105
Tests
71106
-----
72107

73-
These are contained in the html5lib-tests repository and included as a
74-
submodule, thus for git checkouts they must be initialized (for
75-
release tarballs this is unneeded)::
108+
Unit tests require the ``nose`` library and can be run using the
109+
``nosetests`` command in the root directory. All should pass.
110+
111+
Test data are contained in a separate `html5lib-tests
112+
<https://github.com/html5lib/html5lib-tests>`_ repository and included
113+
as a submodule, thus for git checkouts they must be initialized::
76114

77115
$ git submodule init
78116
$ git submodule update
79117

80-
And then they can be run, with ``nose`` installed, using the
81-
``nosetests`` command in the root directory. All should pass.
118+
This is unneeded for release tarballs.
82119

83120
If you have all compatible Python implementations available on your
84-
system, you can run tests on all of them by using tox::
85-
86-
$ pip install tox
87-
$ tox
88-
...
89-
_______________________ summary ______________________
90-
py26: commands succeeded
91-
py27: commands succeeded
92-
py32: commands succeeded
93-
py33: commands succeeded
94-
congratulations :)
121+
system, you can run tests on all of them using the ``tox`` utility,
122+
which can be found on PyPI.
95123

96124

97125
Contributing
@@ -121,5 +149,5 @@ Questions?
121149

122150
There's a mailing list available for support on Google Groups,
123151
`html5lib-discuss<http://groups.google.com/group/html5lib-discuss>`_,
124-
though you mayhave more success (and get a far quicker response)
125-
asking on IRC in #whatwg onirc.freenode.net.
152+
though you mayget a quicker response asking on IRC in #whatwg on
153+
irc.freenode.net.

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp