- Notifications
You must be signed in to change notification settings - Fork239
High-performance browser-grade HTML5 parser
License
Apache-2.0, MIT licenses found
Licenses found
servo/html5ever
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
html5ever is an HTML parser developed as part of theServo project.
It can parse and serialize HTML according to theWHATWG specs (aka "HTML5"). However, there are some differences in the actual behavior currently, most of which are documentedin the bug tracker. html5ever passes all tokenizer tests fromhtml5lib-tests, with most tree builder tests outside of the unimplemented features. The goal is to pass all html5lib tests, while also providing all hooks needed by a production web browser, e.g.document.write
.
Note that the HTML syntax is very similar to XML. For correct parsing of XHTML, use an XML parser (that said, many XHTML documents in the wild are serialized in an HTML-compatible form).
html5ever is written inRust, therefore it avoids the notorious security problems that come along with using C. Being built with Rust also makes the library come with the high-grade performance you would expect from an HTML parser written in C. html5ever is basically a C HTML parser, but without needing a garbage collector or other heavy runtime processes.
Add html5ever as a dependency:
cargo add html5ever
You should also take a look atexamples/html2html.rs
,examples/print-rcdom.rs
, and theAPI documentation.
Bindings for Python and other languages are much desired.
To fetch the test suite, you need to run
git submodule update --init
Runcargo doc
in the repository root to build local documentation undertarget/doc/
.
html5ever uses callbacks to manipulate the DOM, therefore it does not provide any DOM tree representation.
html5ever exclusively uses UTF-8 to represent strings. In the future it will support other document encodings (and UCS-2document.write
) by converting input.
The code is cross-referenced with the WHATWG syntax spec, and eventually we will have a way to present code and spec side-by-side.
html5ever builds against the official stable releases of Rust, though some optimizations are only supported on nightly releases.
About
High-performance browser-grade HTML5 parser
Resources
License
Apache-2.0, MIT licenses found
Licenses found
Uh oh!
There was an error while loading.Please reload this page.