Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Various html5lib improvements #119

Open
@kovidgoyal

Description

@kovidgoyal

This is just a FYI, I have been working on a modified version of html5lib that achieves the following goals:

  1. Preserves attribute order
  2. Optionally includes line and column number information when parsing
  3. Handles XML namespaces correctly, so that if you happen to parse an XHTML document with html5lib you dont lose all the namespace information
  4. Create a new treebuilder for lxml
  5. Various performance improvements

Using my new lxml treebuilder parsing performance with line numbers and attribute order preservation is the same as for vanilla html5lib with its builtin treebuilder. The speed improvements come mainly from the new lxml builder and an optimized inputstream class for in memory streams.

I make no claims as to the relevance of my work for html5lib. I am just sharing it with you as a way of giving back. You are welcome to use the patches or not. Feel free to ask if you need any clarification.

The code is inhttps://github.com/kovidgoyal/calibre/tree/master/src/html5lib (these are the changes to html5lib itself)

and the lxml builder is in:
https://github.com/kovidgoyal/calibre/blob/master/src/calibre/ebooks/oeb/polish/parsing.py

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp