Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Compile html5lib with Cython#524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
gsnedders wants to merge22 commits intohtml5lib:master
base:master
Choose a base branch
Loading
fromgsnedders:cythonzied

Conversation

gsnedders
Copy link
Member

Use Cython to make the parser quicker; see#445. This builds on top of#272.

This is a long way from ready to land, but shows potential. We probably also want to split this up so many of the earlier changes land first if they make performance sense without Cython.

The change to attribute representation especially might be of interest to#521 (cc@jayaddison).

There's also some API changes towards the end of the branch which we may well want to delay landing even beyond the rest of the Cython stuff.

This added a fair bit of complexity, and notable made the Phase classesdynamically generated.However, by doing this, we no longer include "process thetoken using the rules for" phases in the debug log.
This allows us to define the argument as an int in Cython
This is in preparation for Cython using C function pointers for _state
The current _ascii module is a placeholder, because I accidentallydeleted the original implementation of it (but I needed to rewrite itto be even quicker anyway!)
This makes duplicate checking much quicker, and avoids theconversion to a dict at the end
# then stop
if self.chunkOffset != self.chunkSize:
while True:
# this really should be a slice of self.chunk, but https://github.com/cython/cython/issues/3536
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

with cython.boundscheck(False), cython.wraparound(False):
c = self.chunk[i]
if c > 0x7F and opposite or c <= 0x7F and (bitmap[index(c)] & bit(c)):
cyrv += self.chunk[self.chunkOffset:i]
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

refcython/cython#3887 (IIRC, this was a significant perf issue)

@jayaddison
Copy link
Contributor

Thanks@gsnedders - I see that the attribute names/values are no longer accessed via tuple/list indexes (a nice improvement), and that the element attributes are initialized as attribute maps (rather than being converted into them at the end of the process).

Those are similar to the approach taken in#521 as well. What's the general approach to review, quality/performance assurance and merge for each pull request? It looks tricky to review this changeset as one combined change (given the size of the diff) - could we create a list of the proposed changes (with dependencies / ordering between them) and queue them up for review?

@gsnedders
Copy link
MemberAuthor

What's the general approach to review, quality/performance assurance and merge for each pull request? It looks tricky to review this changeset as one combined change (given the size of the diff) - could we create a list of the proposed changes (with dependencies / ordering between them) and queue them up for review?

Yeah, I was mostly just throwing this up as one PR so at least the branch exists somewhere that isn't my local disk!

Also not sure why it fails on 2.7 or 3.6 on CI; it doesn't fail locally!

@jgrahamjgraham marked this pull request as draftMarch 10, 2023 16:21
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@gsnedders@jayaddison

[8]ページ先頭

©2009-2025 Movatter.jp