Commitd85d895

committed

Optimised entity lookup a bit. (Reduces tokenisation time by around 10% in some cases.)

--HG--extra : convert_revision : svn%3Aacbfec75-9323-0410-a652-858a13e371e0/trunk%401157

1 parentbf696f4 commitd85d895Copy full SHA for d85d895

File tree

-2

lines changed

-2

lines changed

Lines changed: 6 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -16,6 +16,11 @@`
`16`	`16`
`17`	`17`	`frominputstreamimportHTMLInputStream`
`18`	`18`
	`19`	`+# Group entities by their first character, for faster lookups`
	`20`	`+entitiesByFirstChar= {}`
	`21`	`+foreinentities:`
	`22`	`+entitiesByFirstChar.setdefault(e[0], []).append(e)`
	`23`	`+`
`19`	`24`	`classHTMLTokenizer(object):`
`20`	`25`	`""" This class takes care of tokenizing HTML.`
`21`	`26`
`@@ -224,8 +229,7 @@ def consumeEntity(self, allowedChar=None, fromAttribute=False):`
`224`	`229`	`#`
`225`	`230`	`# Consume characters and compare to these to a substring of the`
`226`	`231`	`# entity names in the list until the substring no longer matches.`
`227`		`-filteredEntityList= [eforeinentitiesif \`
`228`		`-e.startswith(charStack[0])]`
	`232`	`+filteredEntityList=entitiesByFirstChar.get(charStack[0], [])`
`229`	`233`
`230`	`234`	`defentitiesStartingWith(name):`
`231`	`235`	`return [eforeinfilteredEntityListife.startswith(name)]`

Comments

(0)