- Notifications
You must be signed in to change notification settings - Fork294
Open
Description
Would it be possible to add position information, i.e. line+column to text nodes? Or, at least make this information available to the tree builder? I implemented a very minimal proof of concept to add the information to each token and pass that along to the dom tree builder and obtain the following result:
import html5libhtml = '<div>&<p>b<span>c</span></p> cab</div>'parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))doc = parser.parse(html)def parse(n): for c in n.childNodes: if hasattr(c, 'sourcepos'): print(c.sourcepos, c) parse(c)parse(doc)
None <DOM Element: head at 0x10bbed0d0>None <DOM Element: body at 0x10bbed1f0>(1, 5) <DOM Element: div at 0x10bbfb790>(1, 10) <DOM Text node "'&'">(1, 13) <DOM Element: p at 0x10bbfb820>(1, 14) <DOM Text node "'b'">(1, 20) <DOM Element: span at 0x10bbfb8b0>(1, 21) <DOM Text node "'c'">(1, 33) <DOM Text node "' '">(1, 36) <DOM Text node "'cab'">
I would be willing to implement it.
Metadata
Metadata
Assignees
Labels
No labels