Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Infinite loop with nested button #4

Closed
Labels
Milestone
@gsnedders

Description

@gsnedders

http://code.google.com/p/html5lib/issues/detail?id=211

Reported byjoseph@metaoptimize.com, Aug 24, 2012

So I know this is not well-formed HTML, but it occurred in the wild as the output from Markdown.

I have the latest pypi Python library (version = 0.95-dev).

If I try to parse the following HTML, my program goes into an infinite loop and memory usage increases without stop:

u"<p>So theres no shortage of info out there on rounded corners and I've been through much of it and I'm posting to get the communities opinons at this piont.</p>\n<p>My scenario is that we're developing a rounded corner dependant design, mainly used for interactions (<button> and <a>). We are going to use border radius for the good browsers on the block that play nice with it and then use the server to send down javscript to browsers that don't</p>\n<p>What I'm wondering is what to use to up scale the browsers that ignore border radius CSS? I need something that works on button aswell as a, div etc. I've been looking at the following and have found that some don't play nice with <button>. Also the site already uses jQuery.</p>\n<p>http://www.curvycorners.net/ - http://code.google.com/p/jquerycurvycorners/</p>\n<p>http://www.html.it/articoli/niftycube/index.html</p>\n<p>http://www.malsup.com/jquery/corner/</p>"

Aug 24, 2012 waylan

I can't comment on the infinite loop, but as the maintainer of the Markdown library, I was concerned regarding the original reporter's implication that Markdown may be producing invalid HTML. While only the output is provided, not the input, it appears to me that the invalid output is a result of invalid input. You should be wrapping those random angle-bracket tags in code tags. So "(<button> and<a>)" (note the backticks surrounding each tag) would be output by Markdown as "(<button> and<a>)", which is valid HTML and will not result in an infinite loop in html5lib.

If, in the event that the Markdown input is coming from an untrusted third party, then you absolutely should be sanitizing it before passing it on to anything else.

That said, one such way to sanitize (my recommendation) is to use the Bleach library1, which uses html5lib internally. So I guess we're back to that infinite loop.

Aug 24, 2012joseph@metaoptimize.com

The Markdown comes from the wild and is probably invalid.

My idea was to pass the HTML through tidy before running an HTML parser, thus avoiding an infinite loop. There are several tidy wrappers in Python. I used pytidylib.

I didn't play with the options to make tidy more strict, and even after tidy, html5lib still goes into an infinite loop. So my current workaround is to use tidy followed by lxml :\

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp