Charset in meta content does not correctly parse for trailing semi-colon #92

New issue

Open

Charset in meta content does not correctly parse for trailing semi-colon#92

Labels

bugparser

Description

1619digital

opened

on Jul 19, 2013

Reference:http://www.w3.org/html/wg/drafts/html/master/infrastructure.html#algorithm-for-extracting-a-character-encoding-from-a-meta-element

Because the ContentAttrParser is looking only for a space character to terminate an unquoted charset

will incorrectly be inferred to have the charset 'iso8859-2;text/html'. The fix is to add a semicolon to the spaceCharacters scanned in SkipUntil - line 860.

EDIT: as per specification. Also, I don't know what the status is of the parser tests, but they're out of date and incorrect and (obviously) not used. Although most of the tests are still valid, so it would not take much to bring them back into the full test regime.

Metadata

Assignees

No one assigned

Labels

bugparser

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Charset in meta content does not correctly parse for trailing semi-colon #92

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions