Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork22
Parse HTML/XML to PostHTMLTree
License
posthtml/posthtml-parser
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
npm install posthtml-parser
Input HTML:
<aclass="animals"href="#"><spanclass="animals__cat"style="background: url(cat.png)">Cat</span></a>
Parse withposthtml-parser
:
importfsfrom'fs'import{parser}from'posthtml-parser'consthtml=fs.readFileSync('path/to/input.html','utf-8')console.log(parser(html))
Resulting PostHTML AST:
[{tag:'a',attrs:{class:'animals',href:'#'},content:['\n ',{tag:'span',attrs:{class:'animals__cat',style:'background: url(cat.png)'},content:['Cat']},'\n']}]
Any parser used with PostHTML should return a standard PostHTMLAbstract Syntax Tree (AST).
Fortunately, this is a very easy format to produce and understand. The AST is an array that can contain strings and objects. Strings represent plain text content, while objects represent HTML tags.
Tag objects generally look like this:
{tag:'div',attrs:{class:'foo'},content:['hello world!']}
Tag objects can contain three keys:
- The
tag
key takes the name of the tag as the value. This can include custom tags. - The optional
attrs
key takes an object with key/value pairs representing the attributes of the html tag. A boolean attribute has an empty string as its value. - The optional
content
key takes an array as its value, which is a PostHTML AST. In this manner, the AST is a tree that should be walked recursively.
Type:Array
Default:[{name: '!doctype', start: '<', end: '>'}]
Adds processing of custom directives.
The propertyname
in custom directives can be ofString
orRegExp
type.
Type:Boolean
Default:false
Indicates whether special tags (<script>
and<style>
) should get special treatment and if "empty" tags (eg.<br>
) can have children. If false, the content of special tags will be text only.
For feeds and other XML content (documents that don't consist of HTML), set this totrue
.
Type:Boolean
Default:false
If set totrue
, entities within the document will be decoded.
Type:Boolean
Default:false
If set totrue
, all tags will be lowercased. IfxmlMode
is disabled.
Type:Boolean
Default:false
If set totrue
, all attribute names will be lowercased.
This has noticeable impact on speed.
Type:Boolean
Default:false
If set totrue
, CDATA sections will be recognized as text even if thexmlMode
option is not enabled.
IfxmlMode
is set totrue
, then CDATA sections will always be recognized as text.
Type:Boolean
Default:false
If set totrue
, self-closing tags will trigger theonclosetag
event even ifxmlMode
is not set totrue
.
IfxmlMode
is set totrue
, then self-closing tags will always be recognized.
Type:Boolean
Default:false
If set totrue
, AST nodes will have alocation
property containing thestart
andend
line and column position of the node.
Type:Boolean
Default:false
If set totrue
, AST nodes will recognize attribute with no value and mark astrue
which will be correctly rendered byposthtml-render
package.
About
Parse HTML/XML to PostHTMLTree
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Sponsor this project
Uh oh!
There was an error while loading.Please reload this page.
Packages0
Uh oh!
There was an error while loading.Please reload this page.