Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Hypertext Abstract Syntax Tree format

NotificationsYou must be signed in to change notification settings

syntax-tree/hast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

hast

HypertextAbstractSyntaxTree format.


hast is a specification for representingHTML(and embeddedSVG orMathML)as an abstract syntax tree.It implements theunist spec.

This document may not be released.Seereleases for released documents.The latest released version is2.4.0.

Contents

Introduction

This document defines a format for representing hypertext as anabstract syntax tree.Development of hast started in April 2016 forrehype.This specification is written in aWeb IDL-like grammar.

Where this specification fits

hast extendsunist,a format for syntax trees,to benefit from itsecosystem of utilities.

hast relates toJavaScript in that it has anecosystem of utilitiesfor working with compliant syntax trees in JavaScript.However,hast is not limited to JavaScript and can be used in other programminglanguages.

hast relates to theunified andrehypeprojects in that hast syntax trees are used throughout their ecosystems.

Virtual DOM

The reason for introducing a new “virtual” DOM is primarily:

  • TheDOM is very heavy to implement outside of the browser,a lean and stripped down virtual DOM can be used everywhere
  • Most virtual DOMs do not focus on ease of use in transformations
  • Other virtual DOMs cannot represent the syntax of HTML in its entirety(think comments and document types)
  • Neither the DOM nor virtual DOMs focus on positional information

Types

If you are using TypeScript,you can use the hast types by installing them with npm:

npm install @types/hast

Nodes (abstract)

Literal

interface Literal<: UnistLiteral {  value:string}

Literal (UnistLiteral) represents a node in hastcontaining a value.

Parent

interface Parent<: UnistParent {  children: [Comment| Doctype| Element|Text]}

Parent (UnistParent) represents a node in hastcontaining other nodes (said to bechildren).

Its content is limited to only other hast content.

Nodes

Comment

interface Comment<: Literal {  type:'comment'}

Comment (Literal) represents aComment([DOM]).

For example,the following HTML:

<!--Charlie-->

Yields:

{type:'comment',value:'Charlie'}

Doctype

interface Doctype<: Node {  type:'doctype'}

Doctype (Node) represents aDocumentType ([DOM]).

For example,the following HTML:

<!doctype html>

Yields:

{type:'doctype'}

Element

interface Element<: Parent {  type:'element'  tagName:string  properties: Properties  content: Root?  children: [Comment| Element|Text]}

Element (Parent) represents anElement([DOM]).

AtagName field must be present.It represents the element’slocal name([DOM]).

Theproperties field represents information associated with the element.The value of theproperties field implements theProperties interface.

If thetagName field is'template',acontent field can be present.The value of thecontent field implements theRoot interface.

If thetagName field is'template',the element must be aleaf.

If thetagName field is'noscript',itschildren should be represented as ifscripting is disabled ([HTML]).

For example,the following HTML:

<ahref="https://alpha.com"class="bravo"download></a>

Yields:

{type:'element',tagName:'a',properties:{href:'https://alpha.com',className:['bravo'],download:true},children:[]}

Root

interface Root<: Parent {  type:'root'}

Root (Parent) represents a document.

Root can be used as theroot of atree,or as a value of thecontent field on a'template'Element,never as achild.

Text

interfaceText<: Literal {  type:'text'}

Text (Literal) represents aText([DOM]).

For example,the following HTML:

<span>Foxtrot</span>

Yields:

{type:'element',tagName:'span',properties:{},children:[{type:'text',value:'Foxtrot'}]}

Other types

Properties

interface Properties {}

Properties represents information associated with an element.

Every field must be aPropertyName and every value aPropertyValue.

PropertyName

typedefstring PropertyName

Property names are keys onProperties objects and reflectHTML,SVG,ARIA,XML,XMLNS,or XLink attribute names.Often,they have the same value as the corresponding attribute(for example,id is a property name reflecting theid attribute name),but there are some notable differences.

These rules aren’t simple.Usehastscript(orproperty-information directly)to help.

The following rules are used to transform HTML attribute names to propertynames.These rules are based onhow ARIA is reflected in the DOM([ARIA]),and differs from how some(older)HTML attributes are reflected in the DOM.

  1. any name referencing a combinations of multiple words(such as “stroke miter limit”) becomes a camelcased property namecapitalizing each word boundary;this includes combinations that are sometimes written as several words;for example,stroke-miterlimit becomesstrokeMiterLimit,autocorrect becomesautoCorrect,andallowfullscreen becomesallowFullScreen
  2. any name that can be hyphenated,becomes a camelcased property name capitalizing each boundary;for example,“read-only” becomesreadOnly
  3. compound words that are not used with spaces or hyphens are treated as anormal word and the previous rules apply;for example,“placeholder”,“strikethrough”,and “playback” stay the same
  4. acronyms in names are treated as a normal word and the previous rules apply;for example,itemid becomeitemId andbgcolor becomesbgColor
Exceptions

Some jargon is seen as one word even though it may not be seen as such bydictionaries.For example,nohref becomesnoHref,playsinline becomesplaysInline,andaccept-charset becomesacceptCharset.

The HTML attributesclass andfor respectively becomeclassName andhtmlFor in alignment with the DOM.No other attributes gain different names as properties,other than a change in casing.

Notes

property-information lists all property names.

The property name rules differ from how HTML is reflected in the DOM for thefollowing attributes:

View list of differences
  • charoff becomescharOff (notchOff)
  • char stayschar (does not becomech)
  • rel staysrel (does not becomerelList)
  • checked stayschecked (does not becomedefaultChecked)
  • muted staysmuted (does not becomedefaultMuted)
  • value staysvalue (does not becomedefaultValue)
  • selected staysselected (does not becomedefaultSelected)
  • allowfullscreen becomesallowFullScreen (notallowFullscreen)
  • hreflang becomeshrefLang, nothreflang
  • autoplay becomesautoPlay, notautoplay
  • autocomplete becomesautoComplete (notautocomplete)
  • autofocus becomesautoFocus, notautofocus
  • enctype becomesencType, notenctype
  • formenctype becomesformEncType (notformEnctype)
  • vspace becomesvSpace, notvspace
  • hspace becomeshSpace, nothspace
  • lowsrc becomeslowSrc, notlowsrc

PropertyValue

typedef any PropertyValue

Property values should reflect the data type determined by their property name.For example,the HTML<div hidden></div> has ahidden attribute,which is reflected as ahidden property name set to the property valuetrue,and<input minlength="5">,which has aminlength attribute,is reflected as aminLength property name set to the property value5.

InJSON,the valuenull must be treated as if the property was not included.InJavaScript,bothnull andundefined must be similarly ignored.

The DOM has strict rules on how it coerces HTML to expected values,whereas hast is more lenient in how it reflects the source.Where the DOM treats<div hidden="no"></div> as having a value oftrue and<img width="yes"> as having a value of0,these should be reflected as'no' and'yes',respectively,in hast.

The reason for this is to allow plugins and utilities to inspect thesenon-standard values.

The DOM also specifies comma separated and space separated lists attributevalues.In hast, these should be treated as ordered lists.For example,<div></div> is represented as['alpha', 'bravo'].

There’s no special format for the property value of thestyle property name.

Glossary

See§Glossary insyntax-tree/unist.

List of utilities

See§List of utilities insyntax-tree/unistfor more utilities.

Related HTML utilities

References

Security

As hast represents HTML,and improper use of HTML can open you up to across-site scripting (XSS) attack,improper use of hast is also unsafe.Always be careful with user input and usehast-util-santize to make the hast tree safe.

Related

  • mdast— Markdown Abstract Syntax Tree format
  • nlcst— Natural Language Concrete Syntax Tree format
  • xast— Extensible Abstract Syntax Tree

Contribute

Seecontributing.md insyntax-tree/.github for ways to get started.Seesupport.md for ways to get help.

A curated list of awesome syntax-tree,unist,mdast,hast,xast,and nlcst resources can be found inawesome syntax-tree.

This project has acode of conduct.By interacting with this repository,organization,or community you agree to abide by its terms.

Acknowledgments

The initial release of this project was authored by@wooorm.

Special thanks to@eush77 for their work,ideas,and incredibly valuable feedback!

Thanks to@andrewburgess,@arobase-che,@arystan-sw,@BarryThePenguin,@brechtcs,@ChristianMurphy,@ChristopherBiscardi,@craftzdog,@cupojoe,@davidtheclark,@derhuerst,@detj,@DxCx,@erquhart,@flurmbo,@Hamms,@Hypercubed,@inklesspen,@jeffal,@jlevy,@Justineo,@lfittl,@kgryte,@kmck,@kthjm,@KyleAMathews,@macklinu,@medfreeman,@Murderlon,@nevik,@nokome,@phiresky,@revolunet,@rhysd,@Rokt33r,@rubys,@s1n,@Sarah-Seo,@sethvincent,@simov,@StarpTech,@stefanprobst,@stuff,@subhero24,@tripodsan,@tunnckoCore,@vhf,@voischev, and@zjaml,for contributing to hast and related projects!

License

CC-BY-4.0 ©Titus Wormer

About

Hypertext Abstract Syntax Tree format

Topics

Resources

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks


[8]ページ先頭

©2009-2025 Movatter.jp