Movatterモバイル変換

syntax-tree/unistPublic

NotificationsYou must be signed in to change notification settings
Fork23
Star964

Universal Syntax Tree used by@unifiedjs

unifiedjs.com

964 stars 23 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
.npmrc		.npmrc
logo-square.svg		logo-square.svg
logo.svg		logo.svg
package.json		package.json
readme.md		readme.md

Repository files navigation

UniversalSyntaxTree.

unist is a specification for syntax trees.It has a bigecosystem of utilities in JavaScript forworking with these trees.It’s implemented by several other specifications.

This document may not be released.Seereleases for released documents.The latest released version is3.0.0.

Intro

This document defines a general-purpose format for syntax trees.Development of unist started in July 2015.This specification is written in aWeb IDL-like grammar.

Syntax tree

Syntax trees are representations of source code or even natural language.These trees are abstractions that make it possible to analyze, transform, andgenerate code.

Syntax treescome in two flavors:

concrete syntax trees: structures that represent every detail (such aswhite-space in white-space insensitive languages)
abstract syntax trees: structures that only represent details relatingto the syntactic structure of code (such as ignoring whether a double orsingle quote was used in languages that support both, such as JavaScript).

This specification can express both abstract and concrete syntax trees.

Where this specification fits

unist is not intended to be self-sufficient.Instead, it is expected that other specifications implement unist and extend itto express language specific nodes.For example, see projects such ashast (for HTML),nlcst (fornatural language),mdast (for Markdown), andxast (for XML).

unist relates toJSON in that compliant syntax trees can be expressedcompletely in JSON.However, unist is not limited to JSON and can be expressed in other dataformats, such asXML.

unist relates toJavaScript in that it has a richecosystem ofutilities for working with compliant syntax trees inJavaScript.The five most used utilities combined are downloaded thirty million times eachmonth.However, unist is not limited to JavaScript and can be used in other programminglanguages.

unist relates to theunified,remark,rehype, andretextprojects in that unist syntax trees are used throughout their ecosystems.

unist relates to thevfile project in that it accepts unist nodes for itsmessage store, and that vfile can be a sourcefile of a syntaxtree.

Types

If you are using TypeScript, you can use the unist types by installing themwith npm:

npm install @types/unist

Nodes

Syntactic units in unist syntax trees are called nodes, and implement theNode interface.

`Node`

interface Node {  type:string  data: Data?  position: Position?}

Thetype field is a non-empty string representing the variant of a node.This field can be used to determine thetype a node implements.

Thedata field represents information from the ecosystem.The value of thedata field implements theData interface.

Theposition field represents the location of a node in a source document.The value of theposition field implements thePositioninterface.Theposition field must not be present if a node isgenerated.

Specifications implementing unist are encouraged to define more fields.Ecosystems can define fields onData.

Any value in unistmust be expressible in JSON values:string,number,object,array,true,false, ornull.This means that the syntax tree should be able to be converted to and from JSONand produce the same tree.For example, in JavaScript, a tree can be passed throughJSON.parse(JSON.stringify(tree)) and result in the same tree.

`Position`

interface Position {  start: Pointend: Point}

Position represents the location of a node in a sourcefile.

Thestart field ofPosition represents the place of the first character ofthe parsed source region.Theend field ofPosition represents the place of the first characterafter the parsed source region, whether it exists or not.The value of thestart andend fields implement thePointinterface.

If the syntactic unit represented by a node is not present in the sourcefile at the time of parsing, the node is said to begenerated and it must not have positional information.

For example, if the following value was represented as unist:

alphabravo

…the first word (alpha) would start at line1, column1, offset0, andend at line1, column6, offset5.The line feed would start at line1, column6, offset5, and end at line2, column1, offset6.The last word (bravo) would start at line2, column1, offset6, and endat line2, column6, offset11.

`Point`

interface Point {  line: number>=1  column: number>=1  offset: number>=0?}

Point represents one place in a sourcefile.

Theline field (1-indexed integer) represents a line in a source file.Thecolumn field (1-indexed integer) represents a column in a source file.Theoffset field (0-indexed integer) represents a character in a source file.

The term character means a (UTF-16) code unit which is defined in theWeb IDL specification.

`Data`

interface Data { }

Data represents information associated by the ecosystem with the node.

This space is guaranteed to never be specified by unist or specificationsimplementing unist.

`Parent`

interface Parent<: Node {  children: [Node]}

Nodes containing other nodes (said to bechildren) extend theabstract interfaceParent (Node).

Thechildren field is a list representing the children of a node.

`Literal`

interface Literal<: Node {  value: any}

Nodes containing a value extend the abstract interfaceLiteral(Node).

Thevalue field can contain any value.

Glossary

Tree

Atree is a node and all of itsdescendants (if any).

Child

Node X ischild of node Y, if Y’schildren include X.

Parent

Node X isparent of node Y, if Y is achild of X.

Index

Theindex of achild is its number of precedingsiblings, or0 if it has none.

Sibling

Node X is asibling of node Y, if X and Y have the sameparent (if any).

Theprevious sibling of achild is itssibling at itsindex minus 1.

Thenext sibling of achild is itssibling at itsindex plus 1.

Root

Theroot of a node is itself, if withoutparent, or theroot of itsparent.

Theroot of atree is any node in thattreewithoutparent.

Descendant

Node X isdescendant of node Y, if X is achild of Y, or ifX is achild of node Z that is adescendant of Y.

Aninclusive descendant is a node or one of itsdescendants.

Ancestor

Node X is anancestor of node Y, if Y is adescendantof X.

Aninclusive ancestor is a node or one of itsancestors.

Head

Thehead of a node is its firstchild (if any).

Tail

Thetail of a node is its lastchild (if any).

Leaf

Aleaf is a node with nochildren.

Branch

Abranch is a node with one or morechildren.

Generated

A node isgenerated if it does not havepositionalinformation.

Type

Thetype of a node is the value of itstype field.

Positional information

Thepositional information of a node is the value of itsposition field.

File

Afile is a source document that represents the original file that wasparsed to produce the syntax tree.Positional information represents the place of a nodein this file.Files are provided by the host environment and not defined by unist.

For example, see projects such asvfile.

Preorder

Inpreorder (NLR) isdepth-first treetraversal that performs the following steps for each nodeN:

N: visitN itself
L: traversehead (then itsnext sibling, recursivelymoving forward until reachingtail)
R: traversetail

Postorder

Inpostorder (LRN) isdepth-first treetraversal that performs the following steps for each nodeN:

L: traversehead (then itsnext sibling, recursivelymoving forward until reachingtail)
R: traversetail
N: visitN itself

Enter

Enter is a step right before other steps performed on a given nodeN whentraversing a tree.

For example, when performingpreorder traversal,enter is the first steptaken, right before visitingN itself.

Exit

Exit is a step right after other steps performed on a given nodeN whentraversing a tree.

For example, when performingpreorder traversal,exit is the last steptaken, right after traversing thetail ofN.

Tree traversal

Tree traversal is a common task when working with atree tosearch it.Tree traversal is typically eitherbreadth-first ordepth-first.

In the following examples, we’ll work with this tree:

graph TD    A-->B-->C        B-->D        B-->E    A-->F-->G

Breadth-first traversal

Breadth-first traversal is visiting a node and all itssiblings to broaden the search at that level, beforetraversingchildren.

For the syntax tree defined in the diagram, a breadth-first traversal firstsearches the root of the tree (A), then its children (B andF), thentheir children (C,D,E, andG).

Depth-first traversal

Alternatively, and more commonly,depth-first traversal is used.The search is first deepened, by traversingchildren, beforetraversingsiblings.

For the syntax tree defined in the diagram, a depth-first traversal firstsearches the root of the tree (A), then one of its children (B orF), then their children (C,D, andE, orG).

For a given nodeN withchildren, adepth-first traversalperforms three steps, simplified to only binary trees (every node hashead andtail, but no other children):

N: visitN itself
L: traversehead
R: traversetail

These steps can be done in any order, but for non-binary trees,L andRoccur together.IfL is done beforeR, the traversal is calledleft-to-righttraversal, otherwise it is calledright-to-left traversal.In the case of non-binary trees, the other children betweenhead andtailare processed in that order as well, so forleft-to-right traversal, firsthead is traversed (L), then itsnext sibling is traversed, etcetera,until finallytail (R) is traversed.

BecauseL andR occur together for non-binary trees, we can produce fourtypes of orders: NLR, NRL, LRN, RLN.

NLR and LRN (the twoleft-to-right traversal options) are most commonly usedand respectively namedpreorder andpostorder.

For the syntax tree defined in the diagram,preorder andpostorder traversalthus first search the root of the tree (A), then its head (B), then itschildren from left-to-right (C,D, and thenE).After alldescendants ofB are traversed, its nextsibling (F) is traversed and then finally its only child (G).

Utilities

Utilities are functions that work with nodes.

There are several projects that deal with nodes from specifications implementingunist:

List of utilities

unist-util-ancestor— get the common ancestor of one or more nodes
unist-util-assert— assert nodes
unist-util-filter— create a new tree with all nodes that pass the given function
unist-util-find— find a node by condition
unist-util-find-after— find a node after another node
unist-util-find-all-after— find nodes after another node or index
unist-util-find-all-before— find nodes before another node or index
unist-util-find-all-between— find nodes between two nodes or positions
unist-util-find-before— find a node before another node
unist-util-flat-filter— flat map version ofunist-util-filter
unist-util-flatmap— create a new tree by expanding a node into many
unist-util-generated— check if a node is generated
unist-util-index— index nodes by property or computed key
unist-util-inspect— node inspector
unist-util-is— check if a node passes a test
unist-util-map— create a new tree by mapping nodes
unist-util-modify-children— modify direct children of a parent
unist-util-parents—parent references on nodes
unist-util-position— get positional info of nodes
unist-util-reduce— recursively reduce a tree
unist-util-remove— remove nodes from trees
unist-util-remove-position— remove positional info from trees
unist-util-replace-all-between— replace nodes between two nodes or positions
unist-util-select— select nodes with CSS-like selectors
unist-util-size— calculate the number of nodes in a tree
unist-util-source— get the source of a value (node or position) in a file
unist-util-stringify-position— stringify a node, position, or point
unist-util-visit— recursively walk over nodes
unist-util-visit-parents— recursively walk over nodes, with a stack of parents
unist-util-visit-children— visit direct children of a parent
unist-util-visit-all-after— visit nodes after another node
unist-builder— helper for creating trees

References

JavaScript:ECMAScript Language Specification.Ecma International.
JSON:The JavaScript Object Notation (JSON) Data Interchange Format,T. Bray.IETF.
XML:Extensible Markup Language,T. Bray, J. Paoli, C. Sperberg-McQueen, E. Maler, F. Yergeau.W3C.
Web IDL:Web IDL,C. McCormack.W3C.

Contribute

Seecontributing.md insyntax-tree/.github forways to get started.Seesupport.md for ways to get help.

A curated list of awesome syntax-tree, unist, hast, xast, mdast, and nlcstresources can be found inawesome syntax-tree.

This project has acode of conduct.By interacting with this repository, organization, or community you agree toabide by its terms.

Acknowledgments

The initial release of this project was authored by@wooorm.

Special thanks to@eush77 for their work,ideas, and incredibly valuable feedback!Thanks to@anandthakker,@anko,@arobase-che,@azu,@BarryThePenguin,@ben-eb,@blahah,@blakeembrey,@brainkim,@ChristianMurphy,@davidtheclark,@denysdovhan,@derhuerst,@dozoisch,@fazouane-marouane,@gibson042,@hrajchert,@ikatyang,@inklesspen,@izumin5210,@jasonLaster,@JDvorak,@jlevy,@justjake,@kmck,@kt3k,@KyleAMathews,@luca3m,@mattdesl,@muraken720,@mrzmmr,@nwtn,@rhysd,@Rokt33r,@Sarah-Seo,@sethvincent,@shawnbot,@simov,@staltz,@TitanSnow,@tmcw,and@vhf,for contributing to unist and related projects!

License

About

Universal Syntax Tree used by@unifiedjs

unifiedjs.com

Topics

ast cst syntax-tree remarks unist unist-nodes unist-files unist-utilities mdast hast nlcst xast

Resources

Readme

Code of conduct

Contributing

Releases5

3.0.0 Latest

Jun 15, 2023

+ 4 releases

Sponsor this project

Learn more about GitHub Sponsors

Movatterモバイル変換

Uh oh!

syntax-tree/unist

Folders and files

Latest commit

History

Repository files navigation

Contents

Intro

Syntax tree

Where this specification fits

Types

Nodes

Node

Position

Point

Data

Parent

Literal

Glossary

Tree

Child

Parent

Index

Sibling

Root

Descendant

Ancestor

Head

Tail

Leaf

Branch

Generated

Type

Positional information

File

Preorder

Postorder

Enter

Exit

Tree traversal

Breadth-first traversal

Depth-first traversal

Utilities

List of utilities

References

Contribute

Acknowledgments

License

About

Topics

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases5

Sponsor this project

Uh oh!

Uh oh!

Contributors6

Uh oh!

`Node`

`Position`

`Point`

`Data`

`Parent`

`Literal`