- Notifications
You must be signed in to change notification settings - Fork10
Natural Language Concrete Syntax Tree format
syntax-tree/nlcst
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
NaturalLanguageConcreteSyntaxTree format.
nlcst is a specification for representing natural language in asyntaxtree.It implements theunist spec.
This document may not be released.Seereleases for released documents.The latest released version is1.0.2
.
- Introduction
- Types
- Nodes (abstract)
- Nodes
- Glossary
- List of utilities
- Related
- References
- Contribute
- Acknowledgments
- License
This document defines a format for representing natural language as aconcretesyntax tree.Development of nlcst started in May 2014,in the now deprecatedtextom project forretext,beforeunist existed.This specification is written in aWeb IDL-like grammar.
nlcst extendsunist,a format for syntax trees,to benefit from itsecosystem of utilities.
nlcst relates toJavaScript in that it has anecosystem ofutilities for working with compliant syntax trees inJavaScript.However,nlcst is not limited to JavaScript and can be used in other programminglanguages.
nlcst relates to theunified andretext projects in that nlcst syntaxtrees are used throughout their ecosystems.
If you are using TypeScript,you can use the nlcst types by installing them with npm:
npm install @types/nlcst
interface Literal<: UnistLiteral { value:string}
Literal (UnistLiteral) represents a node in nlcstcontaining a value.
Itsvalue
field is astring
.
interface Parent<: UnistParent { children: [Paragraph| Punctuation| Sentence| Source|Symbol|Text| WhiteSpace| Word]}
Parent (UnistParent) represents a node in nlcstcontaining other nodes (said to bechildren).
Its content is limited to only other nlcst content.
interface Paragraph<: Parent { type:'ParagraphNode' children: [Sentence| Source| WhiteSpace]}
Paragraph (Parent) represents a unit of discourse dealingwith a particular point or idea.
Paragraph can be used in aroot node.It can containsentence,whitespace,andsource nodes.
interface Punctuation<: Literal { type:'PunctuationNode'}
Punctuation (Literal) represents typographical deviceswhich aid understanding and correct reading of other grammatical units.
Punctuation can be used insentence orword nodes.
interface Root<: Parent { type:'RootNode'}
Root (Parent) represents a document.
Root can be used as theroot of atree,never as achild.Its content model is not limited,it can contain any nlcst content,with the restriction that all content must be of the same category.
interface Sentence<: Parent { type:'SentenceNode' children: [Punctuation| Source|Symbol| WhiteSpace| Word]}
Sentence (Parent) represents grouping of grammaticallylinked words,that in principle tells a complete thought,although it may make little sense taken in isolation out of context.
Sentence can be used in aparagraph node.It can containword,symbol,punctuation,whitespace,andsource nodes.
interface Source<: Literal { type:'SourceNode'}
Source (Literal) represents an external (ungrammatical)value embedded into a grammatical unit: a hyperlink,code,and such.
Source can be used inroot,paragraph,sentence,orword nodes.
interfaceSymbol<: Literal { type:'SymbolNode'}
Symbol (Literal) represents typographical devicesdifferent from characters which represent sounds (like letters and numerals),white space,or punctuation.
Symbol can be used insentence orwordnodes.
interfaceText<: Literal { type:'TextNode'}
Text (Literal) represents actual content in nlcstdocuments: one or more characters.
Text can be used inword nodes.
interface WhiteSpace<: Literal { type:'WhiteSpaceNode'}
WhiteSpace (Literal) represents typographical devicesdevoid of content,separating other units.
WhiteSpace can be used inroot,paragraph,orsentence nodes.
interface Word<: Parent { type:'WordNode' children: [Punctuation| Source|Symbol|Text]}
Word (Parent) represents the smallest element that may beuttered in isolation with semantic or pragmatic content.
Word can be used in asentence node.It can containtext,symbol,punctuation,andsource nodes.
See theunist glossary.
See theunist list of utilities for more utilities.
nlcst-affix-emoticon-modifier
— merge affix emoticons into the previous sentencenlcst-emoji-modifier
— support emojinlcst-emoticon-modifier
— support emoticonsnlcst-is-literal
— check whether a node is meant literallynlcst-normalize
— normalize a word for easier comparisonnlcst-search
— search for patternsnlcst-to-string
— serialize a nodenlcst-test
— validate a nodemdast-util-to-nlcst
— transform mdast to nlcsthast-util-to-nlcst
— transform hast to nlcst
- mdast— Markdown Abstract Syntax Tree format
- hast— Hypertext Abstract Syntax Tree format
- xast— Extensible Abstract Syntax Tree
- unist:Universal Syntax Tree.T. Wormer; et al.
- JavaScript:ECMAScript Language Specification.Ecma International.
- Web IDL:Web IDL,C. McCormack.W3C.
Seecontributing.md
insyntax-tree/.github
forways to get started.Seesupport.md
for ways to get help.Ideas for new utilities and tools can be posted insyntax-tree/ideas
.
A curated list of awesome syntax-tree,unist,mdast,hast,xast,and nlcst resources can be found inawesome syntax-tree.
This project has acode of conduct.By interacting with this repository,organization,or community you agree to abide by its terms.
The initial release of this project was authored by@wooorm.
Thanks to@nwtn,@tmcw,@muraken720,and@dozoischfor contributing to nlcst and related projects!
About
Natural Language Concrete Syntax Tree format