Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Natural Language Concrete Syntax Tree format

NotificationsYou must be signed in to change notification settings

syntax-tree/nlcst

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nlcst

NaturalLanguageConcreteSyntaxTree format.


nlcst is a specification for representing natural language in asyntaxtree.It implements theunist spec.

This document may not be released.Seereleases for released documents.The latest released version is1.0.2.

Contents

Introduction

This document defines a format for representing natural language as aconcretesyntax tree.Development of nlcst started in May 2014,in the now deprecatedtextom project forretext,beforeunist existed.This specification is written in aWeb IDL-like grammar.

Where this specification fits

nlcst extendsunist,a format for syntax trees,to benefit from itsecosystem of utilities.

nlcst relates toJavaScript in that it has anecosystem ofutilities for working with compliant syntax trees inJavaScript.However,nlcst is not limited to JavaScript and can be used in other programminglanguages.

nlcst relates to theunified andretext projects in that nlcst syntaxtrees are used throughout their ecosystems.

Types

If you are using TypeScript,you can use the nlcst types by installing them with npm:

npm install @types/nlcst

Nodes (abstract)

Literal

interface Literal<: UnistLiteral {  value:string}

Literal (UnistLiteral) represents a node in nlcstcontaining a value.

Itsvalue field is astring.

Parent

interface Parent<: UnistParent {  children: [Paragraph| Punctuation| Sentence| Source|Symbol|Text| WhiteSpace| Word]}

Parent (UnistParent) represents a node in nlcstcontaining other nodes (said to bechildren).

Its content is limited to only other nlcst content.

Nodes

Paragraph

interface Paragraph<: Parent {  type:'ParagraphNode'  children: [Sentence| Source| WhiteSpace]}

Paragraph (Parent) represents a unit of discourse dealingwith a particular point or idea.

Paragraph can be used in aroot node.It can containsentence,whitespace,andsource nodes.

Punctuation

interface Punctuation<: Literal {  type:'PunctuationNode'}

Punctuation (Literal) represents typographical deviceswhich aid understanding and correct reading of other grammatical units.

Punctuation can be used insentence orword nodes.

Root

interface Root<: Parent {  type:'RootNode'}

Root (Parent) represents a document.

Root can be used as theroot of atree,never as achild.Its content model is not limited,it can contain any nlcst content,with the restriction that all content must be of the same category.

Sentence

interface Sentence<: Parent {  type:'SentenceNode'  children: [Punctuation| Source|Symbol| WhiteSpace| Word]}

Sentence (Parent) represents grouping of grammaticallylinked words,that in principle tells a complete thought,although it may make little sense taken in isolation out of context.

Sentence can be used in aparagraph node.It can containword,symbol,punctuation,whitespace,andsource nodes.

Source

interface Source<: Literal {  type:'SourceNode'}

Source (Literal) represents an external (ungrammatical)value embedded into a grammatical unit: a hyperlink,code,and such.

Source can be used inroot,paragraph,sentence,orword nodes.

Symbol

interfaceSymbol<: Literal {  type:'SymbolNode'}

Symbol (Literal) represents typographical devicesdifferent from characters which represent sounds (like letters and numerals),white space,or punctuation.

Symbol can be used insentence orwordnodes.

Text

interfaceText<: Literal {  type:'TextNode'}

Text (Literal) represents actual content in nlcstdocuments: one or more characters.

Text can be used inword nodes.

WhiteSpace

interface WhiteSpace<: Literal {  type:'WhiteSpaceNode'}

WhiteSpace (Literal) represents typographical devicesdevoid of content,separating other units.

WhiteSpace can be used inroot,paragraph,orsentence nodes.

Word

interface Word<: Parent {  type:'WordNode'  children: [Punctuation| Source|Symbol|Text]}

Word (Parent) represents the smallest element that may beuttered in isolation with semantic or pragmatic content.

Word can be used in asentence node.It can containtext,symbol,punctuation,andsource nodes.

Glossary

See theunist glossary.

List of utilities

See theunist list of utilities for more utilities.

Related

  • mdast— Markdown Abstract Syntax Tree format
  • hast— Hypertext Abstract Syntax Tree format
  • xast— Extensible Abstract Syntax Tree

References

Contribute

Seecontributing.md insyntax-tree/.github forways to get started.Seesupport.md for ways to get help.Ideas for new utilities and tools can be posted insyntax-tree/ideas.

A curated list of awesome syntax-tree,unist,mdast,hast,xast,and nlcst resources can be found inawesome syntax-tree.

This project has acode of conduct.By interacting with this repository,organization,or community you agree to abide by its terms.

Acknowledgments

The initial release of this project was authored by@wooorm.

Thanks to@nwtn,@tmcw,@muraken720,and@dozoischfor contributing to nlcst and related projects!

License

CC-BY-4.0 ©Titus Wormer


[8]ページ先頭

©2009-2025 Movatter.jp