nene/sql-parser-cstPublic

NotificationsYou must be signed in to change notification settings
Fork9
Star161

Parses SQL into Concrete Syntax Tree (CST)

License

GPL-2.0 license

161 stars 9 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 2,009 Commits
.github/workflows		.github/workflows
perf		perf
scripts		scripts
src		src
test		test
.editorconfig		.editorconfig
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierignore		.prettierignore
AUTHORS		AUTHORS
LICENSE		LICENSE
README.md		README.md
changelog.hbs		changelog.hbs
package.json		package.json
ts-pegjs.d.ts		ts-pegjs.d.ts
tsconfig.eslint.json		tsconfig.eslint.json
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Repository files navigation

SQL Parser CST

SQL Parser which produces aConcrete Syntax Tree (CST).

Unlike a more usual parser which produces an Abstract Syntax Tree (AST),this parser preserves all the syntax elements present in the parsed source code,with the goal of being able to re-create the exact original source code.

Try it live inSQL Explorer.

Features

Detailed TypeScript types for the syntax tree
Unified syntax tree for multiple SQL dialects
Includes source code location data for all nodes
Includes comments in the syntax tree
Helpful error messages
Fast

Supports the following SQL dialects:

SQLite - full support (version 3.45)
BigQuery - full support (as of 31 January 2024).
MySQL - experimental (version 8)(see#7 for implementation progress).
MariaDB - experimental (version 10)(see#32 for implementation progress).
PostgreSQL - experimental (version 16)(see#40 for implementation progress).

Note: This software is in very active development.The syntax tree structure is mostly stable now,though there are bound to be changes as new SQL dialectsare added and they contain features that need to be accommodatedto the syntax tree.

Install

npm install sql-parser-cst

Usage

import{parse,show,cstVisitor}from"sql-parser-cst";constcst=parse("select * from my_table",{dialect:"sqlite",// These are optional:includeSpaces:true,// Adds spaces/tabsincludeNewlines:true,// Adds newlinesincludeComments:true,// Adds commentsincludeRange:true,// Adds source code location data});// convert all keywords to uppercaseconsttoUpper=cstVisitor({keyword:(kw)=>{kw.text=kw.text.toUpperCase();},});toUpper(cst);// Serialize back to SQLshow(cst);// --> SELECT * FROM my_table

AST versus CST-parsers

For example, given the following SQL:

/* My query*/SELECT ("first_name"||' jr.')as fname-- use important tableFROM persons;

An AST-parser might parse this to the following abstract syntax tree:

{"type":"select_stmt","columns": [    {"type":"alias","expr": {"type":"binary_expr","left": {"type":"column_ref","column":"first_name" },"operator":"||","right": {"type":"string","value":" jr." }      },"alias":"fname"    }  ],"from": [{"type":"table_ref","table":"persons" }]}

Note that the above AST is missing the following information:

comments
whitespace (e.g. where the newlines are)
case of keywords (e.g. whetherAS oras was written)
whether an identifier was quoted or not (and with what kind of quotes)
whether an expression is wrapped in additional (unnecessary) parenthesis.
whether the statement ends with a semicolon.

In contrast, this CST parser produces the following concrete syntax tree,which preserves all of this information:

{"type":"program","statements": [    {"type":"select_stmt","clauses": [        {"type":"select_clause","selectKw": {"type":"keyword","text":"SELECT","name":"SELECT" },"options": [],"columns": {"type":"list_expr","items": [              {"type":"alias","expr": {"type":"paren_expr","expr": {"type":"binary_expr","left": {"type":"identifier","text":"\"first_name\"","name":"first_name" },"operator":"||","right": {"type":"string_literal","text":"' jr.'","value":" jr." }                  }                },"asKw": {"type":"keyword","text":"as","name":"AS" },"alias": {"type":"identifier","text":"fname","name":"fname" }              }            ]          }        },        {"type":"from_clause","fromKw": {"type":"keyword","text":"FROM","name":"FROM" },"expr": {"type":"identifier","text":"persons","name":"persons" },"leading": [            {"type":"newline","text":"\n" },            {"type":"line_comment","text":"-- use important table" },            {"type":"newline","text":"\n" }          ]        }      ]    },    {"type":"empty" }  ],"leading": [    {"type":"block_comment","text":"/* My query */" },    {"type":"newline","text":"\n" }  ]}

Note the following conventions:

All keywords are preserved intype: keyword nodes, which are usuallystored in fields named likesomeNameKw.
Parenthesis is represented by separatetype: paren_expr node.
Comma-separated lists are represented by separatetype: list_expr node.
Trailing semicolon is represented bytype: empty node in the end.
The original source code representation of strings, identifiers, keywords, etcis preserved intext fields.
Each node can haveleading andtrailing fields,which store comments and newlines immediately before or after that node.These fields will also contain information about regular spaces/tabs(e.g.{"type": "space", "text": " \t"}). This has been left out from thisexample for the sake of simplicity.

API

parse(sql: string, options: ParserOptions): Program

Parses SQL string and returns the CST tree. Takes the following options:

dialect:'sqlite' | 'bigquery' | 'mysql' | 'mariadb' | 'postgresql' The SQL dialect to parse(required).
includeRange:boolean When enabled addsrange: [number, number] field to all CST nodes,which contains the start and end locations of the node.
includeComments:boolean When enabled addsleading: Whitespace[] and/ortrailing: Whitespace[]to nodes which are preceded or followed by comments.
includeNewlines:boolean LikeincludeComments, but includes newlines info to the same fields.
includeSpaces:boolean LikeincludeComments, but includes horizontal whitespace info to the same fields.
paramTypes:("?" | "?nr" | "$nr" | ":name" | "$name" | "@name" | "`@name`")[]Determines the types of bound parameters supported by the parser.By default a query likeSELECT * FROM tbl WHERE id = ? will result in parse error.To fix it, useparamTypes: ["?"] config option.
filename:string Name of the SQL file. This is only used for error-reporting.
acceptUnsupportedGrammar:boolean When enabled, code that would otherwise fail to parse,gets parsed asunsupported_grammar_stmt node. That will consume all text until the next semicolon.After the semicolon, parsing will resume as normal. This option is primarily intended as a workaroundfor using the parser with an SQL dialect that's not yet 100% supported.

When parsing fails with syntax error, it throwsFormattedSyntaxError which contains a message like:

Syntax Error: Unexpected "WHERE"Was expecting to see: "!", "$", "(", "-", ":", "?", "@", "CASE", ...--> my_db.sql:2:33  |2 | SELECT * FROM my_table ORDER BY WHERE  |                                 ^

show(cst: Node): string

Converts CST back to string.

Important caveat: the CST has to contain whitespace data, meaning,it was generated withincludeComments,includeNewlines andincludeSpaces options enabled.

For any valid SQL the following assertion will always hold:

constopts={dialect:"sqlite",includeComments:true,includeNewlines:true,includeSpaces:true,};show(parse(sql,opts))===sql;// always true

cstVisitor(map: VisitorMap): (node: Node) => SKIP | void

Generates a function that walks through the whole CST tree and callsa function inmap whenever it encounters a node with that type.

For example the following code checks that all table and column aliasesuse the explicitAS keyword:

constcheckAliases=cstVisitor({alias:(node)=>{if(!node.asKw){thrownewError("All alias definitions must use AS keyword!");}},});checkAliases(cst);

You can returnVisitorAction.SKIP to avoid visiting all child nodes of a specific node:

lettopLevelSelects=0;constcountTopLevelSelects=cstVisitor({select_stmt:(node)=>{topLevelSelects++;returnVisitorAction.SKIP;},});countTopLevelSelects(cst);

cstTransformer<T>(map: TransformMap<T>): (node: Node) => T

Transforms the whole CST into some other typeT. Themap objectshould contain an entry for each of the CST node types it expects toencounter (this generally means all of them).

For example, the following implements atoString() function thatserializes very basic SQL queries likeSELECT 1, 2, 3 + 4:

consttoString=cstTransformer({program:(node)=>node.statements.map(toString).join(";"),select_statement:(node)=>node.clauses.map(toString).join(" "),select_clause:(node)=>"SELECT "+node.columns.map(toString).join(", "),binary_expr:(node)=>toString(node.left)+" "+node.operator+" "+toString(node.right),number_literal:(node)=>node.text,});

The builtinshow() function is implemented as such a transform.

xKeywords: Record<string, boolean>

Additionally the parser exports lists ofreserved keywords for each supported SQL dialect:sqliteKeywords,bigqueryKeywords,mysqlKeywords,mariadbKeywords,postgresqlKeywords.These are simple JavaScript objects, useful for doing lookups:

exportconstsqliteKeywords={ABORT:true,ACTION:true,ADD:true,  ...};

Development

yarn generate will generate parser.

The testsuite contains two kinds of tests:

tests applicable for all dialects
tests applicable for only some specific dialects

When running the testsuite one always needs to pick a dialect.For exampleyarn test:sqlite oryarn test:mysql.Running one of these commands will run the testsuite against the parserof that dialect. It will execute all the generic tests plus testsapplicable for that dialect.

yarn test will execute the testsuite for each supported dialect,covering all the possible combinations.

During development

Start the parser-generator watch process in one terminal:

yarn watch:generate

and the tests watch process in another terminal:

yarn test:sqlite --watch

Note thatyarn test --watch doesn't work.A separate watch process needs to be started manually for each dialect.

Release

Generate new release withyarn publish.

To generate a changelog use theyarn changelog command:

VERSION=v0.27.0 yarn changelog

Acknowledgements

This started as a fork ofnode-sql-parser,which is based on@flora/sql-parser,which in turn was extracted from Alibaba'snquery module.

There's very little left of the original code though.

About

Parses SQL into Concrete Syntax Tree (CST)

Releases26

0.33.1 Latest

Mar 18, 2025

+ 25 releases

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SQL Parser CST

Features

Install

Usage

AST versus CST-parsers

API

parse(sql: string, options: ParserOptions): Program

show(cst: Node): string

cstVisitor(map: VisitorMap): (node: Node) => SKIP | void

cstTransformer<T>(map: TransformMap<T>): (node: Node) => T

xKeywords: Record<string, boolean>

Development

During development

Release

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases26

Packages

Uh oh!

Contributors2

Uh oh!

Languages

Movatterモバイル変換

License

nene/sql-parser-cst

Folders and files

Latest commit

History

Repository files navigation

SQL Parser CST

Features

Install

Usage

AST versus CST-parsers

API

parse(sql: string, options: ParserOptions): Program

show(cst: Node): string

cstVisitor(map: VisitorMap): (node: Node) => SKIP | void

cstTransformer<T>(map: TransformMap<T>): (node: Node) => T

xKeywords: Record<string, boolean>

Development

During development

Release

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases26

Packages0

Uh oh!

Contributors2

Uh oh!

Languages

Packages