Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Parses SQL into Concrete Syntax Tree (CST)

License

NotificationsYou must be signed in to change notification settings

nene/sql-parser-cst

Repository files navigation

SQL Parser which produces aConcrete Syntax Tree (CST).

Unlike a more usual parser which produces an Abstract Syntax Tree (AST),this parser preserves all the syntax elements present in the parsed source code,with the goal of being able to re-create the exact original source code.

Try it live inSQL Explorer.

Features

  • Detailed TypeScript types for the syntax tree
  • Unified syntax tree for multiple SQL dialects
  • Includes source code location data for all nodes
  • Includes comments in the syntax tree
  • Helpful error messages
  • Fast

Supports the following SQL dialects:

  • SQLite - full support (version 3.45)
  • BigQuery - full support (as of 31 January 2024).
  • MySQL - experimental (version 8)(see#7 for implementation progress).
  • MariaDB - experimental (version 10)(see#32 for implementation progress).
  • PostgreSQL - experimental (version 16)(see#40 for implementation progress).

Note: This software is in very active development.The syntax tree structure is mostly stable now,though there are bound to be changes as new SQL dialectsare added and they contain features that need to be accommodatedto the syntax tree.

Install

npm install sql-parser-cst

Usage

import{parse,show,cstVisitor}from"sql-parser-cst";constcst=parse("select * from my_table",{dialect:"sqlite",// These are optional:includeSpaces:true,// Adds spaces/tabsincludeNewlines:true,// Adds newlinesincludeComments:true,// Adds commentsincludeRange:true,// Adds source code location data});// convert all keywords to uppercaseconsttoUpper=cstVisitor({keyword:(kw)=>{kw.text=kw.text.toUpperCase();},});toUpper(cst);// Serialize back to SQLshow(cst);// --> SELECT * FROM my_table

AST versus CST-parsers

For example, given the following SQL:

/* My query*/SELECT ("first_name"||' jr.')as fname-- use important tableFROM persons;

An AST-parser might parse this to the following abstract syntax tree:

{"type":"select_stmt","columns": [    {"type":"alias","expr": {"type":"binary_expr","left": {"type":"column_ref","column":"first_name" },"operator":"||","right": {"type":"string","value":" jr." }      },"alias":"fname"    }  ],"from": [{"type":"table_ref","table":"persons" }]}

Note that the above AST is missing the following information:

  • comments
  • whitespace (e.g. where the newlines are)
  • case of keywords (e.g. whetherAS oras was written)
  • whether an identifier was quoted or not (and with what kind of quotes)
  • whether an expression is wrapped in additional (unnecessary) parenthesis.
  • whether the statement ends with a semicolon.

In contrast, this CST parser produces the following concrete syntax tree,which preserves all of this information:

{"type":"program","statements": [    {"type":"select_stmt","clauses": [        {"type":"select_clause","selectKw": {"type":"keyword","text":"SELECT","name":"SELECT" },"options": [],"columns": {"type":"list_expr","items": [              {"type":"alias","expr": {"type":"paren_expr","expr": {"type":"binary_expr","left": {"type":"identifier","text":"\"first_name\"","name":"first_name" },"operator":"||","right": {"type":"string_literal","text":"' jr.'","value":" jr." }                  }                },"asKw": {"type":"keyword","text":"as","name":"AS" },"alias": {"type":"identifier","text":"fname","name":"fname" }              }            ]          }        },        {"type":"from_clause","fromKw": {"type":"keyword","text":"FROM","name":"FROM" },"expr": {"type":"identifier","text":"persons","name":"persons" },"leading": [            {"type":"newline","text":"\n" },            {"type":"line_comment","text":"-- use important table" },            {"type":"newline","text":"\n" }          ]        }      ]    },    {"type":"empty" }  ],"leading": [    {"type":"block_comment","text":"/* My query */" },    {"type":"newline","text":"\n" }  ]}

Note the following conventions:

  • All keywords are preserved intype: keyword nodes, which are usuallystored in fields named likesomeNameKw.
  • Parenthesis is represented by separatetype: paren_expr node.
  • Comma-separated lists are represented by separatetype: list_expr node.
  • Trailing semicolon is represented bytype: empty node in the end.
  • The original source code representation of strings, identifiers, keywords, etcis preserved intext fields.
  • Each node can haveleading andtrailing fields,which store comments and newlines immediately before or after that node.These fields will also contain information about regular spaces/tabs(e.g.{"type": "space", "text": " \t"}). This has been left out from thisexample for the sake of simplicity.

API

parse(sql: string, options: ParserOptions): Program

Parses SQL string and returns the CST tree. Takes the following options:

  • dialect:'sqlite' | 'bigquery' | 'mysql' | 'mariadb' | 'postgresql' The SQL dialect to parse(required).
  • includeRange:boolean When enabled addsrange: [number, number] field to all CST nodes,which contains the start and end locations of the node.
  • includeComments:boolean When enabled addsleading: Whitespace[] and/ortrailing: Whitespace[]to nodes which are preceded or followed by comments.
  • includeNewlines:boolean LikeincludeComments, but includes newlines info to the same fields.
  • includeSpaces:boolean LikeincludeComments, but includes horizontal whitespace info to the same fields.
  • paramTypes:("?" | "?nr" | "$nr" | ":name" | "$name" | "@name" | "`@name`")[]Determines the types of bound parameters supported by the parser.By default a query likeSELECT * FROM tbl WHERE id = ? will result in parse error.To fix it, useparamTypes: ["?"] config option.
  • filename:string Name of the SQL file. This is only used for error-reporting.
  • acceptUnsupportedGrammar:boolean When enabled, code that would otherwise fail to parse,gets parsed asunsupported_grammar_stmt node. That will consume all text until the next semicolon.After the semicolon, parsing will resume as normal. This option is primarily intended as a workaroundfor using the parser with an SQL dialect that's not yet 100% supported.

When parsing fails with syntax error, it throwsFormattedSyntaxError which contains a message like:

Syntax Error: Unexpected "WHERE"Was expecting to see: "!", "$", "(", "-", ":", "?", "@", "CASE", ...--> my_db.sql:2:33  |2 | SELECT * FROM my_table ORDER BY WHERE  |                                 ^

show(cst: Node): string

Converts CST back to string.

Important caveat: the CST has to contain whitespace data, meaning,it was generated withincludeComments,includeNewlines andincludeSpaces options enabled.

For any valid SQL the following assertion will always hold:

constopts={dialect:"sqlite",includeComments:true,includeNewlines:true,includeSpaces:true,};show(parse(sql,opts))===sql;// always true

cstVisitor(map: VisitorMap): (node: Node) => SKIP | void

Generates a function that walks through the whole CST tree and callsa function inmap whenever it encounters a node with that type.

For example the following code checks that all table and column aliasesuse the explicitAS keyword:

constcheckAliases=cstVisitor({alias:(node)=>{if(!node.asKw){thrownewError("All alias definitions must use AS keyword!");}},});checkAliases(cst);

You can returnVisitorAction.SKIP to avoid visiting all child nodes of a specific node:

lettopLevelSelects=0;constcountTopLevelSelects=cstVisitor({select_stmt:(node)=>{topLevelSelects++;returnVisitorAction.SKIP;},});countTopLevelSelects(cst);

cstTransformer<T>(map: TransformMap<T>): (node: Node) => T

Transforms the whole CST into some other typeT. Themap objectshould contain an entry for each of the CST node types it expects toencounter (this generally means all of them).

For example, the following implements atoString() function thatserializes very basic SQL queries likeSELECT 1, 2, 3 + 4:

consttoString=cstTransformer({program:(node)=>node.statements.map(toString).join(";"),select_statement:(node)=>node.clauses.map(toString).join(" "),select_clause:(node)=>"SELECT "+node.columns.map(toString).join(", "),binary_expr:(node)=>toString(node.left)+" "+node.operator+" "+toString(node.right),number_literal:(node)=>node.text,});

The builtinshow() function is implemented as such a transform.

xKeywords: Record<string, boolean>

Additionally the parser exports lists ofreserved keywords for each supported SQL dialect:sqliteKeywords,bigqueryKeywords,mysqlKeywords,mariadbKeywords,postgresqlKeywords.These are simple JavaScript objects, useful for doing lookups:

exportconstsqliteKeywords={ABORT:true,ACTION:true,ADD:true,  ...};

Development

yarn generate will generate parser.

The testsuite contains two kinds of tests:

  • tests applicable for all dialects
  • tests applicable for only some specific dialects

When running the testsuite one always needs to pick a dialect.For exampleyarn test:sqlite oryarn test:mysql.Running one of these commands will run the testsuite against the parserof that dialect. It will execute all the generic tests plus testsapplicable for that dialect.

yarn test will execute the testsuite for each supported dialect,covering all the possible combinations.

During development

Start the parser-generator watch process in one terminal:

yarn watch:generate

and the tests watch process in another terminal:

yarn test:sqlite --watch

Note thatyarn test --watch doesn't work.A separate watch process needs to be started manually for each dialect.

Release

Generate new release withyarn publish.

To generate a changelog use theyarn changelog command:

VERSION=v0.27.0 yarn changelog

Acknowledgements

This started as a fork ofnode-sql-parser,which is based on@flora/sql-parser,which in turn was extracted from Alibaba'snquery module.

There's very little left of the original code though.

About

Parses SQL into Concrete Syntax Tree (CST)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors2

  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp