Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Extensible SQL Lexer and Parser for Rust

License

NotificationsYou must be signed in to change notification settings

apache/datafusion-sqlparser-rs

LicenseVersionBuild StatusCoverage StatusGitter Chat

This crate contains a lexer and parser for SQL that conforms with theANSI/ISO SQL standard and other dialects. This crateis used as a foundation for SQL query engines, vendor-specificparsers, and various SQL analysis.

Example

To parse a simpleSELECT statement:

use sqlparser::dialect::GenericDialect;use sqlparser::parser::Parser;let sql ="SELECT a, b, 123, myfunc(b)\           FROM table_1\           WHERE a > b AND b < 100\           ORDER BY a DESC, b";let dialect =GenericDialect{};// or AnsiDialect, or your own dialect ...let ast =Parser::parse_sql(&dialect, sql).unwrap();println!("AST: {:?}", ast);

This outputs

AST:[Query(Query{ctes:[],body:Select(Select{distinct:false,projection:[UnnamedExpr(Identifier("a")),UnnamedExpr(Identifier("b")),UnnamedExpr(Value(Long(123))),UnnamedExpr(Function(Function{name:ObjectName([Identifier(Ident{value:"myfunc",quote_style:None})]),args:[Identifier("b")],filter:None,over:None,distinct:false}))],from:[TableWithJoins{relation:Table{name:ObjectName([Identifier(Ident{value:"table_1",quote_style:None})]),alias:None,args:[],with_hints:[]},joins:[]}],selection:Some(BinaryOp{left:BinaryOp{left:Identifier("a"),op:Gt,right:Identifier("b")},op:And,right:BinaryOp{left:Identifier("b"),op:Lt,right:Value(Long(100))}}),group_by:[],having:None}),order_by:[OrderByExpr{expr:Identifier("a"),asc:Some(false)},OrderByExpr{expr:Identifier("b"),asc:None}],limit:None,offset:None,fetch:None})]

Features

The following optionalcrate features are available:

  • serde: AddsSerde support by implementingSerialize andDeserialize for all AST nodes.
  • visitor: Adds aVisitor capable of recursively walking the AST tree.
  • recursive-protection (enabled by default), usesrecursive for stack overflow protection.

Syntax vs Semantics

This crate provides only a syntax parser, and tries to avoid applyingany SQL semantics, and accepts queries that specific databases wouldreject, even when using that Database's specificDialect. Forexample,CREATE TABLE(x int, x int) is accepted by this crate, eventhough most SQL engines will reject this statement due to the repeatedcolumn namex.

This crate avoids semantic analysis because it varies drasticallybetween dialects and implementations. If you want to do semanticanalysis, feel free to use this project as a base.

Preserves Syntax Round Trip

This crate allows users to recover the original SQL text (with comments removed,normalized whitespace and keyword capitalization), which is useful for toolsthat analyze and manipulate SQL.

This means that other than comments, whitespace and the capitalization ofkeywords, the following should hold true for all SQL:

// Parse SQLlet sql ="SELECT 'hello'";let ast =Parser::parse_sql(&GenericDialect, sql).unwrap();// The original SQL text can be generated from the ASTassert_eq!(ast[0].to_string(), sql);// The SQL can also be pretty-printed with newlines and indentationassert_eq!(format!("{:#}", ast[0]),"SELECT\n  'hello'");

There are still some cases in this crate where different SQL with seeminglysimilar semantics are represented with the same AST. We welcome PRs to fix suchissues and distinguish different syntaxes in the AST.

Source Locations (Work in Progress)

This crate allows recovering source locations from AST nodes via theSpannedtrait, which can be used for advanced diagnostics tooling. Note that thisfeature is a work in progress and many nodes report missing or inaccurate spans.Please seethis ticket for information on how to contribute missingimprovements.

// Parse SQLlet ast =Parser::parse_sql(&GenericDialect,"SELECT A FROM B").unwrap();// The source span can be retrieved with start and end locationsassert_eq!(ast[0].span(),Span{  start:Location::of(1,1),  end:Location::of(1,16),});

SQL compliance

SQL was first standardized in 1987, and revisions of the standard have beenpublished regularly since. Most revisions have added significant new features tothe language, and as a result no database claims to support the full breadth offeatures. This parser currently supports most of the SQL-92 syntax, plus somesyntax from newer versions that have been explicitly requested, plus variousother dialect-specific syntax. Whenever possible, theonline SQL:2016grammar is used to guide what syntax to accept.

Unfortunately, stating anything more specific about compliance is difficult.There is no publicly available test suite that can assess complianceautomatically, and doing so manually would strain the project's limitedresources. Still, we are interested in eventually supporting the full SQLdialect, and we are slowly building out our own test suite.

If you are assessing whether this project will be suitable for your needs,you'll likely need to experimentally verify whether it supports the subset ofSQL that you need. Please file issues about any unsupported queries that youdiscover. Doing so helps us prioritize support for the portions of the standardthat are actually used. Note that if you urgently need support for a feature,you will likely need to write the implementation yourself. See theContributing section for details.

Command line

This crate contains a CLI program that can parse a file and dump the results as JSON:

$ cargo run --features json_example --example cli FILENAME.sql [--dialectname]

Users

This parser is currently being used by theDataFusion query engine,LocustDB,Ballista,GlueSQL,Opteryx,Polars,PRQL,Qrlew,JumpWire,ParadeDB,CipherStash Proxy,andGreptimeDB.

If your project is using sqlparser-rs feel free to make a PR to add itto this list.

Design

The core expression parser uses thePratt Parser design, which is a top-downoperator-precedence (TDOP) parser, while the surrounding SQL statement parser isa traditional, hand-written recursive descent parser. Eli Bendersky has a goodtutorial on TDOP parsers, if you are interested in learningmore about the technique.

We are a fan of this design pattern over parser generators for the followingreasons:

  • Code is simple to write and can be concise and elegant
  • Performance is generally better than code generated by parser generators
  • Debugging is much easier with hand-written code
  • It is far easier to extend and make dialect-specific extensionscompared to using a parser generator

Supporting custom SQL dialects

This is a work in progress, but we have some notes onwriting a custom SQLparser.

Contributing

Contributions are highly encouraged! However, the bandwidth we have tomaintain this crate is limited. Please read the following sections carefully.

New Syntax

The most commonly accepted PRs add support for or fix a bug in a feature in theSQL standard, or a popular RDBMS, such as Microsoft SQLServer or PostgreSQL, will likely be accepted after a briefreview. Any SQL feature that is dialect specific should be parsed byboth the relevantDialectas well asGenericDialect.

Major API Changes

The current maintainers do not plan for any substantial changes tothis crate's API. PRs proposing major refactorsare not likely to be accepted.

Testing

While we hope to review PRs in a reasonablytimely fashion, it may take a week or more. In order to speed the process,please make sure the PR passes all CI checks, and includes testsdemonstrating your code works as intended (and to avoidregressions). Remember to also test error paths.

PRs without tests will not be reviewed or merged. Since the CIensures thatcargo test,cargo fmt, andcargo clippy, pass youshould likely to run all three commands locally before submittingyour PR.

Filing Issues

If you are unable to submit a patch, feel free to file an issue instead. Pleasetry to include:

  • some representative examples of the syntax you wish to support or fix;
  • the relevant bits of theSQL grammar, if the syntax ispart of SQL:2016; and
  • links to documentation for the feature for a few of the most populardatabases that support it.

Unfortunately, if you need support for a feature, you will likely need to implementit yourself, or file a well enough described ticket that another member of the community can do so.Our goal as maintainers is to facilitate the integrationof various features from various contributors, but not to provide theimplementations ourselves, as we simply don't have the resources.

Benchmarking

There are several micro benchmarks in thesqlparser_bench directory.You can run them with:

git checkout maincd sqlparser_benchcargo bench -- --save-baseline maingit checkout <your branch>cargo bench -- --baseline main

By adding the--save-baseline main and--baseline main you can track theprogress of your improvements as you continue working on the feature branch.

Licensing

All code in this repository is licensed under theApache Software License 2.0.

Unless you explicitly state otherwise, any contribution intentionally submittedfor inclusion in the work by you, as defined in the Apache-2.0 license, shall belicensed as above, without any additional terms or conditions.

About

Extensible SQL Lexer and Parser for Rust

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp