- Notifications
You must be signed in to change notification settings - Fork115
基于 antlr4 的多种数据库SQL解析器,获取SQL中元数据,可用于数据平台产品中的多个场景:ddl语句提取元数据、sql 权限校验、表级血缘、sql语法校验等场景。支持spark、flink、gauss、starrocks、Oracle、MYSQL、Postgresql,sqlserver,、db2等
License
NotificationsYou must be signed in to change notification settings
melin/superior-sql-parser
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
基于 antlr4 的多种数据库SQL解析器,获取SQL中元数据,可用于数据平台产品中的多个场景:ddl语句提取元数据、sql 权限校验、表级血缘、sql语法校验等场景。支持spark、flink、gauss、starrocks、Oracle、MYSQL、Postgresql,sqlserver,、db2等
<dependency> <groupId>io.github.melin.superior</groupId> <artifactId>superior-[spark|presto|mysql|oracle|...]-parser</artifactId> <version>4.1.0-SNAPSHOT</version></dependency>
4.0.x 支持jdk8, antlr 4.9.3, 4.1.x 支持 jdk11, antlr 4.13.1
export GPG_TTY=$(tty)mvn spotless:apply clean deploy -Prelease
每个数据库SQL 提供 Helper 类,Helper 方法提供四个方法:
1. parseStatement(String sql) // 解析单个完整sql a. ddl: 获取ddl 详细信息。例如:数据库执行完ddl以后,解析ddl,获取到相关信息,同步到元数据信息。 b. dml: 获取sql中使用到的表,用于构建表级血缘,或者校验表权限。2. parseMultiStatement(String sql) // 解析多个完整sql,支持空格、换行、分号分隔3. splitSql(String sql) // sql 文本包含多个完整sql,方法用于分隔sql语句,支持空格、换行、分号分隔4. checkSqlSyntax(String sql) // 验证单个完整sql语法是否正确5. sqlKeywords() // 获取sql 关键字,主要用于sql editor 关键字提示
// Spark SQLval sql="select bzdys, bzhyyh, bzdy, week, round((bzdy-bzdys)*100/bzdys, 2)"+"from (select lag(bzdy) over (order by week) bzdys, bzhyyh, bzdy, week"+"from (select count(distinct partner_code) bzhyyh, count(1) bzdy, week from tdl_dt2x_table)) limit 111"val statement=SparkSQLHelper.parseStatement(sql)if (statementisQueryStmt) {Assert.assertEquals(StatementType.SELECT, statement.statementType)Assert.assertEquals(1, statement.inputTables.size)Assert.assertEquals("tdl_dt2x_table", statement.inputTables.get(0).tableName)Assert.assertEquals(111, statement.limit)}else {Assert.fail()}// Spark Jarval sql=""" set spark.shuffle.compress=true;set spark.rdd.compress=true; set spark.driver.maxResultSize=3g; set spark.serializer=org.apache.spark.serializer.KryoSerializer; set spark.kryoserializer.buffer.max=1024m; set spark.kryoserializer.buffer=256m; set spark.network.timeout=300s; examples-jar-with-dependencies.jar imei_test.euSaveHBase gaea_offline:account_mobile sh md shda.interest_radar_mobile_score_dt 20180318 /xiaoyong.fu/sh/mobile/loan 400 '%7B%22job_type%22=' --jar""";val statementDatas=JobTaskHelper.parseStatement(sql)Assert.assertEquals(8, statementDatas.size)var statementData= statementDatas.get(7)var statement= statementData.statementif (statementisJobData) {Assert.assertEquals(StatementType.JOB, statement.statementType)Assert.assertEquals("createHfile-1.2-SNAPSHOT-jar-with-dependencies.jar", statement.resourceName)Assert.assertEquals("imei_test.euSaveHBase", statement.className)Assert.assertEquals("/xiaoyong.fu/sh/mobile/loan", statement.params?.get(5))Assert.assertEquals("400", statement.params?.get(6))Assert.assertEquals("%7B%22job_type%22=", statement.params?.get(7))Assert.assertEquals("--jar", statement.params?.get(8))}else {Assert.fail()}// MySQLval sql="insert into bigdata.user select * from users a left outer join address b on a.address_id = b.id"val statement=MySQLHelper.parseStatement(sql)if(statementisQueryStmt) {Assert.assertEquals(StatementType.INSERT_SELECT, statement.statementType)Assert.assertEquals("bigdata", statement.outpuTables.get(0).databaseName)Assert.assertEquals("user", statement.outpuTables.get(0).tableName)Assert.assertEquals(2, statement.inputTables.size)}else {Assert.fail()}// Postgresval sql=""" select a.* from datacompute1.datacompute.dc_job a left join datacompute1.datacompute.dc_job_scheduler b on a.id=b.job_id""".trimIndent()val statement=PostgreSQLHelper.parseStatement(sql)if (statementisQueryStmt) {Assert.assertEquals(StatementType.SELECT, statement.statementType)Assert.assertEquals(2, statement.inputTables.size)}else {Assert.fail()}