- Notifications
You must be signed in to change notification settings - Fork126
Uses tokenized query returned by python-sqlparse and generates query metadata
License
macbre/sql-metadata
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Uses tokenized query returned bypython-sqlparse
and generates query metadata.
Extracts column names and tables used by the query.Automatically conductcolumn alias resolution,sub queries aliases resolution as well astables aliases resolving.
Provides also a helper fornormalization of SQL queries.
Supported queries syntax:
- MySQL
- PostgreSQL
- Sqlite
- MSSQL
- Apache Hive
(note that listed backends can differ quite substantially but should work in regard of query types supported bysql-metadata
)
You can test the capabilities ofsql-metadata
with an interactive demo:https://sql-app.infocruncher.com/
pip install sql-metadata
fromsql_metadataimportParser# extract raw sql-metadata tokensParser("SELECT * FROM foo").tokens# ['SELECT', '*', 'FROM', 'foo']
fromsql_metadataimportParser# get columns from query - for more examples see `tests/test_getting_columns.py`Parser("SELECT test, id FROM foo, bar").columns# ['test', 'id']Parser("INSERT /* VoteHelper::addVote xxx */ INTO `page_vote` (article_id,user_id,`time`) VALUES ('442001','27574631','20180228130846')").columns# ['article_id', 'user_id', 'time']parser=Parser("SELECT a.* FROM product_a.users AS a JOIN product_b.users AS b ON a.ip_address = b.ip_address")# note that aliases are auto-resolvedparser.columns# ['product_a.*', 'product_a.users.ip_address', 'product_b.users.ip_address']# note that you can also extract columns with their place in the query# which will return dict with lists divided into select, where, order_by, group_by, join, insert and updateparser.columns_dict# {'select': ['product_a.users.*'], 'join': ['product_a.users.ip_address', 'product_b.users.ip_address']}
fromsql_metadataimportParserparser=Parser("SELECT a, (b + c - u) as alias1, custome_func(d) alias2 from aa, bb order by alias1")# note that columns list do not contain aliases of the columnsparser.columns# ["a", "b", "c", "u", "d"]# but you can still extract aliases namesparser.columns_aliases_names# ["alias1", "alias2"]# aliases are resolved to the columns which they refer toparser.columns_aliases# {"alias1": ["b", "c", "u"], "alias2": "d"}# you can also extract aliases used by section of the query in which they are usedparser.columns_aliases_dict# {"order_by": ["alias1"], "select": ["alias1", "alias2"]}# the same applies to aliases used in queries section when you extract columns_dict# here only the alias is used in order by but it's resolved to actual columnsassertparser.columns_dict== {'order_by': ['b','c','u'],'select': ['a','b','c','u','d']}
fromsql_metadataimportParser# get tables from query - for more examples see `tests/test_getting_tables.py`Parser("SELECT a.* FROM product_a.users AS a JOIN product_b.users AS b ON a.ip_address = b.ip_address").tables# ['product_a.users', 'product_b.users']Parser("SELECT test, id FROM foo, bar").tables# ['foo', 'bar']# you can also extract aliases of the tables as a dictionaryparser=Parser("SELECT f.test FROM foo AS f")# get table aliasesparser.tables_aliases# {'f': 'foo'}# note that aliases are auto-resolved for columnsparser.columns# ["foo.test"]
fromsql_metadataimportParserparser=Parser("INSERT /* VoteHelper::addVote xxx */ INTO `page_vote` (article_id,user_id,`time`) ""VALUES ('442001','27574631','20180228130846')")# extract values from queryparser.values# ["442001", "27574631", "20180228130846"]# extract a dictionary with column-value pairsparser.values_dict#{"article_id": "442001", "user_id": "27574631", "time": "20180228130846"}# if column names are not set auto-add placeholdersparser=Parser("INSERT IGNORE INTO `table` VALUES (9, 2.15, '123', '2017-01-01');")parser.values# [9, 2.15, "123", "2017-01-01"]parser.values_dict#{"column_1": 9, "column_2": 2.15, "column_3": "123", "column_4": "2017-01-01"}
fromsql_metadataimportParserParser('SELECT foo_limit FROM bar_offset LIMIT 50 OFFSET 1000').limit_and_offset# (50, 1000)Parser('SELECT foo_limit FROM bar_offset limit 2000,50').limit_and_offset# (50, 2000)
fromsql_metadataimportParserparser=Parser("""WITH database1.tableFromWith AS (SELECT aa.* FROM table3 as aa left join table4 on aa.col1=table4.col2), test as (SELECT * from table3)SELECT "xxxxx"FROM database1.tableFromWith aliasLEFT JOIN database2.table2 ON ("tt"."ttt"."fff" = "xx"."xxx")""")# get names/ aliases of with statementsparser.with_names# ["database1.tableFromWith", "test"]# get definition of with queriesparser.with_queries# {"database1.tableFromWith": "SELECT aa.* FROM table3 as aa left join table4 on aa.col1=table4.col2"# "test": "SELECT * from table3"}# note that names of with statements do not appear in tablesparser.tables# ["table3", "table4", "database2.table2"]
fromsql_metadataimportParserparser=Parser("""SELECT COUNT(1) FROM(SELECT std.task_id FROM some_task_detail std WHERE std.STATUS = 1) aJOIN (SELECT st.task_id FROM some_task st WHERE task_type_id = 80) bON a.task_id = b.task_id;""")# get sub-queries dictionaryparser.subqueries# {"a": "SELECT std.task_id FROM some_task_detail std WHERE std.STATUS = 1",# "b": "SELECT st.task_id FROM some_task st WHERE task_type_id = 80"}# get names/ aliases of sub-queries / derived tablesparser.subqueries_names# ["a", "b"]# note that columns coming from sub-queries are resolved to real columnsparser.columns#["some_task_detail.task_id", "some_task_detail.STATUS", "some_task.task_id",# "task_type_id"]# same applies for columns_dict, note the join columns are resolvedparser.columns_dict#{'join': ['some_task_detail.task_id', 'some_task.task_id'],# 'select': ['some_task_detail.task_id', 'some_task.task_id'],# 'where': ['some_task_detail.STATUS', 'task_type_id']}
Seetests
file for more examples of a bit more complex queries.
fromsql_metadataimportParserparser=Parser('SELECT /* Test */ foo FROM bar WHERE id in (1, 2, 56)')# generalize queryparser.generalize# 'SELECT foo FROM bar WHERE id in (XYZ)'# remove commentsparser.without_comments# 'SELECT foo FROM bar WHERE id in (1, 2, 56)'# extract commentsparser.comments# ['/* Test */']
Seetest/test_normalization.py
file for more examples of a bit more complex queries.
sql_metadata.compat
module has been implemented to make the introduction of sql-metadata v2.0 smoother.
You can use it by simply changing the imports in your code from:
fromsql_metadataimportget_query_columns,get_query_tables
into:
fromsql_metadata.compatimportget_query_columns,get_query_tables
The following functions from the old API are available in thesql_metadata.compat
module:
generalize_sql
get_query_columns
(since #131 columns aliases ARE NOT returned by this function)get_query_limit_and_offset
get_query_tables
get_query_tokens
preprocess_query
Created and maintained by@macbre with a great contributions from@collerek and the others.
- aborecki (https://github.com/aborecki)
- collerek (https://github.com/collerek)
- dylanhogg (https://github.com/dylanhogg)
- macbre (https://github.com/macbre)
About
Uses tokenized query returned by python-sqlparse and generates query metadata