Commit3eaaac9

Jesse

authored

Add SQLAlchemy Dialect (#57)

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

1 parente3f3a6f commit3eaaac9Copy full SHA for 3eaaac9

File tree

13 files changed

+1216

-87

lines changed

README.md
examples
- README.md
- sqlalchemy.py
poetry.lock
pyproject.toml
src/databricks
- sqlalchemy
  - __init__.py
  - dialect
- sql
  - utils.py
tests
- e2e/sqlalchemy
  - test_basic.py
- sqlalchemy/demo_data
  - MOCK_DATA.xlsx
- unit
  - test_param_escaper.py

13 files changed

+1216

-87

lines changed

`‎README.md‎`

Lines changed: 17 additions & 6 deletions

Original file line number	Diff line number	Diff line change
`@@ -3,7 +3,7 @@`
`3`	`3`	`[![PyPI](https://img.shields.io/pypi/v/databricks-sql-connector?style=flat-square)](https://pypi.org/project/databricks-sql-connector/)`
`4`	`4`	`[![Downloads](https://pepy.tech/badge/databricks-sql-connector)](https://pepy.tech/project/databricks-sql-connector)`
`5`	`5`
`6`		`-The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the[Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/).`
	`6`	+The Databricks SQL Connector for Python allows you to develop Python applications that connect to Databricks clusters and SQL warehouses. It is a Thrift-based client with no dependencies on ODBC or JDBC. It conforms to the[Python DB API 2.0 specification](https://www.python.org/dev/peps/pep-0249/) and exposes a[SQLAlchemy](https://www.sqlalchemy.org/) dialect for use with tools like`pandas` and`alembic` which use SQLAlchemy to execute DDL.
`7`	`7`
`8`	`8`	This connector uses Arrow as the data-exchange format, and supports APIs to directly fetch Arrow tables. Arrow tables are wrapped in the`ArrowQueue` class to provide a natural API to get several rows at a time.
`9`	`9`
`@@ -24,16 +24,27 @@ For the latest documentation, see`
`24`	`24`
`25`	`25`	Install the library with`pip install databricks-sql-connector`
`26`	`26`
`27`		`-Example usage:`
	`27`	`+Note: Don't hard-code authentication secrets into your Python. Use environment variables`
	`28`	`+`
	`29`	+```bash
	`30`	`+export DATABRICKS_HOST=********.databricks.com`
	`31`	`+export DATABRICKS_HTTP_PATH=/sql/1.0/endpoints/****************`
	`32`	`+export DATABRICKS_TOKEN=dapi********************************`
	`33`	+```
`28`	`34`
	`35`	`+Example usage:`
`29`	`36`	```python
	`37`	`+import os`
`30`	`38`	`from databricksimport sql`
`31`	`39`
`32`		`-connection= sql.connect(`
`33`		`-server_hostname='********.databricks.com',`
`34`		`-http_path='/sql/1.0/endpoints/****************',`
`35`		`-access_token='dapi********************************')`
	`40`	`+host= os.getenv("DATABRICKS_HOST)`
	`41`	`+http_path= os.getenv("DATABRICKS_HTTP_PATH)`
	`42`	`+access_token= os.getenv("DATABRICKS_ACCESS_TOKEN)`
`36`	`43`
	`44`	`+connection= sql.connect(`
	`45`	`+server_hostname=host,`
	`46`	`+http_path=http_path,`
	`47`	`+access_token=access_token)`
`37`	`48`
`38`	`49`	`cursor= connection.cursor()`
`39`	`50`

`‎examples/README.md‎`

Lines changed: 2 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -36,4 +36,5 @@ To run all of these examples you can clone the entire repository to your disk. O`
`36`	`36`	-`persistent_oauth.py` shows a more advanced example of authenticating by OAuth while Bring Your Own IDP is in public preview. In this case, it shows how to use a sublcass of`OAuthPersistence` to reuse an OAuth token across script executions.
`37`	`37`	-`set_user_agent.py` shows how to customize the user agent header used for Thrift commands. In
`38`	`38`	this example the string`ExamplePartnerTag` will be added to the the user agent on every request.
`39`		--`staging_ingestion.py` shows how the connector handles Databricks' experimental staging ingestion commands`GET`,`PUT`, and`REMOVE`.
	`39`	+-`staging_ingestion.py` shows how the connector handles Databricks' experimental staging ingestion commands`GET`,`PUT`, and`REMOVE`.
	`40`	+-`sqlalchemy.py` shows a basic example of connecting to Databricks with[SQLAlchemy](https://www.sqlalchemy.org/).

`‎examples/sqlalchemy.py‎`

Lines changed: 92 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,92 @@`
	`1`	`+"""`
	`2`	`+databricks-sql-connector includes a SQLAlchemy dialect compatible with Databricks SQL.`
	`3`	`+It aims to be a drop-in replacement for the crflynn/sqlalchemy-databricks project, that implements`
	`4`	`+more of the Databricks API, particularly around table reflection, Alembic usage, and data`
	`5`	`+ingestion with pandas.`
	`6`	`+`
	`7`	`+Because of the extent of SQLAlchemy's capabilities it isn't feasible to provide examples of every`
	`8`	`+usage in a single script, so we only provide a basic one here. More examples are found in our test`
	`9`	`+suite at tests/e2e/sqlalchemy/test_basic.py and in the PR that implements this change:`
	`10`	`+`
	`11`	`+https://github.com/databricks/databricks-sql-python/pull/57`
	`12`	`+`
	`13`	`+# What's already supported`
	`14`	`+`
	`15`	`+Most of the functionality is demonstrated in the e2e tests mentioned above. The below list we`
	`16`	`+derived from those test method names:`
	`17`	`+`
	`18`	`+ - Create and drop tables with SQLAlchemy Core`
	`19`	`+ - Create and drop tables with SQLAlchemy ORM`
	`20`	`+ - Read created tables via reflection`
	`21`	`+ - Modify column nullability`
	`22`	`+ - Insert records manually`
	`23`	`+ - Insert records with pandas.to_sql (note that this does not work for DataFrames with indexes)`
	`24`	`+`
	`25`	`+This connector also aims to support Alembic for programmatic delta table schema maintenance. This`
	`26`	`+behaviour is not yet backed by integration tests, which will follow in a subsequent PR as we learn`
	`27`	`+more about customer use cases there. That said, the following behaviours have been tested manually:`
	`28`	`+`
	`29`	`+ - Autogenerate revisions with alembic revision --autogenerate`
	`30`	+ - Upgrade and downgrade between revisions with `alembic upgrade <revision hash>` and
	`31`	+ `alembic downgrade <revision hash>`
	`32`	`+`
	`33`	`+# Known Gaps`
	`34`	`+ - MAP, ARRAY, and STRUCT types: this dialect can read these types out as strings. But you cannot`
	`35`	`+ define a SQLAlchemy model with databricks.sqlalchemy.dialect.types.DatabricksMap (e.g.) because`
	`36`	`+ we haven't implemented them yet.`
	`37`	`+ - Constraints: with the addition of information_schema to Unity Catalog, Databricks SQL supports`
	`38`	`+ foreign key and primary key constraints. This dialect can write these constraints but the ability`
	`39`	`+ for alembic to reflect and modify them programmatically has not been tested.`
	`40`	`+"""`
	`41`	`+`
	`42`	`+importos`
	`43`	`+fromsqlalchemy.ormimportdeclarative_base,Session`
	`44`	`+fromsqlalchemyimportColumn,String,Integer,BOOLEAN,create_engine,select`
	`45`	`+`
	`46`	`+host=os.getenv("DATABRICKS_SERVER_HOSTNAME")`
	`47`	`+http_path=os.getenv("DATABRICKS_HTTP_PATH")`
	`48`	`+access_token=os.getenv("DATABRICKS_TOKEN")`
	`49`	`+catalog=os.getenv("DATABRICKS_CATALOG")`
	`50`	`+schema=os.getenv("DATABRICKS_SCHEMA")`
	`51`	`+`
	`52`	`+`
	`53`	`+# Extra arguments are passed untouched to the driver`
	`54`	`+# See thrift_backend.py for complete list`
	`55`	`+extra_connect_args= {`
	`56`	`+"_tls_verify_hostname":True,`
	`57`	`+"_user_agent_entry":"PySQL Example Script",`
	`58`	`+}`
	`59`	`+`
	`60`	`+engine=create_engine(`
	`61`	`+f"databricks://token:{access_token}@{host}?http_path={http_path}&catalog={catalog}&schema={schema}",`
	`62`	`+connect_args=extra_connect_args,`
	`63`	`+)`
	`64`	`+session=Session(bind=engine)`
	`65`	`+base=declarative_base(bind=engine)`
	`66`	`+`
	`67`	`+`
	`68`	`+classSampleObject(base):`
	`69`	`+`
	`70`	`+__tablename__="mySampleTable"`
	`71`	`+`
	`72`	`+name=Column(String(255),primary_key=True)`
	`73`	`+episodes=Column(Integer)`
	`74`	`+some_bool=Column(BOOLEAN)`
	`75`	`+`
	`76`	`+`
	`77`	`+base.metadata.create_all()`
	`78`	`+`
	`79`	`+sample_object_1=SampleObject(name="Bim Adewunmi",episodes=6,some_bool=True)`
	`80`	`+sample_object_2=SampleObject(name="Miki Meek",episodes=12,some_bool=False)`
	`81`	`+`
	`82`	`+session.add(sample_object_1)`
	`83`	`+session.add(sample_object_2)`
	`84`	`+`
	`85`	`+session.commit()`
	`86`	`+`
	`87`	`+stmt=select(SampleObject).where(SampleObject.name.in_(["Bim Adewunmi","Miki Meek"]))`
	`88`	`+`
	`89`	`+output= [iforiinsession.scalars(stmt)]`
	`90`	`+assertlen(output)==2`
	`91`	`+`
	`92`	`+base.metadata.drop_all()`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit3eaaac9

File tree

13 files changed

13 files changed

`‎README.md‎`

`‎examples/README.md‎`

`‎examples/sqlalchemy.py‎`

0 commit comments