databricks/databricks-sql-pythonPublic

NotificationsYou must be signed in to change notification settings
Fork126
Star210

Commit910bb5c

Jesse

authored

SQLAlchemy 2: Stop skipping all type tests (#242)

Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>

1 parentefc0337 commit910bb5cCopy full SHA for 910bb5c

File tree

7 files changed

+376

-560

lines changed

CONTRIBUTING.md
src/databricks/sqlalchemy
- __init__.py
- _types.py
- requirements.py
- test_local
  - test_types.py
- test
  - test_suite.py
- types.py

7 files changed

+376

-560

lines changed

`‎CONTRIBUTING.md‎`

Lines changed: 5 additions & 6 deletions

Original file line number	Diff line number	Diff line change
@@ -148,16 +148,15 @@ The suites marked `[not documented]` require additional configuration which will
`148`	`148`
`149`	`149`	`SQLAlchemy provides reusable tests for testing dialect implementations.`
`150`	`150`
`151`		`-To run these tests, assuming the environment variables needed for e2e tests are set, do the following:`
`152`		`-`
`153`	`151`	```
`154`		`-cd src/databricks/sqlalchemy`
`155`		`-poetry run python -m pytest test/sqlalchemy_dialect_compliance.py --dburi \`
	`152`	`+poetry shell`
	`153`	`+cd src/databricks/sqlalchemy/test`
	`154`	`+python -m pytest test_suite.py --dburi \`
`156`	`155`	`"databricks://token:$access_token@$host?http_path=$http_path&catalog=$catalog&schema=$schema"`
`157`	`156`	```
`158`	`157`
`159`		`-Some of these of these tests fail currently. We're working on getting`
`160`		`-relavent tests passingandothers skipped.`
	`158`	`+Some of these of these tests fail currently. We're working on getting relevant tests passing and others skipped. The tests that we've already reviewed and verified`
	`159`	+are decorated with a pytest marker called`reviewed`. To only run these testsandcheck for regressions, you can add`-m reviewed` to the invocation command above.
`161`	`160`
`162`	`161`	`###Code formatting`
`163`	`162`

`‎src/databricks/sqlalchemy/init.py‎`

Lines changed: 7 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -13,7 +13,7 @@`
`13`	`13`	`fromdatabricksimportsql`
`14`	`14`
`15`	`15`	`# This import is required to process our @compiles decorators`
`16`		`-importdatabricks.sqlalchemy.types`
	`16`	`+importdatabricks.sqlalchemy._typesasdialect_type_impl`
`17`	`17`
`18`	`18`
`19`	`19`	`fromdatabricks.sqlalchemy.baseimport (`
`@@ -48,6 +48,12 @@ class DatabricksDialect(default.DefaultDialect):`
`48`	`48`	`non_native_boolean_check_constraint:bool=False`
`49`	`49`	`paramstyle:str="named"`
`50`	`50`
	`51`	`+colspecs= {`
	`52`	`+sqlalchemy.types.DateTime:dialect_type_impl.DatabricksDateTimeNoTimezoneType,`
	`53`	`+sqlalchemy.types.Time:dialect_type_impl.DatabricksTimeType,`
	`54`	`+sqlalchemy.types.String:dialect_type_impl.DatabricksStringType,`
	`55`	`+ }`
	`56`	`+`
`51`	`57`	`@classmethod`
`52`	`58`	`defdbapi(cls):`
`53`	`59`	`returnsql`
`@@ -130,7 +136,6 @@ def get_columns(self, connection, table_name, schema=None, **kwargs):`
`130`	`136`	`columns= []`
`131`	`137`
`132`	`138`	`forcolinresp:`
`133`		`-`
`134`	`139`	`# Taken from PyHive. This removes added type info from decimals and maps`
`135`	`140`	`_col_type=re.search(r"^\w+",col.TYPE_NAME).group(0)`
`136`	`141`	`this_column= {`

`‎src/databricks/sqlalchemy/_types.py‎`

Lines changed: 214 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,214 @@`
	`1`	`+importsqlalchemy`
	`2`	`+fromsqlalchemy.ext.compilerimportcompiles`
	`3`	`+`
	`4`	`+fromtypingimportUnion`
	`5`	`+`
	`6`	`+fromdatetimeimportdatetime,time`
	`7`	`+`
	`8`	`+`
	`9`	`+fromdatabricks.sql.utilsimportParamEscaper`
	`10`	`+`
	`11`	`+`
	`12`	`+@compiles(sqlalchemy.types.Enum,"databricks")`
	`13`	`+@compiles(sqlalchemy.types.String,"databricks")`
	`14`	`+@compiles(sqlalchemy.types.Text,"databricks")`
	`15`	`+@compiles(sqlalchemy.types.Time,"databricks")`
	`16`	`+@compiles(sqlalchemy.types.Unicode,"databricks")`
	`17`	`+@compiles(sqlalchemy.types.UnicodeText,"databricks")`
	`18`	`+@compiles(sqlalchemy.types.Uuid,"databricks")`
	`19`	`+defcompile_string_databricks(type_,compiler,**kw):`
	`20`	`+"""`
	`21`	`+ We override the default compilation for Enum(), String(), Text(), and Time() because SQLAlchemy`
	`22`	`+ defaults to incompatible / abnormal compiled names`
	`23`	`+`
	`24`	`+ Enum -> VARCHAR`
	`25`	`+ String -> VARCHAR[LENGTH]`
	`26`	`+ Text -> VARCHAR[LENGTH]`
	`27`	`+ Time -> TIME`
	`28`	`+ Unicode -> VARCHAR[LENGTH]`
	`29`	`+ UnicodeText -> TEXT`
	`30`	`+ Uuid -> CHAR[32]`
	`31`	`+`
	`32`	`+ But all of these types will be compiled to STRING in Databricks SQL`
	`33`	`+ """`
	`34`	`+return"STRING"`
	`35`	`+`
	`36`	`+`
	`37`	`+@compiles(sqlalchemy.types.Integer,"databricks")`
	`38`	`+defcompile_integer_databricks(type_,compiler,**kw):`
	`39`	`+"""`
	`40`	`+ We need to override the default Integer compilation rendering because Databricks uses "INT" instead of "INTEGER"`
	`41`	`+ """`
	`42`	`+return"INT"`
	`43`	`+`
	`44`	`+`
	`45`	`+@compiles(sqlalchemy.types.LargeBinary,"databricks")`
	`46`	`+defcompile_binary_databricks(type_,compiler,**kw):`
	`47`	`+"""`
	`48`	`+ We need to override the default LargeBinary compilation rendering because Databricks uses "BINARY" instead of "BLOB"`
	`49`	`+ """`
	`50`	`+return"BINARY"`
	`51`	`+`
	`52`	`+`
	`53`	`+@compiles(sqlalchemy.types.Numeric,"databricks")`
	`54`	`+defcompile_numeric_databricks(type_,compiler,**kw):`
	`55`	`+"""`
	`56`	`+ We need to override the default Numeric compilation rendering because Databricks uses "DECIMAL" instead of "NUMERIC"`
	`57`	`+`
	`58`	`+ The built-in visit_DECIMAL behaviour captures the precision and scale. Here we're just mapping calls to compile Numeric`
	`59`	`+ to the SQLAlchemy Decimal() implementation`
	`60`	`+ """`
	`61`	`+returncompiler.visit_DECIMAL(type_,**kw)`
	`62`	`+`
	`63`	`+`
	`64`	`+@compiles(sqlalchemy.types.DateTime,"databricks")`
	`65`	`+defcompile_datetime_databricks(type_,compiler,**kw):`
	`66`	`+"""`
	`67`	`+ We need to override the default DateTime compilation rendering because Databricks uses "TIMESTAMP" instead of "DATETIME"`
	`68`	`+ """`
	`69`	`+return"TIMESTAMP_NTZ"`
	`70`	`+`
	`71`	`+`
	`72`	`+@compiles(sqlalchemy.types.ARRAY,"databricks")`
	`73`	`+defcompile_array_databricks(type_,compiler,**kw):`
	`74`	`+"""`
	`75`	`+ SQLAlchemy's default ARRAY can't compile as it's only implemented for Postgresql.`
	`76`	`+ The Postgres implementation works for Databricks SQL, so we duplicate that here.`
	`77`	`+`
	`78`	`+ :type_:`
	`79`	`+ This is an instance of sqlalchemy.types.ARRAY which always includes an item_type attribute`
	`80`	`+ which is itself an instance of TypeEngine`
	`81`	`+`
	`82`	`+ https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.ARRAY`
	`83`	`+ """`
	`84`	`+`
	`85`	`+inner=compiler.process(type_.item_type,**kw)`
	`86`	`+`
	`87`	`+returnf"ARRAY<{inner}>"`
	`88`	`+`
	`89`	`+`
	`90`	`+classDatabricksDateTimeNoTimezoneType(sqlalchemy.types.TypeDecorator):`
	`91`	`+"""The decimal that pysql creates when it receives the contents of a TIMESTAMP_NTZ`
	`92`	`+ includes a timezone of 'Etc/UTC'. But since SQLAlchemy's test suite assumes that`
	`93`	`+ the sqlalchemy.types.DateTime type will return a datetime.datetime _without_ any`
	`94`	`+ timezone set, we need to strip the timezone off the value received from pysql.`
	`95`	`+`
	`96`	`+ It's not clear if DBR sends a timezone to pysql or if pysql is adding it. This could be a bug.`
	`97`	`+ """`
	`98`	`+`
	`99`	`+impl=sqlalchemy.types.DateTime`
	`100`	`+`
	`101`	`+cache_ok=True`
	`102`	`+`
	`103`	`+defprocess_result_value(self,value:Union[None,datetime],dialect):`
	`104`	`+ifvalueisNone:`
	`105`	`+returnNone`
	`106`	`+returnvalue.replace(tzinfo=None)`
	`107`	`+`
	`108`	`+`
	`109`	`+classDatabricksTimeType(sqlalchemy.types.TypeDecorator):`
	`110`	`+"""Databricks has no native TIME type. So we store it as a string."""`
	`111`	`+`
	`112`	`+impl=sqlalchemy.types.Time`
	`113`	`+cache_ok=True`
	`114`	`+`
	`115`	`+TIME_WITH_MICROSECONDS_FMT="%H:%M:%S.%f"`
	`116`	`+TIME_NO_MICROSECONDS_FMT="%H:%M:%S"`
	`117`	`+`
	`118`	`+defprocess_bind_param(self,value:Union[time,None],dialect)->Union[None,str]:`
	`119`	`+"""Values sent to the database are converted to %:H:%M:%S strings."""`
	`120`	`+ifvalueisNone:`
	`121`	`+returnNone`
	`122`	`+returnvalue.strftime(self.TIME_WITH_MICROSECONDS_FMT)`
	`123`	`+`
	`124`	`+# mypy doesn't like this workaround because TypeEngine wants process_literal_param to return a string`
	`125`	`+defprocess_literal_param(self,value,dialect)->time:# type: ignore`
	`126`	`+"""It's not clear to me why this is necessary. Without it, SQLAlchemy's Timetest:test_literal fails`
	`127`	`+ because the string literal renderer receives a str() object and calls .isoformat() on it.`
	`128`	`+`
	`129`	`+ Whereas this method receives a datetime.time() object which is subsequently passed to that`
	`130`	`+ same renderer. And that works.`
	`131`	`+`
	`132`	`+ UPDATE: After coping with the literal_processor override in DatabricksStringType, I suspect a similar`
	`133`	`+ mechanism is at play. Two different processors are are called in sequence. This is likely a byproduct`
	`134`	`+ of Databricks not having a true TIME type. I think the string representation of Time() types is`
	`135`	`+ somehow affecting the literal rendering process. But as long as this passes the tests, I'm not`
	`136`	`+ worried about it.`
	`137`	`+ """`
	`138`	`+returnvalue`
	`139`	`+`
	`140`	`+defprocess_result_value(`
	`141`	`+self,value:Union[None,str],dialect`
	`142`	`+ )->Union[time,None]:`
	`143`	`+"""Values received from the database are parsed into datetime.time() objects"""`
	`144`	`+ifvalueisNone:`
	`145`	`+returnNone`
	`146`	`+`
	`147`	`+try:`
	`148`	`+_parsed=datetime.strptime(value,self.TIME_WITH_MICROSECONDS_FMT)`
	`149`	`+exceptValueError:`
	`150`	`+# If the string doesn't have microseconds, try parsing it without them`
	`151`	`+_parsed=datetime.strptime(value,self.TIME_NO_MICROSECONDS_FMT)`
	`152`	`+`
	`153`	`+return_parsed.time()`
	`154`	`+`
	`155`	`+`
	`156`	`+classDatabricksStringType(sqlalchemy.types.TypeDecorator):`
	`157`	`+"""We have to implement our own String() type because SQLAlchemy's default implementation`
	`158`	`+ wants to escape single-quotes with a doubled single-quote. Databricks uses a backslash for`
	`159`	`+ escaping of literal strings. And SQLAlchemy's default escaping breaks Databricks SQL.`
	`160`	`+ """`
	`161`	`+`
	`162`	`+impl=sqlalchemy.types.String`
	`163`	`+cache_ok=True`
	`164`	`+pe=ParamEscaper()`
	`165`	`+`
	`166`	`+defprocess_literal_param(self,value,dialect)->str:`
	`167`	`+"""SQLAlchemy's default string escaping for backslashes doesn't work for databricks. The logic here`
	`168`	`+ implements the same logic as our legacy inline escaping logic.`
	`169`	`+ """`
	`170`	`+`
	`171`	`+returnself.pe.escape_string(value)`
	`172`	`+`
	`173`	`+defliteral_processor(self,dialect):`
	`174`	`+"""We manually override this method to prevent further processing of the string literal beyond`
	`175`	`+ what happens in the process_literal_param() method.`
	`176`	`+`
	`177`	`+ The SQLAlchemy docs _specifically_ say to not override this method.`
	`178`	`+`
	`179`	`+ It appears that any processing that happens from TypeEngine.process_literal_param happens _before_`
	`180`	`+ and _in addition to_ whatever the class's impl.literal_processor() method does. The String.literal_processor()`
	`181`	`+ method performs a string replacement that doubles any single-quote in the contained string. This raises a syntax`
	`182`	`+ error in Databricks. And it's not necessary because ParamEscaper() already implements all the escaping we need.`
	`183`	`+`
	`184`	`+ We should consider opening an issue on the SQLAlchemy project to see if I'm using it wrong.`
	`185`	`+`
	`186`	`+ See type_api.py::TypeEngine.literal_processor:`
	`187`	`+`
	`188`	+ ```python
	`189`	`+ def process(value: Any) -> str:`
	`190`	`+ return fixed_impl_processor(`
	`191`	`+ fixed_process_literal_param(value, dialect)`
	`192`	`+ )`
	`193`	+ ```
	`194`	`+`
	`195`	`+ That call to fixed_impl_processor wraps the result of fixed_process_literal_param (which is the`
	`196`	`+ process_literal_param defined in our Databricks dialect)`
	`197`	`+`
	`198`	`+ https://docs.sqlalchemy.org/en/20/core/custom_types.html#sqlalchemy.types.TypeDecorator.literal_processor`
	`199`	`+ """`
	`200`	`+`
	`201`	`+defprocess(value):`
	`202`	`+"""This is a copy of the default String.literal_processor() method but stripping away`
	`203`	`+ its double-escaping behaviour for single-quotes.`
	`204`	`+ """`
	`205`	`+`
	`206`	`+_step1=self.process_literal_param(value,dialect="databricks")`
	`207`	`+ifdialect.identifier_preparer._double_percents:`
	`208`	`+_step2=_step1.replace("%","%%")`
	`209`	`+else:`
	`210`	`+_step2=_step1`
	`211`	`+`
	`212`	`+return"%s"%_step2`
	`213`	`+`
	`214`	`+returnprocess`

`‎src/databricks/sqlalchemy/requirements.py‎`

Lines changed: 83 additions & 21 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,34 +1,96 @@`
`1`	`1`	`"""`
`2`		`-This module is supposedly used by the compliance tests to control which tests are run based on database capabilities.`
`3`		`-However, based on some experimentation that does not appear to be consistently the case. Until we better understand`
`4`		`-when these requirements are and are not implemented, we prefer to manually capture the exact nature of the failures`
`5`		`-and errors.`
`6`		`-`
`7`		`-Once we better understand how to use requirements.py, an example exclusion will look like this:`
`8`		`-`
`9`		`- import sqlalchemy.testing.requirements`
`10`		`- import sqlalchemy.testing.exclusions`
`11`		`-`
`12`		`- class Requirements(sqlalchemy.testing.requirements.SuiteRequirements):`
`13`		`- @property`
`14`		`- def __some_example_requirement(self):`
`15`		`- return sqlalchemy.testing.exclusions.closed`
`16`		`-`
`17`		`-`
`18`	`2`	`The complete list of requirements is provided by SQLAlchemy here:`
`19`	`3`
`20`	`4`	`https://github.com/sqlalchemy/sqlalchemy/blob/main/lib/sqlalchemy/testing/requirements.py`
	`5`	`+`
	`6`	`+When SQLAlchemy skips a test because a requirement is closed() it gives a generic skip message.`
	`7`	`+To make these failures more actionable, we only define requirements in this file that we wish to`
	`8`	`+force to be open(). If a test should be skipped on Databricks, it will be specifically marked skip`
	`9`	`+in test_suite.py with a Databricks-specific reason.`
	`10`	`+`
	`11`	`+See the special note about the array_type exclusion below.`
`21`	`12`	`"""`
`22`	`13`
`23`	`14`	`importsqlalchemy.testing.requirements`
`24`	`15`	`importsqlalchemy.testing.exclusions`
`25`	`16`
`26`		`-importlogging`
`27`	`17`
`28`		`-logger=logging.getLogger(__name__)`
	`18`	`+classRequirements(sqlalchemy.testing.requirements.SuiteRequirements):`
	`19`	`+@property`
	`20`	`+defdate_historic(self):`
	`21`	`+"""target dialect supports representation of Python`
	`22`	`+ datetime.datetime() objects with historic (pre 1970) values."""`
`29`	`23`
`30`		`-logger.warning("requirements.py is not currently employed by Databricks dialect")`
	`24`	`+returnsqlalchemy.testing.exclusions.open()`
`31`	`25`
	`26`	`+@property`
	`27`	`+defdatetime_historic(self):`
	`28`	`+"""target dialect supports representation of Python`
	`29`	`+ datetime.datetime() objects with historic (pre 1970) values."""`
`32`	`30`
`33`		`-classRequirements(sqlalchemy.testing.requirements.SuiteRequirements):`
`34`		`-pass`
	`31`	`+returnsqlalchemy.testing.exclusions.open()`
	`32`	`+`
	`33`	`+@property`
	`34`	`+defdatetime_literals(self):`
	`35`	`+"""target dialect supports rendering of a date, time, or datetime as a`
	`36`	`+ literal string, e.g. via the TypeEngine.literal_processor() method.`
	`37`	`+`
	`38`	`+ """`
	`39`	`+`
	`40`	`+returnsqlalchemy.testing.exclusions.open()`
	`41`	`+`
	`42`	`+@property`
	`43`	`+deftimestamp_microseconds(self):`
	`44`	`+"""target dialect supports representation of Python`
	`45`	`+ datetime.datetime() with microsecond objects but only`
	`46`	`+ if TIMESTAMP is used."""`
	`47`	`+`
	`48`	`+returnsqlalchemy.testing.exclusions.open()`
	`49`	`+`
	`50`	`+@property`
	`51`	`+deftime_microseconds(self):`
	`52`	`+"""target dialect supports representation of Python`
	`53`	`+ datetime.time() with microsecond objects.`
	`54`	`+`
	`55`	`+ This requirement declaration isn't needed but I've included it here for completeness.`
	`56`	`+ Since Databricks doesn't have a TIME type, SQLAlchemy will compile Time() columns`
	`57`	`+ as STRING Databricks data types. And we use a custom time type to render those strings`
	`58`	`+ between str() and time.time() representations. Therefore we can store _any_ precision`
	`59`	`+ that SQLAlchemy needs. The time_microseconds requirement defaults to ON for all dialects`
	`60`	`+ except mssql, mysql, mariadb, and oracle.`
	`61`	`+ """`
	`62`	`+`
	`63`	`+returnsqlalchemy.testing.exclusions.open()`
	`64`	`+`
	`65`	`+@property`
	`66`	`+definfinity_floats(self):`
	`67`	`+"""The Float type can persist and load float('inf'), float('-inf')."""`
	`68`	`+`
	`69`	`+returnsqlalchemy.testing.exclusions.open()`
	`70`	`+`
	`71`	`+@property`
	`72`	`+defprecision_numerics_retains_significant_digits(self):`
	`73`	`+"""A precision numeric type will return empty significant digits,`
	`74`	`+ i.e. a value such as 10.000 will come back in Decimal form with`
	`75`	`+ the .000 maintained."""`
	`76`	`+`
	`77`	`+returnsqlalchemy.testing.exclusions.open()`
	`78`	`+`
	`79`	`+@property`
	`80`	`+defprecision_numerics_many_significant_digits(self):`
	`81`	`+"""target backend supports values with many digits on both sides,`
	`82`	`+ such as 319438950232418390.273596, 87673.594069654243`
	`83`	`+`
	`84`	`+ """`
	`85`	`+returnsqlalchemy.testing.exclusions.open()`
	`86`	`+`
	`87`	`+@property`
	`88`	`+defarray_type(self):`
	`89`	`+"""While Databricks does support ARRAY types, pysql cannot bind them. So`
	`90`	`+ we cannot use them with SQLAlchemy`
	`91`	`+`
	`92`	`+ Due to a bug in SQLAlchemy, we _must_ define this exclusion as closed() here or else the`
	`93`	`+ test runner will crash the pytest process due to an AttributeError`
	`94`	`+ """`
	`95`	`+`
	`96`	`+returnsqlalchemy.testing.exclusions.closed()`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit910bb5c

File tree

7 files changed

7 files changed

`‎CONTRIBUTING.md‎`

`‎src/databricks/sqlalchemy/init.py‎`

`‎src/databricks/sqlalchemy/_types.py‎`

`‎src/databricks/sqlalchemy/requirements.py‎`

0 commit comments

Movatterモバイル変換

File tree

7 files changed

7 files changed

‎CONTRIBUTING.md‎

‎src/databricks/sqlalchemy/__init__.py‎

‎src/databricks/sqlalchemy/_types.py‎

‎src/databricks/sqlalchemy/requirements.py‎

0 commit comments

`‎CONTRIBUTING.md‎`

`‎src/databricks/sqlalchemy/init.py‎`

`‎src/databricks/sqlalchemy/_types.py‎`

`‎src/databricks/sqlalchemy/requirements.py‎`