Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

SQLAlchemy 2: Stop skipping all type tests#242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
susodapop merged 25 commits intomainfromsqlalchemy-test-evaluation
Oct 10, 2023
Merged
Show file tree
Hide file tree
Changes fromall commits
Commits
Show all changes
25 commits
Select commitHold shift + click to select a range
8c05fd8
Rename types.py →_types.py
Oct 6, 2023
6815935
Mark BINARY sqlalchemy tests as skip because connector doesn't suppor…
Oct 6, 2023
75368a5
Stop skipping DateHistoric tests
Oct 6, 2023
da9b9f2
Stop skipping DateTest tests
Oct 6, 2023
b7afd1b
Stop skipping DateTimeHistoric tests
Oct 7, 2023
f4a59f5
Stop skipping DateTimeTest
Oct 7, 2023
c817509
Stop skipping TimeTest
Oct 7, 2023
d3a76a4
Stop skipping DateTimeCoercedToDateTimeTest
Oct 7, 2023
d4afb60
Stop skipping TimestampMicrosecondsTest
Oct 7, 2023
ab8e28f
Stop skipping DateTimeMicrosecondsTest
Oct 7, 2023
af1a49e
Stop skipping StringTest
Oct 8, 2023
53eef60
Stop skipping TextTest
Oct 8, 2023
a56b083
Stop skipping TimeMicrosecondsTest
Oct 8, 2023
aa1cae3
Stop skipping NumericTest
Oct 8, 2023
4671ff9
Stop skipping Boolean test
Oct 8, 2023
77221c2
Fix: must add array_type exclusion to this file else the entire test …
Oct 8, 2023
bc0495d
Black all files changed in this PR
Oct 8, 2023
e233058
Update test running instructions
Oct 8, 2023
214523b
Remove outdated comment now that we better understand how to use this…
Oct 8, 2023
80ae0dd
Add marks for the type tests that we've reviewed. This way we can re-run
Oct 9, 2023
b31eef4
(1/x) Remove .closed() exclusions and replace with explicit skips
Oct 10, 2023
c8332da
Add an open exclusion to allow test_many_significant_digits to run.
Oct 10, 2023
8200e23
(2/x) Remove .closed() exclusions and replace with explicit skips
Oct 10, 2023
12d081b
Fix linting
Oct 10, 2023
0d5be1c
Fix black linting
Oct 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 5 additions & 6 deletionsCONTRIBUTING.md
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -148,16 +148,15 @@ The suites marked `[not documented]` require additional configuration which will

SQLAlchemy provides reusable tests for testing dialect implementations.

To run these tests, assuming the environment variables needed for e2e tests are set, do the following:

```
cd src/databricks/sqlalchemy
poetry run python -m pytest test/sqlalchemy_dialect_compliance.py --dburi \
poetry shell
cd src/databricks/sqlalchemy/test
python -m pytest test_suite.py --dburi \
"databricks://token:$access_token@$host?http_path=$http_path&catalog=$catalog&schema=$schema"
```

Some of these of these tests fail currently. We're working on getting
relavent tests passingandothers skipped.
Some of these of these tests fail currently. We're working on getting relevant tests passing and others skipped. The tests that we've already reviewed and verified
are decorated with a pytest marker called `reviewed`. To only run these testsandcheck for regressions, you can add `-m reviewed` to the invocation command above.

### Code formatting

Expand Down
9 changes: 7 additions & 2 deletionssrc/databricks/sqlalchemy/__init__.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -13,7 +13,7 @@
from databricks import sql

# This import is required to process our @compiles decorators
import databricks.sqlalchemy.types
import databricks.sqlalchemy._types as dialect_type_impl


from databricks.sqlalchemy.base import (
Expand DownExpand Up@@ -48,6 +48,12 @@ class DatabricksDialect(default.DefaultDialect):
non_native_boolean_check_constraint: bool = False
paramstyle: str = "named"

colspecs = {
sqlalchemy.types.DateTime: dialect_type_impl.DatabricksDateTimeNoTimezoneType,
sqlalchemy.types.Time: dialect_type_impl.DatabricksTimeType,
sqlalchemy.types.String: dialect_type_impl.DatabricksStringType,
}

@classmethod
def dbapi(cls):
return sql
Expand DownExpand Up@@ -130,7 +136,6 @@ def get_columns(self, connection, table_name, schema=None, **kwargs):
columns = []

for col in resp:

# Taken from PyHive. This removes added type info from decimals and maps
_col_type = re.search(r"^\w+", col.TYPE_NAME).group(0)
this_column = {
Expand Down
214 changes: 214 additions & 0 deletionssrc/databricks/sqlalchemy/_types.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
import sqlalchemy
Copy link
ContributorAuthor

@susodapopsusodapopOct 9, 2023
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Note for reviewers: this PR renamedtypes.py_types.py, which makes this file appear brand new. But most of the contents of the file are identical tomain. Reviewing commit-by-commit you'll see the file moved in a dedicated commit. Subsequent changes to the file are linked to un-skipping the tests that those changes enabled to pass.

from sqlalchemy.ext.compiler import compiles

from typing import Union

from datetime import datetime, time


from databricks.sql.utils import ParamEscaper


@compiles(sqlalchemy.types.Enum, "databricks")
@compiles(sqlalchemy.types.String, "databricks")
@compiles(sqlalchemy.types.Text, "databricks")
@compiles(sqlalchemy.types.Time, "databricks")
@compiles(sqlalchemy.types.Unicode, "databricks")
@compiles(sqlalchemy.types.UnicodeText, "databricks")
@compiles(sqlalchemy.types.Uuid, "databricks")
def compile_string_databricks(type_, compiler, **kw):
"""
We override the default compilation for Enum(), String(), Text(), and Time() because SQLAlchemy
defaults to incompatible / abnormal compiled names

Enum -> VARCHAR
String -> VARCHAR[LENGTH]
Text -> VARCHAR[LENGTH]
Time -> TIME
Unicode -> VARCHAR[LENGTH]
UnicodeText -> TEXT
Uuid -> CHAR[32]

But all of these types will be compiled to STRING in Databricks SQL
"""
return "STRING"


@compiles(sqlalchemy.types.Integer, "databricks")
def compile_integer_databricks(type_, compiler, **kw):
"""
We need to override the default Integer compilation rendering because Databricks uses "INT" instead of "INTEGER"
"""
return "INT"


@compiles(sqlalchemy.types.LargeBinary, "databricks")
def compile_binary_databricks(type_, compiler, **kw):
"""
We need to override the default LargeBinary compilation rendering because Databricks uses "BINARY" instead of "BLOB"
"""
return "BINARY"


@compiles(sqlalchemy.types.Numeric, "databricks")
def compile_numeric_databricks(type_, compiler, **kw):
"""
We need to override the default Numeric compilation rendering because Databricks uses "DECIMAL" instead of "NUMERIC"

The built-in visit_DECIMAL behaviour captures the precision and scale. Here we're just mapping calls to compile Numeric
to the SQLAlchemy Decimal() implementation
"""
return compiler.visit_DECIMAL(type_, **kw)


@compiles(sqlalchemy.types.DateTime, "databricks")
def compile_datetime_databricks(type_, compiler, **kw):
"""
We need to override the default DateTime compilation rendering because Databricks uses "TIMESTAMP" instead of "DATETIME"
"""
return "TIMESTAMP_NTZ"


@compiles(sqlalchemy.types.ARRAY, "databricks")
def compile_array_databricks(type_, compiler, **kw):
"""
SQLAlchemy's default ARRAY can't compile as it's only implemented for Postgresql.
The Postgres implementation works for Databricks SQL, so we duplicate that here.

:type_:
This is an instance of sqlalchemy.types.ARRAY which always includes an item_type attribute
which is itself an instance of TypeEngine

https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.ARRAY
"""

inner = compiler.process(type_.item_type, **kw)

return f"ARRAY<{inner}>"


class DatabricksDateTimeNoTimezoneType(sqlalchemy.types.TypeDecorator):
"""The decimal that pysql creates when it receives the contents of a TIMESTAMP_NTZ
includes a timezone of 'Etc/UTC'. But since SQLAlchemy's test suite assumes that
the sqlalchemy.types.DateTime type will return a datetime.datetime _without_ any
timezone set, we need to strip the timezone off the value received from pysql.

It's not clear if DBR sends a timezone to pysql or if pysql is adding it. This could be a bug.
"""

impl = sqlalchemy.types.DateTime

cache_ok = True

def process_result_value(self, value: Union[None, datetime], dialect):
if value is None:
return None
return value.replace(tzinfo=None)


class DatabricksTimeType(sqlalchemy.types.TypeDecorator):
"""Databricks has no native TIME type. So we store it as a string."""

impl = sqlalchemy.types.Time
cache_ok = True

TIME_WITH_MICROSECONDS_FMT = "%H:%M:%S.%f"
TIME_NO_MICROSECONDS_FMT = "%H:%M:%S"

def process_bind_param(self, value: Union[time, None], dialect) -> Union[None, str]:
"""Values sent to the database are converted to %:H:%M:%S strings."""
if value is None:
return None
return value.strftime(self.TIME_WITH_MICROSECONDS_FMT)

# mypy doesn't like this workaround because TypeEngine wants process_literal_param to return a string
def process_literal_param(self, value, dialect) -> time: # type: ignore
"""It's not clear to me why this is necessary. Without it, SQLAlchemy's Timetest:test_literal fails
because the string literal renderer receives a str() object and calls .isoformat() on it.

Whereas this method receives a datetime.time() object which is subsequently passed to that
same renderer. And that works.

UPDATE: After coping with the literal_processor override in DatabricksStringType, I suspect a similar
mechanism is at play. Two different processors are are called in sequence. This is likely a byproduct
of Databricks not having a true TIME type. I think the string representation of Time() types is
somehow affecting the literal rendering process. But as long as this passes the tests, I'm not
worried about it.
"""
return value

def process_result_value(
self, value: Union[None, str], dialect
) -> Union[time, None]:
"""Values received from the database are parsed into datetime.time() objects"""
if value is None:
return None

try:
_parsed = datetime.strptime(value, self.TIME_WITH_MICROSECONDS_FMT)
except ValueError:
# If the string doesn't have microseconds, try parsing it without them
_parsed = datetime.strptime(value, self.TIME_NO_MICROSECONDS_FMT)

return _parsed.time()


class DatabricksStringType(sqlalchemy.types.TypeDecorator):
"""We have to implement our own String() type because SQLAlchemy's default implementation
wants to escape single-quotes with a doubled single-quote. Databricks uses a backslash for
escaping of literal strings. And SQLAlchemy's default escaping breaks Databricks SQL.
"""

impl = sqlalchemy.types.String
cache_ok = True
pe = ParamEscaper()

def process_literal_param(self, value, dialect) -> str:
"""SQLAlchemy's default string escaping for backslashes doesn't work for databricks. The logic here
implements the same logic as our legacy inline escaping logic.
"""

return self.pe.escape_string(value)

def literal_processor(self, dialect):
"""We manually override this method to prevent further processing of the string literal beyond
what happens in the process_literal_param() method.

The SQLAlchemy docs _specifically_ say to not override this method.
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Note for reviewers, I checked a couple other open source dialects and we are not the only dialect that directly implements theliteral_processor method for the String() type. I presume other dialect authors do so for the same reason that we do.


It appears that any processing that happens from TypeEngine.process_literal_param happens _before_
and _in addition to_ whatever the class's impl.literal_processor() method does. The String.literal_processor()
method performs a string replacement that doubles any single-quote in the contained string. This raises a syntax
error in Databricks. And it's not necessary because ParamEscaper() already implements all the escaping we need.

We should consider opening an issue on the SQLAlchemy project to see if I'm using it wrong.

See type_api.py::TypeEngine.literal_processor:

```python
def process(value: Any) -> str:
return fixed_impl_processor(
fixed_process_literal_param(value, dialect)
)
```

That call to fixed_impl_processor wraps the result of fixed_process_literal_param (which is the
process_literal_param defined in our Databricks dialect)

https://docs.sqlalchemy.org/en/20/core/custom_types.html#sqlalchemy.types.TypeDecorator.literal_processor
"""

def process(value):
"""This is a copy of the default String.literal_processor() method but stripping away
its double-escaping behaviour for single-quotes.
"""

_step1 = self.process_literal_param(value, dialect="databricks")
if dialect.identifier_preparer._double_percents:
_step2 = _step1.replace("%", "%%")
else:
_step2 = _step1

return "%s" % _step2

return process
104 changes: 83 additions & 21 deletionssrc/databricks/sqlalchemy/requirements.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,96 @@
"""
This module is supposedly used by the compliance tests to control which tests are run based on database capabilities.
However, based on some experimentation that does not appear to be consistently the case. Until we better understand
when these requirements are and are not implemented, we prefer to manually capture the exact nature of the failures
and errors.

Once we better understand how to use requirements.py, an example exclusion will look like this:

import sqlalchemy.testing.requirements
import sqlalchemy.testing.exclusions

class Requirements(sqlalchemy.testing.requirements.SuiteRequirements):
@property
def __some_example_requirement(self):
return sqlalchemy.testing.exclusions.closed


Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I removed this comment since we now have a better understanding of how to userequirements.py to dictate which features are skipped.

That said, I think our approach going forward should be to userequirements.py only topermit tests to run (i.e. only to include exclusions.open() calls). For the tests that we skip because of limitations of Databricks, I think we should manually call these out intest_suite.py and provide a more precise skip message. Because if the test is skipped because ofrequirements.py, the output from pytest simply says "test not enabled for any dialect" which could be misleading.

benc-db reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I've implemented this fix inb31eef4 and8200e23 👍

The complete list of requirements is provided by SQLAlchemy here:

https://github.com/sqlalchemy/sqlalchemy/blob/main/lib/sqlalchemy/testing/requirements.py

When SQLAlchemy skips a test because a requirement is closed() it gives a generic skip message.
To make these failures more actionable, we only define requirements in this file that we wish to
force to be open(). If a test should be skipped on Databricks, it will be specifically marked skip
in test_suite.py with a Databricks-specific reason.

See the special note about the array_type exclusion below.
"""

import sqlalchemy.testing.requirements
import sqlalchemy.testing.exclusions

import logging

logger = logging.getLogger(__name__)
class Requirements(sqlalchemy.testing.requirements.SuiteRequirements):
@property
def date_historic(self):
"""target dialect supports representation of Python
datetime.datetime() objects with historic (pre 1970) values."""

logger.warning("requirements.py is not currently employed by Databricks dialect")
return sqlalchemy.testing.exclusions.open()

@property
def datetime_historic(self):
"""target dialect supports representation of Python
datetime.datetime() objects with historic (pre 1970) values."""

class Requirements(sqlalchemy.testing.requirements.SuiteRequirements):
pass
return sqlalchemy.testing.exclusions.open()

@property
def datetime_literals(self):
"""target dialect supports rendering of a date, time, or datetime as a
literal string, e.g. via the TypeEngine.literal_processor() method.

"""

return sqlalchemy.testing.exclusions.open()

@property
def timestamp_microseconds(self):
"""target dialect supports representation of Python
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Note for reviewers: the docstrings in this file are duplicated directly from SQLAlchemy's baserequirements.py file. Any additional commentary specific to Databricks appears at the end of the docstring.

datetime.datetime() with microsecond objects but only
if TIMESTAMP is used."""

return sqlalchemy.testing.exclusions.open()

@property
def time_microseconds(self):
"""target dialect supports representation of Python
datetime.time() with microsecond objects.

This requirement declaration isn't needed but I've included it here for completeness.
Since Databricks doesn't have a TIME type, SQLAlchemy will compile Time() columns
as STRING Databricks data types. And we use a custom time type to render those strings
between str() and time.time() representations. Therefore we can store _any_ precision
that SQLAlchemy needs. The time_microseconds requirement defaults to ON for all dialects
except mssql, mysql, mariadb, and oracle.
"""

return sqlalchemy.testing.exclusions.open()

@property
def infinity_floats(self):
"""The Float type can persist and load float('inf'), float('-inf')."""

return sqlalchemy.testing.exclusions.open()

@property
def precision_numerics_retains_significant_digits(self):
"""A precision numeric type will return empty significant digits,
i.e. a value such as 10.000 will come back in Decimal form with
the .000 maintained."""

return sqlalchemy.testing.exclusions.open()

@property
def precision_numerics_many_significant_digits(self):
"""target backend supports values with many digits on both sides,
such as 319438950232418390.273596, 87673.594069654243

"""
return sqlalchemy.testing.exclusions.open()

@property
def array_type(self):
"""While Databricks does support ARRAY types, pysql cannot bind them. So
we cannot use them with SQLAlchemy

Due to a bug in SQLAlchemy, we _must_ define this exclusion as closed() here or else the
test runner will crash the pytest process due to an AttributeError
"""

return sqlalchemy.testing.exclusions.closed()
Loading

[8]ページ先頭

©2009-2025 Movatter.jp