What’s New in SQLAlchemy 2.1?

About this Document

This document describes changes between SQLAlchemy version 2.0 andversion 2.1.

Row now represents individual column types directly withoutTuple

SQLAlchemy 2.0 implemented a broad array ofPEP 484 typing throughoutall components, including a new ability for row-returning statements suchasselect() to maintain track of individual column types, whichwere then passed through the execution phase onto theResultobject and then to the individualRow objects. DescribedatSQL Expression / Statement / Result Set Typing, this approach solved several issueswith statement / row typing, but some remained unsolvable. In 2.1, oneof those issues, that the individual column types needed to be packagedinto atyping.Tuple, is now resolved using newPEP 646 integration,which allows for tuple-like types that are not actually typed asTuple.

In SQLAlchemy 2.0, a statement such as:

stmt=select(column("x",Integer),column("y",String))

Would be typed as:

Select[Tuple[int,str]]

In 2.1, it’s now typed as:

Select[int,str]

When executingstmt, theResult andRowobjects will be typed asResult[int,str] andRow[int,str], respectively.The prior workaround usingRow._t to type as a realTupleis no longer needed and projects can migrate off this pattern.

Mypy users will need to make use ofMypy 1.7 or greater for pep-646integration to be available.

Limitations

Not yet solved by pep-646 or any other pep is the ability for an arbitrarynumber of expressions withinSelect and others to be mapped torow objects, without stating each argument position explicitly within typingannotations. To work around this issue, SQLAlchemy makes use of automated“stub generation” tools to generate hardcoded mappings of different numbers ofpositional arguments to constructs likeselect() to resolve toindividualUnpack[] expressions (in SQLAlchemy 2.0, this generationproducedTuple[] annotations instead). This means that there are arbitrarylimits on how many specific column expressions will be typed within theRow object, without restoring toAny for remainingexpressions; forselect(), it’s currently ten expressions, andfor DML expressions likeinsert() that useInsert.returning(),it’s eight. If and when a new pep that provides aMap operatorto pep-646 is proposed, this limitation can be lifted.[1] Originally, it wasmistakenly assumed that this limitation prevented pep-646 from being usable at all,however, theUnpack construct does in fact replace everything thatwas done usingTuple in 2.0.

An additional limitation for which there is no proposed solution is thatthere’s no way for the name-based attributes onRow to beautomatically typed, so these continue to be typed asAny (e.g.row.xandrow.y for the above example). With current language features,this could only be fixed by having an explicit class-based construct thatallows one to compose an explicitRow with explicit fieldsup front, which would be verbose and not automatic.

[1]

https://github.com/python/typing/discussions/1001#discussioncomment-1897813

#10635

Asyncio “greenlet” dependency no longer installs by default

SQLAlchemy 1.4 and 2.0 used a complex expression to determine if thegreenlet dependency, needed by theasyncioextension, could be installed from pypi using a pre-built wheel insteadof having to build from source. This because the source build ofgreenletis not always trivial on some platforms.

Disadvantages to this approach included that SQLAlchemy needed to trackexactly which versions ofgreenlet were published as wheels on pypi;the setup expression led to problems with some package management toolssuch aspoetry; it was not possible to install SQLAlchemywithoutgreenlet being installed, even though this is completely feasibleif the asyncio extension is not used.

These problems are all solved by keepinggreenlet entirely within the[asyncio] target. The only downside is that users of the asyncio extensionneed to be aware of this extra installation dependency.

#10197

ORM Relationship allows callable for back_populates

To help produce code that is more amenable to IDE-level linting and typechecking, therelationship.back_populates parameter nowaccepts both direct references to a class-bound attribute as well aslambdas which do the same:

classA(Base):__tablename__="a"id:Mapped[int]=mapped_column(primary_key=True)# use a lambda: to link to B.a directly when it existsbs:Mapped[list[B]]=relationship(back_populates=lambda:B.a)classB(Base):__tablename__="b"id:Mapped[int]=mapped_column(primary_key=True)a_id:Mapped[int]=mapped_column(ForeignKey("a.id"))# A.bs already exists, so can link directlya:Mapped[A]=relationship(back_populates=A.bs)

#10050

ORM Mapped Dataclasses no longer populate implicitdefault, collection-baseddefault_factory in__dict__

This behavioral change addresses a widely reported issue with SQLAlchemy’sDeclarative Dataclass Mapping feature that was introduced in 2.0.SQLAlchemy ORM has always featured a behavior where a particular attribute onan ORM mapped class will have different behaviors depending on if it has anactively set value, including if that value isNone, versus if theattribute is not set at all. When Declarative Dataclass Mapping was introduced, themapped_column.default parameter introduced a new capabilitywhich is to set up a dataclass-level default to be present in the generated__init__ method. This had the unfortunate side effect of breaking variouspopular workflows, the most prominent of which is creating an ORM object withthe foreign key value in lieu of a many-to-one reference:

classBase(MappedAsDataclass,DeclarativeBase):passclassParent(Base):__tablename__="parent"id:Mapped[int]=mapped_column(primary_key=True,init=False)related_id:Mapped[int|None]=mapped_column(ForeignKey("child.id"),default=None)related:Mapped[Child|None]=relationship(default=None)classChild(Base):__tablename__="child"id:Mapped[int]=mapped_column(primary_key=True,init=False)

In the above mapping, the__init__ method generated forParentwould in Python code look like this:

def__init__(self,related_id=None,related=None):...

This means that creating a newParent withrelated_id only would populatebothrelated_id andrelated in__dict__:

# 2.0 behavior; will INSERT NULL for related_id due to the presence# of related=None>>>p1=Parent(related_id=5)>>>p1.__dict__{'related_id':5,'related':None,'_sa_instance_state':...}

TheNone value for'related' means that SQLAlchemy favors the non-presentrelatedChild over the present value for'related_id', which would bediscarded, andNULL would be inserted for'related_id' instead.

In the new behavior, the__init__ method instead looks like the example below,using a special constantDONT_SET indicating a non-present value for'related'should be ignored. This allows the class to behave more closely to howSQLAlchemy ORM mapped classes traditionally operate:

def__init__(self,related_id=DONT_SET,related=DONT_SET):...

We then get a__dict__ setup that will follow the expected behavior ofomittingrelated from__dict__ and later running an INSERT withrelated_id=5:

# 2.1 behavior; will INSERT 5 for related_id>>>p1=Parent(related_id=5)>>>p1.__dict__{'related_id':5,'_sa_instance_state':...}

Dataclass defaults are delivered via descriptor instead of __dict__

The above behavior goes a step further, which is that in order tohonor default values that are something other thanNone, the value of thedataclass-level default (i.e. set using any of themapped_column.default,column_property.default, ordeferred.defaultparameters) is directed to be delivered at thePythondescriptor level using mechanisms in SQLAlchemy’s attributesystem that normally returnNone for un-popualted columns, so that even though the default is notpopulated into__dict__, it’s still delivered when the attribute isaccessed. This behavior is based on what Python dataclasses itself doeswhen a default is indicated for a field that also includesinit=False.

In the example below, an immutable default"default_status"is applied to a column calledstatus:

classBase(MappedAsDataclass,DeclarativeBase):passclassSomeObject(Base):__tablename__="parent"id:Mapped[int]=mapped_column(primary_key=True,init=False)status:Mapped[str]=mapped_column(default="default_status")

In the above mapping, constructingSomeObject with no parameters willdeliver no values inside of__dict__, but will deliver the defaultvalue via descriptor:

# object is constructed with no value for ``status``>>>s1=SomeObject()# the default value is not placed in ``__dict__``>>>s1.__dict__{'_sa_instance_state':...}# but the default value is delivered at the object level via descriptor>>>s1.status'default_status'# the value still remains unpopulated in ``__dict__``>>>s1.__dict__{'_sa_instance_state':...}

The value passedasmapped_column.default is also assigned as was thecase before to theColumn.default parameter of theunderlyingColumn, where it takesplace as a Python-level default for INSERT statements. So while__dict__is never populated with the default value on the object, the INSERTstill includes the value in the parameter set. This essentially modifiesthe Declarative Dataclass Mapping system to work more like traditionalORM mapped classes, where a “default” means just that, a column leveldefault.

Dataclass defaults are accessible on objects even without init

As the new behavior makes use of descriptors in a similar way as Pythondataclasses do themselves wheninit=False, the new feature implementsthis behavior as well. This is an all new behavior where an ORM mappedclass can deliver a default value for fields even if they are not part ofthe__init__() method at all. In the mapping below, thestatusfield is configured withinit=False, meaning it’s not part of theconstructor at all:

classBase(MappedAsDataclass,DeclarativeBase):passclassSomeObject(Base):__tablename__="parent"id:Mapped[int]=mapped_column(primary_key=True,init=False)status:Mapped[str]=mapped_column(default="default_status",init=False)

When we constructSomeObject() with no arguments, the default is accessibleon the instance, delivered via descriptor:

>>>so=SomeObject()>>>so.statusdefault_status

default_factory for collection-based relationships internally uses DONT_SET

A late add to the behavioral change brings equivalent behavior to theuse of therelationship.default_factory parameter withcollection-based relationships. This attribute isdocumented <orm_declarative_dc_relationships>as being limited to exactly the collection class that’s stated on the left sideof the annotation, which is now enforced at mapper configuration time:

classParent(Base):__tablename__="parents"id:Mapped[int]=mapped_column(primary_key=True,init=False)name:Mapped[str]children:Mapped[list["Child"]]=relationship(default_factory=list)

With the above mapping, the actualrelationship.default_factory parameter is replaced internallyto instead use the sameDONT_SET constant that’s applied torelationship.default for many-to-one relationships.SQLAlchemy’s existing collection-on-attribute access behavior occurs as alwayson access:

>>>p1=Parent(name="p1")>>>p1.children[]

This change torelationship.default_factory accommodates asimilar merge-based condition where an empty collection would be forced intoa new object that in fact wants a merged collection to arrive.

Related Changes

This change includes the following API changes:

  • Therelationship.default parameter, when present, onlyaccepts a value ofNone, and is only accepted when the relationship isultimately a many-to-one relationship or one that establishesrelationship.uselist asFalse.

  • Themapped_column.default andmapped_column.insert_defaultparameters are mutually exclusive, and only one may be passed at a time.The behavior of the two parameters is equivalent at theColumnlevel, however at the Declarative Dataclass Mapping level, onlymapped_column.default actually sets the dataclass-leveldefault with descriptor access; usingmapped_column.insert_defaultwill have the effect of the object attribute defaulting toNone on theinstance until the INSERT takes place, in the same way it works on traditionalORM mapped classes.

#12168

URL stringify and parse now supports URL escaping for the “database” portion

A URL that includes URL-escaped characters in the database portion willnow parse with conversion of those escaped characters:

>>>fromsqlalchemyimportmake_url>>>u=make_url("driver://user:pass@host/database%3Fname")>>>u.database'database?name'

Previously, such characters would not be unescaped:

>>># pre-2.1 behavior>>>fromsqlalchemyimportmake_url>>>u=make_url("driver://user:pass@host/database%3Fname")>>>u.database'database%3Fname'

This change also applies to the stringify side; most special characters inthe database name will be URL escaped, omitting a few such as plus signs andslashes:

>>>fromsqlalchemyimportURL>>>u=URL.create("driver",database="a?b=c")>>>str(u)'driver:///a%3Fb%3Dc'

Where the above URL correctly round-trips to itself:

>>>make_url(str(u))driver:///a%3Fb%3Dc>>>make_url(str(u)).database==u.databaseTrue

Whereas previously, special characters applied programmatically would notbe escaped in the result, leading to a URL that does not represent theoriginal database portion. Below,b=c is part of the query string andnot the database portion:

>>>fromsqlalchemyimportURL>>>u=URL.create("driver",database="a?b=c")>>>str(u)'driver:///a?b=c'

#11234

Potential breaking change to odbc_connect= handling for mssql+pyodbc

Fixed a mssql+pyodbc issue where valid plus signs in an already-unquotedodbc_connect= (raw DBAPI) connection string were replaced with spaces.

Previously, the pyodbc connector would always pass the odbc_connect valueto unquote_plus(), even if it was not required. So, if the (unquoted)odbc_connect value containedPWD=pass+word that would get changed toPWD=password, and the login would fail. One workaround was to quotejust the plus sign —PWD=pass%2Bword — which would then get unquotedtoPWD=pass+word.

Implementations using the above workaround withURL.create()to specify a plus sign in thePWD= argument of an odbc_connect stringwill have to remove the workaround and just pass thePWD= value as itwould appear in a valid ODBC connection string (i.e., the same as would berequired if using the connection string directly withpyodbc.connect()).

#11250

Addition ofBitString subclass for handling postgresqlBIT columns

Values ofBIT columns in the PostgreSQL dialect arereturned as instances of a newstr subclass,BitString. Previously, the value ofBITcolumns was driver dependent, with most drivers returningstr instancesexceptasyncpg, which usedasyncpg.BitString.

With this change, for thepsycopg,psycopg2, andpg8000 drivers,the newBitString type is mostly compatible withstr, butadds methods for bit manipulation and supports bitwise operators.

AsBitString is a string subclass, hashability as wellas equality tests continue to work against plain strings. This also leavesordering operators intact.

For implementations using theasyncpg driver, the new type is incompatible withthe existingasyncpg.BitString type.

#10556