SQLAlchemy 1.4 Documentation
Changes and Migration
- What’s New in SQLAlchemy 1.4?¶
- Major API changes and features - General
- Python 3.6 is the minimum Python 3 version; Python 2.7 still supported
- ORM Query is internally unified with select, update, delete; 2.0 style execution available
- ORM
Session.execute()
uses “future” styleResult
sets in all cases - Transparent SQL Compilation Caching added to All DQL, DML Statements in Core, ORM
- Declarative is now integrated into the ORM with new features
- Python Dataclasses, attrs Supported w/ Declarative, Imperative Mappings
- Asynchronous IO Support for Core and ORM
- Many Core and ORM statement objects now perform much of their construction and validation in the compile phase
- Repaired internal importing conventions such that code linters may work correctly
- Support for SQL Regular Expression operators
- SQLAlchemy 2.0 Deprecations Mode
- API and Behavioral Changes - Core
- A SELECT statement is no longer implicitly considered to be a FROM clause
- select().join() and outerjoin() add JOIN criteria to the current query, rather than creating a subquery
- The URL object is now immutable
- select(), case() now accept positional expressions
- All IN expressions render parameters for each value in the list on the fly (e.g. expanding parameters)
- Built-in FROM linting will warn for any potential cartesian products in a SELECT statement
- New Result object
- RowProxy is no longer a “proxy”; is now called Row and behaves like an enhanced named tuple
- SELECT objects and derived FROM clauses allow for duplicate columns and column labels
- Improved column labeling for simple column expressions using CAST or similar
- New “post compile” bound parameters used for LIMIT/OFFSET in Oracle, SQL Server
- Connection-level transactions can now be inactive based on subtransaction
- Enum and Boolean datatypes no longer default to “create constraint”
- New Features - ORM
- Behavioral Changes - ORM
- The “KeyedTuple” object returned by Query is replaced by Row
- Session features new “autobegin” behavior
- Viewonly relationships don’t synchronize backrefs
- cascade_backrefs behavior deprecated for removal in 2.0
- Eager loaders emit during unexpire operations
- Column loaders such as
deferred()
,with_expression()
only take effect when indicated on the outermost, full entity query - Accessing an uninitialized collection attribute on a transient object no longer mutates __dict__
- The “New instance conflicts with existing identity” error is now a warning
- Persistence-related cascade operations disallowed with viewonly=True
- Stricter behavior when querying inheritance mappings using custom queries
- Dialect Changes
- pg8000 minimum version is 1.16.6, supports Python 3 only
- psycopg2 version 2.7 or higher is required for the PostgreSQL psycopg2 dialect
- psycopg2 dialect no longer has limitations regarding bound parameter names
- psycopg2 dialect features “execute_values” with RETURNING for INSERT statements by default
- Removed “join rewriting” logic from SQLite dialect; updated imports
- Added Sequence support for MariaDB 10.3
- Added Sequence support distinct from IDENTITY to SQL Server
- Major API changes and features - General
- Migrating to SQLAlchemy 2.0
- 1.4 Changelog
- 1.3 Changelog
- 1.2 Changelog
- 1.1 Changelog
- 1.0 Changelog
- 0.9 Changelog
- 0.8 Changelog
- 0.7 Changelog
- 0.6 Changelog
- 0.5 Changelog
- 0.4 Changelog
- 0.3 Changelog
- 0.2 Changelog
- 0.1 Changelog
- What’s New in SQLAlchemy 1.3?
- What’s New in SQLAlchemy 1.2?
- What’s New in SQLAlchemy 1.1?
- What’s New in SQLAlchemy 1.0?
- What’s New in SQLAlchemy 0.9?
- What’s New in SQLAlchemy 0.8?
- What’s New in SQLAlchemy 0.7?
- What’s New in SQLAlchemy 0.6?
- What’s new in SQLAlchemy 0.5?
- What’s new in SQLAlchemy 0.4?
Project Versions
- Previous:Changes and Migration
- Next:Migrating to SQLAlchemy 2.0
- Up:Home
- On this page:
- What’s New in SQLAlchemy 1.4?
- Major API changes and features - General
- Python 3.6 is the minimum Python 3 version; Python 2.7 still supported
- ORM Query is internally unified with select, update, delete; 2.0 style execution available
- ORM
Session.execute()
uses “future” styleResult
sets in all cases - Transparent SQL Compilation Caching added to All DQL, DML Statements in Core, ORM
- Declarative is now integrated into the ORM with new features
- Python Dataclasses, attrs Supported w/ Declarative, Imperative Mappings
- Asynchronous IO Support for Core and ORM
- Many Core and ORM statement objects now perform much of their construction and validation in the compile phase
- Repaired internal importing conventions such that code linters may work correctly
- Support for SQL Regular Expression operators
- SQLAlchemy 2.0 Deprecations Mode
- API and Behavioral Changes - Core
- A SELECT statement is no longer implicitly considered to be a FROM clause
- select().join() and outerjoin() add JOIN criteria to the current query, rather than creating a subquery
- The URL object is now immutable
- select(), case() now accept positional expressions
- All IN expressions render parameters for each value in the list on the fly (e.g. expanding parameters)
- Built-in FROM linting will warn for any potential cartesian products in a SELECT statement
- New Result object
- RowProxy is no longer a “proxy”; is now called Row and behaves like an enhanced named tuple
- SELECT objects and derived FROM clauses allow for duplicate columns and column labels
- Improved column labeling for simple column expressions using CAST or similar
- New “post compile” bound parameters used for LIMIT/OFFSET in Oracle, SQL Server
- Connection-level transactions can now be inactive based on subtransaction
- Enum and Boolean datatypes no longer default to “create constraint”
- New Features - ORM
- Behavioral Changes - ORM
- The “KeyedTuple” object returned by Query is replaced by Row
- Session features new “autobegin” behavior
- Viewonly relationships don’t synchronize backrefs
- cascade_backrefs behavior deprecated for removal in 2.0
- Eager loaders emit during unexpire operations
- Column loaders such as
deferred()
,with_expression()
only take effect when indicated on the outermost, full entity query - Accessing an uninitialized collection attribute on a transient object no longer mutates __dict__
- The “New instance conflicts with existing identity” error is now a warning
- Persistence-related cascade operations disallowed with viewonly=True
- Stricter behavior when querying inheritance mappings using custom queries
- Dialect Changes
- pg8000 minimum version is 1.16.6, supports Python 3 only
- psycopg2 version 2.7 or higher is required for the PostgreSQL psycopg2 dialect
- psycopg2 dialect no longer has limitations regarding bound parameter names
- psycopg2 dialect features “execute_values” with RETURNING for INSERT statements by default
- Removed “join rewriting” logic from SQLite dialect; updated imports
- Added Sequence support for MariaDB 10.3
- Added Sequence support distinct from IDENTITY to SQL Server
- Major API changes and features - General
What’s New in SQLAlchemy 1.4?¶
About this Document
This document describes changes between SQLAlchemy version 1.3and SQLAlchemy version 1.4.
Version 1.4 is taking on a different focus than other SQLAlchemy releasesin that it is in many ways attempting to serve as a potential migrationpoint for a more dramatic series of API changes currently planned forrelease 2.0 of SQLAlchemy. The focus of SQLAlchemy 2.0 is a modernizedand slimmed down API that removes lots of usage patterns that have longbeen discouraged, as well as mainstreams the best ideas in SQLAlchemy asfirst class API features, with the goal being that there is much lessambiguity in how the API is to be used, as well as that a series ofimplicit behaviors and rarely-used API flags that complicate the internalsand hinder performance will be removed.
For the current status of SQLAlchemy 2.0, seeMigrating to SQLAlchemy 2.0.
Major API changes and features - General¶
Python 3.6 is the minimum Python 3 version; Python 2.7 still supported¶
As Python 3.5 reached EOL in September of 2020, SQLAlchemy 1.4 now placesversion 3.6 as the minimum Python 3 version. Python 2.7 is still supported,however the SQLAlchemy 1.4 series will be the last series to support Python 2.
ORM Query is internally unified with select, update, delete; 2.0 style execution available¶
The biggest conceptual change to SQLAlchemy for version 2.0 and essentiallyin 1.4 as well is that the great separation between theSelect
construct in Core and theQuery
object in the ORM has been removed,as well as between theQuery.update()
andQuery.delete()
methods in how they relate toUpdate
andDelete
.
With regards toSelect
andQuery
, these two objectshave for many versions had similar, largely overlapping APIs and even someability to change between one and the other, while remaining very different intheir usage patterns and behaviors. The historical background for this wasthat theQuery
object was introduced to overcome shortcomings intheSelect
object which used to be at the core of how ORM objectswere queried, except that they had to be queried in terms ofTable
metadata only. HoweverQuery
had only asimplistic interface for loading objects, and only over the course of manymajor releases did it eventually gain most of the flexibility of theSelect
object, which then led to the ongoing awkwardness thatthese two objects became highly similar yet still largely incompatible witheach other.
In version 1.4, all Core and ORM SELECT statements are rendered from aSelect
object directly; when theQuery
objectis used, at statement invocation time it copies its state to aSelect
which is then invoked internally using2.0 style execution. Going forward,theQuery
object will become legacy only, and applications willbe encouraged to move to2.0 style execution which allows Core constructsto be used freely against ORM entities:
withSession(engine,future=True)assess:stmt=(select(User).where(User.name=="sandy").join(User.addresses).where(Address.email_address.like("%gmail%")))result=sess.execute(stmt)foruserinresult.scalars():print(user)
Things to note about the above example:
The
Session
andsessionmaker
objects now featurefull context manager (i.e. thewith:
statement) capability;see the revised documentation atOpening and Closing a Session for an example.Within the 1.4 series, all2.0 style ORM invocation uses a
Session
that includes theSession.future
flag set toTrue
; this flag indicates theSession
shouldhave 2.0-style behaviors, which include that ORM queries can be invokedfromexecute
as well as some changes in transactionalfeatures. In version 2.0 this flag will always beTrue
.The
select()
construct no longer needs brackets around thecolumns clause; seeselect(), case() now accept positional expressions for background on this improvement.The
select()
/Select
object has aSelect.join()
method that acts like that of theQuery
and even accommodatesan ORM relationship attribute (without breaking the separation betweenCore and ORM!) - seeselect().join() and outerjoin() add JOIN criteria to the current query, rather than creating a subquery for background on this.Statements that work with ORM entities and are expected to return ORMresults are invoked using
Session.execute()
. SeeQuerying (2.0 style) for a primer. See also the following noteatORM Session.execute() uses “future” style Result sets in all cases.a
Result
object is returned, rather than a plain list, whichitself is a much more sophisticated version of the previousResultProxy
object; this object is now used both for Core and ORM results. SeeNew Result object,RowProxy is no longer a “proxy”; is now called Row and behaves like an enhanced named tuple, andThe “KeyedTuple” object returned by Query is replaced by Row for information on this.
Throughout SQLAlchemy’s documentation, there will be many references to1.x style and2.0 style execution. This is to distinguishbetween the two querying styles and to attempt to forwards-document the newcalling style going forward. In SQLAlchemy 2.0, while theQuery
object may remain as a legacy construct, it will no longer be featured inmost documentation.
Similar adjustments have been made to “bulk updates and deletes” such thatCoreupdate()
anddelete()
can be used for bulkoperations. A bulk update like the following:
session.query(User).filter(User.name=="sandy").update({"password":"foobar"},synchronize_session="fetch")
can now be achieved in2.0 style (and indeed the above runs internallyin this way) as follows:
withSession(engine,future=True)assess:stmt=(update(User).where(User.name=="sandy").values(password="foobar").execution_options(synchronize_session="fetch"))sess.execute(stmt)
Note the use of theExecutable.execution_options()
method to passORM-related options. The use of “execution options” is now much more prevalentwithin both Core and ORM, and many ORM-related methods fromQuery
are now implemented as execution options (seeQuery.execution_options()
for some examples).
See also
ORMSession.execute()
uses “future” styleResult
sets in all cases¶
As noted inRowProxy is no longer a “proxy”; is now called Row and behaves like an enhanced named tuple, theResult
andRow
objects now feature “named tuple” behavior, when used withanEngine
that includes thecreate_engine.future
parameter set toTrue
. These“named tuple” rows in particular include a behavioral change which is thatPython containment expressions usingin
, such as:
>>>engine=create_engine("...",future=True)>>>conn=engine.connect()>>>row=conn.execute.first()>>>"name"inrowTrue
The above containment test willusevalue containment, notkey containment; therow
would need tohave avalue of “name” to returnTrue
.
Under SQLAlchemy 1.4, whencreate_engine.future
parameter settoFalse
, legacy-styleLegacyRow
objects are returned which feature thepartial-named-tuple behavior of prior SQLAlchemy versions, where containmentchecks continue to use key containment;"name"inrow
would returnTrue if the row had acolumn named “name”, rather than a value.
When usingSession.execute()
, full named-tuple style is enabledunconditionally, meaning"name"inrow
will usevalue containmentas the test, andnot key containment. This is to accommodate thatSession.execute()
now returns aResult
that alsoaccommodates for ORM results, where even legacy ORM result rows such as thosereturned byQuery.all()
use value containment.
This is a behavioral change from SQLAlchemy 1.3 to 1.4. To continue receivingkey-containment collections, use theResult.mappings()
method toreceive aMappingResult
that returns rows as dictionaries:
fordict_rowinsession.execute(text("select id from table")).mappings():assert"id"indict_row
Transparent SQL Compilation Caching added to All DQL, DML Statements in Core, ORM¶
One of the most broadly encompassing changes to ever land in a singleSQLAlchemy version, a many-month reorganization and refactoring of all queryingsystems from the base of Core all the way through ORM now allows themajority of Python computation involved producing SQL strings and relatedstatement metadata from a user-constructed statement to be cached in memory,such that subsequent invocations of an identical statement construct will use35-60% fewer CPU resources.
This caching goes beyond the construction of the SQL string to also include theconstruction of result fetching structures that link the SQL construct to theresult set, and in the ORM it includes the accommodation of ORM-enabledattribute loaders, relationship eager loaders and other options, and objectconstruction routines that must be built up each time an ORM query seeks to runand construct ORM objects from result sets.
To introduce the general idea of the feature, given code from thePerformance suite as follows, which will invokea very simple query “n” times, for a default value of n=10000. Thequery returns only a single row, as the overhead we are looking to decreaseis that ofmany small queries. The optimization is not as significantfor queries that return many rows:
session=Session(bind=engine)forid_inrandom.sample(ids,n):result=session.query(Customer).filter(Customer.id==id_).one()
This example in the 1.3 release of SQLAlchemy on a Dell XPS13 running Linuxcompletes as follows:
test_orm_query:(10000iterations);totaltime3.440652sec
In 1.4, the code above without modification completes:
test_orm_query:(10000iterations);totaltime2.367934sec
This first test indicates that regular ORM queries when using caching can runover many iterations in the range of30% faster.
A second variant of the feature is the optional use of Python lambdas to deferthe construction of the query itself. This is a more sophisticated variant ofthe approach used by the “Baked Query” extension, which was introduced inversion 1.0.0. The “lambda” feature may be used in a style very similar tothat of baked queries, except that it is available in an ad-hoc way for any SQLconstruct. It additionally includes the ability to scan each invocation of thelambda for bound literal values that change on every invocation, as well aschanges to other constructs, such as querying from a different entity or columneach time, while still not having to run the actual code each time.
Using this API looks as follows:
session=Session(bind=engine)forid_inrandom.sample(ids,n):stmt=lambda_stmt(lambda:future_select(Customer))stmt+=lambdas:s.where(Customer.id==id_)session.execute(stmt).scalar_one()
The code above completes:
test_orm_query_newstyle_w_lambdas:(10000iterations);totaltime1.247092sec
This test indicates that using the newer “select()” style of ORM querying,in conjunction with a full “baked” style invocation that caches the entireconstruction, can run over many iterations in the range of60% faster andgrants performance about the same as the baked query system which is now supersededby the native caching system.
The new system makes use of the existingConnection.execution_options.compiled_cache
executionoption and also adds a cache to theEngine
directly, which isconfigured using theEngine.query_cache_size
parameter.
A significant portion of API and behavioral changes throughout 1.4 weredriven in order to support this new feature.
See also
Declarative is now integrated into the ORM with new features¶
After ten years or so of popularity, thesqlalchemy.ext.declarative
package is now integrated into thesqlalchemy.orm
namespace, with theexception of the declarative “extension” classes which remain as Declarativeextensions.
The new classes added tosqlalchemy.orm
include:
registry
- a new class that supersedes the role of the“declarative base” class, serving as a registry of mapped classes whichcan be referenced via string name withinrelationship()
callsand is agnostic of the style in which any particular class was mapped.declarative_base()
- this is the same declarative base class thathas been in use throughout the span of the declarative system, except it nowreferences aregistry
object internally and is implementedby theregistry.generate_base()
method which can be invokedfrom aregistry
directly. Thedeclarative_base()
function creates this registry automatically so there is no impact onexisting code. Thesqlalchemy.ext.declarative.declarative_base
nameis still present, emitting a 2.0 deprecation warning when2.0 deprecations mode is enabled.declared_attr()
- the same “declared attr” function call nowpart ofsqlalchemy.orm
. Thesqlalchemy.ext.declarative.declared_attr
name is still present, emitting a 2.0 deprecation warning when2.0 deprecations mode is enabled.Other names moved into
sqlalchemy.orm
includehas_inherited_table()
,synonym_for()
,DeclarativeMeta
,as_declarative()
.
In addition, Theinstrument_declarative()
function isdeprecated, superseded byregistry.map_declaratively()
. TheConcreteBase
,AbstractConcreteBase
,andDeferredReflection
classes remain as extensions in theDeclarative Extensions package.
Mapping styles have now been organized such that they all extend fromtheregistry
object, and fall into these categories:
- Declarative Mapping
- Using
declarative_base()
Base class w/ metaclass
- Using
- Using
registry.mapped()
Declarative Decorator Declarative Table
- Imperative Table (Hybrid)
- Using
The existing classical mapping functionmapper()
remains, howeverit is deprecated to call uponmapper()
directly; the newregistry.map_imperatively()
method now routes the request throughthesqlalchemy.orm.registry()
so that it integrates with other declarative mappingsunambiguously.
The new approach interoperates with 3rd party class instrumentation systemswhich necessarily must take place on the class before the mapping processdoes, allowing declarative mapping to work via a decorator instead of adeclarative base so that packages likedataclasses andattrs can beused with declarative mappings, in addition to working with classicalmappings.
Declarative documentation has now been fully integrated into the ORM mapperconfiguration documentation and includes examples for all styles of mappingsorganized into one place. See the sectionORM Mapped Class Overview for the start of the newly reorganizeddocumentation.
Python Dataclasses, attrs Supported w/ Declarative, Imperative Mappings¶
Along with the new declarative decorator styles introduced inDeclarative is now integrated into the ORM with new features,theMapper
is now explicitly aware of the Pythondataclasses
module and will recognize attributes that are configured in this way, andproceed to map them without skipping them as was the case previously. In thecase of theattrs
module,attrs
already removes its own attributesfrom the class so was already compatible with SQLAlchemy classical mappings.With the addition of theregistry.mapped()
decorator, bothattribute systems can now interoperate with Declarative mappings as well.
Asynchronous IO Support for Core and ORM¶
SQLAlchemy now supports Pythonasyncio
-compatible database drivers using anall-new asyncio front-end interface toConnection
for Coreusage as well asSession
for ORM use, using theAsyncConnection
andAsyncSession
objects.
Note
The new asyncio feature should be consideredalpha level forthe initial releases of SQLAlchemy 1.4. This is super new stuff that usessome previously unfamiliar programming techniques.
The initial database API supported is theasyncpgasyncio driver for PostgreSQL.
The internal features of SQLAlchemy are fully integrated by making use ofthegreenlet library in orderto adapt the flow of execution within SQLAlchemy’s internals to propagateasyncioawait
keywords outwards from the database driver to the end-userAPI, which featuresasync
methods. Using this approach, the asyncpgdriver is fully operational within SQLAlchemy’s own test suite and featurescompatibility with most psycopg2 features. The approach was vetted andimproved upon by developers of the greenlet project for which SQLAlchemyis appreciative.
greenlets are good
Don’t confuse thegreenlet library with event-based IO libraries that buildon top of it such asgevent
andeventlet
; while the use of theselibraries with SQLAlchemy is common, SQLAlchemy’s asyncio integrationdoes not make use of these event based systems in any way. The asyncioAPI integrates with the user-provided event loop, typically Python’s ownasyncio event loop, without the use of additional threads or event systems.The approach involves a single greenlet context switch perawait
call,and the extension which makes it possible is less than 20 lines of code.
The user facingasync
API itself is focused around IO-oriented methods suchasAsyncEngine.connect()
andAsyncConnection.execute()
. The new Core constructs strictlysupport2.0 style usage only; which means all statements must beinvoked given a connection object, in this caseAsyncConnection
.
Within the ORM,2.0 style query execution issupported, usingselect()
constructs in conjunction withAsyncSession.execute()
; the legacyQuery
object itself is not supported by theAsyncSession
class.
ORM features such as lazy loading of related attributes as well as unexpiry ofexpired attributes are by definition disallowed in the traditional asyncioprogramming model, as they indicate IO operations that would run implicitlywithin the scope of a Pythongetattr()
operation. To overcome this, thetraditional asyncio application should make judicious use ofeagerloading techniques as well as forego the use of featuressuch asexpire on commit so that such loads are notneeded.
For the asyncio application developer whochooses to break withtradition, the new API provides astrictly optionalfeature such that applications that wish to make use of such ORM featurescan opt to organize database-related code into functions which can then berun within greenlets using theAsyncSession.run_sync()
method. See thegreenlet_orm.py
example atAsyncio Integrationfor a demonstration.
Support for asynchronous cursors is also provided using new methodsAsyncConnection.stream()
andAsyncSession.stream()
, which support a newAsyncResult
object that itself provides awaitableversions of common methods likeAsyncResult.all()
andAsyncResult.fetchmany()
. Both Core and ORM are integratedwith the feature which corresponds to the use of “server side cursors”in traditional SQLAlchemy.
Many Core and ORM statement objects now perform much of their construction and validation in the compile phase¶
A major initiative in the 1.4 series is to approach the model of both Core SQLstatements as well as the ORM Query to allow for an efficient, cacheable modelof statement creation and compilation, where the compilation step would becached, based on a cache key generated by the created statement object, whichitself is newly created for each use. Towards this goal, much of the Pythoncomputation which occurs within the construction of statements, particularlythat of the ORMQuery
as well as theselect()
construct when used to invoke ORM queries, is being moved to occur withinthe compilation phase of the statement which only occurs after the statementhas been invoked, and only if the statement’s compiled form was not yetcached.
From an end-user perspective, this means that some of the error messages whichcan arise based on arguments passed to the object will no longer be raisedimmediately, and instead will occur only when the statement is invoked forthe first time. These conditions are always structural and not data driven,so there is no risk of such a condition being missed due to a cached statement.
Error conditions which fall under this category include:
when a
_selectable.CompoundSelect
is constructed (e.g. a UNION, EXCEPT, etc.)and the SELECT statements passed do not have the same number of columns, aCompileError
is now raised to this effect; previously, anArgumentError
would be raised immediately upon statementconstruction.Various error conditions which may arise when calling upon
Query.join()
will be evaluated at statement compilation time rather than when the methodis first called.
Other things that may change involve theQuery
object directly:
Behaviors may be slightly different when calling upon the
Query.statement
accessor. TheSelect
objectreturned is now a direct copy of the same state that was present in theQuery
, without any ORM-specific compilation being performed(which means it’s dramatically faster). However, theSelect
will not have the same internal state as it had in 1.3, including things likethe FROM clauses being explicitly spelled out if they were not explicitlystated in theQuery
. This means code that relies uponmanipulating thisSelect
statement such as calling methods likeSelect.with_only_columns()
may need to accommodate for the FROMclause.
Repaired internal importing conventions such that code linters may work correctly¶
SQLAlchemy has for a long time used a parameter-injecting decorator to help resolvemutually-dependent module imports, like this:
@util.dependency_for("sqlalchemy.sql.dml")definsert(self,dml,*args,**kw):
Where the above function would be rewritten to no longer have thedml
parameteron the outside. This would confuse code-linting tools into seeing a missing parameterto functions. A new approach has been implemented internally such that the function’ssignature is no longer modified and the module object is procured inside the functioninstead.
Support for SQL Regular Expression operators¶
A long awaited feature to add rudimentary support for database regularexpression operators, to complement theColumnOperators.like()
andColumnOperators.match()
suites of operations. The new featuresincludeColumnOperators.regexp_match()
implementing a regularexpression match like function, andColumnOperators.regexp_replace()
implementing a regular expression string replace function.
Supported backends include SQLite, PostgreSQL, MySQL / MariaDB, and Oracle.The SQLite backend only supports “regexp_match” but not “regexp_replace”.
The regular expression syntaxes and flags arenot backend agnostic.A future feature will allow multiple regular expression syntaxes to bespecified at once to switch between different backends on the fly.
For SQLite, Python’sre.search()
function with no additional argumentsis established as the implementation.
See also
ColumnOperators.regexp_match()
ColumnOperators.regexp_replace()
Regular Expression Support - SQLite implementation notes
SQLAlchemy 2.0 Deprecations Mode¶
One of the primary goals of the 1.4 release is to provide a “transitional”release so that applications may migrate to SQLAlchemy 2.0 gradually. Towardsthis end, a primary feature in release 1.4 is “2.0 deprecations mode”, which isa series of deprecation warnings that emit against every detectable API patternwhich will work differently in version 2.0. The warnings all make use of theRemovedIn20Warning
class. As these warnings affect foundationalpatterns including theselect()
andEngine
constructs, evensimple applications can generate a lot of warnings until appropriate APIchanges are made. The warning mode is therefore turned off by default untilthe developer enables the environment variableSQLALCHEMY_WARN_20=1
.
For a full walkthrough of using 2.0 Deprecations mode, seeMigration to 2.0 Step Two - Turn on RemovedIn20Warnings.
API and Behavioral Changes - Core¶
A SELECT statement is no longer implicitly considered to be a FROM clause¶
This change is one of the larger conceptual changes in SQLAlchemy in many years,however it is hoped that the end user impact is relatively small, as the changemore closely matches what databases like MySQL and PostgreSQL require in any case.
The most immediate noticeable impact is that aselect()
can no longerbe embedded inside of anotherselect()
directly, without explicitlyturning the innerselect()
into a subquery first. This is historicallyperformed by using theSelectBase.alias()
method, which remains, howeveris more explicitly suited by using a new methodSelectBase.subquery()
;both methods do the same thing. The object returned is nowSubquery
,which is very similar to theAlias
object and shares a commonbaseAliasedReturnsRows
.
That is, this will now raise:
stmt1=select(user.c.id,user.c.name)stmt2=select(addresses,stmt1).select_from(addresses.join(stmt1))
Raising:
sqlalchemy.exc.ArgumentError:ColumnexpressionorFROMclauseexpected,got<...Selectobject...>.TocreateaFROMclausefroma<class'sqlalchemy.sql.selectable.Select'> object, use the .subquery() method.
The correct calling form is instead (noting also thatbrackets are nolonger required for select()):
sq1=select(user.c.id,user.c.name).subquery()stmt2=select(addresses,sq1).select_from(addresses.join(sq1))
Noting above that theSelectBase.subquery()
method is essentiallyequivalent to using theSelectBase.alias()
method.
The rationale for this change is based on the following:
In order to support the unification of
Select
withQuery
, theSelect
object needs to haveSelect.join()
andSelect.outerjoin()
methods thatactually add JOIN criteria to the existing FROM clause, as is what users havealways expected it to do in any case. The previous behavior, having toalign with what aFromClause
would do, was that it would generatean unnamed subquery and then JOIN to it, which was a completely uselessfeature that only confused those users unfortunate enough to try this. Thischange is discussed atselect().join() and outerjoin() add JOIN criteria to the current query, rather than creating a subquery.The behavior of including a SELECT in the FROM clause of another SELECTwithout first creating an alias or subquery would be that it creates anunnamed subquery. While standard SQL does support this syntax, in practiceit is rejected by most databases. For example, both the MySQL and PostgreSQLoutright reject the usage of unnamed subqueries:
# MySQL / MariaDB:MariaDB[(none)]>select*from(select1);ERROR1248(42000):Everyderivedtablemusthaveitsownalias# PostgreSQL:test=>select*from(select1);ERROR:subqueryinFROMmusthaveanaliasLINE1:select*from(select1);^HINT:Forexample,FROM(SELECT...)[AS]foo.
A database like SQLite accepts them, however it is still often the case thatthe names produced from such a subquery are too ambiguous to be useful:
sqlite>CREATETABLEa(idinteger);sqlite>CREATETABLEb(idinteger);sqlite>SELECT*FROMaJOIN(SELECT*FROMb)ONa.id=id;Error:ambiguouscolumnname:idsqlite>SELECT*FROMaJOIN(SELECT*FROMb)ONa.id=b.id;Error:nosuchcolumn:b.id# use a namesqlite>SELECT*FROMaJOIN(SELECT*FROMb)ASanon_1ONa.id=anon_1.id;
AsSelectBase
objects are no longerFromClause
objects, attributes like the.c
attributeas well as methods like.select()
is now deprecated, as they imply implicitproduction of a subquery. The.join()
and.outerjoin()
methods are nowrepurposed to append JOIN criteria to the existing query in a similarway as that ofQuery.join()
, which is what users have alwaysexpected these methods to do in any case.
In place of the.c
attribute, a new attributeSelectBase.selected_columns
is added. This attribute resolves to a column collection that is what mostpeople hope that.c
does (but does not), which is to reference the columnsthat are in the columns clause of the SELECT statement. A common beginner mistakeis code such as the following:
stmt=select(users)stmt=stmt.where(stmt.c.name=="foo")
The above code appears intuitive and that it would generate“SELECT * FROM users WHERE name=’foo’”, however veteran SQLAlchemy users willrecognize that it in fact generates a useless subquery resembling“SELECT * FROM (SELECT * FROM users) WHERE name=’foo’”.
The newSelectBase.selected_columns
attribute howeverdoes suitthe use case above, as in a case like the above it links directly to the columnspresent in theusers.c
collection:
stmt=select(users)stmt=stmt.where(stmt.selected_columns.name=="foo")
select().join() and outerjoin() add JOIN criteria to the current query, rather than creating a subquery¶
Towards the goal of unifyingQuery
andSelect
,particularly for2.0 style use ofSelect
, it was criticalthat there be a workingSelect.join()
method that behaves like theQuery.join()
method, adding additional entries to the FROM clause ofthe existing SELECT and then returning the newSelect
object forfurther modification, instead of wrapping the object inside of an unnamedsubquery and returning a JOIN from that subquery, a behavior that has alwaysbeen virtually useless and completely misleading to users.
To allow this to be the case,A SELECT statement is no longer implicitly considered to be a FROM clause was first implemented whichsplits offSelect
from having to be aFromClause
;this removed the requirement thatSelect.join()
would need toreturn aJoin
object rather than a new version of thatSelect
object that includes a new JOIN in its FROM clause.
From that point on, as theSelect.join()
andSelect.outerjoin()
did have an existing behavior, the original plan was that thesemethods would be deprecated, and the new “useful” version ofthe methods would be available on an alternate, “future”Select
object available as a separate import.
However, after some time working with this particular codebase, it was decidedthat having two different kinds ofSelect
objects floatingaround, each with 95% the same behavior except for some subtle differencein how some of the methods behave was going to be more misleading and inconvenientthan simply making a hard change in how these two methods behave, giventhat the existing behavior ofSelect.join()
andSelect.outerjoin()
is essentially never used and only causes confusion.
So it was decided, given how very useless the current behavior is, and howextremely useful and important and useful the new behavior would be, to make ahard behavioral change in this one area, rather than waiting another yearand having a more awkward API in the interim. SQLAlchemy developers do nottake it lightly to make a completely breaking change like this, however this isa very special case and it is extremely unlikely that the previousimplementation of these methods was being used; as noted inA SELECT statement is no longer implicitly considered to be a FROM clause, major databases such as MySQL and PostgreSQL don’t allowfor unnamed subqueries in any case and from a syntactical point of view it’snearly impossible for a JOIN from an unnamed subquery to be useful since it’svery difficult to refer to the columns within it unambiguously.
With the new implementation,Select.join()
andSelect.outerjoin()
now behave very similarly to that ofQuery.join()
, adding JOIN criteria to the existing statement bymatching to the left entity:
stmt=select(user_table).join(addresses_table,user_table.c.id==addresses_table.c.user_id)
producing:
SELECTuser.id,user.nameFROMuserJOINaddressONuser.id=address.user_id
As is the case forJoin
, the ON clause is automatically determinedif feasible:
stmt=select(user_table).join(addresses_table)
When ORM entities are used in the statement, this is essentially how ORMqueries are built up using2.0 style invocation. ORM entities willassign a “plugin” to the statement internally such that ORM-related compilationrules will take place when the statement is compiled into a SQL string. Moredirectly, theSelect.join()
method can accommodate ORMrelationships, without breaking the hard separation between Core and ORMinternals:
stmt=select(User).join(User.addresses)
Another new methodSelect.join_from()
is also added, whichallows easier specification of the left and right side of a join at once:
stmt=select(Address.email_address,User.name).join_from(User,Address)
producing:
SELECTaddress.email_address,user.nameFROMuserJOINaddressONuser.id==address.user_id
The URL object is now immutable¶
TheURL
object has been formalized such that it now presentsitself as anamedtuple
with a fixed number of fields that are immutable. Inaddition, the dictionary represented by theURL.query
attributeis also an immutable mapping. Mutation of theURL
object wasnot a formally supported or documented use case which led to some open-endeduse cases that made it very difficult to intercept incorrect usages, mostcommonly mutation of theURL.query
dictionary to include non-string elements.It also led to all the common problems of allowing mutability in a fundamentaldata object, namely unwanted mutations elsewhere leaking into code that didn’texpect the URL to change. Finally, the namedtuple design is inspired by thatof Python’surllib.parse.urlparse()
which returns the parsed object as anamed tuple.
The decision to change the API outright is based on a calculus weighing theinfeasibility of a deprecation path (which would involve changing theURL.query
dictionary to be a special dictionary that emits deprecationwarnings when any kind of standard library mutation methods are invoked, inaddition that when the dictionary would hold any kind of list of elements, thelist would also have to emit deprecation warnings on mutation) against theunlikely use case of projects already mutatingURL
objects inthe first place, as well as that small changes such as that of#5341were creating backwards-incompatibility in any case. The primary case formutation of aURL
object is that of parsing plugin arguments within theCreateEnginePlugin
extension point, itself a fairly recentaddition that based on Github code search is in use by two repositories,neither of which are actually mutating the URL object.
TheURL
object now provides a rich interface inspectingand generating newURL
objects. Theexisting mechanism to create aURL
object, themake_url()
function, remains unchanged:
>>>fromsqlalchemy.engineimportmake_url>>>url=make_url("postgresql+psycopg2://user:pass@host/dbname")
For programmatic construction, code that may have been using theURL
constructor or__init__
method directly willreceive a deprecation warning if arguments are passed as keyword argumentsand not an exact 7-tuple. The keyword-style constructor is now availablevia theURL.create()
method:
>>>fromsqlalchemy.engineimportURL>>>url=URL.create("postgresql","user","pass",host="host",database="dbname")>>>str(url)'postgresql://user:pass@host/dbname'
Fields can be altered typically using theURL.set()
method, whichreturns a newURL
object with changes applied:
>>>mysql_url=url.set(drivername="mysql+pymysql")>>>str(mysql_url)'mysql+pymysql://user:pass@host/dbname'
To alter the contents of theURL.query
dictionary, methodssuch asURL.update_query_dict()
may be used:
>>>url.update_query_dict({"sslcert":"/path/to/crt"})postgresql://user:***@host/dbname?sslcert=%2Fpath%2Fto%2Fcrt
To upgrade code that is mutating these fields directly, abackwards andforwards compatible approach is to use a duck-typing, as in the followingstyle:
defset_url_drivername(some_url,some_drivername):# check for 1.4ifhasattr(some_url,"set"):returnsome_url.set(drivername=some_drivername)else:# SQLAlchemy 1.3 or earlier, mutate in placesome_url.drivername=some_drivernamereturnsome_urldefset_ssl_cert(some_url,ssl_cert):# check for 1.4ifhasattr(some_url,"update_query_dict"):returnsome_url.update_query_dict({"sslcert":ssl_cert})else:# SQLAlchemy 1.3 or earlier, mutate in placesome_url.query["sslcert"]=ssl_certreturnsome_url
The query string retains its existing format as a dictionary of stringsto strings, using sequences of strings to represent multiple parameters.For example:
>>>fromsqlalchemy.engineimportmake_url>>>url=make_url(..."postgresql://user:pass@host/dbname?alt_host=host1&alt_host=host2&sslcert=%2Fpath%2Fto%2Fcrt"...)>>>url.queryimmutabledict({'alt_host': ('host1', 'host2'), 'sslcert': '/path/to/crt'})
To work with the contents of theURL.query
attribute such that all values arenormalized into sequences, use theURL.normalized_query
attribute:
>>>url.normalized_queryimmutabledict({'alt_host': ('host1', 'host2'), 'sslcert': ('/path/to/crt',)})
The query string can be appended to via methods such asURL.update_query_dict()
,URL.update_query_pairs()
,URL.update_query_string()
:
>>>url.update_query_dict({"alt_host":"host3"},append=True)postgresql://user:***@host/dbname?alt_host=host1&alt_host=host2&alt_host=host3&sslcert=%2Fpath%2Fto%2Fcrt
See also
Changes to CreateEnginePlugin¶
TheCreateEnginePlugin
is also impacted by this change,as the documentation for custom plugins indicated that thedict.pop()
method should be used to remove consumed arguments from the URL object. Thisshould now be achieved using theCreateEnginePlugin.update_url()
method. A backwards compatible approach would look like:
fromsqlalchemy.engineimportCreateEnginePluginclassMyPlugin(CreateEnginePlugin):def__init__(self,url,kwargs):# check for 1.4 styleifhasattr(CreateEnginePlugin,"update_url"):self.my_argument_one=url.query["my_argument_one"]self.my_argument_two=url.query["my_argument_two"]else:# legacyself.my_argument_one=url.query.pop("my_argument_one")self.my_argument_two=url.query.pop("my_argument_two")self.my_argument_three=kwargs.pop("my_argument_three",None)defupdate_url(self,url):# this method runs in 1.4 only and should be used to consume# plugin-specific argumentsreturnurl.difference_update_query(["my_argument_one","my_argument_two"])
See the docstring atCreateEnginePlugin
for complete detailson how this class is used.
select(), case() now accept positional expressions¶
As it may be seen elsewhere in this document, theselect()
construct willnow accept “columns clause” arguments positionally, rather than requiring theybe passed as a list:
# new way, supports 2.0stmt=select(table.c.col1,table.c.col2,...)
When sending the arguments positionally, no other keyword arguments are permitted.In SQLAlchemy 2.0, the above calling style will be the only calling stylesupported.
For the duration of 1.4, the previous calling style will still continueto function, which passes the list of columns or other expressions as a list:
# old way, still works in 1.4stmt=select([table.c.col1,table.c.col2,...])
The above legacy calling style also accepts the old keyword arguments that havesince been removed from most narrative documentation. The existence of thesekeyword arguments is why the columns clause was passed as a list in the first place:
# very much the old way, but still works in 1.4stmt=select([table.c.col1,table.c.col2,...],whereclause=table.c.col1==5)
The detection between the two styles is based on whether or not the firstpositional argument is a list. There are unfortunately still likely someusages that look like the following, where the keyword for the “whereclause”is omitted:
# very much the old way, but still works in 1.4stmt=select([table.c.col1,table.c.col2,...],table.c.col1==5)
As part of this change, theSelect
construct also gains the 2.0-style“future” API which includes an updatedSelect.join()
method as wellas methods likeSelect.filter_by()
andSelect.join_from()
.
In a related change, thecase()
construct has also been modifiedto accept its list of WHEN clauses positionally, with a similar deprecationtrack for the old calling style:
stmt=select(users_table).where(case((users_table.c.name=="wendy","W"),(users_table.c.name=="jack","J"),else_="E",))
The convention for SQLAlchemy constructs accepting*args
vs. a list ofvalues, as is the latter case for a construct likeColumnOperators.in_()
, is thatpositional arguments are used forstructural specification, lists are used for data specification.
All IN expressions render parameters for each value in the list on the fly (e.g. expanding parameters)¶
The “expanding IN” feature, first introduced inLate-expanded IN parameter sets allow IN expressions with cached statements, has maturedenough such that it is clearly superior to the previous method of rendering INexpressions. As the approach was improved to handle empty lists of values, itis now the only means that Core / ORM will use to render lists of INparameters.
The previous approach which has been present in SQLAlchemy since its firstrelease was that when a list of values were passed to theColumnOperators.in_()
method, the list would be expanded into a seriesof individualBindParameter
objects at statement construction time.This suffered from the limitation that it was not possible to vary theparameter list at statement execution time based on the parameter dictionary,which meant that string SQL statements could not be cached independently oftheir parameters, nor could the parameter dictionary be fully used forstatements that included IN expressions generally.
In order to service the “baked query” feature described atBaked Queries, a cacheable version of IN was needed, which is whatbrought about the “expanding IN” feature. In contrast to the existing behaviorwhereby the parameter list is expanded at statement construction time intoindividualBindParameter
objects, the feature instead uses a singleBindParameter
that stores the list of values at once; when thestatement is executed by theEngine
, it is “expanded” on the fly intoindividual bound parameter positions based on the parameters passed to the calltoConnection.execute()
, and the existing SQL string which may have beenretrieved from a previous execution is modified using a regular expression tosuit the current parameter set. This allows for the sameCompiled
object, which stores the rendered string statement, to be invoked multipletimes against different parameter sets that modify the list contents passed toIN expressions, while still maintaining the behavior of individual scalarparameters being passed to the DBAPI. While some DBAPIs do support thisfunctionality directly, it is not generally available; the “expanding IN”feature now supports the behavior consistently for all backends.
As a major focus of 1.4 is to allow for true statement caching in Core and ORMwithout the awkwardness of the “baked” system, and since the “expanding IN”feature represents a simpler approach to building expressions in any case,it’s now invoked automatically whenever a list of values is passed toan IN expression:
stmt=select(A.id,A.data).where(A.id.in_([1,2,3]))
The pre-execution string representation is:
>>>print(stmt)SELECT a.id, a.dataFROM aWHERE a.id IN ([POSTCOMPILE_id_1])
To render the values directly, useliteral_binds
as was the case previously:
>>>print(stmt.compile(compile_kwargs={"literal_binds":True}))SELECT a.id, a.dataFROM aWHERE a.id IN (1, 2, 3)
A new flag, “render_postcompile”, is added as a helper to allow the currentbound value to be rendered as it would be passed to the database:
>>>print(stmt.compile(compile_kwargs={"render_postcompile":True}))SELECT a.id, a.dataFROM aWHERE a.id IN (:id_1_1, :id_1_2, :id_1_3)
Engine logging output shows the ultimate rendered statement as well:
INFOsqlalchemy.engine.base.EngineSELECTa.id,a.dataFROMaWHEREa.idIN(?,?,?)INFOsqlalchemy.engine.base.Engine(1,2,3)
As part of this change, the behavior of “empty IN” expressions, where the listparameter is empty, is now standardized on use of the IN operator against aso-called “empty set”. As there is no standard SQL syntax for empty sets, aSELECT that returns no rows is used, tailored in specific ways for each backendso that the database treats it as an empty set; this feature was firstintroduced in version 1.3 and is described atExpanding IN feature now supports empty lists. Thecreate_engine.empty_in_strategy
parameter, introduced in version1.2 as a means for migrating for how this case was treated for the previous INsystem, is now deprecated and this flag no longer has an effect; as describedinThe IN / NOT IN operator’s empty collection behavior is now configurable; default expression simplified, this flag allowed a dialect to switch between theoriginal system of comparing a column against itself, which turned out to be ahuge performance issue, and a newer system of comparing “1 != 1” inorder to produce a “false” expression. The 1.3 introduced behavior whichnow takes place in all cases is more correct than both approaches as the INoperator is still used, and does not have the performance issue of the originalsystem.
In addition, the “expanding” parameter system has been generalized so that italso services other dialect-specific use cases where a parameter cannot beaccommodated by the DBAPI or backing database; seeNew “post compile” bound parameters used for LIMIT/OFFSET in Oracle, SQL Server fordetails.
Built-in FROM linting will warn for any potential cartesian products in a SELECT statement¶
As the Core expression language as well as the ORM are built on an “implicitFROMs” model where a particular FROM clause is automatically added if any partof the query refers to it, a common issue is the case where a SELECT statement,either a top level statement or an embedded subquery, contains FROM elementsthat are not joined to the rest of the FROM elements in the query, causingwhat’s referred to as a “cartesian product” in the result set, i.e. everypossible combination of rows from each FROM element not otherwise joined. Inrelational databases, this is nearly always an undesirable outcome as itproduces an enormous result set full of duplicated, uncorrelated data.
SQLAlchemy, for all of its great features, is particularly prone to this sortof issue happening as a SELECT statement will have elements added to its FROMclause automatically from any table seen in the other clauses. A typicalscenario looks like the following, where two tables are JOINed together,however an additional entry in the WHERE clause that perhaps inadvertently doesnot line up with these two tables will create an additional FROM entry:
address_alias=aliased(Address)q=(session.query(User).join(address_alias,User.addresses).filter(Address.email_address=="foo"))
The above query selects from a JOIN ofUser
andaddress_alias
, thelatter of which is an alias of theAddress
entity. However, theAddress
entity is used within the WHERE clause directly, so the above wouldresult in the SQL:
SELECTusers.idASusers_id,users.nameASusers_name,users.fullnameASusers_fullname,users.nicknameASusers_nicknameFROMaddresses,usersJOINaddressesASaddresses_1ONusers.id=addresses_1.user_idWHEREaddresses.email_address=:email_address_1
In the above SQL, we can see what SQLAlchemy developers term “the dreadedcomma”, as we see “FROM addresses, users JOIN addresses” in the FROM clausewhich is the classic sign of a cartesian product; where a query is making useof JOIN in order to join FROM clauses together, however because one of them isnot joined, it uses a comma. The above query will return a full set ofrows that join the “user” and “addresses” table together on the “id / user_id”column, and will then apply all those rows into a cartesian product againstevery row in the “addresses” table directly. That is, if there are ten userrows and 100 rows in addresses, the above query will return its expected resultrows, likely to be 100 as all address rows would be selected, multiplied by 100again, so that the total result size would be 10000 rows.
The “table1, table2 JOIN table3” pattern is one that also occurs quitefrequently within the SQLAlchemy ORM due to either subtle mis-application ofORM features particularly those related to joined eager loading or joined tableinheritance, as well as a result of SQLAlchemy ORM bugs within those samesystems. Similar issues apply to SELECT statements that use “implicit joins”,where the JOIN keyword is not used and instead each FROM element is linked withanother one via the WHERE clause.
For some years there has been a recipe on the Wiki that applies a graphalgorithm to aselect()
construct at query execution time and inspectsthe structure of the query for these un-linked FROM clauses, parsing throughthe WHERE clause and all JOIN clauses to determine how FROM elements are linkedtogether and ensuring that all the FROM elements are connected in a singlegraph. This recipe has now been adapted to be part of theSQLCompiler
itself where it now optionally emits a warning for a statement if thiscondition is detected. The warning is enabled using thecreate_engine.enable_from_linting
flag and is enabled by default.The computational overhead of the linter is very low, and additionally it onlyoccurs during statement compilation which means for a cached SQL statement itonly occurs once.
Using this feature, our ORM query above will emit a warning:
>>>q.all()SAWarning: SELECT statement has a cartesian product between FROMelement(s) "addresses_1", "users" and FROM element "addresses".Apply join condition(s) between each element to resolve.
The linter feature accommodates not just for tables linked together through theJOIN clauses but also through the WHERE clause Above, we can add a WHEREclause to link the newAddress
entity with the previousaddress_alias
entity and that will remove the warning:
q=(session.query(User).join(address_alias,User.addresses).filter(Address.email_address=="foo").filter(Address.id==address_alias.id))# resolve cartesian products,# will no longer warn
The cartesian product warning considersany kind of link between twoFROM clauses to be a resolution, even if the end result set is stillwasteful, as the linter is intended only to detect the common case of aFROM clause that is completely unexpected. If the FROM clause is referredto explicitly elsewhere and linked to the other FROMs, no warning is emitted:
q=(session.query(User).join(address_alias,User.addresses).filter(Address.email_address=="foo").filter(Address.id>address_alias.id))# will generate a lot of rows,# but no warning
Full cartesian products are also allowed if they are explicitly stated; if wewanted for example the cartesian product ofUser
andAddress
, we canJOIN ontrue()
so that every row will match with every other; thefollowing query will return all rows and produce no warnings:
fromsqlalchemyimporttrue# intentional cartesian productq=session.query(User).join(Address,true())# intentional cartesian product
The warning is only generated by default when the statement is compiled by theConnection
for execution; calling theClauseElement.compile()
method will not emit a warning unless the linting flag is supplied:
>>>fromsqlalchemy.sqlimportFROM_LINTING>>>print(q.statement.compile(linting=FROM_LINTING))SAWarning: SELECT statement has a cartesian product between FROM element(s) "addresses" and FROM element "users". Apply join condition(s) between each element to resolve.SELECT users.id, users.name, users.fullname, users.nicknameFROM addresses, users JOIN addresses AS addresses_1 ON users.id = addresses_1.user_idWHERE addresses.email_address = :email_address_1
New Result object¶
A major goal of SQLAlchemy 2.0 is to unify how “results” are handled betweenthe ORM and Core. Towards this goal, version 1.4 introduces new versionsof both theResultProxy
andRowProxy
objects that have been partof SQLAlchemy since the beginning.
The new objects are documented atResult
andRow
,and are used not only for Core result sets but for2.0 style resultswithin the ORM as well.
This result object is fully compatible withResultProxy
and includes manynew features, that are now applied to both Core and ORM results equally,including methods such as:
Result.one()
- returns exactly a single row, or raises:
withengine.connect()asconn:row=conn.execute(table.select().where(table.c.id==5)).one()
Result.one_or_none()
- same, but also returns None for no rows
Result.all()
- returns all rows
Result.partitions()
- fetches rows in chunks:
withengine.connect()asconn:result=conn.execute(table.select().order_by(table.c.id),execution_options={"stream_results":True})forchunkinresult.partitions(500):# process up to 500 records
Result.columns()
- allows slicing and reorganizing of rows:
withengine.connect()asconn:# requests x, y, zresult=conn.execute(select(table.c.x,table.c.y,table.c.z))# iterate rows as y, xfory,xinresult.columns("y","x"):print("Y:%s X:%s"%(y,x))
Result.scalars()
- returns lists of scalar objects, from thefirst column by default but can also be selected:
result=session.execute(select(User).order_by(User.id))foruser_objinresult.scalars():# ...
Result.mappings()
- instead of named-tuple rows, returnsdictionaries:
withengine.connect()asconn:result=conn.execute(select(table.c.x,table.c.y,table.c.z))formap_inresult.mappings():print("Y:%(y)s X:%(x)s"%map_)
When using Core, the object returned byConnection.execute()
isan instance ofCursorResult
, which continues to feature the same APIfeatures asResultProxy
regarding inserted primary keys, defaults,rowcounts, etc. For ORM, aResult
subclass will be returnedthat performs translation of Core rows into ORM rows, and then allows all thesame operations to take place.
See also
ORM Query Unified with Core Select - in the 2.0 migration documentation
RowProxy is no longer a “proxy”; is now called Row and behaves like an enhanced named tuple¶
TheRowProxy
class, which represents individual database result rowsin a Core result set, is now calledRow
and is no longer a “proxy”object; what this means is that when theRow
object is returned, therow is a simple tuple that contains the data in its final form, already havingbeen processed by result-row handling functions associated with datatypes(examples include turning a date string from the database into adatetime
object, a JSON string into a Pythonjson.loads()
result, etc.).
The immediate rationale for this is so that the row can act more like a Pythonnamed tuple, rather than a mapping, where the values in the tuple are thesubject of the__contains__
operator on the tuple, rather than the keys.WithRow
acting like a named tuple, it is then suitable for use as asreplacement for the ORM’sKeyedTuple
object, leading to an eventualAPI where both the ORM and Core deliver result sets that behave identically.Unification of major patterns within ORM and Core is a major goal of SQLAlchemy2.0, and release 1.4 aims to have most or all of the underlying architecturalpatterns in place in order to support this process. The note inThe “KeyedTuple” object returned by Query is replaced by Row describes the ORM’s use of theRow
class.
For release 1.4, theRow
class provides an additional subclassLegacyRow
, which is used by Core and provides a backwards-compatibleversion ofRowProxy
while emitting deprecation warnings for those APIfeatures and behaviors that will be moved. ORMQuery
now makes useofRow
directly as a replacement forKeyedTuple
.
TheLegacyRow
class is a transitional class where the__contains__
method is still testing against the keys, not the values,while emitting a deprecation warning when the operation succeeds.Additionally, all the other mapping-like methods on the previousRowProxy
are deprecated, includingLegacyRow.keys()
,LegacyRow.items()
, etc. For mapping-like behaviors from aRow
object, including support for these methods as well as a key-oriented__contains__
operator, the API going forward will be to first access aspecial attributeRow._mapping
, which will then provide a completemapping interface to the row, rather than a tuple interface.
Rationale: To behave more like a named tuple rather than a mapping¶
The difference between a named tuple and a mapping as far as boolean operatorscan be summarized. Given a “named tuple” in pseudo code as:
row=(id:5,name:'some name')
The biggest cross-incompatible difference is the behavior of__contains__
:
"id"inrow# True for a mapping, False for a named tuple"some name"inrow# False for a mapping, True for a named tuple
In 1.4, when aLegacyRow
is returned by a Core result set, the above"id"inrow
comparison will continue to succeed, however a deprecationwarning will be emitted. To use the “in” operator as a mapping, use theRow._mapping
attribute:
"id"inrow._mapping
SQLAlchemy 2.0’s result object will feature a.mappings()
modifier so thatthese mappings can be received directly:
# using sqlalchemy.future packageforrowinresult.mappings():row["id"]
Proxying behavior goes away, was also unnecessary in modern usage¶
The refactor ofRow
to behave like a tuple requires that alldata values be fully available up front. This is an internal behavior changefrom that ofRowProxy
, where result-row processing functions wouldbe invoked at the point of accessing an element of the row, instead ofwhen the row was first fetched. This means for example when retrievinga datetime value from SQLite, the data for the row as present in theRowProxy
object would previously have looked like:
row_proxy=(1,"2019-12-31 19:56:58.272106")
and then upon access via__getitem__
, thedatetime.strptime()
functionwould be used on the fly to convert the above string date into adatetime
object. With the new architecture, thedatetime()
object is presentin the tuple when it is returned, thedatetime.strptime()
functionhaving been called just once up front:
row=(1,datetime.datetime(2019,12,31,19,56,58,272106))
TheRowProxy
andRow
objects in SQLAlchemy are where themajority of SQLAlchemy’s C extension code takes place. This code has beenhighly refactored to provide the new behavior in an efficient manner, andoverall performance has been improved as the design ofRow
is nowconsiderably simpler.
The rationale behind the previous behavior assumed a usage model where aresult row might have dozens or hundreds of columns present, where most ofthose columns would not be accessed, and for which a majority of those columnswould require some result-value processing function. By invoking theprocessing function only when needed, the goal was that lots of resultprocessing functions would not be necessary, thus increasing performance.
There are many reasons why the above assumptions do not hold:
the vast majority of row-processing functions called were to Unicode decodea bytestring into a Python Unicode string under Python 2. This was rightas Python Unicode was beginning to see use and before Python 3 existed.Once Python 3 was introduced, within a few years, all Python DBAPIs tookon the proper role of supporting the delivering of Python Unicode objects directly, underboth Python 2 and Python 3, as an option in the former case and as the onlyway forward in the latter case. Eventually, in most cases it becamethe default for Python 2 as well. SQLAlchemy’s Python 2 support stillenables explicit string-to-Unicode conversion for some DBAPIs such ascx_Oracle, however it is now performed at the DBAPI level rather thanas a standard SQLAlchemy result row processing function.
The above string conversion, when it is used, was made to be extremelyperformant via the C extensions, so much so that even in 1.4, SQLAlchemy’sbyte-to-Unicode codec hook is plugged into cx_Oracle where it has beenobserved to be more performant than cx_Oracle’s own hook; this meant thatthe overhead for converting all strings in a row was not as significantas it originally was in any case.
Row processing functions are not used in most other cases; theexceptions are SQLite’s datetime support, JSON support for some backends,some numeric handlers such as string to
Decimal
. In the case ofDecimal
, Python 3 also standardized on the highly performantcdecimal
implementation, which is not the case in Python 2 which continues to usethe much less performant pure Python version.Fetching full rows where only a few columns are needed is not common withinreal-world use cases In the early days of SQLAlchemy, database code from otherlanguages of the form “row = fetch(‘SELECT * FROM table’)” was common;using SQLAlchemy’s expression language however, code observed in the wildtypically makes use of the specific columns needed.
SELECT objects and derived FROM clauses allow for duplicate columns and column labels¶
This change allows that theselect()
construct now allows for duplicatecolumn labels as well as duplicate column objects themselves, so that resulttuples are organized and ordered in the identical way in that the columns wereselected. The ORMQuery
already works this way, so this changeallows for greater cross-compatibility between the two, which is a key goal ofthe 2.0 transition:
>>>fromsqlalchemyimportcolumn,select>>>c1,c2,c3,c4=column("c1"),column("c2"),column("c3"),column("c4")>>>stmt=select(c1,c2,c3.label("c2"),c2,c4)>>>print(stmt)SELECT c1, c2, c3 AS c2, c2, c4
To support this change, theColumnCollection
used bySelectBase
as well as for derived FROM clauses such as subqueriesalso support duplicate columns; this includes the newSelectBase.selected_columns
attribute, the deprecatedSelectBase.c
attribute, as well as theFromClause.c
attribute seen on constructssuch asSubquery
andAlias
:
>>>list(stmt.selected_columns)[ <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcca20; c1>, <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcc9e8; c2>, <sqlalchemy.sql.elements.Label object at 0x7fa540b3e2e8>, <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcc9e8; c2>, <sqlalchemy.sql.elements.ColumnClause at 0x7fa540897048; c4>]>>>print(stmt.subquery().select())SELECT anon_1.c1, anon_1.c2, anon_1.c2, anon_1.c2, anon_1.c4FROM (SELECT c1, c2, c3 AS c2, c2, c4) AS anon_1
ColumnCollection
also allows access by integer index to supportwhen the string “key” is ambiguous:
>>>stmt.selected_columns[2]<sqlalchemy.sql.elements.Label object at 0x7fa540b3e2e8>
To suit the use ofColumnCollection
in objects such asTable
andPrimaryKeyConstraint
, the old “deduplicating”behavior which is more critical for these objects is preserved in a new classDedupeColumnCollection
.
The change includes that the familiar warning"Column%rontable%rbeingreplacedby%r,whichhasthesamekey. Consideruse_labelsforselect()statements."
isremoved; theSelect.apply_labels()
is stillavailable and is still used by the ORM for all SELECT operations, however itdoes not imply deduplication of column objects, although it does implydeduplication of implicitly generated labels:
>>>fromsqlalchemyimporttable>>>user=table("user",column("id"),column("name"))>>>stmt=select(user.c.id,user.c.name,user.c.id).apply_labels()>>>print(stmt)SELECT "user".id AS user_id, "user".name AS user_name, "user".id AS id_1FROM "user"
Finally, the change makes it easier to create UNION and other_selectable.CompoundSelect
objects, by ensuring that the number and positionof columns in a SELECT statement mirrors what was given, in a use case suchas:
>>>s1=select(user,user.c.id)>>>s2=select(c1,c2,c3)>>>fromsqlalchemyimportunion>>>u=union(s1,s2)>>>print(u)SELECT "user".id, "user".name, "user".idFROM "user" UNION SELECT c1, c2, c3
Improved column labeling for simple column expressions using CAST or similar¶
A user pointed out that the PostgreSQL database has a convenient behavior whenusing functions like CAST against a named column, in that the result column nameis named the same as the inner expression:
test=>SELECTCAST(dataASVARCHAR)FROMfoo;data------5(1row)
This allows one to apply CAST to table columns while not losing the columnname (above using the name"data"
) in the result row. Compare todatabases such as MySQL/MariaDB, as well as most others, where the columnname is taken from the full SQL expression and is not very portable:
MariaDB[test]>SELECTCAST(dataASCHAR)FROMfoo;+--------------------+|CAST(dataASCHAR)|+--------------------+|5|+--------------------+1rowinset(0.003sec)
In SQLAlchemy Core expressions, we never deal with a raw generated name likethe above, as SQLAlchemy applies auto-labeling to expressions like these, whichare up until now always a so-called “anonymous” expression:
>>>print(select(cast(foo.c.data,String)))SELECT CAST(foo.data AS VARCHAR) AS anon_1 # old behaviorFROM foo
These anonymous expressions were necessary as SQLAlchemy’sResultProxy
made heavy use of result column names in order to matchup datatypes, such as theString
datatype which used to haveresult-row-processing behavior, to the correct column, so most importantly thenames had to be both easy to determine in a database-agnostic manner as well asunique in all cases. In SQLAlchemy 1.0 as part of#918, thisreliance on named columns in result rows (specifically thecursor.description
element of the PEP-249 cursor) was scaled back to not benecessary for most Core SELECT constructs; in release 1.4, the system overallis becoming more comfortable with SELECT statements that have duplicate columnor label names such as inSELECT objects and derived FROM clauses allow for duplicate columns and column labels. So we now emulate PostgreSQL’sreasonable behavior for simple modifications to a single column, mostprominently with CAST:
>>>print(select(cast(foo.c.data,String)))SELECT CAST(foo.data AS VARCHAR) AS dataFROM foo
For CAST against expressions that don’t have a name, the previous logic is usedto generate the usual “anonymous” labels:
>>>print(select(cast("hi there,"+foo.c.data,String)))SELECT CAST(:data_1 + foo.data AS VARCHAR) AS anon_1FROM foo
Acast()
against aLabel
, despite having to omit the labelexpression as these don’t render inside of a CAST, will nonetheless make use ofthe given name:
>>>print(select(cast(("hi there,"+foo.c.data).label("hello_data"),String)))SELECT CAST(:data_1 + foo.data AS VARCHAR) AS hello_dataFROM foo
And of course as was always the case,Label
can be applied to theexpression on the outside to apply an “AS <name>” label directly:
>>>print(select(cast(("hi there,"+foo.c.data),String).label("hello_data")))SELECT CAST(:data_1 + foo.data AS VARCHAR) AS hello_dataFROM foo
New “post compile” bound parameters used for LIMIT/OFFSET in Oracle, SQL Server¶
A major goal of the 1.4 series is to establish that all Core SQL constructsare completely cacheable, meaning that a particularCompiled
structure will produce an identical SQL string regardless of any SQL parametersused with it, which notably includes those used to specify the LIMIT andOFFSET values, typically used for pagination and “top N” style results.
While SQLAlchemy has used bound parameters for LIMIT/OFFSET schemes for manyyears, a few outliers remained where such parameters were not allowed, includinga SQL Server “TOP N” statement, such as:
SELECTTOP5mytable.id,mytable.dataFROMmytable
as well as with Oracle, where the FIRST_ROWS() hint (which SQLAlchemy willuse if theoptimize_limits=True
parameter is passed tocreate_engine()
with an Oracle URL) does not allow them,but also that using bound parameters with ROWNUM comparisons has been reportedas producing slower query plans:
SELECTanon_1.id,anon_1.dataFROM(SELECT/*+FIRST_ROWS(5)*/anon_2.idASid,anon_2.dataASdata,ROWNUMASora_rnFROM(SELECTmytable.id,mytable.dataFROMmytable)anon_2WHEREROWNUM<=:param_1)anon_1WHEREora_rn>:param_2
In order to allow for all statements to be unconditionally cacheable at thecompilation level, a new form of bound parameter called a “post compile”parameter has been added, which makes use of the same mechanism as thatof “expanding IN parameters”. This is abindparam()
that behavesidentically to any other bound parameter except that parameter value willbe rendered literally into the SQL string before sending it to the DBAPIcursor.execute()
method. The new parameter is used internally by theSQL Server and Oracle dialects, so that the drivers receive the literalrendered value but the rest of SQLAlchemy can still consider this as abound parameter. The above two statements when stringified usingstr(statement.compile(dialect=<dialect>))
now look like:
SELECTTOP[POSTCOMPILE_param_1]mytable.id,mytable.dataFROMmytable
and:
SELECTanon_1.id,anon_1.dataFROM(SELECT/*+FIRST_ROWS([POSTCOMPILE__ora_frow_1])*/anon_2.idASid,anon_2.dataASdata,ROWNUMASora_rnFROM(SELECTmytable.id,mytable.dataFROMmytable)anon_2WHEREROWNUM<=[POSTCOMPILE_param_1])anon_1WHEREora_rn>[POSTCOMPILE_param_2]
The[POSTCOMPILE_<param>]
format is also what is seen when an“expanding IN” is used.
When viewing the SQL logging output, the final form of the statement willbe seen:
SELECTanon_1.id,anon_1.dataFROM(SELECT/*+FIRST_ROWS(5)*/anon_2.idASid,anon_2.dataASdata,ROWNUMASora_rnFROM(SELECTmytable.idASid,mytable.dataASdataFROMmytable)anon_2WHEREROWNUM<=8)anon_1WHEREora_rn>3
The “post compile parameter” feature is exposed as public API through thebindparam.literal_execute
parameter, however is currently notintended for general use. The literal values are rendered using theTypeEngine.literal_processor()
of the underlying datatype, which inSQLAlchemy hasextremely limited scope, supporting only integers and simplestring values.
Connection-level transactions can now be inactive based on subtransaction¶
AConnection
now includes the behavior where aTransaction
can be made inactive due to a rollback on an inner transaction, however theTransaction
will not clear until it is itself rolled back.
This is essentially a new error condition which will disallow statementexecutions to proceed on aConnection
if an inner “sub” transactionhas been rolled back. The behavior works very similarly to that of theORMSession
, where if an outer transaction has been begun, it needsto be rolled back to clear the invalid transaction; this behavior is describedin“This Session’s transaction has been rolled back due to a previous exception during flush.” (or similar).
While theConnection
has had a less strict behavioral pattern thantheSession
, this change was made as it helps to identify whena subtransaction has rolled back the DBAPI transaction, however the externalcode isn’t aware of this and attempts to continue proceeding, which in factruns operations on a new transaction. The “test harness” pattern describedatJoining a Session into an External Transaction (such as for test suites) is the common place for this to occur.
The “subtransaction” feature of Core and ORM is itself deprecated and willno longer be present in version 2.0. As a result, this new error conditionis itself temporary as it will no longer apply once subtransactions are removed.
In order to work with the 2.0 style behavior that does not includesubtransactions, use thecreate_engine.future
parameteroncreate_engine()
.
The error message is described in the errors page atThis connection is on an inactive transaction. Please rollback() fully before proceeding.
Enum and Boolean datatypes no longer default to “create constraint”¶
TheEnum.create_constraint
andBoolean.create_constraint
parameters now default to False,indicating when a so-called “non-native” version of these two datatypes iscreated, a CHECK constraint willnot be generated by default. TheseCHECK constraints present schema-management maintenance complexities thatshould be opted in to, rather than being turned on by default.
To ensure that a CREATE CONSTRAINT is emitted for these types, set theseflags toTrue
:
classSpam(Base):__tablename__="spam"id=Column(Integer,primary_key=True)boolean=Column(Boolean(create_constraint=True))enum=Column(Enum("a","b","c",create_constraint=True))
New Features - ORM¶
Raiseload for Columns¶
The “raiseload” feature, which raisesInvalidRequestError
when anunloaded attribute is accessed, is now available for column-oriented attributesusing thedefer.raiseload
parameter ofdefer()
. Thisworks in the same manner as that of theraiseload()
option used byrelationship loading:
book=session.query(Book).options(defer(Book.summary,raiseload=True)).first()# would raise an exceptionbook.summary
To configure column-level raiseload on a mapping, thedeferred.raiseload
parameter ofdeferred()
may be used. Theundefer()
option may then be used at query time to eagerly loadthe attribute:
classBook(Base):__tablename__="book"book_id=Column(Integer,primary_key=True)title=Column(String(200),nullable=False)summary=deferred(Column(String(2000)),raiseload=True)excerpt=deferred(Column(Text),raiseload=True)book_w_excerpt=session.query(Book).options(undefer(Book.excerpt)).first()
It was originally considered that the existingraiseload()
option thatworks forrelationship()
attributes be expanded to also support column-orientedattributes. However, this would break the “wildcard” behavior ofraiseload()
,which is documented as allowing one to prevent all relationships from loading:
session.query(Order).options(joinedload(Order.items),raiseload("*"))
Above, if we had expandedraiseload()
to accommodate for columns aswell, the wildcard would also prevent columns from loading and thus be abackwards incompatible change; additionally, it’s not clear ifraiseload()
covered both column expressions and relationships, how onewould achieve the effect above of only blocking relationship loads, withoutnew API being added. So to keep things simple, the option for columnsremains ondefer()
:
raiseload()
- query option to raise for relationship loads
defer.raiseload
- query option to raise for column expression loads
As part of this change, the behavior of “deferred” in conjunction withattribute expiration has changed. Previously, when an object would be markedas expired, and then unexpired via the access of one of the expired attributes,attributes which were mapped as “deferred” at the mapper level would also load.This has been changed such that an attribute that is deferred in the mappingwill never “unexpire”, it only loads when accessed as part of the deferralloader.
An attribute that is not mapped as “deferred”, however was deferred at querytime via thedefer()
option, will be reset when the object or attributeis expired; that is, the deferred option is removed. This is the same behavioras was present previously.
See also
ORM Batch inserts with psycopg2 now batch statements with RETURNING in most cases¶
The change inpsycopg2 dialect features “execute_values” with RETURNING for INSERT statements by default adds support for “executemany” + “RETURNING”at the same time in Core, which is now enabled for the psycopg2 dialectby default using the psycopg2execute_values()
extension. The ORM flushprocess now makes use of this feature such that the retrieval of newly generatedprimary key values and server defaults can be achieved while not losing theperformance benefits of being able to batch INSERT statements together. Additionally,psycopg2’sexecute_values()
extension itself provides a five-fold performanceimprovement over psycopg2’s default “executemany” implementation, by rewritingan INSERT statement to include many “VALUES” expressions all in one statementrather than invoking the same statement repeatedly, as psycopg2 lacks the abilityto PREPARE the statement ahead of time as would normally be expected for thisapproach to be performant.
SQLAlchemy includes aperformance suite withinits examples, where we can compare the times generated for the “batch_inserts”runner against 1.3 and 1.4, revealing a 3x-5x speedup for most flavorsof batch insert:
# 1.3$python-mexamples.performancebulk_inserts--dburlpostgresql://scott:tiger@localhost/testtest_flush_no_pk:(100000iterations);totaltime14.051527sectest_bulk_save_return_pks:(100000iterations);totaltime15.002470sectest_flush_pk_given:(100000iterations);totaltime7.863680sectest_bulk_save:(100000iterations);totaltime6.780378sectest_bulk_insert_mappings:(100000iterations);totaltime5.363070sectest_core_insert:(100000iterations);totaltime5.362647sec# 1.4 with enhancement$python-mexamples.performancebulk_inserts--dburlpostgresql://scott:tiger@localhost/testtest_flush_no_pk:(100000iterations);totaltime3.820807sectest_bulk_save_return_pks:(100000iterations);totaltime3.176378sectest_flush_pk_given:(100000iterations);totaltime4.037789sectest_bulk_save:(100000iterations);totaltime2.604446sectest_bulk_insert_mappings:(100000iterations);totaltime1.204897sectest_core_insert:(100000iterations);totaltime0.958976sec
Note that theexecute_values()
extension modifies the INSERT statement in the psycopg2layer,after it’s been logged by SQLAlchemy. So with SQL logging, one will see theparameter sets batched together, but the joining of multiple “values” will not be visibleon the application side:
2020-06-2719:08:18,166INFOsqlalchemy.engine.EngineINSERTINTOa(data)VALUES(%(data)s)RETURNINGa.id2020-06-2719:08:18,166INFOsqlalchemy.engine.Engine[generatedin0.00698s]({'data':'data 1'},{'data':'data 2'},{'data':'data 3'},{'data':'data 4'},{'data':'data 5'},{'data':'data 6'},{'data':'data 7'},{'data':'data 8'}...displaying10of4999totalboundparametersets...{'data':'data 4998'},{'data':'data 4999'})2020-06-2719:08:18,254INFOsqlalchemy.engine.EngineCOMMIT
The ultimate INSERT statement can be seen by enabling statement logging on the PostgreSQL side:
2020-06-2719:08:18.169EDT[26960]LOG:statement:INSERTINTOa(data)VALUES('data 1'),('data 2'),('data 3'),('data 4'),('data 5'),('data 6'),('data7'),('data8'),('data9'),('data10'),('data11'),('data12'),...('data 999'),('data 1000')RETURNINGa.id2020-06-2719:08:18.175EDT[26960]LOG:statement:INSERTINTOa(data)VALUES('data 1001'),('data1002'),('data1003'),('data1004'),('data1005'),('data1006'),('data1007'),('data1008'),('data1009'),('data1010'),('data1011'), ...
The feature batches rows into groups of 1000 by default which can be affectedusing theexecutemany_values_page_size
argument documented atPsycopg2 Fast Execution Helpers.
ORM Bulk Update and Delete use RETURNING for “fetch” strategy when available¶
An ORM bulk update or delete that uses the “fetch” strategy:
sess.query(User).filter(User.age>29).update({"age":User.age-10},synchronize_session="fetch")
Will now use RETURNING if the backend database supports it; this currentlyincludes PostgreSQL and SQL Server (the Oracle dialect does not support RETURNINGof multiple rows):
UPDATEusersSETage_int=(users.age_int-%(age_int_1)s)WHEREusers.age_int>%(age_int_2)sRETURNINGusers.id[generatedin0.00060s]{'age_int_1':10,'age_int_2':29}Col('id',)Row(2,)Row(4,)
For backends that do not support RETURNING of multiple rows, the previous approachof emitting SELECT for the primary keys beforehand is still used:
SELECTusers.idFROMusersWHEREusers.age_int>%(age_int_1)s[generatedin0.00043s]{'age_int_1':29}Col('id',)Row(2,)Row(4,)UPDATEusersSETage_int=(users.age_int-%(age_int_1)s)WHEREusers.age_int>%(age_int_2)s[generatedin0.00102s]{'age_int_1':10,'age_int_2':29}
One of the intricate challenges of this change is to support cases such as thehorizontal sharding extension, where a single bulk update or delete may bemultiplexed among backends some of which support RETURNING and some don’t. Thenew 1.4 execution architecture supports this case so that the “fetch” strategycan be left intact with a graceful degrade to using a SELECT, rather than havingto add a new “returning” strategy that would not be backend-agnostic.
As part of this change, the “fetch” strategy is also made much more efficientin that it will no longer expire the objects located which match the rows,for Python expressions used in the SET clause which can be evaluated inPython; these are instead assigneddirectly onto the object in the same way as the “evaluate” strategy. Onlyfor SQL expressions that can’t be evaluated does it fall back to expiringthe attributes. The “evaluate” strategy has also been enhanced to fall backto “expire” for a value that cannot be evaluated.
Behavioral Changes - ORM¶
The “KeyedTuple” object returned by Query is replaced by Row¶
As discussed atRowProxy is no longer a “proxy”; is now called Row and behaves like an enhanced named tuple, the CoreRowProxy
objectis now replaced by a class calledRow
. The baseRow
object now behaves more fully like a named tuple, and as such it is nowused as the basis for tuple-like results returned by theQuery
object, rather than the previous “KeyedTuple” class.
The rationale is so that by SQLAlchemy 2.0, both Core and ORM SELECT statementswill return result rows using the sameRow
object which behaves likea named tuple. Dictionary-like functionality is available fromRow
via theRow._mapping
attribute. In the interim, Core result setswill make use of aRow
subclassLegacyRow
which maintainsthe previous dict/tuple hybrid behavior for backwards compatibility while theRow
class will be used directly for ORM tuple results returnedby theQuery
object.
Effort has been made to get most of the featureset ofRow
to beavailable within the ORM, meaning that access by string name as wellas entity / column should work:
row=s.query(User,Address).join(User.addresses).first()row._mapping[User]# same as row[0]row._mapping[Address]# same as row[1]row._mapping["User"]# same as row[0]row._mapping["Address"]# same as row[1]u1=aliased(User)row=s.query(u1).only_return_tuples(True).first()row._mapping[u1]# same as row[0]row=s.query(User.id,Address.email_address).join(User.addresses).first()row._mapping[User.id]# same as row[0]row._mapping["id"]# same as row[0]row._mapping[users.c.id]# same as row[0]
Session features new “autobegin” behavior¶
Previously, theSession
in its default mode ofautocommit=False
would internally begin aSessionTransaction
object immediatelyon construction, and additionally would create a new one after each call toSession.rollback()
orSession.commit()
.
The new behavior is that thisSessionTransaction
object is nowcreated on demand only, when methods such asSession.add()
orSession.execute()
are called. However it is also now possibleto callSession.begin()
explicitly in order to begin the transaction,even inautocommit=False
mode, thus matching the behavior of thefuture-style_base.Connection
.
The behavioral changes this indicates are:
The
Session
can now be in the state where no transaction is begun,even inautocommit=False
mode. Previously, this state was only availablein “autocommit” mode.Within this state, the
Session.commit()
andSession.rollback()
methods are no-ops. Code that relies upon these methods to expire all objectsshould make explicit use of eitherSession.begin()
orSession.expire_all()
to suit their use case.The
SessionEvents.after_transaction_create()
event hook is not emittedimmediately when theSession
is created, or after aSession.rollback()
orSession.commit()
completes.The
Session.close()
method also does not imply implicit begin of a newSessionTransaction
.
See also
Rationale¶
TheSession
object’s default behavior ofautocommit=False
historically has meant that there is always aSessionTransaction
object in play, associated with theSession
via theSession.transaction
attribute. When the givenSessionTransaction
was complete, due to a commit, rollback, or close,it was immediately replaced with a new one. TheSessionTransaction
by itself does not imply the usage of any connection-oriented resources, sothis long-standing behavior has a particular elegance to it in that the stateofSession.transaction
is always predictable as non-None.
However, as part of the initiative in#5056 to greatly reducereference cycles, this assumption means that calling uponSession.close()
results in aSession
object that still hasreference cycles and is more expensive to clean up, not to mention that thereis a small overhead in constructing theSessionTransaction
object, which meant that there would be unnecessary overhead createdfor aSession
that for example invokedSession.commit()
and thenSession.close()
.
As such, it was decided thatSession.close()
should leave the internalstate ofself.transaction
, now referred to internally asself._transaction
, as None, and that a newSessionTransaction
should only be created when needed. For consistency and code coverage, thisbehavior was also expanded to include all the points at which “autobegin” isexpected, not just whenSession.close()
were called.
In particular, this causes a behavioral change for applications whichsubscribe to theSessionEvents.after_transaction_create()
event hook;previously, this event would be emitted when theSession
were firstconstructed, as well as for most actions that closed the previous transactionand would emitSessionEvents.after_transaction_end()
. The new behavioris thatSessionEvents.after_transaction_create()
is emitted on demand,when theSession
has not yet created a newSessionTransaction
object and mapped objects are associated with theSession
through methods likeSession.add()
andSession.delete()
, when theSession.transaction
attribute iscalled upon, when theSession.flush()
method has tasks to complete, etc.
In addition, code which relies upon theSession.commit()
orSession.rollback()
method to unconditionally expire all objects can nolonger do so. Code which needs to expire all objects when no change that hasoccurred should be callingSession.expire_all()
for this case.
Besides the change in when theSessionEvents.after_transaction_create()
event is emitted as well as the no-op nature ofSession.commit()
orSession.rollback()
, the change should have no other user-visible impacton theSession
object’s behavior; theSession
will continueto have the behavior that it remains usable for new operations afterSession.close()
is called, and the sequencing of how theSession
interacts with theEngine
and the databaseitself should also remain unaffected, since these operations were alreadyoperating in an on-demand fashion.
Viewonly relationships don’t synchronize backrefs¶
In#5149 in 1.3.14, SQLAlchemy began emitting a warning when therelationship.backref
orrelationship.back_populates
keywords would be used at the same time as therelationship.viewonly
flag on the target relationship. This was because a “viewonly” relationship doesnot actually persist changes made to it, which could cause some misleadingbehaviors to occur. However, in#5237, we sought to refine thisbehavior as there are legitimate use cases to have backrefs set up onviewonly relationships, including that back populates attributes are usedin some cases by the relationship lazy loaders to determine that an additionaleager load in the other direction is not necessary, as well as that backpopulates can be used for mapper introspection and thatbackref()
can be a convenient way to set up bi-directional relationships.
The solution then was to make the “mutation” that occurs from a backrefan optional thing, using therelationship.sync_backref
flag. In 1.4 the value ofrelationship.sync_backref
defaultsto False for a relationship target that also setsrelationship.viewonly
.This indicates that any changes made to a relationship withviewonly will not impact the state of the other side or of theSession
in any way:
classUser(Base):# ...addresses=relationship(Address,backref=backref("user",viewonly=True))classAddress(Base):# ...u1=session.query(User).filter_by(name="x").first()a1=Address()a1.user=u1
Above, thea1
object willnot be added to theu1.addresses
collection, nor will thea1
object be added to the session. Previously,both of these things would be true. The warning thatrelationship.sync_backref
should be set toFalse
whenrelationship.viewonly
isFalse
is no longer emitted as this isnow the default behavior.
cascade_backrefs behavior deprecated for removal in 2.0¶
SQLAlchemy has long had a behavior of cascading objects into theSession
based on backref assignment. GivenUser
belowalready in aSession
, assigning it to theAddress.user
attribute of anAddress
object, assuming a bidirectional relationshipis set up, would mean that theAddress
also gets put into theSession
at that point:
u1=User()session.add(u1)a1=Address()a1.user=u1# <--- adds "a1" to the Session
The above behavior was an unintended side effect of backref behavior, in thatsincea1.user
impliesu1.addresses.append(a1)
,a1
would getcascaded into theSession
. This remains the default behaviorthroughout 1.4. At some point, a new flagrelationship.cascade_backrefs
was added to disable to above behavior, along withbackref.cascade_backrefs
to set this when the relationship is specified byrelationship.backref
, as it can besurprising and also gets in the way of some operations where the object would be placed intheSession
too early and get prematurely flushed.
In 2.0, the default behavior will be that “cascade_backrefs” is False, andadditionally there will be no “True” behavior as this is not generally a desirablebehavior. When 2.0 deprecation warnings are enabled, a warning will be emittedwhen a “backref cascade” actually takes place. To get the new behavior, eithersetrelationship.cascade_backrefs
andbackref.cascade_backrefs
toFalse
on any targetrelationships, as is already supported in 1.3 and earlier, or alternatively makeuse of theSession.future
flag to2.0-style mode:
Session=sessionmaker(engine,future=True)withSession()assession:u1=User()session.add(u1)a1=Address()a1.user=u1# <--- will not add "a1" to the Session
Eager loaders emit during unexpire operations¶
A long sought behavior was that when an expired object is accessed, configuredeager loaders will run in order to eagerly load relationships on the expiredobject when the object is refreshed or otherwise unexpired. This behavior hasnow been added, so that joinedloaders will add inline JOINs as usual, andselectin/subquery loaders will run an “immediateload” operation for a givenrelationship, when an expired object is unexpired or an object is refreshed:
>>>a1=session.query(A).options(joinedload(A.bs)).first()>>>a1.data="new data">>>session.commit()
Above, theA
object was loaded with ajoinedload()
option associatedwith it in order to eagerly load thebs
collection. After thesession.commit()
, the state of the object is expired. Upon accessingthe.data
column attribute, the object is refreshed and this will nowinclude the joinedload operation as well:
>>>a1.dataSELECT a.id AS a_id, a.data AS a_data, b_1.id AS b_1_id, b_1.a_id AS b_1_a_idFROM a LEFT OUTER JOIN b AS b_1 ON a.id = b_1.a_idWHERE a.id = ?
The behavior applies both to loader strategies applied to therelationship()
directly, as well as with options used withQuery.options()
, provided that the object was originally loaded by thatquery.
For the “secondary” eager loaders “selectinload” and “subqueryload”, the SQLstrategy for these loaders is not necessary in order to eagerly load attributeson a single object; so they will instead invoke the “immediateload” strategy ina refresh scenario, which resembles the query emitted by “lazyload”, emitted asan additional query:
>>>a1=session.query(A).options(selectinload(A.bs)).first()>>>a1.data="new data">>>session.commit()>>>a1.dataSELECT a.id AS a_id, a.data AS a_dataFROM aWHERE a.id = ?(1,)SELECT b.id AS b_id, b.a_id AS b_a_idFROM bWHERE ? = b.a_id(1,)
Note that a loader option does not apply to an object that was introducedinto theSession
in a different way. That is, if thea1
objectwere just persisted in thisSession
, or was loaded with a differentquery before the eager option had been applied, then the object doesn’t havean eager load option associated with it. This is not a new concept, howeverusers who are looking for the eagerload on refresh behavior may find thisto be more noticeable.
Column loaders such asdeferred()
,with_expression()
only take effect when indicated on the outermost, full entity query¶
Note
This change note was not present in earlier versions of this document,however is relevant for all SQLAlchemy 1.4 versions.
A behavior that was never supported in 1.3 and previous versionsyet nonetheless would have a particular effectwas to repurpose column loader options such asdefer()
andwith_expression()
in subqueries in order to control whichSQL expressions would be in the columns clause of each subquery. A typicalexample would be toconstruct UNION queries, such as:
q1=session.query(User).options(with_expression(User.expr,literal("u1")))q2=session.query(User).options(with_expression(User.expr,literal("u2")))q1.union_all(q2).all()
In version 1.3, thewith_expression()
option would take effectfor each element of the UNION, such as:
SELECTanon_1.anon_2ASanon_1_anon_2,anon_1.user_account_idASanon_1_user_account_id,anon_1.user_account_nameASanon_1_user_account_nameFROM(SELECT?ASanon_2,user_account.idASuser_account_id,user_account.nameASuser_account_nameFROMuser_accountUNIONALLSELECT?ASanon_3,user_account.idASuser_account_id,user_account.nameASuser_account_nameFROMuser_account)ASanon_1('u1','u2')
SQLAlchemy 1.4’s notion of loader options has been made more strict, and as suchare applied to theoutermost part of the query only, which is theSELECT that is intended to populate the actual ORM entities to be returned; thequery above in 1.4 will produce:
SELECT?ASanon_1,anon_2.user_account_idASanon_2_user_account_id,anon_2.user_account_nameASanon_2_user_account_nameFROM(SELECTuser_account.idASuser_account_id,user_account.nameASuser_account_nameFROMuser_accountUNIONALLSELECTuser_account.idASuser_account_id,user_account.nameASuser_account_nameFROMuser_account)ASanon_2('u1',)
that is, the options for theQuery
were taken from the firstelement of the UNION, since all loader options are only to be at the topmostlevel. The option from the second query was ignored.
Rationale¶
This behavior now more closely matches that of other kinds of loader optionssuch as relationship loader options likejoinedload()
in allSQLAlchemy versions, 1.3 and earlier included, which in a UNION situation werealready copied out to the top most level of the query, and only taken from thefirst element of the UNION, discarding any options on other parts of the query.
This implicit copying and selective ignoring of options, demonstrated above asbeing fairly arbitrary, is a legacy behavior that’s only part ofQuery
, and is a particular example of whereQuery
and its means of applyingQuery.union_all()
falls short, as it’sambiguous how to turn a single SELECT into a UNION of itself and another queryand how loader options should be applied to that new statement.
SQLAlchemy 1.4’s behavior can be demonstrated as generally superior to thatof 1.3 for a more common case of usingdefer()
. The followingquery:
q1=session.query(User).options(defer(User.name))q2=session.query(User).options(defer(User.name))q1.union_all(q2).all()
In 1.3 would awkwardly add NULL to the inner queries and then SELECT it:
SELECTanon_1.anon_2ASanon_1_anon_2,anon_1.user_account_idASanon_1_user_account_idFROM(SELECTNULLASanon_2,user_account.idASuser_account_idFROMuser_accountUNIONALLSELECTNULLASanon_2,user_account.idASuser_account_idFROMuser_account)ASanon_1
If all queries didn’t have the identical options set up, the above scenariowould raise an error due to not being able to form a proper UNION.
Whereas in 1.4, the option is applied only at the top layer, omittingthe fetch forUser.name
, and this complexity is avoided:
SELECTanon_1.user_account_idASanon_1_user_account_idFROM(SELECTuser_account.idASuser_account_id,user_account.nameASuser_account_nameFROMuser_accountUNIONALLSELECTuser_account.idASuser_account_id,user_account.nameASuser_account_nameFROMuser_account)ASanon_1
Correct Approach¶
Using2.0-style querying, no warning is emitted at the moment, howeverthe nestedwith_expression()
options are consistently ignored asthey don’t apply to an entity being loaded, and are not implicitly copiedanywhere. The query below produces no output for thewith_expression()
calls:
s1=select(User).options(with_expression(User.expr,literal("u1")))s2=select(User).options(with_expression(User.expr,literal("u2")))stmt=union_all(s1,s2)session.scalars(select(User).from_statement(stmt)).all()
producing the SQL:
SELECTuser_account.id,user_account.nameFROMuser_accountUNIONALLSELECTuser_account.id,user_account.nameFROMuser_account
To correctly applywith_expression()
to theUser
entity,it should be applied to the outermost level of the query, using anordinary SQL expression inside the columns clause of each SELECT:
s1=select(User,literal("u1").label("some_literal"))s2=select(User,literal("u2").label("some_literal"))stmt=union_all(s1,s2)session.scalars(select(User).from_statement(stmt).options(with_expression(User.expr,stmt.selected_columns.some_literal))).all()
Which will produce the expected SQL:
SELECTuser_account.id,user_account.name,?ASsome_literalFROMuser_accountUNIONALLSELECTuser_account.id,user_account.name,?ASsome_literalFROMuser_account
TheUser
objects themselves will include this expression in theircontents underneathUser.expr
.
Accessing an uninitialized collection attribute on a transient object no longer mutates __dict__¶
It has always been SQLAlchemy’s behavior that accessing mapped attributes on anewly created object returns an implicitly generated value, rather than raisingAttributeError
, such asNone
for scalar attributes or[]
for alist-holding relationship:
>>>u1=User()>>>u1.nameNone>>>u1.addresses[]
The rationale for the above behavior was originally to make ORM objects easierto work with. Since an ORM object represents an empty row when first createdwithout any state, it is intuitive that its un-accessed attributes wouldresolve toNone
(or SQL NULL) for scalars and to empty collections forrelationships. In particular, it makes possible an extremely common patternof being able to mutate the new collection without manually creating andassigning an empty collection first:
>>>u1=User()>>>u1.addresses.append(Address())# no need to assign u1.addresses = []
Up until version 1.0 of SQLAlchemy, the behavior of this initialization systemfor both scalar attributes as well as collections would be that theNone
orempty collection would bepopulated into the object’s state, e.g.__dict__
. This meant that the following two operations were equivalent:
>>>u1=User()>>>u1.name=None# explicit assignment>>>u2=User()>>>u2.name# implicit assignment just by accessing itNone
Where above, bothu1
andu2
would have the valueNone
populatedin the value of thename
attribute. Since this is a SQL NULL, the ORMwould skip including these values within an INSERT so that SQL-level defaultstake place, if any, else the value defaults to NULL on the database side.
In version 1.0 as part ofChanges to attribute events and other operations regarding attributes that have no pre-existing value, this behavior was refined sothat theNone
value was no longer populated into__dict__
, onlyreturned. Besides removing the mutating side effect of a getter operation,this change also made it possible to set columns that did have server defaultsto the value NULL by actually assigningNone
, which was now distinguishedfrom just reading it.
The change however did not accommodate for collections, where returning anempty collection that is not assigned meant that this mutable collection wouldbe different each time and also would not be able to correctly accommodate formutating operations (e.g. append, add, etc.) called upon it. While thebehavior continued to generally not get in anyone’s way, an edge case waseventually identified in#4519 where this empty collection could beharmful, which is when the object is merged into a session:
>>>u1=User(id=1)# create an empty User to merge with id=1 in the database>>>merged1=session.merge(...u1...)# value of merged1.addresses is unchanged from that of the DB>>>u2=User(id=2)# create an empty User to merge with id=2 in the database>>>u2.addresses[]>>>merged2=session.merge(u2)# value of merged2.addresses has been emptied in the DB
Above, the.addresses
collection onmerged1
will contain all theAddress()
objects that were already in the database.merged2
willnot; because it has an empty list implicitly assigned, the.addresses
collection will be erased. This is an example of where this mutating sideeffect can actually mutate the database itself.
While it was considered that perhaps the attribute system should begin usingstrict “plain Python” behavior, raisingAttributeError
in all cases fornon-existent attributes on non-persistent objects and requiring that allcollections be explicitly assigned, such a change would likely be too extremefor the vast number of applications that have relied upon this behavior formany years, leading to a complex rollout / backwards compatibility problem aswell as the likelihood that workarounds to restore the old behavior wouldbecome prevalent, thus rendering the whole change ineffective in any case.
The change then is to keep the default producing behavior, but to finally makethe non-mutating behavior of scalars a reality for collections as well, via theaddition of additional mechanics in the collection system. When accessing theempty attribute, the new collection is created and associated with the state,however is not added to__dict__
until it is actually mutated:
>>>u1=User()>>>l1=u1.addresses# new list is created, associated with the state>>>assertu1.addressesisl1# you get the same list each time you access it>>>assert(..."addresses"notinu1.__dict__...)# but it won't go into __dict__ until it's mutated>>>fromsqlalchemyimportinspect>>>inspect(u1).attrs.addresses.historyHistory(added=None, unchanged=None, deleted=None)
When the list is changed, then it becomes part of the tracked changes tobe persisted to the database:
>>>l1.append(Address())>>>assert"addresses"inu1.__dict__>>>inspect(u1).attrs.addresses.historyHistory(added=[<__main__.Address object at 0x7f49b725eda0>], unchanged=[], deleted=[])
This change is expected to havenearly no impact on existing applicationsin any way, except that it has been observed that some applications may berelying upon the implicit assignment of this collection, such as to assert thatthe object contains certain values based on its__dict__
:
>>>u1=User()>>>u1.addresses[]# this will now fail, would pass before>>>assert{k:vfork,vinu1.__dict__.items()ifnotk.startswith("_")}=={..."addresses":[]...}
or to ensure that the collection won’t require a lazy load to proceed, the(admittedly awkward) code below will now also fail:
>>>u1=User()>>>u1.addresses[]>>>s.add(u1)>>>s.flush()>>>s.close()>>>u1.addresses# <-- will fail, .addresses is not loaded and object is detached
Applications that rely upon the implicit mutating behavior of collections willneed to be changed so that they assign the desired collection explicitly:
>>>u1.addresses=[]
The “New instance conflicts with existing identity” error is now a warning¶
SQLAlchemy has always had logic to detect when an object in theSession
to be inserted has the same primary key as an object that is already present:
classProduct(Base):__tablename__="product"id=Column(Integer,primary_key=True)session=Session(engine)# add Product with primary key 1session.add(Product(id=1))session.flush()# add another Product with same primary keysession.add(Product(id=1))s.commit()# <-- will raise FlushError
The change is that theFlushError
is altered to be only a warning:
sqlalchemy/orm/persistence.py:408:SAWarning:Newinstance<Productat0x7f1ff65e0ba8>withidentitykey(<class'__main__.Product'>, (1,), None) conflicts with persistent instance <Product at 0x7f1ff60a4550>
Subsequent to that, the condition will attempt to insert the row into thedatabase which will emitIntegrityError
, which is the same error thatwould be raised if the primary key identity was not already present in theSession
:
sqlalchemy.exc.IntegrityError:(sqlite3.IntegrityError)UNIQUEconstraintfailed:product.id
The rationale is to allow code that is usingIntegrityError
to catchduplicates to function regardless of the existing state of theSession
, as is often done using savepoints:
# add another Product with same primary keytry:withsession.begin_nested():session.add(Product(id=1))exceptexc.IntegrityError:print("row already exists")
The above logic was not fully feasible earlier, as in the case that theProduct
object with the existing identity were already in theSession
, the code would also have to catchFlushError
,which additionally is not filtered for the specific condition of integrityissues. With the change, the above block behaves consistently with theexception of the warning also being emitted.
Since the logic in question deals with the primary key, all databases emit anintegrity error in the case of primary key conflicts on INSERT. The casewhere an error would not be raised, that would have earlier, is the extremelyunusual scenario of a mapping that defines a primary key on the mappedselectable that is more restrictive than what is actually configured in thedatabase schema, such as when mapping to joins of tables or when definingadditional columns as part of a composite primary key that is not actuallyconstrained in the database schema. However, these situations also work moreconsistently in that the INSERT would theoretically proceed whether or not theexisting identity were still in the database. The warning can also beconfigured to raise an exception using the Python warnings filter.
Persistence-related cascade operations disallowed with viewonly=True¶
When arelationship()
is set asviewonly=True
using therelationship.viewonly
flag, it indicates this relationship shouldonly be used to load data from the database, and should not be mutatedor involved in a persistence operation. In order to ensure this contractworks successfully, the relationship can no longer specifyrelationship.cascade
settings that make no sense in terms of“viewonly”.
The primary targets here are the “delete, delete-orphan” cascades, whichthrough 1.3 continued to impact persistence even if viewonly were True, whichis a bug; even if viewonly were True, an object would still cascade thesetwo operations onto the related object if the parent were deleted or theobject were detached. Rather than modify the cascade operations to checkfor viewonly, the configuration of both of these together is simplydisallowed:
classUser(Base):# ...# this is now an erroraddresses=relationship("Address",viewonly=True,cascade="all, delete-orphan")
The above will raise:
sqlalchemy.exc.ArgumentError:Cascadesettings"delete, delete-orphan, merge, save-update"applytopersistenceoperationsandshouldnotbecombinedwithaviewonly=Truerelationship.
Applications that have this issue should be emitting a warning as ofSQLAlchemy 1.3.12, and for the above error the solution is to removethe cascade settings for a viewonly relationship.
Stricter behavior when querying inheritance mappings using custom queries¶
This change applies to the scenario where a joined- or single- tableinheritance subclass entity is being queried, given a completed SELECT subqueryto select from. If the given subquery returns rows that do not correspond tothe requested polymorphic identity or identities, an error is raised.Previously, this condition would pass silently under joined table inheritance,returning an invalid subclass, and under single table inheritance, theQuery
would be adding additional criteria against the subquery tolimit the results which could inappropriately interfere with the intent of thequery.
Given the example mapping ofEmployee
,Engineer(Employee)
,Manager(Employee)
,in the 1.3 series if we were to emit the following query against a joinedinheritance mapping:
s=Session(e)s.add_all([Engineer(),Manager()])s.commit()print(s.query(Manager).select_entity_from(s.query(Employee).subquery()).all())
The subquery selects both theEngineer
and theManager
rows, andeven though the outer query is againstManager
, we get a nonManager
object back:
SELECTanon_1.typeASanon_1_type,anon_1.idASanon_1_idFROM(SELECTemployee.typeAStype,employee.idASidFROMemployee)ASanon_12020-01-2918:04:13,524INFOsqlalchemy.engine.base.Engine()[<__main__.Engineerobjectat0x7f7f5b9a9810>,<__main__.Managerobjectat0x7f7f5b9a9750>]
The new behavior is that this condition raises an error:
sqlalchemy.exc.InvalidRequestError:Rowwithidentitykey(<class'__main__.Employee'>, (1,), None) can'tbeloadedintoanobject;thepolymorphicdiscriminatorcolumn'%(140205120401296 anon)s.type'referstomappedclassEngineer->engineer,whichisnotasub-mapperoftherequestedmappedclassManager->manager
The above error only raises if the primary key columns of that entity arenon-NULL. If there’s no primary key for a given entity in a row, no attemptto construct an entity is made.
In the case of single inheritance mapping, the change in behavior is slightlymore involved; ifEngineer
andManager
above are mapped withsingle table inheritance, in 1.3 the following query would be emitted andonly aManager
object is returned:
SELECTanon_1.typeASanon_1_type,anon_1.idASanon_1_idFROM(SELECTemployee.typeAStype,employee.idASidFROMemployee)ASanon_1WHEREanon_1.typeIN(?)2020-01-2918:08:32,975INFOsqlalchemy.engine.base.Engine('manager',)[<__main__.Managerobjectat0x7ff1b0200d50>]
TheQuery
added the “single table inheritance” criteria to thesubquery, editorializing on the intent that was originally set up by it.This behavior was added in version 1.0 in#3891, and creates abehavioral inconsistency between “joined” and “single” table inheritance,and additionally modifies the intent of the given query, which may intendto return additional rows where the columns that correspond to the inheritingentity are NULL, which is a valid use case. The behavior is now equivalentto that of joined table inheritance, where it is assumed that the subqueryreturns the correct rows and an error is raised if an unexpected polymorphicidentity is encountered:
SELECTanon_1.typeASanon_1_type,anon_1.idASanon_1_idFROM(SELECTemployee.typeAStype,employee.idASidFROMemployee)ASanon_12020-01-2918:13:10,554INFOsqlalchemy.engine.base.Engine()Traceback(mostrecentcalllast):# ...sqlalchemy.exc.InvalidRequestError:Rowwithidentitykey(<class'__main__.Employee'>, (1,), None) can'tbeloadedintoanobject;thepolymorphicdiscriminatorcolumn'%(140700085268432 anon)s.type'referstomappedclassEngineer->employee,whichisnotasub-mapperoftherequestedmappedclassManager->employee
The correct adjustment to the situation as presented above which worked on 1.3is to adjust the given subquery to correctly filter the rows based on thediscriminator column:
print(s.query(Manager).select_entity_from(s.query(Employee).filter(Employee.discriminator=='manager').subquery()).all())SELECTanon_1.typeASanon_1_type,anon_1.idASanon_1_idFROM(SELECTemployee.typeAStype,employee.idASidFROMemployeeWHEREemployee.type=?)ASanon_12020-01-2918:14:49,770INFOsqlalchemy.engine.base.Engine('manager',)[<__main__.Managerobjectat0x7f70e13fca90>]
Dialect Changes¶
pg8000 minimum version is 1.16.6, supports Python 3 only¶
Support for the pg8000 dialect has been dramatically improved, with help fromthe project’s maintainer.
Due to API changes, the pg8000 dialect now requiresversion 1.16.6 or greater. The pg8000 series has dropped Python 2 support as ofthe 1.13 series. Python 2 users who require pg8000 should ensure theirrequirements are pinned atSQLAlchemy<1.4
.
psycopg2 version 2.7 or higher is required for the PostgreSQL psycopg2 dialect¶
The psycopg2 dialect relies upon many features of psycopg2 releasedin the past few years. To simplify the dialect, version 2.7, releasedin March, 2017 is now the minimum version required.
psycopg2 dialect no longer has limitations regarding bound parameter names¶
SQLAlchemy 1.3 was not able to accommodate bound parameter names that includedpercent signs or parenthesis under the psycopg2 dialect. This in turn meantthat column names which included these characters were also problematic asINSERT and other DML statements would generate parameter names that matchedthat of the column, which would then cause failures. The workaround was to makeuse of theColumn.key
parameter so that an alternate namethat would be used to generate the parameter, or otherwise the parameter styleof the dialect had to be changed at thecreate_engine()
level. As ofSQLAlchemy 1.4.0beta3 all naming limitations have been removed and parametersare fully escaped in all scenarios, so these workarounds are no longernecessary.
psycopg2 dialect features “execute_values” with RETURNING for INSERT statements by default¶
The first half of a significant performance enhancement for PostgreSQL whenusing both Core and ORM, the psycopg2 dialect now usespsycopg2.extras.execute_values()
by default for compiled INSERT statementsand also implements RETURNING support in this mode. The other half of thischange isORM Batch inserts with psycopg2 now batch statements with RETURNING in most cases which allows the ORM to take advantage ofRETURNING with executemany (i.e. batching of INSERT statements) so that ORMbulk inserts with psycopg2 are up to 400% faster depending on specifics.
This extension method allows many rows to be INSERTed within a singlestatement, using an extended VALUES clause for the statement. WhileSQLAlchemy’sinsert()
construct already supports this syntax viatheInsert.values()
method, the extension method allows theconstruction of the VALUES clause to occur dynamically when the statementis executed as an “executemany” execution, which is what occurs when onepasses a list of parameter dictionaries toConnection.execute()
.It also occurs beyond the cache boundary so that the INSERT statement maybe cached before the VALUES are rendered.
A quick test of theexecute_values()
approach using thebulk_inserts.py
script in thePerformance examplesuite reveals an approximatefivefold performance increase:
$python-mexamples.performancebulk_inserts--testtest_core_insert--num100000--dburlpostgresql://scott:tiger@localhost/test# 1.3test_core_insert:AsingleCoreINSERTconstructinsertingmappingsinbulk.(100000iterations);totaltime5.229326sec# 1.4test_core_insert:AsingleCoreINSERTconstructinsertingmappingsinbulk.(100000iterations);totaltime0.944007sec
Support for the “batch” extension was added in version 1.2 inSupport for Batch Mode / Fast Execution Helpers, and enhanced to include support for theexecute_values
extension in 1.3 in#4623. In 1.4 theexecute_values
extension isnow being turned on by default for INSERT statements; the “batch” extensionfor UPDATE and DELETE remains off by default.
In addition, theexecute_values
extension function supports returning therows that are generated by RETURNING as an aggregated list. The psycopg2dialect will now retrieve this list if the giveninsert()
constructrequests returning via theInsert.returning()
method or similar methodsintended to return generated defaults; the rows are then installed in theresult so that they are retrieved as though they came from the cursordirectly. This allows tools like the ORM to use batched inserts in all cases,which is expected to provide a dramatic performance improvement.
Theexecutemany_mode
feature of the psycopg2 dialect has been revisedwith the following changes:
A new mode
"values_only"
is added. This mode uses the very performantpsycopg2.extras.execute_values()
extension method for compiled INSERTstatements run with executemany(), but does not useexecute_batch()
forUPDATE and DELETE statements. This new mode is now the default setting forthe psycopg2 dialect.The existing
"values"
mode is now named"values_plus_batch"
. This modewill useexecute_values
for INSERT statements andexecute_batch
for UPDATE and DELETE statements. The mode is not enabled by defaultbecause it disables the proper functioning ofcursor.rowcount
withUPDATE and DELETE statements executed withexecutemany()
.RETURNING support is enabled for
"values_only"
and"values"
forINSERT statements. The psycopg2 dialect will receive the rows backfrom psycopg2 using the fetch=True flag and install them into the resultset as though they came directly from the cursor (which they ultimately did,however psycopg2’s extension function has aggregated multiple batches intoone list).The default “page_size” setting for
execute_values
has been increasedfrom 100 to 1000. The default remains at 100 for theexecute_batch
function. These parameters may both be modified as was the case before.The
use_batch_mode
flag that was part of the 1.2 version of the featureis removed; the behavior remains controllable via theexecutemany_mode
flag added in 1.3.The Core engine and dialect has been enhanced to support executemanyplus returning mode, currently only available with psycopg2, by providingnew
CursorResult.inserted_primary_key_rows
andCursorResult.returned_default_rows
accessors.
See also
Removed “join rewriting” logic from SQLite dialect; updated imports¶
Dropped support for right-nested join rewriting to support old SQLiteversions prior to 3.7.16, released in 2013. It is not expected thatany modern Python versions rely upon this limitation.
The behavior was first introduced in 0.9 and was part of the larger change ofallowing for right nested joins as described atMany JOIN and LEFT OUTER JOIN expressions will no longer be wrapped in (SELECT * FROM ..) AS ANON_1.However the SQLite workaround produced many regressions in the 2013-2014period due to its complexity. In 2016, the dialect was modified so that thejoin rewriting logic would only occur for SQLite versions prior to 3.7.16 afterbisection was used to identify where SQLite fixed its support for thisconstruct, and no further issues were reported against the behavior (eventhough some bugs were found internally). It is now anticipated that thereare little to no Python builds for Python 2.7 or 3.5 and above (the supportedPython versions) which would include a SQLite version prior to 3.7.17, andthe behavior is only necessary only in more complex ORM joining scenarios.A warning is now emitted if the installed SQLite version is older than3.7.16.
In related changes, the module imports for SQLite no longer attempt toimport the “pysqlite2” driver on Python 3 as this driver does not existon Python 3; a very old warning for old pysqlite2 versions is also dropped.
Added Sequence support for MariaDB 10.3¶
The MariaDB database as of 10.3 supports sequences. SQLAlchemy’s MySQLdialect now implements support for theSequence
object against thisdatabase, meaning “CREATE SEQUENCE” DDL will be emitted for aSequence
that is present in aTable
orMetaData
collection in the same way as it works for backends such as PostgreSQL, Oracle,when the dialect’s server version check has confirmed the database is MariaDB10.3 or greater. Additionally, theSequence
will act as acolumn default and primary key generation object when used in these ways.
Since this change will impact the assumptions both for DDL as well as thebehavior of INSERT statements for an application that is currently deployedagainst MariaDB 10.3 which also happens to make explicit use theSequence
construct within its table definitions, it is important tonote thatSequence
supports a flagSequence.optional
which is used to limit the scenarios in which theSequence
to takeeffect. When “optional” is used on aSequence
that is present in theinteger primary key column of a table:
Table("some_table",metadata,Column("id",Integer,Sequence("some_seq",optional=True),primary_key=True),)
The aboveSequence
is only used for DDL and INSERT statements if thetarget database does not support any other means of generating integer primarykey values for the column. That is, the Oracle database above would use thesequence, however the PostgreSQL and MariaDB 10.3 databases would not. This maybe important for an existing application that is upgrading to SQLAlchemy 1.4which may not have emitted DDL for thisSequence
against its backingdatabase, as an INSERT statement will fail if it seeks to use a sequence thatwas not created.
See also
Added Sequence support distinct from IDENTITY to SQL Server¶
TheSequence
construct is now fully functional with MicrosoftSQL Server. When applied to aColumn
, the DDL for the table willno longer include IDENTITY keywords and instead will rely upon “CREATE SEQUENCE”to ensure a sequence is present which will then be used for INSERT statementson the table.
TheSequence
prior to version 1.3 was used to control parameters forthe IDENTITY column in SQL Server; this usage emitted deprecation warningsthroughout 1.3 and is now removed in 1.4. For control of parameters for anIDENTITY column, themssql_identity_start
andmssql_identity_increment
parameters should be used; see the MSSQL dialect documentation linked below.
flambé! the dragon andThe Alchemist image designs created and generously donated byRotem Yaari.
Created usingSphinx 8.2.3. Documentation last generated: Thu 10 Jul 2025 12:27:10 AM EDT