10.Tuning python-oracledb
Some general tuning tips are:
Tune your application architecture.
A general application goal is to reduce the number ofround-trips between python-oracledb and the database.
For multi-user applications, make use of connection pooling. Create the poolonce during application initialization. Do not oversize the pool, seeConnection Pooling. Use a session callback function to set session state,seeSession Callbacks for Setting Pooled Connection State.
Make use of efficient python-oracledb functions. For example, to insertmultiple rows use
Cursor.executemany()
instead ofCursor.execute()
. Another example is to fetch data directly asdata frames when working with packages like Pandasand NumPy.Tune your SQL statements. See theSQL Tuning Guide.
Usebind variables to avoid statement reparsing.
Tune
Cursor.arraysize
andCursor.prefetchrows
for each query,seeTuning Fetch Performance.Do simple optimizations likelimiting the number of rowsand avoiding selecting columns not used in the application.
It may be faster to work with simple scalar relational values than to useOracle Database object types.
Make good use of PL/SQL to avoid executing many individual statements frompython-oracledb.
Tune theStatement Cache.
EnableClient Result Caching for small lookuptables.
Tune your database. See theDatabase Performance Tuning Guide.
Tune your network. For example, when inserting or retrieving a large numberof rows (or for large data), or when using a slow network, then tune theOracle Network Session Data Unit (SDU) and socket buffer sizes, seeConfiguring Session Data Unit andOracle NetServices: Best Practices for Database Performance and High Availability.
In python-oracledb Thick mode, the SDU size is configured in theOptional Oracle Net Configuration Files. In python-oracledb Thin mode, the SDU size can be passedas a connection or pool creation parameter. In both modes it may optionallybe set in the connectionEasy Connect string orconnect descriptor. The SDU size that will actuallybe used is negotiated down to the lower of application-side value and thedatabase network SDU configuration value.
Do not commit or rollback unnecessarily. Use
Connection.autocommit
on the last of a sequence of DML statements.Consider usingconcurrent programming orpipelining.
If Python’s Global Interpreter Lock (GIL) is limiting concurrent programperformance, then explore using parallel Python processes.
10.1.Tuning Fetch Performance
To tune queries, you can adjust python-oracledb’s internal buffer sizes toimprove the speed of fetching rows across the network from the database, and tooptimize memory usage. This can reduceround-trips whichhelps performance and scalability. Tune “array fetching” withCursor.arraysize
and tune “row prefetching” withCursor.prefetchrows
. Set these before callingCursor.execute()
. The value used for prefetching can also be set in anoraaccess.xml
file, seeOptional Oracle Client Configuration File. In python-oracledb Thickmode, the internal buffers allocated forprefetchrows
andarraysize
areseparate, so increasing both settings will require more Python process memory.Queries that return LOBs and similar types will never prefetch rows, so theprefetchrows
value is ignored in those cases.
The internal buffer sizes do not affect how or when rows are returned to yourapplication regardless of whichpython-oracledb method isused to fetch query results. They do not affect the minimum or maximum numberof rows returned by a query.
The difference between row prefetching and array fetching is when the internalbuffering occurs. Internally python-oracledb performs separate “execute SQLstatement” and “fetch data” steps. Prefetching allows query results to bereturned to the application when the acknowledgment of successful statementexecution is returned from the database. This means that the subsequentinternal “fetch data” operation does not always need to make a round-trip tothe database because rows are already buffered in python-oracledb or in theOracle Client libraries. An overhead of prefetching when using thepython-oracledb Thick mode is the need for additional data copies from OracleClient’s prefetch buffer when fetching the first batch of rows. This cost mayoutweigh the benefits of using prefetching in some cases.
10.1.1.Choosing values forarraysize
andprefetchrows
The bestCursor.arraysize
andCursor.prefetchrows
values can befound by experimenting with your application under the expected load of normalapplication use. The reduction of round-trips may help performance and overallsystem scalability. The documentation inround-trips showshow to measure round-trips.
Here are some suggestions for tuning:
To tune queries that return an unknown, large, number of rows, estimate thenumber of rows returned and increase the
Cursor.arraysize
value forbest performance, memory and round-trip usage. The default is 100. Forexample:cur=connection.cursor()cur.arraysize=1000forrowincur.execute("SELECT * FROM very_big_table"):print(row)
In general for this scenario, leave
prefetchrows
at its default value.If you do change it, then setarraysize
as big, or bigger. Do not makethe sizes unnecessarily large.If you are fetching a fixed number of rows, set
arraysize
to the numberof expected rows, and setprefetchrows
to one greater than this value.Adding one removes the need for a round-trip to check for end-of-fetch. Forexample, if you are querying 20 rows, perhaps todisplay a page of data, then setprefetchrows
to 21 andarraysize
to 20:cur=connection.cursor()cur.prefetchrows=21cur.arraysize=20forrowincur.execute(""" SELECT last_name FROM employees ORDER BY last_name OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY"""):print(row)
This will return all rows for the query in one round-trip.
If you know that a query returns just one row then set
Cursor.arraysize
to 1 to minimize memory usage. The default prefetchvalue of 2 allows minimal round-trips for single-row queries:cur=connection.cursor()cur.arraysize=1cur.execute("select * from MyTable where id = 1"):row=cur.fetchone()print(row)
The following table shows the number of round-trips required to fetch variousnumbers of rows with differentprefetchrows
andarraysize
values.
Number of rows |
|
| Round-trips |
---|---|---|---|
1 | 2 | 100 | 1 |
100 | 2 | 100 | 2 |
1000 | 2 | 100 | 11 |
10000 | 2 | 100 | 101 |
10000 | 2 | 1000 | 11 |
10000 | 1000 | 1000 | 11 |
20 | 20 | 20 | 2 |
20 | 21 | 20 | 1 |
The number of round-trips will be the same regardless of whichpython-oracledb method is used to fetch query results.
10.1.1.1.Application Default Prefetchrows and Arraysize Values
Application-wide defaults can be set usingdefaults.prefetchrows
anddefaults.arraysize
, for example:
importoracledboracledb.defaults.prefetchrows=1000oracledb.defaults.arraysize=1000
When using python-oracledb in the Thick mode, prefetching can also be tuned inan externaloraaccess.xml file, which may be useful fortuning an application when modifying its code is not feasible.
Setting the sizes withoracledb.defaults
attributes or withoraaccess.xml
will affect the whole application, so it should not be thefirst tuning choice.
10.1.1.2.Changing Prefetchrows and Arraysize for Re-executed Statements
In python-oracledb, thearraysize
andprefetchrows
values are onlyexamined when a statement is executed the first time. To change the values fora re-executed statement, create a new cursor. For example, to changearraysize
:
array_sizes=(10,100,1000)forsizeinarray_sizes:cursor=connection.cursor()cursor.arraysize=sizestart=time.time()cursor.execute(sql).fetchall()elapsed=time.time()-startprint("Time for",size,elapsed,"seconds")
10.1.1.3.Avoiding Premature Prefetching
There are two cases that will benefit from settingprefetchrows
to zero:
When passing a python-oracledb cursorinto PL/SQL. Setting
prefetchrows
to 0 can stop rows being prematurely (and silently) fetchedinto the python-oracledb internal buffer, making those rows unavailable tothe PL/SQL REF CURSOR parameter:refcursor=connection.cursor()refcursor.prefetchrows=0refcursor.execute("select ...")cursor.callproc("myproc",[refcursor])
When querying a PL/SQL function that uses PIPE ROW to emit rows atintermittent intervals. By default, several rows needs to be emitted by thefunction before python-oracledb can return them to the application. Setting
prefetchrows
to 0 helps give a consistent flow of data to theapplication.
10.1.2.Tuning Fetching from REF CURSORS
The internal buffering and performance of fetching data from REF CURSORS can betuned by setting the value ofarraysize
before rows are fetched from thecursor. Theprefetchrows
value is ignored when fetchingfrom REF CURSORS.
For example:
ref_cursor=connection.cursor()cursor.callproc("myrefcursorproc",[ref_cursor])ref_cursor.arraysize=1000print("Sum of IntCol for",num_rows,"rows:")forrowinref_cursor:sum_rows+=row[0]print(sum_rows)
Thearraysize
value can also be set before calling the procedure:
ref_cursor=connection.cursor()ref_cursor.arraysize=1000cursor.callproc("myrefcursorproc",[ref_cursor])forrowinref_cursor:...
Also seeAvoiding Premature Prefetching.
10.1.3.Tuning Fetching for Data Frames
When fetchingdata frames withConnection.fetch_df_all()
orConnection.fetch_df_batches()
,tuning of data transfer across the network is controlled by the respectivemethodsarraysize
orsize
parameters.
Anydefaults.prefetchrows
value is ignored since these methods alwaysset the internal prefetch size to the relevantarraysize
orsize
value.
10.1.4.Parallelizing Data Fetches from a Single Table
Before trying to improve the performance of querying a single table by issuingmultiple SQL queries in multiple threads, where each query extracts a differentrange of data, you should do careful benchmarking.
Factors that will impact such a solution:
How heavily loaded is the database? The parallel solution may appear to befast but it could be inefficient, thereby impacting, or eventually beinglimited by, everyone else.
A single python-oracledb connection can only do one database operation at atime, so you need to use a different connection for each executed SQLstatement, for example, using connections from apool. This will cause extra database load that needs to beassessed.
A naive solution using the OFFSET FETCH syntax to fetch sections of a tablein individual queries will still cause table blocks to be scanned even thoughnot all data is returned.
Is the table partitioned?
Are zone maps being used?
Maybe the real performance bottleneck cannot be solved by parallelism.Perhaps you have function based indexes that are being invoked for everyrow.
Is the data in the database spread across multiple spindles or is the onedisk having to seek?
Is Exadata with storage indexes being used?
What is the impact of Python’s Global Interpreter Lock (GIL)? Maybe multiplePython processes should be used instead of threads.
What is the application doing with the data? Can the receiving endefficiently process it?
Is it better to execute a single query in Python but use a PARALLEL queryhint? Or will this overload the database.
10.2.Database Round-trips
A round-trip is defined as the travel of a message from python-oracledb to thedatabase and back. Calling each python-oracledb function, or accessing eachattribute, will require zero or more round-trips. For example, inserting asimple row involves sending data to the database and getting a success responseback. This is a round-trip. Along with tuning an application’s architectureandtuning its SQL statements, a generalperformance and scalability goal is to minimizeround-trips because they impact application performance andoverall system scalability.
Some general tips for reducing round-trips are:
Tune
Cursor.arraysize
andCursor.prefetchrows
for eachquery.Use
Cursor.executemany()
for optimal DML execution.Only commit when necessary. Use
Connection.autocommit
on the laststatement of a transaction.For connection pools, use a callback to set connection state, seeSession Callbacks for Setting Pooled Connection State.
Make use of PL/SQL procedures which execute multiple SQL statements insteadof executing them individually from python-oracledb.
Review whetherPipelining can be used.
Use scalar types instead of Oracle Database object types.
Avoid overuse of
Connection.ping()
.Avoid setting
ConnectionPool.ping_interval
to 0 or a small value.When usingSODA, use pooled connections and enabletheSODA metadata cache.
10.2.1.Finding the Number of Round-Trips
Oracle’sAutomatic Workload Repository(AWR) reports show ‘SQL*Net roundtrips to/from client’ and are useful forfinding the overall behavior of a system.
Sometimes you may wish to find the number of round-trips used for aspecific application. Snapshots of the V$SESSTAT view taken beforeand after doing some work can be used for this:
# Get the connection's session iddefget_session_id(connection):sql="select sys_context('userenv','sid') from dual"result,=connection.cursor().execute(sql).fetchone()returnresult# Get the number of round-trips a session has made so fardefget_round_trips(systemconn,sid):sql="""select ss.value from v$sesstat ss, v$statname sn where ss.sid = :sid and ss.statistic# = sn.statistic# and sn.name like '%roundtrip%client%'"""round_trips,=systemconn.cursor().execute(sql,[sid]).fetchone()returnround_tripssystemconn=oracledb.connect(user="system",password=spw,dsn=cs)connection=oracledb.connect(user=un,password=pw,dsn=cs)sid=get_session_id(connection)round_trips_before=get_round_trips(systemconn,sid)# Do some "work"cursor.execute("select ...")rows=cursor.fetchall()round_trips_after=get_round_trips(systemconn,sid)print(f"Round-trips required for query:{round_trips_after-round_trips_before}")
Note that V$SESSTAT is not accurate forpipelined database operations.
10.3.Statement Caching
Python-oracledb’sCursor.execute()
andCursor.executemany()
methods use statement caching to make re-execution of statements efficient.Statement caching lets Oracle Database cursors be used without re-parsing thestatement. Statement caching also reduces metadata transfer costs betweenpython-oracledb and the database. Performance and scalability are improved.
The python-oracledb Thick mode usesOracle Call Interface statement caching, whereas the Thin mode uses its own implementation.
Each standalone or pooled connection has its own cache of statements with adefault size of 20. The default size of the statement cache can be changedusing thedefaults.stmtcachesize
attribute. The size can be set whencreating connection pools or standalone connections. In general, set thestatement cache size to the size of the working set of statements beingexecuted by the application. To manually tune the cache, monitor the generalapplication load and theAutomatic Workload Repository (AWR) “bytes sent via SQL*Net to client” values. The latterstatistic should benefit from not shipping statement metadata topython-oracledb. Adjust the statement cache size to your satisfaction. WithOracle Database 12c (or later), the Thick mode statement cache size can beautomatically tuned using anoraaccess.xml file.
10.3.1.Setting the Statement Cache
The statement cache size can be set globally withdefaults.stmtcachesize
:
importoracledboracledb.defaults.stmtcachesize=40
The value can be overridden in anoracledb.connect()
call, or whencreating a pool withoracledb.create_pool()
. For example:
oracledb.create_pool(user="scott",password=userpwd,dsn="dbhost.example.com/orclpb",min=2,max=5,increment=1,stmtcachesize=50)
When python-oracledb Thick mode uses Oracle Client 21 (or later), changing thecache size withConnectionPool.reconfigure()
does not immediatelyaffect connections previously acquired and currently in use. When thoseconnections are subsequently released to the pool and re-acquired, they willthen use the new value. When the Thick mode uses Oracle Client prior toversion 21, changing the pool’s statement cache size has no effect onconnections that already exist in the pool but will affect new connectionsthat are subsequently created, for example when the pool grows.
10.3.2.Tuning the Statement Cache
In general, set the statement cache to the size of the working set ofstatements being executed by the application.SODAinternally makes SQL calls, so tuning the cache is also beneficial for SODAapplications.
In python-oracledb Thick mode with Oracle Client Libraries 12c (or later), thestatement cache size can be automatically tuned with the Oracle ClientConfigurationoraaccess.xml file.
For manual tuning use views like V$SYSSTAT:
SELECTvalueFROMV$SYSSTATWHEREname='parse count (total)'
Find the value before and after running application load to give the number ofstatement parses during the load test. Alter the statement cache size andrepeat the test until you find a minimal number of parses.
If you have Automatic Workload Repository (AWR) reports you can monitorgeneral application load and the “bytes sent via SQL*Net to client” values.The latter statistic should benefit from not shipping statement metadata topython-oracledb. Adjust the statement cache size and re-run the test to findthe best cache size.
10.3.3.Disabling the Statement Cache
Statement caching can be disabled by setting the cache size to 0:
oracledb.defaults.stmtcachesize=0
Disabling the cache may be beneficial when the quantity or order of statementscauses cache entries to be flushed before they get a chance to bereused. For example if there are more distinct statements than cacheslots, and the order of statement execution causes older statements tobe flushed from the cache before the statements are re-executed.
Disabling the statement cache may also be helpful in test and developmentenvironments. The statement cache can become invalid if connections remainopen and database schema objects are recreated. Applications can then receiveerrors such asORA-3106
. However, after a statement execution error isreturned once to the application, python-oracledb automatically drops thatstatement from the cache. This lets subsequent re-executions of the statementon that connection to succeed.
When it is inconvenient to pass statement text through an application, theCursor.prepare()
call can be used to avoid statement re-parsing.If thecache_statement
parameter in theCursor.prepare()
method isTrue and the statement cache size is greater than 0, then the statements willbe added to the cache, if not already present. If thecache_statement
parameter in theCursor.prepare()
method is False and the statementcache size is greater than 0, then the statement will be removed from thestatement cache (if present) or will not be cached (if not present). Thesubsequentexecute()
calls use the value None instead of the SQL text.
This feature can prevent a rarely executed statement from flushing a potentialmore frequently executed one from a full cache. For example, if a statementwill only ever be executed once:
cursor.prepare("select user from dual",cache_statement=False)cursor.execute(None)
Alternatively,
sql="select user from dual"cursor.prepare(sql,cache_statement=False)cursor.execute(sql)
Statements passed toprepare()
are also stored in the statementcache.
10.4.Client Result Caching (CRC)
Python-oracledb applications can use Oracle Database’sClient Result Cache. The CRC enables client-side caching of SQL query(SELECT statement) results in client memory for immediate use when the samequery is re-executed. This is useful for reducing the cost of queries forsmall, mostly static, lookup tables, such as for postal codes. CRC reducesnetworkround-trips, and also reduces database server CPUusage.
Note
Client Result Caching is only supported in the python-oracledb Thick mode.SeeEnabling python-oracledb Thick mode.
The cache is at the application process level. Access and invalidation ismanaged by the Oracle Client libraries. This removes the need for extraapplication logic, or external utilities, to implement a cache.
CRC can be enabled by setting thedatabase parametersCLIENT_RESULT_CACHE_SIZE
andCLIENT_RESULT_CACHE_LAG
, and thenrestarting the database, for example:
SQL>ALTERSYSTEMSETCLIENT_RESULT_CACHE_LAG=3000SCOPE=SPFILE;SQL>ALTERSYSTEMSETCLIENT_RESULT_CACHE_SIZE=64KSCOPE=SPFILE;SQL>STARTUPFORCE
CRC can alternatively be configured in anoraaccess.xmlorsqlnet.ora file on the Python host, seeClientConfiguration Parameters.
Tables can then be created, or altered, so repeated queries use CRC. Thisallows existing applications to use CRC without needing modification. Forexample:
SQL>CREATETABLEcities(idnumber,namevarchar2(40))RESULT_CACHE(MODEFORCE);SQL>ALTERTABLElocationsRESULT_CACHE(MODEFORCE);
Alternatively, hints can be used in SQL statements. For example:
SELECT/*+ result_cache */postal_codeFROMlocations