- API reference
- Input/output
- pandas.DataF...
pandas.DataFrame.to_sql#
- DataFrame.to_sql(name,con,*,schema=None,if_exists='fail',index=True,index_label=None,chunksize=None,dtype=None,method=None)[source]#
Write records stored in a DataFrame to a SQL database.
Databases supported by SQLAlchemy[1] are supported. Tables can benewly created, appended to, or overwritten.
- Parameters:
- namestr
Name of SQL table.
- consqlalchemy.engine.(Engine or Connection) or sqlite3.Connection
Using SQLAlchemy makes it possible to use any DB supported by thatlibrary. Legacy support is provided for sqlite3.Connection objects. The useris responsible for engine disposal and connection closure for the SQLAlchemyconnectable. Seehere.If passing a sqlalchemy.engine.Connection which is already in a transaction,the transaction will not be committed. If passing a sqlite3.Connection,it will not be possible to roll back the record insertion.
- schemastr, optional
Specify the schema (if database flavor supports this). If None, usedefault schema.
- if_exists{‘fail’, ‘replace’, ‘append’}, default ‘fail’
How to behave if the table already exists.
fail: Raise a ValueError.
replace: Drop the table before inserting new values.
append: Insert new values to the existing table.
- indexbool, default True
Write DataFrame index as a column. Usesindex_label as the columnname in the table. Creates a table index for this column.
- index_labelstr or sequence, default None
Column label for index column(s). If None is given (default) andindex is True, then the index names are used.A sequence should be given if the DataFrame uses MultiIndex.
- chunksizeint, optional
Specify the number of rows in each batch to be written at a time.By default, all rows will be written at once.
- dtypedict or scalar, optional
Specifying the datatype for columns. If a dictionary is used, thekeys should be the column names and the values should be theSQLAlchemy types or strings for the sqlite3 legacy mode. If ascalar is provided, it will be applied to all columns.
- method{None, ‘multi’, callable}, optional
Controls the SQL insertion clause used:
None : Uses standard SQL
INSERT
clause (one per row).‘multi’: Pass multiple values in a single
INSERT
clause.callable with signature
(pd_table,conn,keys,data_iter)
.
Details and a sample callable implementation can be found in thesectioninsert method.
- Returns:
- None or int
Number of rows affected by to_sql. None is returned if the callablepassed into
method
does not return an integer number of rows.The number of returned rows affected is the sum of the
rowcount
attribute ofsqlite3.Cursor
or SQLAlchemy connectable which may notreflect the exact number of written rows as stipulated in thesqlite3 orSQLAlchemy.Added in version 1.4.0.
- Raises:
- ValueError
When the table already exists andif_exists is ‘fail’ (thedefault).
See also
read_sql
Read a DataFrame from a table.
Notes
Timezone aware datetime columns will be written as
Timestampwithtimezone
type with SQLAlchemy if supported by thedatabase. Otherwise, the datetimes will be stored as timezone unawaretimestamps local to the original timezone.Not all datastores support
method="multi"
. Oracle, for example,does not support multi-value insert.References
Examples
Create an in-memory SQLite database.
>>>fromsqlalchemyimportcreate_engine>>>engine=create_engine('sqlite://',echo=False)
Create a table from scratch with 3 rows.
>>>df=pd.DataFrame({'name':['User 1','User 2','User 3']})>>>df name0 User 11 User 22 User 3
>>>df.to_sql(name='users',con=engine)3>>>fromsqlalchemyimporttext>>>withengine.connect()asconn:...conn.execute(text("SELECT * FROM users")).fetchall()[(0, 'User 1'), (1, 'User 2'), (2, 'User 3')]
Ansqlalchemy.engine.Connection can also be passed tocon:
>>>withengine.begin()asconnection:...df1=pd.DataFrame({'name':['User 4','User 5']})...df1.to_sql(name='users',con=connection,if_exists='append')2
This is allowed to support operations that require that the sameDBAPI connection is used for the entire operation.
>>>df2=pd.DataFrame({'name':['User 6','User 7']})>>>df2.to_sql(name='users',con=engine,if_exists='append')2>>>withengine.connect()asconn:...conn.execute(text("SELECT * FROM users")).fetchall()[(0, 'User 1'), (1, 'User 2'), (2, 'User 3'), (0, 'User 4'), (1, 'User 5'), (0, 'User 6'), (1, 'User 7')]
Overwrite the table with just
df2
.>>>df2.to_sql(name='users',con=engine,if_exists='replace',...index_label='id')2>>>withengine.connect()asconn:...conn.execute(text("SELECT * FROM users")).fetchall()[(0, 'User 6'), (1, 'User 7')]
Use
method
to define a callable insertion method to do nothingif there’s a primary key conflict on a table in a PostgreSQL database.>>>fromsqlalchemy.dialects.postgresqlimportinsert>>>definsert_on_conflict_nothing(table,conn,keys,data_iter):...# "a" is the primary key in "conflict_table"...data=[dict(zip(keys,row))forrowindata_iter]...stmt=insert(table.table).values(data).on_conflict_do_nothing(index_elements=["a"])...result=conn.execute(stmt)...returnresult.rowcount>>>df_conflict.to_sql(name="conflict_table",con=conn,if_exists="append",method=insert_on_conflict_nothing)0
For MySQL, a callable to update columns
b
andc
if there’s a conflicton a primary key.>>>fromsqlalchemy.dialects.mysqlimportinsert>>>definsert_on_conflict_update(table,conn,keys,data_iter):...# update columns "b" and "c" on primary key conflict...data=[dict(zip(keys,row))forrowindata_iter]...stmt=(...insert(table.table)....values(data)...)...stmt=stmt.on_duplicate_key_update(b=stmt.inserted.b,c=stmt.inserted.c)...result=conn.execute(stmt)...returnresult.rowcount>>>df_conflict.to_sql(name="conflict_table",con=conn,if_exists="append",method=insert_on_conflict_update)2
Specify the dtype (especially useful for integers with missing values).Notice that while pandas is forced to store the data as floating point,the database supports nullable integers. When fetching the data withPython, we get back integer scalars.
>>>df=pd.DataFrame({"A":[1,None,2]})>>>df A0 1.01 NaN2 2.0
>>>fromsqlalchemy.typesimportInteger>>>df.to_sql(name='integers',con=engine,index=False,...dtype={"A":Integer()})3
>>>withengine.connect()asconn:...conn.execute(text("SELECT * FROM integers")).fetchall()[(1,), (None,), (2,)]