Table of Contents
Managing database migrations is a great challenge in any software project. Luckily, as of version 1.7,Django comes with a built-in migration framework. The framework is very powerful and useful in managing change in databases. But the flexibility provided by the framework required some compromises. To understand the limitations of Django migrations, you are going to tackle a well known problem: creating an index in Django with no downtime.
In this tutorial, you’ll learn:
This intermediate-level tutorial is designed for readers who are already familiar with Django migrations. For an introduction to that topic, check outDjango Migrations: A Primer.
Free Bonus:Click here to get free access to additional Django tutorials and resources you can use to deepen your Python web development skills.
A common change that usually becomes necessary when the data stored by your application grows is adding an index. Indexes are used to speed up queries and make your app feel fast and responsive.
In most databases, adding an index requires an exclusive lock on the table. An exclusive lock prevents data modification (DML) operations such asUPDATE
,INSERT
, andDELETE
, while the index is created.
Locks are obtained implicitly by the database when executing certain operations. For example, when a user logs into your app, Django will update thelast_login
field in theauth_user
table. To perform the update, the database will first have to obtain a lock on the row. If the row is currently being locked by another connection, then you might get adatabase exception.
Locking a table might pose a problem when it’s necessary to keep the system available during migrations. The bigger the table, the longer it can take to create the index. The longer it takes to create the index, the longer the system is unavailable or unresponsive to users.
Some database vendors provide a way to create an index without locking the table. For example, to create an index in PostgreSQL without locking a table, you can use theCONCURRENTLY
keyword:
CREATEINDEXCONCURRENTLYixONtable(column);
In Oracle, there is anONLINE
option to allow DML operations on the table while the index is created:
CREATEINDEXixONtable(column)ONLINE;
When generating migrations, Django will not use these special keywords. Running the migration as is will make the database acquire an exclusive lock on the table and prevent DML operations while the index is created.
Creating an index concurrently has some caveats. It’s important to understand the issues specific to your database backend in advance. For example,one caveat in PostgreSQL is that creating an index concurrently takes longer because it requires an additional table scan.
In this tutorial, you’ll use Django migrations to create an index on a large table, without causing any downtime.
Note: To follow this tutorial, it is recommended that you use a PostgreSQL backend, Django 2.x, and Python 3.
It is possible to follow along with other database backends as well. In places where SQL features unique to PostgreSQL are used, change the SQL to match your database backend.
You’re going to use a made upSale
model in an app calledapp
. In a real life situation, models such asSale
are the main tables in the database, and they will usually be very big and store a lot of data:
# models.pyfromdjango.dbimportmodelsclassSale(models.Model):sold_at=models.DateTimeField(auto_now_add=True,)charged_amount=models.PositiveIntegerField()
To create the table, generate the initial migration and apply it:
$pythonmanage.pymakemigrationsMigrations for 'app': app/migrations/0001_initial.py - Create model Sale$pythonmanagemigrateOperations to perform: Apply all migrations: appRunning migrations: Applying app.0001_initial... OK
After a while, the sales table becomes very big, and users start to complain about slowness. While monitoring the database, you noticed that a lot of queries use thesold_at
column. To speed things up, you decide that you need an index on the column.
To add an index onsold_at
, you make the following change to the model:
# models.pyfromdjango.dbimportmodelsclassSale(models.Model):sold_at=models.DateTimeField(auto_now_add=True,db_index=True,)charged_amount=models.PositiveIntegerField()
If you run this migration as it is, then Django will create the index on the table, and it will be locked until the index is completed. It can take a while to create an index on a very large table, and you want to avoid downtime.
On a local development environment with a small dataset and very few connections, this migration might feel instantaneous. However, on large datasets with many concurrent connections, obtaining a lock and creating the index can take a while.
In the next steps, you are going to modify migrations created by Django to create the index without causing any downtime.
The first approach is to create the index manually. You are going to generate the migration, but you are not going to actually let Django apply it. Instead, you will run the SQL manually in the database and then make Django think the migration completed.
First, generate the migration:
$pythonmanage.pymakemigrations--nameadd_index_fakeMigrations for 'app': app/migrations/0002_add_index_fake.py - Alter field sold_at on sale
Use thesqlmigrate
command to view the SQL Django will use to execute this migration:
$pythonmanage.pysqlmigrateapp0002BEGIN;---- Alter field sold_at on sale--CREATE INDEX "app_sale_sold_at_b9438ae4" ON "app_sale" ("sold_at");COMMIT;
You want to create the index without locking the table, so you need to modify the command. Add theCONCURRENTLY
keyword and execute in the database:
app=#CREATEINDEXCONCURRENTLY"app_sale_sold_at_b9438ae4"ON"app_sale"("sold_at");CREATE INDEX
Notice that you executed the command without theBEGIN
andCOMMIT
parts. Omitting these keywords will execute the commands without a database transaction. We will discuss database transactions later in the article.
After you executed the command, if you try to apply migrations, then you will get the following error:
$pythonmanage.pymigrateOperations to perform: Apply all migrations: appRunning migrations: Applying app.0002_add_index_fake...Traceback (most recent call last): File "venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 85, in _execute return self.cursor.execute(sql, params)psycopg2.ProgrammingError: relation "app_sale_sold_at_b9438ae4" already exists
Django complains that the index already exists, so it can’t proceed with the migration. You just created the index directly in the database, so now you need to make Django think that the migration was already applied.
How to Fake a Migration
Django provides a built-in way of marking migrations as executed, without actually executing them. To use this option, set the--fake
flag when applying the migration:
$pythonmanage.pymigrate--fakeOperations to perform: Apply all migrations: appRunning migrations: Applying app.0002_add_index_fake... FAKED
Django didn’t raise an error this time. In fact, Django didn’t really apply any migration. It just marked it as executed (orFAKED
).
Here are some issues to consider when faking migrations:
The manual command must be equivalent to the SQL generated by Django: You need to make sure the command you execute is equivalent to the SQL generated by Django. Usesqlmigrate
to produce the SQL command. If the commands do not match, then you might end up with inconsistencies between the database and the models state.
Other unapplied migrations will also be faked: When you have multiple unapplied migrations, they will all be faked. Before you apply migrations, it’s important to make sure only the migrations you want to fake are unapplied. Otherwise, you might end up with inconsistencies. Another option is to specify the exact migration you want to fake.
Direct access to the database is required: You need to run the SQL command in the database. This is not always an option. Also, executing commands directly in a production database is dangerous and should be avoided when possible.
Automated deployment processes might need adjustments: If youautomated the deployment process (using CI, CD, or other automation tools), then you might need to alter the process to fake migrations. This is not always desirable.
Cleanup
Before moving on to the next section, you need to bring the database back to its state right after the initial migration. To do that, migrate back to the initial migration:
$pythonmanage.pymigrate0001Operations to perform: Target specific migration: 0001_initial, from appRunning migrations: Rendering model states... DONE Unapplying app.0002_add_index_fake... OK
Django unapplied the changes made in the second migration, so now it’s safe to also delete the file:
$rmapp/migrations/0002_add_index_fake.py
To make sure you did everything right, inspect the migrations:
$pythonmanage.pyshowmigrationsappapp [X] 0001_initial
The initial migration was applied, and there are no unapplied migrations.
In the previous section, you executed SQL directly in the database and faked the migration. This gets the job done, but there is a better solution.
Django provides a way to execute raw SQL in migrations usingRunSQL
. Let’s try to use it instead of executing the command directly in the database.
First, generate a new empty migration:
$pythonmanage.pymakemigrationsapp--empty--nameadd_index_runsqlMigrations for 'app': app/migrations/0002_add_index_runsql.py
Next, edit the migration file and add aRunSQL
operation:
# migrations/0002_add_index_runsql.pyfromdjango.dbimportmigrations,modelsclassMigration(migrations.Migration):atomic=Falsedependencies=[('app','0001_initial'),]operations=[migrations.RunSQL('CREATE INDEX "app_sale_sold_at_b9438ae4" ''ON "app_sale" ("sold_at");',),]
When you run the migration, you will get the following output:
$pythonmanage.pymigrateOperations to perform: Apply all migrations: appRunning migrations: Applying app.0002_add_index_runsql... OK
This is looking good, but there is a problem. Let’s try to generate migrations again:
$pythonmanage.pymakemigrations--nameleftover_migrationMigrations for 'app': app/migrations/0003_leftover_migration.py - Alter field sold_at on sale
Django generated the same migration again. Why did it do that?
Cleanup
Before we can answer that question, you need to clean up and undo the changes you made to the database. Start by deleting the last migration. It was not applied, so it’s safe to delete:
$rmapp/migrations/0003_leftover_migration.py
Next, list the migrations for theapp
app:
$pythonmanage.pyshowmigrationsappapp [X] 0001_initial [X] 0002_add_index_runsql
The third migration is gone, but the second is applied. You want to get back to the state right after the initial migration. Try to migrate back to the initial migration as you did in the previous section:
$pythonmanage.pymigrateapp0001Operations to perform: Target specific migration: 0001_initial, from appRunning migrations: Rendering model states... DONE Unapplying app.0002_add_index_runsql...Traceback (most recent call last):NotImplementedError: You cannot reverse this operation
Django is unable to reverse the migration.
To reverse a migration, Django executes an opposite action for every operation. In this case, the reverse of adding an index is to drop it. As you’ve already seen, when a migration is reversible, you can unapply it. Just like you can usecheckout
in Git, you can reverse a migration if you executemigrate
to an earlier migration.
Many built-in migration operations already define a reverse action. For example, the reverse action for adding a field is to drop the corresponding column. The reverse action for creating a model is to drop the corresponding table.
Some migration operations are not reversible. For example, there is no reverse action for removing a field or deleting a model, because once the migration was applied, the data is gone.
In the previous section, you used theRunSQL
operation. When you tried to reverse the migration, you encountered an error. According to the error, one of the operations in the migration cannot be reversed. Django is unable to reverse raw SQL by default. Because Django has no knowledge of what was executed by the operation, it cannot generate an opposite action automatically.
How to Make a Migration Reversible
For a migration to be reversible, all the operations in it must be reversible. It’s not possible to reverse part of a migration, so a single non-reversible operation will make the entire migration non-reversible.
To make aRunSQL
operation reversible, you must provide SQL to execute when the operation is reversed. The reverse SQL is provided in thereverse_sql
argument.
The opposite action to adding an index is to drop it. To make your migration reversible, provide thereverse_sql
to drop the index:
# migrations/0002_add_index_runsql.pyfromdjango.dbimportmigrations,modelsclassMigration(migrations.Migration):atomic=Falsedependencies=[('app','0001_initial'),]operations=[migrations.RunSQL('CREATE INDEX "app_sale_sold_at_b9438ae4" ''ON "app_sale" ("sold_at");',reverse_sql='DROP INDEX "app_sale_sold_at_b9438ae4";',),]
Now try to reverse the migration:
$pythonmanage.pyshowmigrationsappapp [X] 0001_initial [X] 0002_add_index_runsql$pythonmanage.pymigrateapp0001Operations to perform: Target specific migration: 0001_initial, from appRunning migrations: Rendering model states... DONE Unapplying app.0002_add_index_runsql... OK$pythonmanage.pyshowmigrationsappapp [X] 0001_initial [ ] 0002_add_index_runsql
The second migration was reversed, and the index was dropped by Django. Now it’s safe to delete the migration file:
$rmapp/migrations/0002_add_index_runsql.py
It’s always a good idea to providereverse_sql
. In situations where reversing a raw SQL operation does not require any action, you can mark the operation as reversible using the special sentinelmigrations.RunSQL.noop
:
migrations.RunSQL(sql='...',# Your forward SQL herereverse_sql=migrations.RunSQL.noop,),
In your previous attempt to create the index manually usingRunSQL
, Django generated the same migration over and over again even though the index was created in the database. To understand why Django did that, you first need to understand how Django decides when to generate new migrations.
In the process of generating and applying migrations, Django syncs between the state of the database and the state of the models. For example, when you add a field to a model, Django adds a column to the table. When you remove a field from the model, Django removes the column from the table.
To sync between the models and the database, Django maintains a state that represents the models. To sync the database with the models, Django generates migration operations. Migration operations translate to a vendor specific SQL that can be executed in the database. When all migration operations are executed, the database and the models are expected to be consistent.
To get the state of the database, Django aggregates the operations from all past migrations. When the aggregated state of the migrations is not consistent with the state of the models, Django generates a new migration.
In the previous example, you created the index using raw SQL. Django did not know you created the index because you didn’t use a familiar migration operation.
When Django aggregated all the migrations and compared them with the state of the models, it found that an index was missing. This is why, even after you created the index manually, Django still thought it was missing and generated a new migration for it.
Since Django is unable to create the index the way you want it to, you want to provide your own SQL but still let Django know you created it.
In other words, you need to execute something in the database and provide Django with the migration operation to sync its internal state. To do that, Django provides us with a special migration operation calledSeparateDatabaseAndState
. This operation is not well known and should be reserved for special cases such as this one.
It’s much easier to edit migrations than write them from scratch, so start by generating a migration the usual way:
$pythonmanage.pymakemigrations--nameadd_index_separate_database_and_stateMigrations for 'app': app/migrations/0002_add_index_separate_database_and_state.py - Alter field sold_at on sale
This is the contents of the migration generated by Django, same as before:
# migrations/0002_add_index_separate_database_and_state.pyfromdjango.dbimportmigrations,modelsclassMigration(migrations.Migration):dependencies=[('app','0001_initial'),]operations=[migrations.AlterField(model_name='sale',name='sold_at',field=models.DateTimeField(auto_now_add=True,db_index=True,),),]
Django generated anAlterField
operation on the fieldsold_at
. The operation will create an index and update the state. We want to keep this operation but provide a different command to execute in the database.
Once again, to get the command, use the SQL generated by Django:
$pythonmanage.pysqlmigrateapp0002BEGIN;---- Alter field sold_at on sale--CREATE INDEX "app_sale_sold_at_b9438ae4" ON "app_sale" ("sold_at");COMMIT;
Add theCONCURRENTLY
keyword in the appropriate place:
CREATEINDEXCONCURRENTLY"app_sale_sold_at_b9438ae4"ON"app_sale"("sold_at");
Next, edit the migration file and useSeparateDatabaseAndState
to provide your modified SQL command for execution:
# migrations/0002_add_index_separate_database_and_state.pyfromdjango.dbimportmigrations,modelsclassMigration(migrations.Migration):dependencies=[('app','0001_initial'),]operations=[migrations.SeparateDatabaseAndState(state_operations=[migrations.AlterField(model_name='sale',name='sold_at',field=models.DateTimeField(auto_now_add=True,db_index=True,),),],database_operations=[migrations.RunSQL(sql=""" CREATE INDEX CONCURRENTLY "app_sale_sold_at_b9438ae4" ON "app_sale" ("sold_at"); """,reverse_sql=""" DROP INDEX "app_sale_sold_at_b9438ae4"; """),],),],
The migration operationSeparateDatabaseAndState
accepts 2 lists of operations:
You kept the original operation generated by Django instate_operations
. When usingSeparateDatabaseAndState
, this is what you will usually want to do. Notice that thedb_index=True
argument is provided to the field. This migration operation will let Django know that there is an index on the field.
You used the SQL generated by Django and added theCONCURRENTLY
keyword. You used the special actionRunSQL
to execute raw SQL in the migration.
If you try to run the migration, you will get the following output:
$pythonmanage.pymigrateappOperations to perform: Apply all migrations: appRunning migrations: Applying app.0002_add_index_separate_database_and_state...Traceback (most recent call last): File "/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 83, in _execute return self.cursor.execute(sql)psycopg2.InternalError: CREATE INDEX CONCURRENTLY cannot run inside a transaction block
In SQL,CREATE
,DROP
,ALTER
, andTRUNCATE
operations are referred to asData Definition Language (DDL). In databases that support transactional DDL,such as PostgreSQL, Django executes migrations inside a database transaction by default. However, according to the error above, PostgreSQL cannot create an index concurrently inside a transaction block.
To be able to create an index concurrently within a migration, you need to tell Django to not execute the migration in a database transaction. To do that, you mark the migration asnon-atomic by settingatomic
toFalse
:
# migrations/0002_add_index_separate_database_and_state.pyfromdjango.dbimportmigrations,modelsclassMigration(migrations.Migration):atomic=Falsedependencies=[('app','0001_initial'),]operations=[migrations.SeparateDatabaseAndState(state_operations=[migrations.AlterField(model_name='sale',name='sold_at',field=models.DateTimeField(auto_now_add=True,db_index=True,),),],database_operations=[migrations.RunSQL(sql=""" CREATE INDEX CONCURRENTLY "app_sale_sold_at_b9438ae4" ON "app_sale" ("sold_at"); """,reverse_sql=""" DROP INDEX "app_sale_sold_at_b9438ae4"; """),],),],
After you marked the migration as non-atomic, you can run the migration:
$pythonmanage.pymigrateappOperations to perform: Apply all migrations: appRunning migrations: Applying app.0002_add_index_separate_database_and_state... OK
You just executed the migration without causing any downtime.
Here are some issues to consider when you’re usingSeparateDatabaseAndState
:
Database operations must be equivalent to state operations: Inconsistencies between the database and model state can cause a lot of trouble. A good starting point is to keep the operations generated by Django instate_operations
and edit the output ofsqlmigrate
to use indatabase_operations
.
Non atomic migrations cannot rollback in case of error: If there is an error during the migration, then you won’t be able to rollback. You would have to either rollback the migration or complete it manually. It’s a good idea to keep the operations executed inside a non-atomic migration to a minimum. If you have additional operations in the migration, move them to a new migration.
Migration might be vendor specific: The SQL generated by Django is specific to the database backend used in the project. It might work with other database backends, but that is not guaranteed. If you need to support multiple database backends, you need to make some adjustments to this approach.
You started this tutorial with a large table and a problem. You wanted to make your app faster for your users, and you wanted to do that without causing them any downtime.
By the end of the tutorial, you managed to generate and safely modify a Django migration to achieve this goal. You tackled different problems along the way and managed to overcome them using built-in tools provided by the migrations framework.
In this tutorial, you learned the following:
RunSQL
actionRunSQL
action reversibleThe separation between model and database state is an important concept. Once you understand it, and how to utilize it, you can overcome many limitations of the built-in migration operations. Some use cases that come to mind include adding an index that was already created in the database and providing vendor specific arguments to DDL commands.
🐍 Python Tricks 💌
Get a short & sweetPython Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.
MasterReal-World Python Skills With Unlimited Access to Real Python
Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:
MasterReal-World Python Skills
With Unlimited Access to Real Python
Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:
What Do You Think?
What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.
Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students.Get tips for asking good questions andget answers to common questions in our support portal.
Already have an account?Sign-In
Almost there! Complete this form and click the button below to gain instant access:
Free Django Resources and Tutorials to Boost Your Web Development Skills