Movatterモバイル変換


[0]ホーム

URL:


— FREE Email Series —

🐍 Python Tricks 💌

Python Tricks Dictionary Merge

🔒 No spam. Unsubscribe any time.

Browse TopicsGuided Learning Paths
Basics Intermediate Advanced
apibest-practicescareercommunitydatabasesdata-sciencedata-structuresdata-vizdevopsdjangodockereditorsflaskfront-endgamedevguimachine-learningnumpyprojectspythontestingtoolsweb-devweb-scraping

Table of Contents

How to Create an Index in Django Without Downtime

How to Create an Index in Django Without Downtime

byHaki BenitaReading time estimate 19madvanceddatabasesdjango

Table of Contents

Remove ads

Managing database migrations is a great challenge in any software project. Luckily, as of version 1.7,Django comes with a built-in migration framework. The framework is very powerful and useful in managing change in databases. But the flexibility provided by the framework required some compromises. To understand the limitations of Django migrations, you are going to tackle a well known problem: creating an index in Django with no downtime.

In this tutorial, you’ll learn:

  • How and when Django generates new migrations
  • How to inspect the commands Django generates to execute migrations
  • How to safely modify migrations to fit your needs

This intermediate-level tutorial is designed for readers who are already familiar with Django migrations. For an introduction to that topic, check outDjango Migrations: A Primer.

Free Bonus:Click here to get free access to additional Django tutorials and resources you can use to deepen your Python web development skills.

The Problem With Creating an Index in Django Migrations

A common change that usually becomes necessary when the data stored by your application grows is adding an index. Indexes are used to speed up queries and make your app feel fast and responsive.

In most databases, adding an index requires an exclusive lock on the table. An exclusive lock prevents data modification (DML) operations such asUPDATE,INSERT, andDELETE, while the index is created.

Locks are obtained implicitly by the database when executing certain operations. For example, when a user logs into your app, Django will update thelast_login field in theauth_user table. To perform the update, the database will first have to obtain a lock on the row. If the row is currently being locked by another connection, then you might get adatabase exception.

Locking a table might pose a problem when it’s necessary to keep the system available during migrations. The bigger the table, the longer it can take to create the index. The longer it takes to create the index, the longer the system is unavailable or unresponsive to users.

Some database vendors provide a way to create an index without locking the table. For example, to create an index in PostgreSQL without locking a table, you can use theCONCURRENTLY keyword:

SQL
CREATEINDEXCONCURRENTLYixONtable(column);

In Oracle, there is anONLINE option to allow DML operations on the table while the index is created:

SQL
CREATEINDEXixONtable(column)ONLINE;

When generating migrations, Django will not use these special keywords. Running the migration as is will make the database acquire an exclusive lock on the table and prevent DML operations while the index is created.

Creating an index concurrently has some caveats. It’s important to understand the issues specific to your database backend in advance. For example,one caveat in PostgreSQL is that creating an index concurrently takes longer because it requires an additional table scan.

In this tutorial, you’ll use Django migrations to create an index on a large table, without causing any downtime.

Note: To follow this tutorial, it is recommended that you use a PostgreSQL backend, Django 2.x, and Python 3.

It is possible to follow along with other database backends as well. In places where SQL features unique to PostgreSQL are used, change the SQL to match your database backend.

Setup

You’re going to use a made upSale model in an app calledapp. In a real life situation, models such asSale are the main tables in the database, and they will usually be very big and store a lot of data:

Python
# models.pyfromdjango.dbimportmodelsclassSale(models.Model):sold_at=models.DateTimeField(auto_now_add=True,)charged_amount=models.PositiveIntegerField()

To create the table, generate the initial migration and apply it:

Shell
$pythonmanage.pymakemigrationsMigrations for 'app':  app/migrations/0001_initial.py    - Create model Sale$pythonmanagemigrateOperations to perform:  Apply all migrations: appRunning migrations:  Applying app.0001_initial... OK

After a while, the sales table becomes very big, and users start to complain about slowness. While monitoring the database, you noticed that a lot of queries use thesold_at column. To speed things up, you decide that you need an index on the column.

To add an index onsold_at, you make the following change to the model:

Python
# models.pyfromdjango.dbimportmodelsclassSale(models.Model):sold_at=models.DateTimeField(auto_now_add=True,db_index=True,)charged_amount=models.PositiveIntegerField()

If you run this migration as it is, then Django will create the index on the table, and it will be locked until the index is completed. It can take a while to create an index on a very large table, and you want to avoid downtime.

On a local development environment with a small dataset and very few connections, this migration might feel instantaneous. However, on large datasets with many concurrent connections, obtaining a lock and creating the index can take a while.

In the next steps, you are going to modify migrations created by Django to create the index without causing any downtime.

Fake Migration

The first approach is to create the index manually. You are going to generate the migration, but you are not going to actually let Django apply it. Instead, you will run the SQL manually in the database and then make Django think the migration completed.

First, generate the migration:

Shell
$pythonmanage.pymakemigrations--nameadd_index_fakeMigrations for 'app':  app/migrations/0002_add_index_fake.py    - Alter field sold_at on sale

Use thesqlmigrate command to view the SQL Django will use to execute this migration:

Shell
$pythonmanage.pysqlmigrateapp0002BEGIN;---- Alter field sold_at on sale--CREATE INDEX "app_sale_sold_at_b9438ae4" ON "app_sale" ("sold_at");COMMIT;

You want to create the index without locking the table, so you need to modify the command. Add theCONCURRENTLY keyword and execute in the database:

PostgreSQL Console
app=#CREATEINDEXCONCURRENTLY"app_sale_sold_at_b9438ae4"ON"app_sale"("sold_at");CREATE INDEX

Notice that you executed the command without theBEGIN andCOMMIT parts. Omitting these keywords will execute the commands without a database transaction. We will discuss database transactions later in the article.

After you executed the command, if you try to apply migrations, then you will get the following error:

Shell
$pythonmanage.pymigrateOperations to perform:  Apply all migrations: appRunning migrations:  Applying app.0002_add_index_fake...Traceback (most recent call last):  File "venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 85, in _execute    return self.cursor.execute(sql, params)psycopg2.ProgrammingError: relation "app_sale_sold_at_b9438ae4" already exists

Django complains that the index already exists, so it can’t proceed with the migration. You just created the index directly in the database, so now you need to make Django think that the migration was already applied.

How to Fake a Migration

Django provides a built-in way of marking migrations as executed, without actually executing them. To use this option, set the--fake flag when applying the migration:

Shell
$pythonmanage.pymigrate--fakeOperations to perform:  Apply all migrations: appRunning migrations:  Applying app.0002_add_index_fake... FAKED

Django didn’t raise an error this time. In fact, Django didn’t really apply any migration. It just marked it as executed (orFAKED).

Here are some issues to consider when faking migrations:

  • The manual command must be equivalent to the SQL generated by Django: You need to make sure the command you execute is equivalent to the SQL generated by Django. Usesqlmigrate to produce the SQL command. If the commands do not match, then you might end up with inconsistencies between the database and the models state.

  • Other unapplied migrations will also be faked: When you have multiple unapplied migrations, they will all be faked. Before you apply migrations, it’s important to make sure only the migrations you want to fake are unapplied. Otherwise, you might end up with inconsistencies. Another option is to specify the exact migration you want to fake.

  • Direct access to the database is required: You need to run the SQL command in the database. This is not always an option. Also, executing commands directly in a production database is dangerous and should be avoided when possible.

  • Automated deployment processes might need adjustments: If youautomated the deployment process (using CI, CD, or other automation tools), then you might need to alter the process to fake migrations. This is not always desirable.

Cleanup

Before moving on to the next section, you need to bring the database back to its state right after the initial migration. To do that, migrate back to the initial migration:

Shell
$pythonmanage.pymigrate0001Operations to perform:  Target specific migration: 0001_initial, from appRunning migrations:  Rendering model states... DONE  Unapplying app.0002_add_index_fake... OK

Django unapplied the changes made in the second migration, so now it’s safe to also delete the file:

Shell
$rmapp/migrations/0002_add_index_fake.py

To make sure you did everything right, inspect the migrations:

Shell
$pythonmanage.pyshowmigrationsappapp [X] 0001_initial

The initial migration was applied, and there are no unapplied migrations.

Execute Raw SQL in Migrations

In the previous section, you executed SQL directly in the database and faked the migration. This gets the job done, but there is a better solution.

Django provides a way to execute raw SQL in migrations usingRunSQL. Let’s try to use it instead of executing the command directly in the database.

First, generate a new empty migration:

Shell
$pythonmanage.pymakemigrationsapp--empty--nameadd_index_runsqlMigrations for 'app':  app/migrations/0002_add_index_runsql.py

Next, edit the migration file and add aRunSQL operation:

Python
# migrations/0002_add_index_runsql.pyfromdjango.dbimportmigrations,modelsclassMigration(migrations.Migration):atomic=Falsedependencies=[('app','0001_initial'),]operations=[migrations.RunSQL('CREATE INDEX "app_sale_sold_at_b9438ae4" ''ON "app_sale" ("sold_at");',),]

When you run the migration, you will get the following output:

Shell
$pythonmanage.pymigrateOperations to perform:  Apply all migrations: appRunning migrations:  Applying app.0002_add_index_runsql... OK

This is looking good, but there is a problem. Let’s try to generate migrations again:

Shell
$pythonmanage.pymakemigrations--nameleftover_migrationMigrations for 'app':  app/migrations/0003_leftover_migration.py    - Alter field sold_at on sale

Django generated the same migration again. Why did it do that?

Cleanup

Before we can answer that question, you need to clean up and undo the changes you made to the database. Start by deleting the last migration. It was not applied, so it’s safe to delete:

Shell
$rmapp/migrations/0003_leftover_migration.py

Next, list the migrations for theapp app:

Shell
$pythonmanage.pyshowmigrationsappapp [X] 0001_initial [X] 0002_add_index_runsql

The third migration is gone, but the second is applied. You want to get back to the state right after the initial migration. Try to migrate back to the initial migration as you did in the previous section:

Shell
$pythonmanage.pymigrateapp0001Operations to perform:  Target specific migration: 0001_initial, from appRunning migrations:  Rendering model states... DONE  Unapplying app.0002_add_index_runsql...Traceback (most recent call last):NotImplementedError: You cannot reverse this operation

Django is unable to reverse the migration.

Reverse Migration Operation

To reverse a migration, Django executes an opposite action for every operation. In this case, the reverse of adding an index is to drop it. As you’ve already seen, when a migration is reversible, you can unapply it. Just like you can usecheckout in Git, you can reverse a migration if you executemigrate to an earlier migration.

Many built-in migration operations already define a reverse action. For example, the reverse action for adding a field is to drop the corresponding column. The reverse action for creating a model is to drop the corresponding table.

Some migration operations are not reversible. For example, there is no reverse action for removing a field or deleting a model, because once the migration was applied, the data is gone.

In the previous section, you used theRunSQL operation. When you tried to reverse the migration, you encountered an error. According to the error, one of the operations in the migration cannot be reversed. Django is unable to reverse raw SQL by default. Because Django has no knowledge of what was executed by the operation, it cannot generate an opposite action automatically.

How to Make a Migration Reversible

For a migration to be reversible, all the operations in it must be reversible. It’s not possible to reverse part of a migration, so a single non-reversible operation will make the entire migration non-reversible.

To make aRunSQL operation reversible, you must provide SQL to execute when the operation is reversed. The reverse SQL is provided in thereverse_sql argument.

The opposite action to adding an index is to drop it. To make your migration reversible, provide thereverse_sql to drop the index:

Python
# migrations/0002_add_index_runsql.pyfromdjango.dbimportmigrations,modelsclassMigration(migrations.Migration):atomic=Falsedependencies=[('app','0001_initial'),]operations=[migrations.RunSQL('CREATE INDEX "app_sale_sold_at_b9438ae4" ''ON "app_sale" ("sold_at");',reverse_sql='DROP INDEX "app_sale_sold_at_b9438ae4";',),]

Now try to reverse the migration:

Shell
$pythonmanage.pyshowmigrationsappapp [X] 0001_initial [X] 0002_add_index_runsql$pythonmanage.pymigrateapp0001Operations to perform:  Target specific migration: 0001_initial, from appRunning migrations:  Rendering model states... DONE Unapplying app.0002_add_index_runsql... OK$pythonmanage.pyshowmigrationsappapp [X] 0001_initial [ ] 0002_add_index_runsql

The second migration was reversed, and the index was dropped by Django. Now it’s safe to delete the migration file:

Shell
$rmapp/migrations/0002_add_index_runsql.py

It’s always a good idea to providereverse_sql. In situations where reversing a raw SQL operation does not require any action, you can mark the operation as reversible using the special sentinelmigrations.RunSQL.noop:

Python
migrations.RunSQL(sql='...',# Your forward SQL herereverse_sql=migrations.RunSQL.noop,),

Understand Model State and Database State

In your previous attempt to create the index manually usingRunSQL, Django generated the same migration over and over again even though the index was created in the database. To understand why Django did that, you first need to understand how Django decides when to generate new migrations.

When Django Generates a New Migration

In the process of generating and applying migrations, Django syncs between the state of the database and the state of the models. For example, when you add a field to a model, Django adds a column to the table. When you remove a field from the model, Django removes the column from the table.

To sync between the models and the database, Django maintains a state that represents the models. To sync the database with the models, Django generates migration operations. Migration operations translate to a vendor specific SQL that can be executed in the database. When all migration operations are executed, the database and the models are expected to be consistent.

To get the state of the database, Django aggregates the operations from all past migrations. When the aggregated state of the migrations is not consistent with the state of the models, Django generates a new migration.

In the previous example, you created the index using raw SQL. Django did not know you created the index because you didn’t use a familiar migration operation.

When Django aggregated all the migrations and compared them with the state of the models, it found that an index was missing. This is why, even after you created the index manually, Django still thought it was missing and generated a new migration for it.

How to Separate Database and State in Migrations

Since Django is unable to create the index the way you want it to, you want to provide your own SQL but still let Django know you created it.

In other words, you need to execute something in the database and provide Django with the migration operation to sync its internal state. To do that, Django provides us with a special migration operation calledSeparateDatabaseAndState. This operation is not well known and should be reserved for special cases such as this one.

It’s much easier to edit migrations than write them from scratch, so start by generating a migration the usual way:

Shell
$pythonmanage.pymakemigrations--nameadd_index_separate_database_and_stateMigrations for 'app':  app/migrations/0002_add_index_separate_database_and_state.py    - Alter field sold_at on sale

This is the contents of the migration generated by Django, same as before:

Python
# migrations/0002_add_index_separate_database_and_state.pyfromdjango.dbimportmigrations,modelsclassMigration(migrations.Migration):dependencies=[('app','0001_initial'),]operations=[migrations.AlterField(model_name='sale',name='sold_at',field=models.DateTimeField(auto_now_add=True,db_index=True,),),]

Django generated anAlterField operation on the fieldsold_at. The operation will create an index and update the state. We want to keep this operation but provide a different command to execute in the database.

Once again, to get the command, use the SQL generated by Django:

Shell
$pythonmanage.pysqlmigrateapp0002BEGIN;---- Alter field sold_at on sale--CREATE INDEX "app_sale_sold_at_b9438ae4" ON "app_sale" ("sold_at");COMMIT;

Add theCONCURRENTLY keyword in the appropriate place:

SQL
CREATEINDEXCONCURRENTLY"app_sale_sold_at_b9438ae4"ON"app_sale"("sold_at");

Next, edit the migration file and useSeparateDatabaseAndState to provide your modified SQL command for execution:

Python
# migrations/0002_add_index_separate_database_and_state.pyfromdjango.dbimportmigrations,modelsclassMigration(migrations.Migration):dependencies=[('app','0001_initial'),]operations=[migrations.SeparateDatabaseAndState(state_operations=[migrations.AlterField(model_name='sale',name='sold_at',field=models.DateTimeField(auto_now_add=True,db_index=True,),),],database_operations=[migrations.RunSQL(sql="""                    CREATE INDEX CONCURRENTLY "app_sale_sold_at_b9438ae4"                    ON "app_sale" ("sold_at");                """,reverse_sql="""                    DROP INDEX "app_sale_sold_at_b9438ae4";                """),],),],

The migration operationSeparateDatabaseAndState accepts 2 lists of operations:

  1. state_operations are operations to apply on the internal model state. They do not effect the database.
  2. database_operations are operations to apply to the database.

You kept the original operation generated by Django instate_operations. When usingSeparateDatabaseAndState, this is what you will usually want to do. Notice that thedb_index=True argument is provided to the field. This migration operation will let Django know that there is an index on the field.

You used the SQL generated by Django and added theCONCURRENTLY keyword. You used the special actionRunSQL to execute raw SQL in the migration.

If you try to run the migration, you will get the following output:

Shell
$pythonmanage.pymigrateappOperations to perform:  Apply all migrations: appRunning migrations:  Applying app.0002_add_index_separate_database_and_state...Traceback (most recent call last):  File "/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 83, in _execute    return self.cursor.execute(sql)psycopg2.InternalError: CREATE INDEX CONCURRENTLY cannot run inside a transaction block

Non-Atomic Migrations

In SQL,CREATE,DROP,ALTER, andTRUNCATE operations are referred to asData Definition Language (DDL). In databases that support transactional DDL,such as PostgreSQL, Django executes migrations inside a database transaction by default. However, according to the error above, PostgreSQL cannot create an index concurrently inside a transaction block.

To be able to create an index concurrently within a migration, you need to tell Django to not execute the migration in a database transaction. To do that, you mark the migration asnon-atomic by settingatomic toFalse:

Python
# migrations/0002_add_index_separate_database_and_state.pyfromdjango.dbimportmigrations,modelsclassMigration(migrations.Migration):atomic=Falsedependencies=[('app','0001_initial'),]operations=[migrations.SeparateDatabaseAndState(state_operations=[migrations.AlterField(model_name='sale',name='sold_at',field=models.DateTimeField(auto_now_add=True,db_index=True,),),],database_operations=[migrations.RunSQL(sql="""                    CREATE INDEX CONCURRENTLY "app_sale_sold_at_b9438ae4"                    ON "app_sale" ("sold_at");                """,reverse_sql="""                    DROP INDEX "app_sale_sold_at_b9438ae4";                """),],),],

After you marked the migration as non-atomic, you can run the migration:

Shell
$pythonmanage.pymigrateappOperations to perform:  Apply all migrations: appRunning migrations:  Applying app.0002_add_index_separate_database_and_state... OK

You just executed the migration without causing any downtime.

Here are some issues to consider when you’re usingSeparateDatabaseAndState:

  • Database operations must be equivalent to state operations: Inconsistencies between the database and model state can cause a lot of trouble. A good starting point is to keep the operations generated by Django instate_operations and edit the output ofsqlmigrate to use indatabase_operations.

  • Non atomic migrations cannot rollback in case of error: If there is an error during the migration, then you won’t be able to rollback. You would have to either rollback the migration or complete it manually. It’s a good idea to keep the operations executed inside a non-atomic migration to a minimum. If you have additional operations in the migration, move them to a new migration.

  • Migration might be vendor specific: The SQL generated by Django is specific to the database backend used in the project. It might work with other database backends, but that is not guaranteed. If you need to support multiple database backends, you need to make some adjustments to this approach.

Conclusion

You started this tutorial with a large table and a problem. You wanted to make your app faster for your users, and you wanted to do that without causing them any downtime.

By the end of the tutorial, you managed to generate and safely modify a Django migration to achieve this goal. You tackled different problems along the way and managed to overcome them using built-in tools provided by the migrations framework.

In this tutorial, you learned the following:

  • How Django migrations work internally using model and database state, and when new migrations are generated
  • How to execute custom SQL in migrations using theRunSQL action
  • What reversible migrations are, and how to make aRunSQL action reversible
  • What atomic migrations are, and how to change the default behavior according to your needs
  • How to safely execute complex migrations in Django

The separation between model and database state is an important concept. Once you understand it, and how to utilize it, you can overcome many limitations of the built-in migration operations. Some use cases that come to mind include adding an index that was already created in the database and providing vendor specific arguments to DDL commands.

🐍 Python Tricks 💌

Get a short & sweetPython Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Python Tricks Dictionary Merge

AboutHaki Benita

Haki is an avid Pythonista and writes for Real Python.

» More about Haki

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

MasterReal-World Python Skills With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

MasterReal-World Python Skills
With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

What Do You Think?

Rate this article:

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students.Get tips for asking good questions andget answers to common questions in our support portal.


Looking for a real-time conversation? Visit theReal Python Community Chat or join the next“Office Hours” Live Q&A Session. Happy Pythoning!

Keep Learning

Related Topics:advanceddatabasesdjango

Keep reading Real Python by creating a free account or signing in:

Already have an account?Sign-In

Almost there! Complete this form and click the button below to gain instant access:

Django Pony

Free Django Resources and Tutorials to Boost Your Web Development Skills

🔒 No spam. We take your privacy seriously.


[8]ページ先頭

©2009-2025 Movatter.jp