Deployments Concepts ¶

When deploying aFastAPI application, or actually, any type of web API, there are several concepts that you probably care about, and using them you can find themost appropriate way todeploy your application.

Some of the important concepts are:

Security - HTTPS
Running on startup
Restarts
Replication (the number of processes running)
Memory
Previous steps before starting

We'll see how they would affectdeployments.

In the end, the ultimate objective is to be able toserve your API clients in a way that issecure, toavoid disruptions, and to use thecompute resources (for example remote servers/virtual machines) as efficiently as possible. 🚀

I'll tell you a bit more about theseconcepts here, and that would hopefully give you theintuition you would need to decide how to deploy your API in very different environments, possibly even infuture ones that don't exist yet.

By considering these concepts, you will be able toevaluate and design the best way to deployyour own APIs.

In the next chapters, I'll give you moreconcrete recipes to deploy FastAPI applications.

But for now, let's check these importantconceptual ideas. These concepts also apply to any other type of web API. 💡

Security - HTTPS¶

In theprevious chapter about HTTPS we learned about how HTTPS provides encryption for your API.

We also saw that HTTPS is normally provided by a componentexternal to your application server, aTLS Termination Proxy.

And there has to be something in charge ofrenewing the HTTPS certificates, it could be the same component or it could be something different.

Example Tools for HTTPS¶

Some of the tools you could use as a TLS Termination Proxy are:

Traefik
- Automatically handles certificates renewals ✨
Caddy
- Automatically handles certificates renewals ✨
Nginx
- With an external component like Certbot for certificate renewals
HAProxy
- With an external component like Certbot for certificate renewals
Kubernetes with an Ingress Controller like Nginx
- With an external component like cert-manager for certificate renewals
Handled internally by a cloud provider as part of their services (read below 👇)

Another option is that you could use acloud service that does more of the work including setting up HTTPS. It could have some restrictions or charge you more, etc. But in that case, you wouldn't have to set up a TLS Termination Proxy yourself.

I'll show you some concrete examples in the next chapters.

Then the next concepts to consider are all about the program running your actual API (e.g. Uvicorn).

Program and Process¶

We will talk a lot about the running "process", so it's useful to have clarity about what it means, and what's the difference with the word "program".

What is a Program¶

The wordprogram is commonly used to describe many things:

Thecode that you write, thePython files.
Thefile that can beexecuted by the operating system, for example:python,python.exe oruvicorn.
A particular program while it isrunning on the operating system, using the CPU, and storing things in memory. This is also called aprocess.

What is a Process¶

The wordprocess is normally used in a more specific way, only referring to the thing that is running in the operating system (like in the last point above):

A particular program while it isrunning on the operating system.
- This doesn't refer to the file, nor to the code, it refersspecifically to the thing that is beingexecuted and managed by the operating system.
Any program, any code,can only do things when it is beingexecuted. So, when there's aprocess running.
The process can beterminated (or "killed") by you, or by the operating system. At that point, it stops running/being executed, and it canno longer do things.
Each application that you have running on your computer has some process behind it, each running program, each window, etc. And there are normally many processes runningat the same time while a computer is on.
There can bemultiple processes of thesame program running at the same time.

If you check out the "task manager" or "system monitor" (or similar tools) in your operating system, you will be able to see many of those processes running.

And, for example, you will probably see that there are multiple processes running the same browser program (Firefox, Chrome, Edge, etc). They normally run one process per tab, plus some other extra processes.

Now that we know the difference between the termsprocess andprogram, let's continue talking about deployments.

Running on Startup¶

In most cases, when you create a web API, you want it to bealways running, uninterrupted, so that your clients can always access it. This is of course, unless you have a specific reason why you want it to run only in certain situations, but most of the time you want it constantly running andavailable.

In a Remote Server¶

When you set up a remote server (a cloud server, a virtual machine, etc.) the simplest thing you can do is usefastapi run (which uses Uvicorn) or something similar, manually, the same way you do when developing locally.

And it will work and will be usefulduring development.

But if your connection to the server is lost, therunning process will probably die.

And if the server is restarted (for example after updates, or migrations from the cloud provider) you probablywon't notice it. And because of that, you won't even know that you have to restart the process manually. So, your API will just stay dead. 😱

Run Automatically on Startup¶

In general, you will probably want the server program (e.g. Uvicorn) to be started automatically on server startup, and without needing anyhuman intervention, to have a process always running with your API (e.g. Uvicorn running your FastAPI app).

Separate Program¶

To achieve this, you will normally have aseparate program that would make sure your application is run on startup. And in many cases, it would also make sure other components or applications are also run, for example, a database.

Example Tools to Run at Startup¶

Some examples of the tools that can do this job are:

Docker
Kubernetes
Docker Compose
Docker in Swarm Mode
Systemd
Supervisor
Handled internally by a cloud provider as part of their services
Others...

I'll give you more concrete examples in the next chapters.

Restarts¶

Similar to making sure your application is run on startup, you probably also want to make sure it isrestarted after failures.

We Make Mistakes¶

We, as humans, makemistakes, all the time. Software almostalways hasbugs hidden in different places. 🐛

And we as developers keep improving the code as we find those bugs and as we implement new features (possibly adding new bugs too 😅).

Small Errors Automatically Handled¶

When building web APIs with FastAPI, if there's an error in our code, FastAPI will normally contain it to the single request that triggered the error. 🛡

The client will get a500 Internal Server Error for that request, but the application will continue working for the next requests instead of just crashing completely.

Bigger Errors - Crashes¶

Nevertheless, there might be cases where we write some code thatcrashes the entire application making Uvicorn and Python crash. 💥

And still, you would probably not want the application to stay dead because there was an error in one place, you probably want it tocontinue running at least for thepath operations that are not broken.

Restart After Crash¶

But in those cases with really bad errors that crash the runningprocess, you would want an external component that is in charge ofrestarting the process, at least a couple of times...

Tip

...Although if the whole application is justcrashing immediately it probably doesn't make sense to keep restarting it forever. But in those cases, you will probably notice it during development, or at least right after deployment.

So let's focus on the main cases, where it could crash entirely in some particular casesin the future, and it still makes sense to restart it.

You would probably want to have the thing in charge of restarting your application as anexternal component, because by that point, the same application with Uvicorn and Python already crashed, so there's nothing in the same code of the same app that could do anything about it.

Example Tools to Restart Automatically¶

In most cases, the same tool that is used torun the program on startup is also used to handle automaticrestarts.

For example, this could be handled by:

Docker
Kubernetes
Docker Compose
Docker in Swarm Mode
Systemd
Supervisor
Handled internally by a cloud provider as part of their services
Others...

Replication - Processes and Memory¶

With a FastAPI application, using a server program like thefastapi command that runs Uvicorn, running it once inone process can serve multiple clients concurrently.

But in many cases, you will want to run several worker processes at the same time.

Multiple Processes - Workers¶

If you have more clients than what a single process can handle (for example if the virtual machine is not too big) and you havemultiple cores in the server's CPU, then you could havemultiple processes running with the same application at the same time, and distribute all the requests among them.

When you runmultiple processes of the same API program, they are commonly calledworkers.

Worker Processes and Ports¶

Remember from the docsAbout HTTPS that only one process can be listening on one combination of port and IP address in a server?

This is still true.

So, to be able to havemultiple processes at the same time, there has to be asingle process listening on a port that then transmits the communication to each worker process in some way.

Memory per Process¶

Now, when the program loads things in memory, for example, a machine learning model in a variable, or the contents of a large file in a variable, all thatconsumes a bit of the memory (RAM) of the server.

And multiple processes normallydon't share any memory. This means that each running process has its own things, variables, and memory. And if you are consuming a large amount of memory in your code,each process will consume an equivalent amount of memory.

Server Memory¶

For example, if your code loads a Machine Learning model with1 GB in size, when you run one process with your API, it will consume at least 1 GB of RAM. And if you start4 processes (4 workers), each will consume 1 GB of RAM. So in total, your API will consume4 GB of RAM.

And if your remote server or virtual machine only has 3 GB of RAM, trying to load more than 4 GB of RAM will cause problems. 🚨

Multiple Processes - An Example¶

In this example, there's aManager Process that starts and controls twoWorker Processes.

This Manager Process would probably be the one listening on theport in the IP. And it would transmit all the communication to the worker processes.

Those worker processes would be the ones running your application, they would perform the main computations to receive arequest and return aresponse, and they would load anything you put in variables in RAM.

And of course, the same machine would probably haveother processes running as well, apart from your application.

An interesting detail is that the percentage of theCPU used by each process canvary a lot over time, but thememory (RAM) normally stays more or lessstable.

If you have an API that does a comparable amount of computations every time and you have a lot of clients, then theCPU utilization will probablyalso be stable (instead of constantly going up and down quickly).

Examples of Replication Tools and Strategies¶

There can be several approaches to achieve this, and I'll tell you more about specific strategies in the next chapters, for example when talking about Docker and containers.

The main constraint to consider is that there has to be asingle component handling theport in thepublic IP. And then it has to have a way totransmit the communication to the replicatedprocesses/workers.

Here are some possible combinations and strategies:

Uvicorn with--workers
- One Uvicornprocess manager would listen on theIP andport, and it would startmultiple Uvicorn worker processes.
Kubernetes and other distributedcontainer systems
- Something in theKubernetes layer would listen on theIP andport. The replication would be by havingmultiple containers, each withone Uvicorn process running.
Cloud services that handle this for you
- The cloud service will probablyhandle replication for you. It would possibly let you definea process to run, or acontainer image to use, in any case, it would most probably bea single Uvicorn process, and the cloud service would be in charge of replicating it.

Tip

Don't worry if some of these items aboutcontainers, Docker, or Kubernetes don't make a lot of sense yet.

I'll tell you more about container images, Docker, Kubernetes, etc. in a future chapter:FastAPI in Containers - Docker.

Previous Steps Before Starting¶

There are many cases where you want to perform some stepsbefore starting your application.

For example, you might want to rundatabase migrations.

But in most cases, you will want to perform these steps onlyonce.

So, you will want to have asingle process to perform thoseprevious steps, before starting the application.

And you will have to make sure that it's a single process running those previous stepseven if afterwards, you startmultiple processes (multiple workers) for the application itself. If those steps were run bymultiple processes, they wouldduplicate the work by running it inparallel, and if the steps were something delicate like a database migration, they could cause conflicts with each other.

Of course, there are some cases where there's no problem in running the previous steps multiple times, in that case, it's a lot easier to handle.

Tip

Also, keep in mind that depending on your setup, in some cases youmight not even need any previous steps before starting your application.

In that case, you wouldn't have to worry about any of this. 🤷

Examples of Previous Steps Strategies¶

This willdepend heavily on the way youdeploy your system, and it would probably be connected to the way you start programs, handling restarts, etc.

Here are some possible ideas:

An "Init Container" in Kubernetes that runs before your app container
A bash script that runs the previous steps and then starts your application
- You would still need a way to start/restartthat bash script, detect errors, etc.

Tip

I'll give you more concrete examples for doing this with containers in a future chapter:FastAPI in Containers - Docker.

Resource Utilization¶

Your server(s) is (are) aresource, you can consume orutilize, with your programs, the computation time on the CPUs, and the RAM memory available.

How much of the system resources do you want to be consuming/utilizing? It might be easy to think "not much", but in reality, you will probably want to consumeas much as possible without crashing.

If you are paying for 3 servers but you are using only a little bit of their RAM and CPU, you are probablywasting money 💸, and probablywasting server electric power 🌎, etc.

In that case, it could be better to have only 2 servers and use a higher percentage of their resources (CPU, memory, disk, network bandwidth, etc).

On the other hand, if you have 2 servers and you are using100% of their CPU and RAM, at some point one process will ask for more memory, and the server will have to use the disk as "memory" (which can be thousands of times slower), or evencrash. Or one process might need to do some computation and would have to wait until the CPU is free again.

In this case, it would be better to getone extra server and run some processes on it so that they all haveenough RAM and CPU time.

There's also the chance that for some reason you have aspike of usage of your API. Maybe it went viral, or maybe some other services or bots start using it. And you might want to have extra resources to be safe in those cases.

You could put anarbitrary number to target, for example, somethingbetween 50% to 90% of resource utilization. The point is that those are probably the main things you will want to measure and use to tweak your deployments.

You can use simple tools likehtop to see the CPU and RAM used in your server or the amount used by each process. Or you can use more complex monitoring tools, which may be distributed across servers, etc.

Recap¶

You have been reading here some of the main concepts that you would probably need to keep in mind when deciding how to deploy your application:

Security - HTTPS
Running on startup
Restarts
Replication (the number of processes running)
Memory
Previous steps before starting

Understanding these ideas and how to apply them should give you the intuition necessary to take any decisions when configuring and tweaking your deployments. 🤓

In the next sections, I'll give you more concrete examples of possible strategies you can follow. 🚀

Movatterモバイル変換

Deployments Concepts¶