Movatterモバイル変換


[0]ホーム

URL:


Understanding Lambda function scaling - AWS Lambda
DocumentationAWS LambdaDeveloper Guide
Understanding and visualizing concurrencyCalculating concurrency for a functionUnderstanding reserved concurrency and provisioned concurrencyUnderstanding concurrency and requests per secondConcurrency quotas

Understanding Lambda function scaling

Concurrency is the number of in-flight requests that your AWS Lambda function is handling at the same time. For each concurrent request, Lambda provisions a separate instance of your execution environment. As your functions receive more requests, Lambda automatically handles scaling the number of execution environments until you reach your account's concurrency limit. By default, Lambda provides your account with a total concurrency limit of 1,000 concurrent executions across all functions in an AWS Region. To support your specific account needs, you canrequest a quota increase and configure function-level concurrency controls so that your critical functions don't experience throttling.

This topic explains concurrency concepts and function scaling in Lambda. By the end of this topic, you'll be able to understand how to calculate concurrency, visualize the two main concurrency control options (reserved and provisioned), estimate appropriate concurrency control settings, and view metrics for further optimization.

Understanding and visualizing concurrency

Lambda invokes your function in a secure and isolatedexecution environment. To handle a request, Lambda must first initialize an execution environment (theInit phase), before using it to invoke your function (theInvoke phase):

Typical lifecycle of an execution environment, showing Init and Invoke phases.

The previous diagram uses a rectangle to represent a single execution environment. When your function receives its very first request (represented by the yellow circle with label1), Lambda creates a new execution environment and runs the code outside your main handler during the Init phase. Then, Lambda runs your function's main handler code during the Invoke phase. During this entire process, this execution environment is busy and cannot process other requests.

When Lambda finishes processing the first request, this execution environment can then process additional requests for the same function. For subsequent requests, Lambda doesn't need to re-initialize the environment.

An execution environment handling two requests in succession.

In the previous diagram, Lambda reuses the execution environment to handle the second request (represented by the yellow circle with label2).

So far, we've focused on just a single instance of your execution environment (that is, a concurrency of 1). In practice, Lambda may need to provision multiple execution environment instances in parallel to handle all incoming requests. When your function receives a new request, one of two things can happen:

For example, let's explore what happens when your function receives 10 requests:

A Lambda function provisioning multiple environments to handle 10 requests

In the previous diagram, each horizontal plane represents a single execution environment instance (labeled fromA throughF). Here's how Lambda handles each request:

RequestLambda behaviorReasoning

1

Provisions new environmentA

This is the first request; no execution environment instances are available.

2

Provisions new environmentB

Existing execution environment instanceA is busy.

3

Provisions new environmentC

Existing execution environment instancesA andB are both busy.

4

Provisions new environmentD

Existing execution environment instancesA,B, andC are all busy.

5

Provisions new environmentE

Existing execution environment instancesA,B,C, andD are all busy.

6

Reuses environmentA

Execution environment instanceA has finished processing request1 and is now available.

7

Reuses environmentB

Execution environment instanceB has finished processing request2 and is now available.

8

Reuses environmentC

Execution environment instanceC has finished processing request3 and is now available.

9

Provisions new environmentF

Existing execution environment instancesA,B,C,D, andE are all busy.

10

Reuses environmentD

Execution environment instanceD has finished processing request4 and is now available.

As your function receives more concurrent requests, Lambda scales up the number of execution environment instances in response. The following animation tracks the number of concurrent requests over time:

An animation illustrating concurrent requests over time.

By freezing the previous animation at six distinct points in time, we get the following diagram:

Function concurrency at six distinct points in time.

In the previous diagram, we can draw a vertical line at any point in time and count the number of environments that intersect this line. This gives us the number of concurrent requests at that point in time. For example, at timet1, there are three active environments serving three concurrent requests. The maximum number of concurrent requests in this simulation occurs at timet4, when there are six active environments serving six concurrent requests.

To summarize, your function's concurrency is the number of concurrent requests that it's handling at the same time. In response to an increase in your function's concurrency, Lambda provisions more execution environment instances to meet request demand.

Calculating concurrency for a function

In general, concurrency of a system is the ability to process more than one task simultaneously. In Lambda, concurrency is the number of in-flight requests that your function is handling at the same time. A quick and practical way of measuring concurrency of a Lambda function is to use the following formula:

Concurrency = (average requests per second) * (average request duration in seconds)

Concurrency differs from requests per second. For example, suppose your function receives 100 requests per second on average. If the average request duration is one second, then it's true that the concurrency is also 100:

Concurrency = (100 requests/second) * (1 second/request) = 100

However, if the average request duration is 500 ms, then the concurrency is 50:

Concurrency = (100 requests/second) * (0.5 second/request) = 50

What does a concurrency of 50 mean in practice? If the average request duration is 500 ms, then you can think of an instance of your function as being able to handle two requests per second. Then, it takes 50 instances of your function to handle a load of 100 requests per second. A concurrency of 50 means that Lambda must provision 50 execution environment instances to efficiently handle this workload without any throttling. Here's how to express this in equation form:

Concurrency = (100 requests/second) / (2 requests/second) = 50

If your function receives double the number of requests (200 requests per second), but only requires half the time to process each request (250 ms), then the concurrency is still 50:

Concurrency = (200 requests/second) * (0.25 second/request) = 50

Understanding reserved concurrency and provisioned concurrency

By default, your account has a concurrency limit of 1,000 concurrent executions across all functions in a Region. Your functions share this pool of 1,000 concurrency on an on-demand basis. Your functions experiences throttling (that is, they start to drop requests) if you run out of available concurrency.

Some of your functions might be more critical than others. As a result, you might want to configure concurrency settings to ensure that critical functions get the concurrency that they need. There are two types of concurrency controls available: reserved concurrency and provisioned concurrency.

Reserved concurrency

If you want to guarantee that a certain amount of concurrency is available for your function at any time, use reserved concurrency.

Reserved concurrency sets the maximum and minimum number of concurrent instances that you want to allocate to your function. When you dedicate reserved concurrency to a function, no other function can use that concurrency. In other words, setting reserved concurrency can impact the concurrency pool that's available to other functions. Functions that don't have reserved concurrency share the remaining pool of unreserved concurrency.

Configuring reserved concurrency counts towards your overall account concurrency limit. There is no charge for configuring reserved concurrency for a function.

To better understand reserved concurrency, consider the following diagram:

Function scaling behavior when you configure reserved concurrency on critical functions.

In this diagram, your account concurrency limit for all the functions in this Region is at the default limit of 1,000. Suppose you have two critical functions,function-blue andfunction-orange, that routinely expect to get high invocation volumes. You decide to give 400 units of reserved concurrency tofunction-blue, and 400 units of reserved concurrency tofunction-orange. In this example, all other functions in your account must share the remaining 200 units of unreserved concurrency.

The diagram has five points of interest:

From this example, notice that reserving concurrency has the following effects:

To learn how to manage reserved concurrency settings for your functions, seeConfiguring reserved concurrency for a function.

Provisioned concurrency

You use reserved concurrency to define the maximum number of execution environments reserved for a Lambda function. However, none of these environments come pre-initialized. As a result, your function invocations may take longer because Lambda must first initialize the new environment before being able to use it to invoke your function. When Lambda has to initialize a new environment in order to carry out an invocation, this is known as acold start. To mitigate cold starts, you can use provisioned concurrency.

Provisioned concurrency is the number of pre-initialized execution environments that you want to allocate to your function. If you set provisioned concurrency on a function, Lambda initializes that number of execution environments so that they are prepared to respond immediately to function requests.

When using provisioned concurrency, Lambda still recycles execution environments in the background. For example, this can occurafter an invocation failure. However, at any given time, Lambda always ensures that the number of pre-initialized environments is equal to the value of your function's provisioned concurrency setting. Importantly, even if you're using provisioned concurrency, you can still experience a cold start delay if Lambda has to reset the execution environment.

In contrast, when using reserved concurrency, Lambda may completely terminate an environment after a period of inactivity. The following diagram illustrates this by comparing the lifecycle of a single execution environment when you configure your function using reserved concurrency compared to provisioned concurrency.

Comparison of reserved concurrency and provisioned concurrency behavior

The diagram has four points of interest:

TimeReserved concurrencyProvisioned concurrency

t1

Nothing happens.

Lambda pre-initializes an execution environment instance.

t2

Request 1 comes in. Lambda must initialize a new execution environment instance.

Request 1 comes in. Lambda uses the pre-initialized environment instance.

t3

After some inactivity, Lambda terminates the active environment instance.

Nothing happens.

t4

Request 2 comes in. Lambda must initialize a new execution environment instance.

Request 2 comes in. Lambda uses the pre-initialized environment instance.

To better understand provisioned concurrency, consider the following diagram:

Function scaling behavior when you configure provisioned concurrency on a critical function.

In this diagram, you have an account concurrency limit of 1,000. You decide to give 400 units of provisioned concurrency tofunction-orange. All functions in your account,includingfunction-orange, can use the remaining 600 units of unreserved concurrency.

The diagram has five points of interest:

  • Att1,function-orange begins receiving requests. Since Lambda has pre-initialized 400 execution environment instances,function-orange is ready for immediate invocation.

  • Att2,function-orange reaches 400 concurrent requests. As a result,function-orange runs out of provisioned concurrency. However, since there's still unreserved concurrency available, Lambda can use this to handle additional requests tofunction-orange (there's no throttling). Lambda must create new instances to serve these requests, and your function may experience cold start latencies.

  • Att3,function-orange returns to 400 concurrent requests after a brief spike in traffic. Lambda is again able to handle all requests without cold start latencies.

  • Att4, functions in your account experience a burst in traffic. This burst can come fromfunction-orange or any other function in your account. Lambda uses unreserved concurrency to handle these requests.

  • Att5, functions in your account reach the maximum concurrency limit of 1,000, and experience throttling.

The previous example considered only provisioned concurrency. In practice, you can set both provisioned concurrency and reserved concurrency on a function. You might do this if you had a function that handles a consistent load of invocations on weekdays, but routinely sees spikes of traffic on weekends. In this case, you could use provisioned concurrency to set a baseline amount of environments to handle request during weekdays, and use reserved concurrency to handle the weekend spikes. Consider the following diagram:

Function scaling behavior when you use both reserved and provisioned concurrency.

In this diagram, suppose that you configure 200 units of provisioned concurrency and 400 units of reserved concurrency forfunction-orange. Because you configured reserved concurrency,function-orange cannot use any of the 600 units of unreserved concurrency.

This diagram has five points of interest:

  • Att1,function-orange begins receiving requests. Since Lambda has pre-initialized 200 execution environment instances,function-orange is ready for immediate invocation.

  • Att2,function-orange uses up all its provisioned concurrency.function-orange can continue serving requests using reserved concurrency, but these requests may experience cold start latencies.

  • Att3,function-orange reaches 400 concurrent requests. As a result,function-orange uses up all its reserved concurrency. Sincefunction-orange cannot use unreserved concurrency, requests begin to throttle.

  • Att4,function-orange starts to receive fewer requests, and no longer throttles.

  • Att5,function-orange drops down to 200 concurrent requests, so all requests are again able to use provisioned concurrency (that is, no cold start latencies).

Both reserved concurrency and provisioned concurrency count towards your account concurrency limit andRegional quotas. In other words, allocating reserved and provisioned concurrency can impact the concurrency pool that's available to other functions. Configuring provisioned concurrency incurs charges to your AWS account.

To manage provisioned concurrency settings for your functions, seeConfiguring provisioned concurrency for a function. To automate provisioned concurrency scaling based on a schedule or application utilization, seeUsing Application Auto Scaling to automate provisioned concurrency management.

How Lambda allocates provisioned concurrency

Provisioned concurrency doesn't come online immediately after you configure it. Lambda starts allocating provisioned concurrency after a minute or two of preparation. For each function, Lambda can provision up to 6,000 execution environments every minute, regardless of AWS Region. This is exactly the same as theconcurrency scaling rate for functions.

When you submit a request to allocate provisioned concurrency, you can't access any of those environments until Lambda completely finishes allocating them. For example, if you request 5,000 provisioned concurrency, none of your requests can use provisioned concurrency until Lambda completely finishes allocating the 5,000 execution environments.

Comparing reserved concurrency and provisioned concurrency

The following table summarizes and compares reserved and provisioned concurrency.

TopicReserved concurrencyProvisioned concurrency

Definition

Maximum number of execution environment instances for your function.

Set number of pre-provisioned execution environment instances for your function.

Provisioning behavior

Lambda provisions new instances on an on-demand basis.

Lambda pre-provisions instances (that is, before your function starts receiving requests).

Cold start behavior

Cold start latency possible, since Lambda must create new instances on-demand.

Cold start latency not possible, since Lambda doesn't have to create instances on-demand.

Throttling behavior

Function throttled when reserved concurrency limit reached.

If reserved concurrency not set: function uses unreserved concurrency when provisioned concurrency limit reached.

If reserved concurrency set: function throttled when reserved concurrency limit reached.

Default behavior if not set

Function uses unreserved concurrency available in your account.

Lambda doesn't pre-provision any instances. Instead, if reserved concurrency not set: function uses unreserved concurrency available in your account.

If reserved concurrency set: function uses reserved concurrency.

Pricing

No additional charge.

Incurs additional charges.

Understanding concurrency and requests per second

As mentioned in the previous section, concurrency differs from requests per second. This is an especially important distinction to make when working with functions that have an average request duration of less than 100 ms.

Across all functions in your account, Lambda enforces a requests per second limit that's equal to 10 times your account concurrency. For example, since the default account concurrency limit is 1,000, functions in your account can handle a maximum of 10,000 requests per second.

For example, consider a function with an average request duration of 50 ms. At 20,000 requests per second, here's the concurrency of this function:

Concurrency = (20,000 requests/second) * (0.05 second/request) = 1,000

Based on this result, you might expect that the account concurrency limit of 1,000 is sufficient to handle this load. However, because of the 10,000 requests per second limit, your function can only handle 10,000 requests per second out of the 20,000 total requests. This function experiences throttling.

The lesson is that you must consider both concurrency and requests per second when configuring concurrency settings for your functions. In this case, you need to request an account concurrency limit increase to 2,000, since this would increase your total requests per second limit to 20,000.

The requests per second limit applies to all quotas in Lambda that involve concurrency. In other words, it applies to synchronous on-demand functions, functions that use provisioned concurrency, andconcurrency scaling behavior. For example, here are a few scenarios where you must carefully consider both your concurrency and request per second limits:

  • A function using on-demand concurrency can experience a burst increase of 500 concurrency every 10 seconds, or by 5,000 requests per second every 10 seconds, whichever happens first.

  • Suppose you have a function that has a provisioned concurrency allocation of 10. This function spills over into on-demand concurrency after 10 concurrency or 100 requests per second, whichever happens first.

Concurrency quotas

Lambda sets quotas for the total amount of concurrency that you can use across all functions in a Region. These quotas exist on two levels:

  • At the account level, your functions can have up to 1,000 units of concurrency by default. To increase this limit, seeRequesting a quota increase in theService Quotas User Guide.

  • At the function level, you can reserve up to 900 units of concurrency across all your functions by default. Regardless of your total account concurrency limit, Lambda always reserves 100 units of concurrency for your functions that don't explicitly reserve concurrency. For example, if you increased your account concurrency limit to 2,000, then you can reserve up to 1,900 units of concurrency at the function level.

  • At both the account level and the function level, Lambda also enforces a requests per second limit of equal to 10 times the corresponding concurrency quota. For instance, this applies to account-level concurrency, functions using on-demand concurrency, functions using provisoned concurrency, andconcurrency scaling behavior. For more information, seeUnderstanding concurrency and requests per second.

To check your current account level concurrency quota, use the AWS Command Line Interface (AWS CLI) to run the following command:

aws lambda get-account-settings

You should see output that looks like the following:

{ "AccountLimit":{ "TotalCodeSize": 80530636800, "CodeSizeUnzipped": 262144000, "CodeSizeZipped": 52428800, "ConcurrentExecutions": 1000, "UnreservedConcurrentExecutions": 900 }, "AccountUsage":{ "TotalCodeSize": 410759889, "FunctionCount": 8 }}

ConcurrentExecutions is your total account-level concurrency quota.UnreservedConcurrentExecutions is the amount of reserved concurrency that you can still allocate to your functions.

As your function receives more requests, Lambda automatically scales up the number of execution environments to handle these requests until your account reaches its concurrency quota. However, to protect against over-scaling in response to sudden bursts of traffic, Lambda limits how fast your functions can scale. This concurrency scaling rate is the maximum rate at which functions in your account can scale in response to increased requests. (That is, how quickly Lambda can create new execution environments.) The concurrency scaling rate differs from the account-level concurrency limit, which is the total amount of concurrency available to your functions.

In each AWS Region, and for each function, your concurrency scaling rate is 1,000 execution environment instances every 10 seconds (or 10,000 requests per second every 10 seconds). In other words, every 10 seconds, Lambda can allocate at most 1,000 additional execution environment instances, or accommodate 10,000 additional requests per second, to each of your functions.

Usually, you don't need to worry about this limitation. Lambda's scaling rate is sufficient for most use cases.

Importantly, the concurrency scaling rate is a function-level limit. This means that each function in your account can scale independently of other functions.

For more information about scaling behavior, seeLambda scaling behavior.

Tutorial: Creating a webhook endpoint
Configuring reserved concurrency

[8]
ページ先頭

©2009-2025 Movatter.jp