Movatterモバイル変換


[0]ホーム

URL:


Scanning tables in DynamoDB - Amazon DynamoDB
DocumentationAmazon DynamoDBDeveloper Guide
Filter expressions for scanLimiting the number of items in the result setPaginating the resultsCounting the items in the resultsCapacity units consumed by scanRead consistency for scanParallel scan

Scanning tables in DynamoDB

AScan operation in Amazon DynamoDB reads every item in a table or a secondary index. By default, aScan operation returns all of the data attributes for every item in the table or index. You can use theProjectionExpression parameter so thatScan only returns some of the attributes, rather than all of them.

Scan always returns a result set. If no matching items are found, the result set is empty.

A singleScan request can retrieve a maximum of 1 MB of data. Optionally, DynamoDB can apply a filter expression to this data, narrowing the results before they are returned to the user.

Filter expressions for scan

If you need to further refine theScan results, you can optionally provide a filter expression. Afilter expression determines which items within theScan results should be returned to you. All of the other results are discarded.

A filter expression is applied after aScan finishes but before the results are returned. Therefore, aScan consumes the same amount of read capacity, regardless of whether a filter expression is present.

AScan operation can retrieve a maximum of 1 MB of data. This limit applies before the filter expression is evaluated.

WithScan, you can specify any attributes in a filter expression—including partition key and sort key attributes.

The syntax for a filter expression is identical to that of a condition expression. Filter expressions can use the same comparators, functions, and logical operators as a condition expression. SeeCondition and filter expressions,operators, and functions in DynamoDB for more information about logical operators.

Example

The following AWS Command Line Interface (AWS CLI) example scans theThread table and returns only the items that were last posted to by a particular user.

aws dynamodb scan \ --table-name Thread \ --filter-expression "LastPostedBy = :name" \ --expression-attribute-values '{":name":{"S":"User A"}}'

Limiting the number of items in the result set

TheScan operation enables you to limit the number of items that it returns in the result. To do this, set theLimit parameter to the maximum number of items that you want theScan operation to return, prior to filter expression evaluation.

For example, suppose that youScan a table with aLimit value of6 and without a filter expression. TheScan result contains the first six items from the table.

Now suppose that you add a filter expression to theScan. In this case, DynamoDB applies the filter expression to the six items that were returned, discarding those that do not match. The finalScan result contains six items or fewer, depending on the number of items that were filtered.

Paginating the results

DynamoDBpaginates the results fromScan operations. With pagination, theScan results are divided into "pages" of data that are 1 MB in size (or less). An application can process the first page of results, then the second page, and so on.

A singleScan only returns a result set that fits within the 1 MB size limit.

To determine whether there are more results and to retrieve them one page at a time, applications should do the following:

In other words, theLastEvaluatedKey from aScan response should be used as theExclusiveStartKey for the nextScan request. If there is not aLastEvaluatedKey element in aScan response, you have retrieved the final page of results. (The absence ofLastEvaluatedKey is the only way to know that you have reached the end of the result set.)

You can use the AWS CLI to view this behavior. The AWS CLI sends low-levelScan requests to DynamoDB, repeatedly, untilLastEvaluatedKey is no longer present in the results. Consider the following AWS CLI example that scans the entireMovies table but returns only the movies from a particular genre.

aws dynamodb scan \ --table-name Movies \ --projection-expression "title" \ --filter-expression 'contains(info.genres,:gen)' \ --expression-attribute-values '{":gen":{"S":"Sci-Fi"}}' \ --page-size 100 \ --debug

Ordinarily, the AWS CLI handles pagination automatically. However, in this example, the AWS CLI--page-size parameter limits the number of items per page. The--debug parameter prints low-level information about requests and responses.

If you run the example, the first response from DynamoDB looks similar to the following.

2017-07-07 12:19:14,389 - MainThread - botocore.parsers - DEBUG - Response body:b'{"Count":7,"Items":[{"title":{"S":"Monster on the Campus"}},{"title":{"S":"+1"}},{"title":{"S":"100 Degrees Below Zero"}},{"title":{"S":"About Time"}},{"title":{"S":"After Earth"}},{"title":{"S":"Age of Dinosaurs"}},{"title":{"S":"Cloudy with a Chance of Meatballs 2"}}],"LastEvaluatedKey":{"year":{"N":"2013"},"title":{"S":"Curse of Chucky"}},"ScannedCount":100}'

TheLastEvaluatedKey in the response indicates that not all of the items have been retrieved. The AWS CLI then issues anotherScan request to DynamoDB. This request and response pattern continues, until the final response.

2017-07-07 12:19:17,830 - MainThread - botocore.parsers - DEBUG - Response body:b'{"Count":1,"Items":[{"title":{"S":"WarGames"}}],"ScannedCount":6}'

The absence ofLastEvaluatedKey indicates that there are no more items to retrieve.

Counting the items in the results

In addition to the items that match your criteria, theScan response contains the following elements:

  • ScannedCount — The number of items evaluated, before anyScanFilter is applied. A highScannedCount value with few, or no,Count results indicates an inefficientScan operation. If you did not use a filter in the request,ScannedCount is the same asCount.

  • Count — The number of items that remain,after a filter expression (if present) was applied.

If the size of theScan result set is larger than 1 MB,ScannedCount andCount represent only a partial count of the total items. You need to perform multipleScan operations to retrieve all the results (seePaginating the results).

EachScan response contains theScannedCount andCount for the items that were processed by that particularScan request. To get grand totals for all of theScan requests, you could keep a running tally of bothScannedCount andCount.

Capacity units consumed by scan

You canScan any table or secondary index.Scan operations consume read capacity units, as follows.

If youScan a...DynamoDB consumes read capacity units from...
TableThe table's provisioned read capacity.
Global secondary indexThe index's provisioned read capacity.
Local secondary indexThe base table's provisioned read capacity.

By default, aScan operation does not return any data on how much read capacity it consumes. However, you can specify theReturnConsumedCapacity parameter in aScan request to obtain this information. The following are the valid settings forReturnConsumedCapacity:

  • NONE — No consumed capacity data is returned. (This is the default.)

  • TOTAL — The response includes the aggregate number of read capacity units consumed.

  • INDEXES — The response shows the aggregate number of read capacity units consumed, together with the consumed capacity for each table and index that was accessed.

DynamoDB calculates the number of read capacity units consumed based on the number of items and the size of those items, not on the amount of data that is returned to an application. For this reason, the number of capacity units consumed is the same whether you request all of the attributes (the default behavior) or just some of them (using a projection expression). The number is also the same whether or not you use a filter expression.Scan consumes a minimum read capacity unit to perform one strongly consistent read per second, or two eventually consistent reads per second for an item up to 4 KB. If you need to read an item that is larger than 4 KB, DynamoDB needs additional read request units. Empty tables and very large tables which have a sparse amount of partition keys might see some additional RCUs charged beyond the amount of data scanned. This covers the cost of serving theScan request, even if no data exists.

Read consistency for scan

AScan operation performs eventually consistent reads, by default. This means that theScan results might not reflect changes due to recently completedPutItem orUpdateItem operations. For more information, seeDynamoDB read consistency.

If you require strongly consistent reads, as of the time that theScan begins, set theConsistentRead parameter totrue in theScan request. This ensures that all of the write operations that completed before theScan began are included in theScan response.

SettingConsistentRead totrue can be useful in table backup or replication scenarios, in conjunction withDynamoDB Streams. You first useScan withConsistentRead set to true to obtain a consistent copy of the data in the table. During theScan, DynamoDB Streams records any additional write activity that occurs on the table. After theScan is complete, you can apply the write activity from the stream to the table.

Parallel scan

By default, theScan operation processes data sequentially. Amazon DynamoDB returns data to the application in 1 MB increments, and an application performs additionalScan operations to retrieve the next 1 MB of data.

The larger the table or index being scanned, the more time theScan takes to complete. In addition, a sequentialScan might not always be able to fully use the provisioned read throughput capacity: Even though DynamoDB distributes a large table's data across multiple physical partitions, aScan operation can only read one partition at a time. For this reason, the throughput of aScan is constrained by the maximum throughput of a single partition.

To address these issues, theScan operation can logically divide a table or secondary index into multiplesegments, with multiple application workers scanning the segments in parallel. Each worker can be a thread (in programming languages that support multithreading) or an operating system process. To perform a parallel scan, each worker issues its ownScan request with the following parameters:

The following diagram shows how a multithreaded application performs a parallelScan with three degrees of parallelism.

A multithreaded application that performs a parallel scan by dividing a table into three segments.

In this diagram, the application spawns three threads and assigns each thread a number. (Segments are zero-based, so the first number is always 0.) Each thread issues aScan request, settingSegment to its designated number and settingTotalSegments to 3. Each thread scans its designated segment, retrieving data 1 MB at a time, and returns the data to the application's main thread.

The values forSegment andTotalSegments apply to individualScan requests, and you can use different values at any time. You might need to experiment with these values, and the number of workers you use, until your application achieves its best performance.

Other aspects of queries
PartiQL query language

[8]
ページ先頭

©2009-2025 Movatter.jp