
Amazon DynamoDB: pointer strategy
The transition from relational databases to the world of NoSQL is not easy. Literally, everything is different. Sometimes this can be so overwhelming that you don't know how to solve a problem so trivial that virtually nonexistent in the world of SQL databases.
Today, I want to tell you about such a problem.
Imagine that in your database you storeOrder
records in the table. And you need information about the last (newest) order number. (Why? It's not relevant now.)
Last Order number in SQL
When using SQL this problem is trivial. All you need to do is write:
SELECTorderIdFROMOrdersORDERBYorderIdDESCLIMIT1;
The query is simple and very efficient (especially when we have indexes). Well done SQL 👏
Last Order number in DynamoDB
And this is where it starts to be problematic. I won't give you a solution right away because to fully understand it some knowledge about DynamoDB is required.
A bit of theory about DynamoDB
Let's start by saying that the equivalent of SQL'sSELECT
in DynamoDB are two commandsscan
andquery
. Both are used to retrieve information from a table (in DynamoDB we only have the concept of a table, not the entire database). However, they are significantly different from each other. Thescan
method scans the contents of thewhole table and returns it to us as a collection of elements.
For the following tablescan
returns6
items because there are six orders in the whole table.
When usingscan
we could retrieve information about all orders and process them in thecode, e.g. in the Lambda function, filter the data and return the last order number.
This approach is obviously theworst possible, as we are retrieving much more data from the database than we want (collection of Orders vs single Order number). This will affect the running time of our application, but also the cost becausein DynamoDB we pay for each query / amount of data returned. To make matters worse, when dealing with large amounts of data, we have to page through the results.
So let us consider other options.
Thequery
method is used to retrieve data from alocal collection, that is, elements in the database that share a common (same value)Partition Key.
I'm trying to make this article concise and simple, so I'm not going to discuss how Amazon DynamoDB is internally designed. However, I strongly encourage you to learn it on your own, because it is simply a good architecture school and it will help you understand why this database is used the way it is.
In our order table (above), each order has a differentPartition Key value, so using thequery
method simply doesn't make sense here since we don't have local collections.
On a side note, a local collection for an Order could be a list of goods in the single order. In that case, each item'sPrimary Key would consist of two values (a composite key):
- Partition Key
- Sort Key
The table would change to the state presented below. Then callingquery
with the parameterorder#2
would return us two items because in this order someone bought two products: DataLake Training and Consulting.
However, this is a different data access pattern in DynamoDB, and I've presented it to you as a side note.
Let's move back to the main problem.
What can we do since neitherscan
norquery
are suitable to return the latest Order number?
The GetItem method
There is alsoGetItem
, a method that returns us a single item from the database when given a specificPrimary Key value. It's kind of likegetById(id)
or in SQL:
SELECT*FROMOrdersWHEREorderId='orderNumber';
That's cool, but how could we get thisorderNumber
if this is what we want to pull from the database in the first place? 🤔
Pointer strategy
This is where the pointer strategy comes in.
We can replace an unknown value with something known, some constant, and refer to the database through that constant. This element with the constantPartition Key will beonly one in the entire table. Thus we can use it as a pointer to store the value of the most recent order.
To our first table, we add another element with aPartition Key ofalways equal toLAST_ORDER
(ourconstant string). Such an element has one attribute namedOrderId
with the value of the most recent order. Each time we add a new Order to the table, we also update the value of theLAST_ORDER
element to the value of the neworderId
.
Now, all we need to do is call a simplegetItem('LAST_ORDER')
method that returns the last order number for us in anoptimal way.
Advanced techniques
Taking into account that the DynamoDB table is usually part of a larger system, let's consider what happens when multiple processes write to the database in parallel (directly or through the SQS queue - doesn't matter). Certainly, at some point it will happen that the last item written to the database, will not be the last order. Then theLAST_ORDER
pointer will wrongly point to an older Order.
We could solve this problem by using theSQS FIFO queue, but there is a much simpler and cheaper solution for this.
We just need to apply aConditionExpression
when saving a new version of theLAST_ORDER
element, which will check if the neworderId
value is greater than the one currently stored in the database. If it is it will update it, if not, it will not.
Thanks to this, using one write to the database, we can update the value without downloading and checking iton the code side. In addition, this method isidempotent (in case we get the same event multiple times, it will not change the database state on subsequent calls).
How to do it?
It's time for some code. My sample implementation in #"http://www.w3.org/2000/svg" width="20px" height="20px" viewbox="0 0 24 24">
Code listing explanation:
- Line 4 - the Pointer class implements a method that converts a
Pointer
object into the JSON that the DynamoDB API expects. Below is the implementation of this method. - Line 7 - conditional expression: when to write to the database and when not to? The
attribute_not_exists(#orderId)
is needed so that when the database is empty, the code also executes and writes the item for the first time. From this point on, only the second part#orderId < :newId
of the condition will have meaning. - Line 19 - if the condition is not met, the DynamoDB API returns a
ConditionalCheckFailedException
, which in our case is expected sooner or later.
And here is aPointer
class implementation.
classPointer{constructor({orderId,createdAt=newDate()}){this.orderId=parseInt(orderId)this.createdAt=createdAtinstanceofDate?createdAt:newDate(createdAt)}key(){return{PK:{S:'LAST_ORDER'}}}staticfromItem(item){returnnewPointer({orderId:item.orderId.N,createdAt:item.createdAt.S})}toItem(){return{...this.key(),orderId:{N:this.orderId.toString()},createdAt:{S:this.createdAt.toISOString()},}}}
Please note, the style of this class isripped off from Alex DeBrie 😃
He is an author ofThe DynamoDB Book which is the best thing to teach yourself how to use this database. Since I read that book I stared using many advanced techniques that he described. I highly recommend his book.
Summary
I hope this simplified, but taken from my real system, example helped you better understand how to use DynamoDB. I've only covered a small area of knowledge, but at the same time I've shown you how to solve a specific problem you may encounter in your projects.
I'm aware that DynamoDB has a steep learning curve - there's no denying that - but honestly,I can't imagine serverless systems without Amazon DynamoDB database. In 9 out of 10 cases, I would choose DynamoDB over AWS RDS (including Aurora Serverless).