Posted onAug 15, 2023 • Originally published atdragonflydb.io

Fearlessly Handling BigKeys with Dragonfly

#database #redis

Introduction

Dragonfly stakes its claim as the most performant in-memory data store on earth, boasting an impressive 4 million ops/sec on an AWS EC2c6gn.16xlarge instance and fully leveraging the CPU and memory resources.
Given that computer memory is a precious commodity, Dragonfly’s mission is to utilize it smartly and efficiently.
However, just like any traditional, successful technology, there are limitations to consider when building applications around them.

One often-overlooked pitfall when working with an in-memory data store is the BigKey issue.
Ideally, an in-memory data store instance should be highly available, responding to read and write requests within sub-milliseconds.
However, unnecessary BigKeys can cause slowdowns or even make an in-memory data store unresponsive.
The reasoning is clear: Despite the blazing speed of modern memory hardware and operating systems, real-world applications still often require operations with O(M) time complexity.
When you need to perform an ultra-fast operation M times, the latency required can mount up, causing what might initially seem fast to instead be perceived as slow, depending on the magnitude of M.

In this blog post, we delve into the infamous BigKey issue and establish best practices for working with them in Dragonfly.
While we stand by the statement made at the beginning of this post, it's also crucial to understand any technology's limits.
By exploring both the strengths and constraints, we hope to provide a comprehensive understanding of working with BigKeys in Dragonfly as well as how it compares with Redis.

What is a BigKey?

Contrary to what the name might suggest, a BigKey within an in-memory data store isn't about the length of the key's name but rather the size of the value associated with that key.
To give you a basic understanding of how Dragonfly (as well as Redis) organizes data from an API/command perspective, there is a global hash map storing key-value pairs.
The value stored against each key may be of different data types, including String, List, Set, Sorted-Set, Hash, and so on.

The capability to store multiple items in List, Set, Sorted-Set, and Hash data types means that these keys can potentially become BigKeys when they contain a large number of elements.
It's worth noting that, theoretically, a very large value could also be stored on a String key.
While Redis limits this to 512MB, it would arguably be more practical to select the appropriate data type for the use case rather than saving such a vast blob of data as a String with an in-memory data store.
Let's look at an example of various data types being stored below:

Understanding how values can become oversized within an in-memory data store is important.
But how do we define "too big"? Alibaba Cloud provides an overview of the characteristics of BigKeys in theirblog post below:

A String key whose size of value is 5MB (the data is too large).
A List key with 20,000 elements (the number of elements in the list is excessive).
A Sorted-Set key with 10,000 members (the number of members is excessive).
A Hash key whose size is 100MB even if only contains 1,000 members (the size of the key is too large).

Essentially, a BigKey refers to a key that has a large-sized value attached to it.

However, it's important to note that these characteristics may vary, and different systems and applications may have unique definitions for BigKeys.
In our context:

If the value stored with a key is large enough to impact the performance of the in-memory data store, and subsequently your applications, we would classify that as a BigKey.

Let's explore how BigKeys would impact performance with a case study.

Case Study - BigKey in a Real World Application

The Scenario

Picture this: We're operating a web application that's seeing misuse of our APIs by some users.
To tackle this, we want to utilize an in-memory data store to establish a block list using user IDs.
This way, any API request would be halted if it came from a user listed in the block list. Here are the finer details:

User IDs are UUID V4 strings.
Each blocked user ID must be paired with a reason, which is a string value. This rules out the use of a Bloom Filter, as we need to record the blocking rationale.
The exact user ID must be recorded for analysis purposes. Deletion or expiration of these IDs isn't an option until the analytics team completes their work. This requirement means we can't set automatic data expiration. Instead, we need to offer the analytics team an internal API endpoint or a similar means to delete the data.
Considering the scale of our user base, potentially millions of user IDs could be blocked at a given time.

Given these requirements, the Hash data type emerges as a strong candidate since it:

Enforces uniqueness on the fields, which can be utilized to store user IDs.
Allows us to store the blocking reason in the values.
Ensures accessing a single field/value pair takes O(1) time - ideal for our use case.

Below is an illustration of our potential in-memory storage:

The Hashfield comprises a UUID V4 string, representing the user ID.
The Hashvalue denotes the specific reason for blocking a user.
In total, we have 10 million field/value pairs stored with thissingle Hash data type namedblocked_user_ids.

While this scenario may not perfectly mirror every real-world situation, it presents a close approximation that helps us understand the dynamics at play.
Now, let's delve into the potential impacts such a BigKey can have on our system.

BigKey Memory Usage

We loaded a Dragonfly instance and a Redis instance from a snapshot containing a Hash keyblocked_user_ids with 10 million field/value pairs, using.dfs files for Dragonfly and a.rdb file for Redis, respectively.

Dragonfly requires only 614.75MB to store this BigKey.
In contrast, Redis consumes approximately 891.88MB to store this BigKey.

These figures reinforce Dragonfly's superior efficiency in memory utilization compared to Redis, resulting in steady-state memory usage that's 30% less, as detailed in theRedis vs. Dragonfly Scalability and Performance blog post.

Read the BigKey

While it's common practice to avoid performing a full-read (i.e.,HGETALL) on such a large key, for experimental purposes, we conducted this operation on both Dragonfly and Redis.
The time complexity ofHGETALL is O(N), where N represents the size of the Hash, which in our case is a substantial 10 million.
In a real-world scenario, smaller batch reads (i.e.,HSCAN) would be the more advisable approach. Our experiment was conducted using the following methodology:

To minimize the impact of network latency, the client was running in the same region of GCP as the server.
We implemented two threads: one responsible for reading a key from Dragonfly/Redis in a loop for thousands of iterations, and another that issued theHGETALL command with a slight delay.
TheHGETALL command was strategically placed somewhere in between the thousands of read commands, concurrently.
We recorded the time taken for theHGETALL command, as well as the P50 through P99.9 and maximum latencies for the thousands of read commands.

By implementing this experimental setup, we were able to evaluate the performance and latency of theHGETALL command in the context of Dragonfly and Redis:

Dragonfly	1st Run	2nd Run	3rd Run	4th Run	5th Run
`HGETALL`	13.64s	13.49s	11.98s	12.62s	13.87s
Other Reads - P50	0.216ms	0.240ms	0.215ms	0.206ms	0.204ms
Other Reads - P90	0.279ms	0.312ms	0.288ms	0.274ms	0.281ms
Other Reads - P99	0.353ms	0.402ms	0.367ms	0.354ms	0.360ms
Other Reads - P99.9	0.614ms	0.481ms	0.550ms	0.465ms	0.542ms
Other Reads - Max	0.746ms	1.41ms	0.948ms	0.544ms	0.584ms

Redis	1st Run	2nd Run	3rd Run	4th Run	5th Run
`HGETALL`	15.68s	20.09s	18.41s	18.41s	18.22s
Other Reads - P50	0.193ms	0.210ms	0.227ms	0.213ms	0.205ms
Other Reads - P90	0.228ms	0.265ms	0.293ms	0.279ms	0.287ms
Other Reads - P99	0.324ms	0.468ms	1.49ms	6.09ms	5.77ms
Other Reads - P99.9	19.7ms	20.3ms	20.4ms	18.5ms	20.2ms
Other Reads - Max	4.47s	4.48s	4.45s	4.47s	4.51s

The results of our experiment clearly demonstrate the considerable slowness of performing anHGETALL full-read operation, primarily due to its O(N) time complexity and network I/O.
While we expected some degree of sluggishness for both Dragonfly and Redis, the disparity between the two systems is significant.

A crucial aspect to consider is the single-threaded nature of Redis' command execution architecture (as discussedbelow).
During the processing of theHGETALL command in Redis, all other concurrent command executions experience a substantial pause for a significantly longer duration (measured in seconds), exacerbating the overall performance impact.

In contrast, Dragonfly showcases its strength by maintaining responsiveness to read and write operations despite the ongoing full-read.
The only bottleneck experienced in this scenario is the client, which awaits the full-read response.
This advantage can be attributed to Dragonfly's multi-threaded, shared-nothing architecture.

As stated earlier, real-world applications should ideally useHSCAN when they need to iterate through all fields of a Hash key andHGET for individual field access.
In the same vein, a Hash key of this scale should be built gradually for write operations rather than instantaneously from one substantial blob of data.
Real-world applications should utilizeHSET with a manageable number of field/value pairs for each call.
When utilizingHGET,HSCAN, andHSET with appropriate parameters, both Dragonfly and Redis are capable of handling these commands properly for our use case.

Delete the BigKey

The end of the road for ourblocked_user_ids key arrives once the analysis team completes its work.
The key's deletion from the in-memory store becomes necessary to free up memory for future use.
While deletion might appear to be a straightforward operation, it often presents unexpected challenges, especially when dealing with BigKeys.
In our experiment, we used the synchronousDEL command to remove this key from both Dragonfly and Redis with the same methodology as above, and the results were as follows:

Dragonfly	1st Run	2nd Run	3rd Run	4th Run	5th Run
`DEL`	367.9ms	374.3ms	372.4ms	371.2ms	372.2ms
Other Reads - P50	0.218ms	0.236ms	0.204ms	0.190ms	0.203ms
Other Reads - P90	0.292ms	0.315ms	0.278ms	0.251ms	0.276ms
Other Reads - P99	0.379ms	0.418ms	0.350ms	0.330ms	0.356ms
Other Reads - P99.9	0.501ms	0.602ms	0.488ms	0.443ms	0.535ms
Other Reads - Max	0.773ms	1.42ms	0.686ms	0.489ms	0.619ms

Redis	1st Run	2nd Run	3rd Run	4th Run	5th Run
`DEL`	5.677s	5.689s	5.719s	5.711s	5.656s
Other Reads - P50	0.183ms	0.225ms	0.194ms	0.210ms	0.204ms
Other Reads - P90	0.258ms	0.315ms	0.265ms	0.277ms	0.288ms
Other Reads - P99	0.352ms	0.463ms	0.353ms	0.371ms	0.389ms
Other Reads - P99.9	0.517ms	1.17ms	1.45ms	0.931ms	0.868ms
Other Reads - Max	5.67s	5.68s	5.71s	5.71s	5.65s

Based on the results above, it is noteworthy that Dragonfly consistently demonstrated efficient deletion of the BigKey across multiple trials.
In stark contrast, Redis took seconds to delete the BigKey.
Notably, during the deletion of this key, Redis suspended all other concurrent operations for a significantly extended duration, further highlighting the impact of BigKey deletion on Redis' single-threaded execution model.
Dragonfly's ability to handle deletion efficiently, while simultaneously maintaining responsiveness to other operations, showcases the advantages of its modern architecture.

Discussion & Best Practices

In light of the experiments conducted, it is clear that BigKeys, while useful, can exert significant pressure on in-memory databases, potentially resulting in high memory usage and increased latency if not managed with caution.

BigKeys consume considerable memory. Their use should be limited to scenarios where they are indispensable for your application's requirements. Otherwise, consider resorting to probabilistic data types, such as HyperLogLog or Bloom Filter, for storing statistical data pertaining to a large number of IDs.
Performing full reads (such asHGETALL andSMEMBERS) on BigKeys and the deletion (DEL) operation can have major costs and negatively impact other concurrent commands.

However, in situations where the use of these resource-intensive BigKeys is unavoidable, extra caution should be exercised when using Redis.
Conversely, Dragonfly offers greater flexibility and peace of mind.

As quoted below fromthe Redis documentation:

Redis is, mostly, asingle-threaded server from the POV of commands execution (actually modern versions of Redis use threads for different things).

If Redis is executing an expensive command likeHGETALL orDEL,this expensive operation will pause the execution of all other concurrent commands.
New commands arriving at the Redis server will be put on hold until the completion of the expensive command.
While this might not initially seem problematic, consider that your in-memory data store typically responds to clients within sub-milliseconds.
A sudden surge in the latency of a few seconds can trigger a cascading effect, potentially crippling your backend system.

While working with Dragonfly, the execution of expensive commands can also be blocking.
However, thanks to the multi-threaded, shared-nothing architecture, the Dragonfly server would continue to function normally without noticeable latency, maintaining its ability to respond within sub-milliseconds to other concurrent commands.

Considering the aforementioned points, it is crucial to consider whether anHGETALL full-read is necessary for our application:

For a Hash key with only a few tens of fields, it's absolutely fine.
However, for a Hash resembling theblocked_user_ids example demonstrated earlier, opting forHSCAN is definitely advisable.

In the meantime, compared to a full-read, the deletion of a BigKey seems somewhat inevitable.
One might wonder if there are better methods to perform deletion on BigKeys beyond just theDEL command.
The answer is yes, and this leads us to our next exploration.

Best Practice - Draining the BigKey

One traditional method of deleting a BigKey from Redis is to gradually drain the key from the client side.
This can be done by using theHSCAN andHDEL commands, as shown in the Go snippet below:

packagemainimport("context""log""github.com/redis/go-redis/v9")const(hashKey="blocked_user_ids"batchSize=100)funcmain(){ctx:=context.Background()rdb:=redis.NewClient(&redis.Options{Addr:"127.0.0.1:6379",Username:"",Password:"",})_,err:=rdb.Ping(ctx).Result()iferr!=nil{log.Fatalln(err)}cursor:=uint64(0)for{log.Printf("draining hash key, cursor: %d\n",cursor)fields,nextCursor,err:=rdb.HScan(ctx,hashKey,cursor,"*",batchSize).Result()iferr!=nil{log.Fatalln(err)}iflen(fields)>0{iferr:=rdb.HDel(ctx,hashKey,fields...).Err();err!=nil{log.Fatalln(err)}}cursor=nextCursorifcursor==0{break}}log.Println("draining hash key completed")}

The loop continues until the cursor returns to 0, signaling the end of the Hash.
The underlying principle behind this pattern is to issue multipleHSCAN andHDEL commands with a reasonable batch size, allowing Redis to execute other commands in between these smaller batches on its single thread.

Though it's not strictly necessary, applying the same pattern to Dragonfly could still be beneficial as a general technique.
However, bear in mind that even if we use theDEL command to delete the entire key from Dragonfly, blocking the current client, executions for other concurrent commands would continue normally.

Best Practice - Lazy Free with`UNLINK`

To address the issue of BigKey deletion, Redis introduced theUNLINK command starting from version 4.0 onwards.
This command evaluates the cost of “deleting a key and reclaiming the corresponding memory”, and then decides whether to proceed in the main thread or delegate it to a background thread.
In addition, from version 6.0 onwards, Redis provides a configurationlazyfree-lazy-user-del which, when set toyes, makesDEL behave the same asUNLINK.

As of Dragonfly version 1.8.0, theUNLINK command behaves the same asDEL, which is a synchronous operation.
However, as we've noted multiple times, other concurrent command executions would not be highly impacted while Dragonfly is running expensive commands.

Best Practice - Lazy Free with`EXPIRE`

Inthis section above, we established the application scenario where we can't automatically expire theblocked_user_ids BigKey, as we need to wait for the analytics team to complete their work before deletion.
This scenario leads to the use of theDEL command.
However, once the analytics team is done, we can utilize expiration commands (EXPIRE,PEXIRE,EXPIREAT, orPEXIREAT) to delete the BigKey, if it aligns with the application logic.

In Redis, expired keys are immediately removed from the keyspace, turning them inaccessible to the application.
However,the actual deletion (reclaiming the memory) is carried out periodically in the main thread by default.
This means if Redis needs to expire a BigKey or many keys simultaneously, it risks stalling the main thread, consequently preventing the Redis server from responding to other commands during this process.
More details can be foundhere. Starting from Redis version 4.0, we can configurelazyfree-lazy-expire toyes to perform the actual deletion in the background.

In Dragonfly, while there may be some slight delay (similar to what we have seen using theDEL command) observed on other commands at the moment of
reclaiming the memory occupied by the BigKey, the resulting latency has minimal impact on the overall performance of the Dragonfly server.

Dragonfly Specific - Item Level Expiration

Speaking of expiration, let's slightly alter our application scenario.
If every other requirement remains the same, but we now need to block an individual user ID for a set period of time (say 24 hours) without additional concern for the analytics team, how do we achieve this?

In Redis, we cannot use a Hash data type anymore since expiration can only be applied at the key level.
To meet this requirement, we could opt to use the String data type and store 10 million records for each user.
While feasible, this approach would comparatively increase memory usage.

Dragonfly aims to be fully compatible with Redis, ensuring you don't need to modify a line of code when switching to Dragonfly, while simultaneously offering more.
You may have noticed that Dragonfly is implementing additional practical commands to address common problems like this one.
One such command isHSETEX (documentation foundhere),which sets and expires an individual field of a Hash data type.
Note that this command is currently experimental, but by utilizing it, we can effortlessly implement the logic of blocking an individual user ID for a set period without any difficulties.
Similarly, Dragonfly offers theSADDEX (documentation foundhere) command for the Set data type, which sets and expires a single member.

By leveraging theHSETEX command, we can achieve our desired outcome while still working with a Hash BigKey.
This approach allows us to conserve memory without the burden of using 10 million individual String keys.
Additionally, our write and auto-expire operations utilizingHSETEX, as well as read operations usingHGET orHSCAN, only affect a small subset of field/value pairs within the BigKey.
This advantage eliminates the need for manual key draining and simplifies our workflow by allowing individual field/value pairs to expire naturally.
The ability of Dragonfly to utilize theHSETEX command offers a convenient and efficient solution for managing BigKeys.

Appendix - Test Environment

We used the following Dragonfly version, Redis version, and cloud instances to gather the data points used in this blog:

Dragonfly --1.8.0
Redis --7.0.12
Server Instance -- GCPc2-standard-16 (16 vCPU, 8 core, 64GB memory)
Client Instance -- GCPe2-standard-2 (2 vCPU, 1 core, 8GB memory)
Client and server instances are in the same region and communicate over a private network.

We went with a moderate instance for the server, not the larger memory-optimized ones, for those experiments we have conducted in this blog.
Since our goal wasn't to push DragonflyDB or Redis to their limits, as we did in theRedis vs. Dragonfly Scalability and Performance blog post,
but to see how a single BigKey could impact their performance.
This moderatec2-standard-16 instance is strong in processing power and has enough memory to hold the BigKey we tested.

Conclusion

In this blog post, we explored the concept of BigKeys and the challenges they pose in traditional in-memory databases like Redis.
To recap, here are the important best practices to follow when working with BigKeys:

Avoid performing full-reads (HGETALL,SMEMBERS) on BigKeys.
Utilize scan/range commands (LRANGE,SSCAN,ZSCAN,ZRANGE,HSCAN) when iterating through all the items within a BigKey.
Prefer write commands (LPUSH,SSAD,ZADD,HSET) with a reasonable size.
Choose read commands and utility commands for different data types wisely.
When feasible, leverage Dragonfly's specific commands (HSETEX,SADDEX).
Use caution when deleting BigKeys.

While there are still best practices and patterns to follow, working with BigKeys in Dragonfly becomes a much more
seamless and fearless endeavour due to its revolutionary architecture and outstanding hardware efficiency.

To dive deeper into Dragonfly, we encourage you to refer to ourdocumentation,
which will help you get started with running Dragonfly in just a few minutes.
Additionally, ourblog features a range of engaging posts that highlight the remarkable capabilities of Dragonfly. Happy coding!

Top comments(0)

For further actions, you may consider blocking this person and/orreporting abuse

Movatterモバイル変換

DEV Community

Fearlessly Handling BigKeys with Dragonfly

Introduction

What is a BigKey?