- Notifications
You must be signed in to change notification settings - Fork62
Adding query performance optimizations#57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
bd5bfd4
to6f0593a
Comparesbellware commentedMar 3, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@tee8z This is good stuff. Thanks. It's not clear yet that it's pertinent to most users. The point about write performance is top-of-mind, but it's not entirely definitive, either. It depends on the workloads. I'm really curious about your workloads. Having some more insight would help characterize the needs of the change, and the tradeoff. Typically, the query performance of retrieving a batch of messages from a category in order to process those messages is insignificant compared to the amount of time it takes to process those messages. Optimizing the batch retrieval query improves the performance of an infinitesimal fraction of the total time between the execution of batch retrieval queries. The typical workload looks like:
The slow part is the processing of the messages. Making the batch retrieval go faster doesn't make much of a noticeable impact on thesystem performance because for each message in the batch, there's at least one message write, or some other I/O. So, lets say that it takes 1000 units of time to retrieve and process 1000 messages. Out of that 1000 units of time, 1 unit of time might be taken up by the retrieval. Optimizing 1/1000th of the operation usually doesn't yield an improvement worth undertaking with the extra cost of ownership of the additional implementation. In other workloads, like bulk loading and data analysis for example, the category batch retrieval might happen in a tight loop, which makes its performance for more significant. Can you disclose the kind of workload you have, or can you characterize how much time is spent in the retrieval of a batch and the processing of a batch? Thanks! |
We have found that adding these few indexes and adding the most common query paths for the get_category_messages function as static queries to greatly improve the performance of our application using this database model. Based off of our own experience these changes have been able to reduce the query time for ~1000 rows using the get_category_messages from ~200ms to ~6ms. We leverage the get_category_messages heavily throughout our application logic to get the next batch of messages for a given stream to process, so this has had massively positive benefits. The additional indexes have also helped in that speed up. One thing to note, we are adding 3 indexes so this may slow down the writes some what and should be watch closely to see how often the indexes are hit for a given use-case.