Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Erik Hatcher
Erik Hatcher

Posted on • Edited on

     

When NOT to use Atlas Search

Design reviews are one-on-one meetings where MongoDB experts deliver advice on data modeling best practices and application design challenges. In this series, we are going to explore common real-life scenarios where design reviews helped developers achieve meaningful success with MongoDB. -How to Align Your Data Model With Your Application Needs When Migrating From RDBMS to MongoDB | by Néstor Daza

Note: I’m taking the opportunity to link liberally, sometimes loony-ily. I love theserendipity of followinginterestinglinks. I had fun researching and reminding myself of oldies but goodies. Here’s to at least some of the shinypaths followed being entertaining and educational to you too.

We’ve got to start with a couple of assumptions for this article to best fit:

  • You’ve got documents in MongoDB Atlas.
  • The documents need to be findable.

If your documents aren’t in Atlas, then Atlas Search doesn’t (yet) apply, and thus the rest of this write-up ismoot.

If anything, I’mpragmatic andagile. Duct tape, twine, or acard catalog—use what works for the job. Whether you useMongoDB or not, the document model is a good way to think about data challenges and worth having handy when the time is right. Do consider Atlas for your future data needs, as it’s aplatform that provides a lot of necessary and powerful capabilities.Just sayin’.

Findability is one such necessary database capability. If you can’t find your content, it may as well not exist.

Atlas Search enables powerful, scalable, and relevant search features. Its strength primarily stems fromone little, old Java library. There’s a potent elixir in that .jar! And it has beenThe Solution to All The Challenges for the bulk of my career. One rather fun aspect of my life at MongoDB is tackling Design Reviews that involve some aspect of search. I excel at, and enjoy, solving concrete search problems. These reviews typically are with folks using Atlas Search and wanting to dig in deeper to get a bit more nuance to relevancy tuning, or folks using MongoDB$match and$regex and exploring if and how to leverage Atlas Search instead. Here’s a story about a recent Design Review with a customer already well versed in Atlas Search and using it effectively… to a point.

Atlas Search matching as expected

Here’s the use case presented to me by a customer during a design review session:

We have a service built on MongoDB Atlas that needs to rapidly match identity requests using only a few fields of exact (though case-insensitive) values, such as an ID, e-mail address, and phone number.

Case-insensitive matching over a few fields? Definitely a problem that Atlas Search can solve handily! Take a fewcompound.should clauses and call us in the morning.

And unsurprisingly, the customer reports that:

Atlas Search matching works as expected...

But not so fast

Literally, and unfortunately,

… However, the time to “eventual consistency” in order to match recently updated documents is too long for the requiredSLA.

And, to work around that,

A [third-party key-value] caching mechanism was implemented for a first pass lookup.

Both of these topics warrant a bit of a deeper dive, so that we can understand how best to help this customer.

  • Eventual consistency
  • Key lookup using indexes

Eventual consistency

Yes, Atlas Search is awesome! It canslice,dice, and do all sorts ofgroovy things, yet it obediently stays within the laws of physics. Data, being what it is, always will be adding, updating, and deleting from the database and its replica set. TheAtlas Search process (mongot) handles the database change stream and updates the underlying Lucene index. This process, by default, runs co-located with the database processes themselves on the same hardware, though ideally should run onits own hardware nearby.

Coupled Architecture

Coupled architecture

Dedicated Search Nodes

Dedicated Search Nodes

Atlas Search iseventually consistent. These machinations involve shards, replicas, CPU, disk, memory, network, and a bit of time. Changes to the database will, eventually, be reflected in associated search indexes. But it isn’t instantaneous, and there are many variables that affect the lag between a database change and search requests finding documents by the modified criteria: rate of data changes, complexity of index mapping configuration, deployment architecture/capabilities, resource contentions, size of the index, query load, and maybe evensolar flares.

Depending on the nature of the application, the eventually consistent lag time may be irrelevant or a critical aspect of consideration. An update to a book record in a library can get reindexed overnight without affecting operations. However, this identity request for a record that just got updated failing to match the latest value in the database is unacceptable.

The trade-off of a search index being eventually consistent is to not delay, or interfere, with database-level updates and transactions. A search index update has so many variables involved and can change over time in complexity; a change to the index configuration could cause vastly more terms or documents to be indexed. An Atlas Search index is an index configuration and its corresponding Lucene index. This word “index” is a great one, but actually a Lucene index is really a collection of special purpose data structures, one for each field (andmulti) defined. Each field “type” has its own optimized index data structure. Lexicographically ordered inverted indexes with posting lists complete with term, document, and corpus-level statistics powerstring mapped fields and queries. This is the heart of relevancy computations.

Key value lookup using indexes

Finding by_id (every document's unique key) is a given (sowe throw that one in for free!). What about finding your data by other exact match types of criteria, such as all products in a specific category? Or all documents modified by a particular username? No doubt this type of findability is crucial too. MongoDB is really good at looking up documents by a value, provided the value is indexed.

This particular application needs exact, case-insensitive value lookup over a few fields. Let’s push the case-insensitivity issue to the application, and simply have it lowercase the field value any time it is being written or queried, so now it’s fully an exact match situation on the MongoDB side of things. Followingindexing best practices such as theESR rule, a few single-field B-Tree index definitions are all the customer needs to satisfy their performance SLA. These indexesdon’t come for free, either, but are managed in the database process quickly and handled synchronously with every document update, so consistency is guaranteed.

And to be sure, key/value lookup in Lucene (via Atlas Search) is very fast. It’s the eventual consistency lag that drives the design recommendation here. If the use case had been querying across dozens of fields in any combination for exact values and eventual consistency was an acceptable trade-off, Atlas Search would be the better approach here. With a lot of fields to intersect, B-Tree index configuration would be arduous and resource-intensive, whereas an Atlas Search index configured for multiple-field intersection would be quite efficient and performant.

Design recommendation: B-Tree pragmatism

For this case of a few fields of exact value matching, with no full-text fuzzy search needed, the clear winner is leveraging the in-process, consistent, and quick B-Tree index capabilities.

When queries are exact field matches and the eventual consistency time lag is a critical blocker, consider usingclassic MongoDB B-Tree indexes rather than Atlas Search. Atlas Search indexes are updated in a separate process, maybe even on separate hardware via a network hop, whereas B-Tree index updates happen within the scope of database update transactions and are immediately usable after an update completes. Note that_id is implicitly indexed in this fashion and can be used for domain valuesif appropriate. With B-Tree-based lookups, a front-end cache is not needed as this is already a fast key/value lookup from a RAM-based index.

Be sure to learn aboutdata modeling and schema design for Atlas Search so that you’re ready for the problems for which it shines!

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

Lucene. In. Action.
  • Work
    Staff Developer Advocate, Atlas Search @ MongoDB
  • Joined

More fromErik Hatcher

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp