Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add batch_size(batch_size) to __find_in_batches (Mongoid)#1036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
sylvain-8422 wants to merge1 commit intoelastic:main
base:main
Choose a base branch
Loading
fromsylvain-8422:mongoid-batch-size

Conversation

@sylvain-8422
Copy link

@sylvain-8422sylvain-8422 commentedJul 11, 2022
edited
Loading

Add.batch_size(batch_size) to#__find_in_batches (Mongoid).

Fixes#1037 .

Although.each_slice(batch_size) is useful in order to limit how many documents are sent to Elasticsearch at a time, it does nots limit the batch size of MongoDB'sgetMore commands.

By default, iterating over a MongoDB collection will first return 101 documents, and then subsequent batches of 16 MiB :

https://www.mongodb.com/docs/manual/tutorial/iterate-a-cursor/#cursor-batches

For example, a MongoDB collection containing documents averaging 1 KiB might return more than 16,000 documents at a time.

Although Mongoid claims in its documentation a default batch size of 1,000 documents, it does not seem to be the case.

Also, Mongoid's.no_timeout is broken right now and does nothing:

mongodb/mongo-ruby-driver#2557

It is now likely that more than 10 minutes go by between twogetMore commands and that the MongoDB cursor expires.

Adding.batch_size(batch_size) to the query makes sure that MongoDB documents are retrieved at the same rate as they are processed and indexed in Elasticsearch, and allow applications affected by the.no_timeout issue to reduce the batch size to avoid cursor timeouts.

@sylvain-8422
Copy link
Author

sylvain-8422 commentedJul 6, 2023
edited
Loading

@shashankjo

Same simple change as before, but I fixed the conflict created by whitespace changes inmain.

Fromef8985e toaa38a1b.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

1 more reviewer

@shashankjoshashankjoshashankjo approved these changes

Reviewers whose approvals may not affect merge requirements

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Batch size is ignored when fetching documents from MongoDB

2 participants

@sylvain-8422@shashankjo

[8]ページ先頭

©2009-2025 Movatter.jp