Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork1.1k
Resolve concurrency issues that were discovered when upgrading the ws package to the latest version.#7068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Hi@pubkey. I've submitted a PR#7075 to reproduce this bug, but the test case requires at least 6 seconds to complete. However, some test scripts have a timeout setting of only 3 seconds. During my testing, if I adjust parameters (like insertion count and query time) to make the test complete within 3 seconds, the bug might not be reproducible. Could you clarify: Were these 3-second timeout configurations intentionally set this way, or was it perhaps an oversight and they should be 10 seconds? I'm happy to address any questions about this PR and make modifications. |
pubkey commentedApr 16, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
For now I "fixed" the issue by making the This at least lets us upgrade to the latest ws version without the security issue. |
Got it. I wasn't fully aware of the community guidelines before, but I'll make sure to follow them going forward. |
hi@pubkey However, the issue still persists in the latest version and has been successfully reproduced. You should take a look at this.CI This is based on the code from this PR:#7075 . The issue can be reliably reproduced in MongoDB because it operates slightly slower than local storage due to additional disk I/O and network latency. |
Hi@pubkey , I've set a flag for computing the changes. The idempotency check will only be triggered when recreate queries to prevent data modification loss, while remaining inactive in other scenarios without any system performance impact. The tests have passed successfully, including the merged longquery tests. Regarding the denoky storage issue that also occasionally occurs on the main branch: since I have limited knowledge about this database, I haven't identified the root cause yet. Would you mind checking if I've missed anything or if anything needs tweaking? |
This PR contains:
Describe the problem you have without this PR
After update ws to 18.81 version, it will make test case
doing insert after subscribe should end with the correct results
error.Through my investigation, this is a concurrency issue that has existed since the beginning. The upgrade of the ws package only changed the message transfer speed, making the concurrency problem more likely to surface. When I manually slowed down the query execution to an extremely low speed, the original version of ws also exhibited the same concurrency issue, causing the client to lose necessary data when processing the changesStream.
The cause of the problem
So by the time the client obtained the query results and began processing the changesStream, it was already too late. The query results we obtained were from version 2, but I started getting the changesStream from the version 3 database.
What I do
The most ideal situation is that we get the changeStream starting from v2. However, the client and the server are two separate entities, so we can only find a checkPoint close to v2 to start merging data.
When rxQuery mustReExec, it records the changePoint. Let changePoint = _changeEventBuffer.getCounter(), which records the current database version. When the query results are obtained, the changesEvent is obtained starting from the recorded changePoint. Of course, this will cause data redundancy, but data redundancy is much easier to handle than data loss. All we need to do is to implement idempotency checks.
And then the issue has been resolved, and the CI is currently functioning normally.CI Result