- Notifications
You must be signed in to change notification settings - Fork1.6k
API Core: Fix race in 'BackgroundConsumer._thread_main'.#8883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
- Bogus changes to pubsub and firestore to ensure their systests pass on CI.
This reverts commit5d79f66.
plamut left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks good, it solves one part of the issue.
There is still a possibility that Consumer gets paused right after we release the lock, and just before we fetch a new response from the server. However, I don't think there is a straightforward way of addressing that
Since theself._bidi_rpc.recv() might block, we must release the_wake lock, otherwise shutting down the background consumer could get blocked because of it (details). This was nicely caught by system tests with an earlier version of the fix.
All in all this PR represents an improvement overall, thus approving.
tseaver commentedAug 1, 2019
@plamut I updated the PR description / commit message to avoid closing #7817. Can you please clarify how we might reproduce / fix the remaining part? |
plamut commentedAug 2, 2019
Reproducing the issueI did not manage to reproduce it with a real application (the time window is narrow), but I came up with a test that detects if The idea is to constantly pause/resume the consumer and check if the latter is indeed paused while The test consistently fails on my machine, but also passes if one comments out the following: pause_resume_thread.start() Can anyone confirm my findings and proof-read the test? Fixing the issueApparently, we must be holding the However, simply indenting the following block into the _LOGGER.debug("waiting for recv.")response=self._bidi_rpc.recv()_LOGGER.debug("recved response.")self._on_response(response) Since Is there a good way to make the Estimation of the issue impactI'm not familiar with Firestore, thus I can only say for PubSub. The background consumer gets paused when the streaming pull manager determines that the client currently has more than If the consumermain loop detects the paused state too late, one extra batch of server messages will be received before pausing in the next iteration. For the PubSub client, this would mean receiving one extra batch of messages that would sit in the holding buffer until they can be processed. The extra memory usage is thus bounded by the maximum size of a single response from the server - probably not a showstopper. |
Uh oh!
There was an error while loading.Please reload this page.
See #7817.
In addition to passing all
api_coretests locally, I have verified that thefirestoreandpubsubsystem tests all run cleanly with this patch (an earlier version caused them to hang).