- Notifications
You must be signed in to change notification settings - Fork1.6k
fix(pubsub): set default stream ACK deadline to subscriptions'#9268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
plamut commentedSep 23, 2019
@pradn Assigned to you, but feel free to delegate to anybody else from the team that will take over the PubSub lib. Also, feel free to ping me for any information about the inner workings of the streaming pull manager. |
86c50cd todfc21f1Compare
pradn left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks good!
nit: spelling in commit message ("streamimng")
Uh oh!
There was an error while loading.Please reload this page.
When subscribing, it makes sense to use the configured subscription'smaximum ACK deadline for the streaming pull, instead of an optimisticminimum of 10 seconds.Using an optimistic deadline affects messages that are put on hold andare not lease managed, because by the time the client dispatches themto the user's callback, the optimistic ACK deadline could have alreadybeen missed, resulting in the backend unnecessary re-sending thosemessages, even if the subscription's ACK deadline has not been hit yet.
Uh oh!
There was an error while loading.Please reload this page.
Closes#9252.
This PR mitigates the reported issue by using the subscription's max ACK deadline when establishing a streaming pull, instead of the previous optimistic ACK deadline of 10 seconds.
(when creating a streaming pull, the ACK timing histogram has no data yet, thus its 99 % value defaults to 10 seconds... also see the commit message for some extra info)
Additionally, it adds a debug log message with the ACK deadline used for the streaming pull, as that important piece of information was missing previously.
How to test
Run the example from the issue description, and verify that the backend does not send any duplicate messages, and the subscription's ACK deadline is respected (300 s in that particular case).
Details
The cause of the bug was that the client overrode the max ACK deadline on the subscription with the default minimum deadline of 10 seconds.
This became problematic for the received messages that exceeded the
FlowControl.max_messagessetting, and were thus put on hold in a buffer. Since these messages were not lease-managed, and their ACK deadlines were not being extended, the overridden ACK deadline of 10 seconds was easily hit, resulting in re-delivery.There are several options on how to deal with these on hold messages:
Discard them immediately, and set their ACK deadlines to 0.
This would cause the backend to re-send them immediately, which would work well with multiple subscribers. However, if using a single subscriber, low
max_messagessetting, and processing each message takes a lot of time, large published message batches would cause a lot of unnecessary traffic.The excessive messages in a batch (those above
max_messages) would be discarded and re-delivered over and over again, because each time the client would dispatch only a small portion of the batch to user's callbacks.Add all received messages to lease management, including those put on hold.
This would avoid ACK timeouts and message re-delivery, but could shoot the leaser's load well over
1.0. When user code would process and acknowledge some of the messages, the client would remove them from the lease management, but the load could still stay above1.0Since the leaser would still be overloaded, no additional messages would be dispatched to the callbacks, and the stream would not be resumed --> deadlock (there would be nothing to trigger resuming the streaming pull).
While there could be a way around that, it would likely require rewriting significant chunks of the streaming pull logic, risking new bugs.
Ignoring the messages on hold until the load allows processing them.
This is the path chosen. Only the messages that arenot put on hold are added to the lease management, while the rest sit in the buffer, waiting to be released when the load allows for it. If they are released soon enough, the auto-lease management steps in.
While these messages could still remain in the buffer for too long, missing the ACK deadline, those messages would be re-delivered to other subscribers, as the overloaded subscriber's stream would be paused in the meantime (the subscriber does not resume the stream, if it has any messages on hold).
In the case of a single subscriber client, the messages would indeed be re-delivered to it, but on the other hand the ACK histogram times would pick that up and gradually extend the ACK deadlines that the auto-leaser sets for the messages.