- Notifications
You must be signed in to change notification settings - Fork586
-
Rabbit client can freeze during writing to socket when the network interface is removed. For example, we can run an app in docker, disconnect the network with The thread is stuck with this stacktrace: We can see that the sending buffer is occupied somehow in By the analysis of this library source code and Ideally, either the flush will throw an exception (but that doesn't happen), or we can detect "heartbeat timeouts" in this library and close the connection from outside. If we try to implement this kind of behavior in the application itself, we fail. For example, if we time-out the For this reason, we believe that this is a bug in the library itself. However, very subtle and hard to fix. |
BetaWas this translation helpful?Give feedback.
All reactions
👍 2
Replies: 5 comments 1 reply
-
This is a pretty esoteric situation. What would expedite us investigating it is if you provide a script or some other means that we can reproduce this easily. Ideally it would be as simple as |
BetaWas this translation helpful?Give feedback.
All reactions
-
Thanks for quick feedback. I'll try to prepare a simple simulation script. |
BetaWas this translation helpful?Give feedback.
All reactions
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
This is how TCP works: it retries for a period of time before it declares the other end of the connection to be unresponsive. Heartbeats (note that values < 5s are explicitly recommended against) andPublisher confirm reception timeouts will help. TCP parameter tuning on client hosts can help, too. This is mentioned somewhat in the Heartbeats guide. |
BetaWas this translation helpful?Give feedback.
All reactions
-
This is the small repro case:https://github.com/sebek64/repro To run it, just do Then, you can observe the logs by The question of how natural this scenario could be is legitimate. For example, if we do an iptables DROP rule instead, the connection just fails correctly and quickly. I haven't tested cable unplugging, but it could be actually similar to network interface disappearing. Anyway, I believe that the library should not rely on the fact that output stream flush method cannot just block. |
BetaWas this translation helpful?Give feedback.
All reactions
-
FWIW, we seem to be running into this exact problem regularly under high load (not sure if that is the trigger though). This is the thread dump: Our current workaround is to wrap the publishing in an |
BetaWas this translation helpful?Give feedback.
All reactions
-
Thanks for providing some steps to reproduce, I'll investigate more shortly. In the meantime, you can try to:
|
BetaWas this translation helpful?Give feedback.
All reactions
This discussion was converted from issue #994 on March 20, 2023 15:19.