- Notifications
You must be signed in to change notification settings - Fork4k
rabbit_quorum_queue: Shrink batches of QQs in parallel#15081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
the-mikedavis commentedDec 5, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
With this change and the default |
Uh oh!
There was an error while loading.Please reload this page.
kjnilsson commentedDec 8, 2025
This looks fine to me, at least for now. It would be quite possible to get much higher throughput on this and use command pipelining instead of spawning a bunch of processes just to exercise the WAL more. We'd need to add that as an option to the Ra API however. |
the-mikedavis commentedDec 8, 2025
Ah yeah, with pipelining we could use the WAL much more efficiently. That shouldn't be too bad to add to Ra - just a new function in I'm actually more worried about the In the meantime making this parallel seems like an easy improvement since we can continue using the |
8208549 tof14957dCompareShrinking a member node off of a QQ can be parallelized. The operationinvolves* removing the node from the QQ's cluster membership (appending a command to the log and committing it) with `ra:remove_member/3`* updating the metadata store to remove the member from the QQ type state with `rabbit_amqqueue:update/2`* deleting the queue data from the node with `ra:force_delete_server/2` if the node can be reachedAll of these operations are I/O bound. Updating the cluster membershipand metadata store involves appending commands to those logs andreplicating them. Writing commands to Ra synchronously in serial isfairly slow - sending many commands in parallel is much more efficient.By parallelizing these steps we can write larger chunks of commands toWAL(s).`ra:force_delete_server/2` benefits from parallelizing if the node beingshrunk off is no longer reachable, for example in some hardwarefailures. The underlying `rpc:call/4` will attempt to auto-connect tothe node and this can take some time to time out. By parallelizing this,each `rpc:call/4` reuses the same underlying distribution entry andall calls fail together once the connection fails to establish.
f14957d toa14595dCompare
Uh oh!
There was an error while loading.Please reload this page.
Shrinking a member node off of a QQ can be parallelized. The operation involves
ra:remove_member/3rabbit_amqqueue:update/2ra:force_delete_server/2if the node can be reachedAll of these operations are I/O bound. Updating the cluster membership and metadata store involves appending commands to those logs and replicating them. Writing commands to Ra synchronously in serial is fairly slow - sending many commands in parallel is much more efficient. By parallelizing these steps we can write larger chunks of commands to WAL(s).
ra:force_delete_server/2benefits from parallelizing if the node being shrunk off is no longer reachable, for example in some hardware failures. The underlyingrpc:call/4will attempt to auto-connect to the node and this can take some time to time out. By parallelizing this, eachrpc:call/4reuses the same underlying distribution entry and all calls fail together once the connection fails to establish.Discussed in#15057