rabbitmq/rabbitmq-serverPublic

NotificationsYou must be signed in to change notification settings
Fork4k
Star13.3k

[Questions] create quorum queue failed#15099

Answeredbymichaelklishin

dormanze asked this question inQuestions

dormanze

Dec 10, 2025

· 3 comments· 7 replies

AnsweredbymichaelklishinReturn to top

Discussion options

dormanze
Dec 10, 2025

Community Support Policy

I have readRabbitMQ's Community Support Policy
I run RabbitMQ 4.x, the only series currently covered bycommunity support
I promise to provide all relevant information (versions, logs from all nodes, rabbitmq-diagnostics output, detailed reproduction steps)

RabbitMQ version used

4.1.2

Erlang version used

26.2.x

Operating system (distribution) used

linux

How is RabbitMQ deployed?

Generic binary package

rabbitmq-diagnostics status output

Seehttps://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics

Details

# PASTE OUTPUT HERE, BETWEEN BACKTICKS

Logs from node 1 (with sensitive values edited out)

Seehttps://www.rabbitmq.com/docs/logging to learn how to collect logs

Details

# PASTE LOG HERE, BETWEEN BACKTICKS

Logs from node 2 (if applicable, with sensitive values edited out)

Seehttps://www.rabbitmq.com/docs/logging to learn how to collect logs

Details

# PASTE LOG HERE, BETWEEN BACKTICKS

Logs from node 3 (if applicable, with sensitive values edited out)

Seehttps://www.rabbitmq.com/docs/logging to learn how to collect logs

Details

# PASTE LOG HERE, BETWEEN BACKTICKS

rabbitmq.conf

Seehttps://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location

Details

# PASTE rabbitmq.conf HERE, BETWEEN BACKTICKS

Steps to deploy RabbitMQ cluster

config cluster and start

Steps to reproduce the behavior in question

so many quorum queue create

advanced.config

Seehttps://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location

Details

# PASTE advanced.config HERE, BETWEEN BACKTICKS

Application code

Details

# PASTE CODE HERE, BETWEEN BACKTICKS

Kubernetes deployment file

Details

# Relevant parts of K8S deployment that demonstrate how RabbitMQ is deployed# PASTE YAML HERE, BETWEEN BACKTICKS

What problem are you trying to solve?

When my client restarts in batches (rolling upgrade), a large number of queue(1w+ quorum queues and 5w+ exclusive queues) registration requests are generated, and I occasionally receive some error messages. At the same time, we periodically call the API: /api/health/checks/port-listener/5673 to check whether the server status is normal. During the client start, this API also frequently reports errors.

./rabbit@rabbitmqservice-O.log:2025-12-10 14:29:11.855670+08:00 [error] <0.4797544.0> operation queue.declare caused a connection exception internal error: "Could not declare quorum queue 'mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum' in vhost '/' on node 'rabbit@rabbitmqservice-O' because the metadata store operation timed out"./rabbit@rabbitmqservice-0.log:2025-12-10 14:29:42.090122+08:00 [error] <0.5036261.0> operation queue.declare caused a connection exception internal error: "Could not declare quorum queue 'mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum' in vhost '/' on node 'rabbit@rabbitmqservice-O' because the metadata store operation timed out"/rabbit@rabbitmqservice-0.log:2025-12-10 14:29:51.902431+08:00 [error] <0.5036261.0> operation queue.declare caused a connection exception internal error: "Could not declare quorum queue 'mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum' in vhost '/' on node 'rabbit@rabbitmqservice-O' because the metadata store operation timed out"./rabbit@rabbitmqservice-0.log:2025-12-10 14:29:51.902638+08:00 [error] <0.5036261.0> operation queue.declare caused a connection exception internal error: "Could not declare quorum queue 'mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum’ in vhost '/' on node 'rabbit@rabbitmqservice-O' because the metadata store operation timed out"./rabbit@rabbitmqservice-0.log:2025-12-10 14:29:51.903377+08:00 [error] <0.5036261.0> operation queue.declare caused a connection exception internal error: "Could not declare quorum queue 'mateinfo.sysrtem.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5Tb-quorum' in vhost '/' on node 'rabbit@rabbitmqservice-O' because the metadata store operation timed out"./rabbit@rabbitmqservice-0.log:2025-12-10 14:30:41.995924+08:00 [error] <0.5425325.0> quorum:queryConsumer:ConsumerTag '<<"mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum-adc-bpm-adc-bpm-79dcd7548c-rf5lb-172.18.0.37-196745786">>' , ChannelID '<0.5425325.0>'./rabbit@rabbitmqservice-0.log:2025-12-10 14:30:41.998024+08:00 [error] <0.5374829.0>operation basic.consume caused a connection exception internal error: "failed consuming from quorum queue 'mateinfosystem.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum’ in vhost '/': noproc"./rabbit@rabbitmqservice-0.log:2025-12-10 14:31:22.275683+08:00 [error] <0.5637570.0> quorum:queryConsumer:ConsumerTag '<<"mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum-adc-bpm-adc-bpm-79dcd7548c-rf5lb-172.18.0.37-27639187">>’ , ChannelID '<0,5637570.0>'/rabbit@rabbitmqservice-0.log:2025-12-10 14:31:22.277879+08:00 [error] <0.5620055.0>operation basic.consume caused a connection exception internal_error: "failed consuming from quorum queue 'mateinfosystem.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5Lb-quorum’ in vhost '/': noproc"./rabbit@rabbitmqservice-0.log:2025-12-10 14:33:13.044177+08:00 [error] <0.6146092.0> quorum:queryConsumer:ConsumerTag '<<"mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum-adc-bpm-adc-bpm-79dcd7548c-rf5Lb-172.18.0.37--797603050">>' , ChannelID '<0.6146092.0>'/rabbit@rabbitmqservice-0.log:2025-12-10 14:33:13.110614+08:00 [error] <0.6120239.0> operation basic.consume caused a connection exception internal error: "failed consuming from quorum queue 'mateinfosystem.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum’ in vhost '/': noproc"

It seems that some I/O timeout errors occurred when creating the queue, and then registering consumers generated a large number of noproc exceptions.
After the client restart was completed, some of my queue statuses became abnormal, and my management interface indicated that one of my instances was experiencing an issue.

I did not encounter this issue in the same environment when using 4.0.x.
I have seen the related PR#15003, could this explain why my consumer creation failed, but is my queue status abnormal due to slow local IO? Are there any parameters I can adjust to change this timeout period?

You must be logged in to vote

Answered by michaelklishin

Dec 10, 2025

@dormanze inject a random delay in the 1-15s range to your clients so that they do not all run their declarations at once.

There were several efficiency improvements around Khepri in the upcoming4.2.2 release.

View full answer

Replies: 3 comments 7 replies

Comment options

kjnilsson
Dec 10, 2025
Maintainer

Around the time you restarted there will most likely be a stack trace with the full error and reason for why some processes did not start. Those are the logs I need to investigate further.

You must be logged in to vote

4 replies

Comment options

dormanze Dec 13, 2025
Author

Could you tell me how to enable logging for this? I noticed that only my temporary exclusive queue is experiencing timeouts; the quorum queue does not.

Comment options

kjnilsson Dec 15, 2025
Maintainer

Errors are always logged by default, you just need to supply all logs from all nodes and we can take a look.

Comment options

kjnilsson Dec 17, 2025
Maintainer

@dormanze any update in providing logs covering the timeframe of the restart?

Comment options

dormanze Dec 17, 2025
Author

Thank you for your reply. I am having some difficulty uploading the logs. I have carefully checked the startup logs and found no obvious errors. I will try version 4.2.x later to see if it can resolve my issue.

Comment options

michaelklishin
Dec 10, 2025
Maintainer

@dormanze inject a random delay in the 1-15s range to your clients so that they do not all run their declarations at once.

There were several efficiency improvements around Khepri in the upcoming4.2.2 release.

You must be logged in to vote

1 reply

Comment options

dormanze Dec 13, 2025
Author

Thank you for your response. Unfortunately, we cannot do that because we have a large number of queues, and we need to ensure the upgrade duration when performing rolling upgrades on the client.

Answer selected bydormanze

Comment options

dormanze
Dec 13, 2025
Author

I noticed in my observation logs that not only does creating queues sometimes time out, but adding bindings also experiences timeouts.

crasher:initial call: rabbit reader:init/3pid: <0.5125499.0>registered name: []exception exit: channel termination timeoutin function rabbit reader:wait for channel termination/3 (src/rabbit reader.erl, line 808)in call from rabbit_reader:send error on channelO and close/4 (src/rabbit reader.erl, line 1720)in call from rabbit reader:mainloop/4 (src/rabbit reader.erl, line 548)in call from rabbit reader:run/1 (src/rabbit reader.erl, line 469)in call from rabbit reader:start connection/5 (src/rabbit reader.erl, line 340)ancestors: [<0.5125478.0>,<0.614.0>,<0.613.0>,<0.612.0>,<0.610.0>,<0.609.0>,rabbit sup,<0.248.0 ]message queue len: 11messages: [{channel exit,4,{amqp_error,internal error,"Could not add binding due to timeout",'queue.bind'}},{channel exit,7,{amqp_error,internal error,"Could not add binding due to timeout",'queue.bind'}},{channel exit,8,{amqp_error,internal error,"Could not add binding due to timeout",'queue.bind'}},

You must be logged in to vote

2 replies

Comment options

michaelklishin Dec 13, 2025
Maintainer

All operations on a queue without an elected leader (because it timed out when enough of them are triggered at once) will fail, this is how Raft works.

Comment options

dormanze Dec 15, 2025
Author

Yes, it seems that all of this is related to my switching to khepri_db. I will roll back that capability first. Thank you for your suggestion.

Movatterモバイル変換

[Questions] create quorum queue failed#15099

Uh oh!

Uh oh!

dormanzeDec 10, 2025

Community Support Policy

RabbitMQ version used

Erlang version used

Operating system (distribution) used

How is RabbitMQ deployed?

rabbitmq-diagnostics status output

Logs from node 1 (with sensitive values edited out)

Logs from node 2 (if applicable, with sensitive values edited out)

Logs from node 3 (if applicable, with sensitive values edited out)

rabbitmq.conf

Steps to deploy RabbitMQ cluster

Steps to reproduce the behavior in question

advanced.config

Application code

Kubernetes deployment file

What problem are you trying to solve?

Replies: 3 comments· 7 replies

Uh oh!

kjnilssonDec 10, 2025 Maintainer

Uh oh!

dormanzeDec 13, 2025 Author

Uh oh!

kjnilssonDec 15, 2025 Maintainer

Uh oh!

kjnilssonDec 17, 2025 Maintainer

Uh oh!

dormanzeDec 17, 2025 Author

Uh oh!

michaelklishinDec 10, 2025 Maintainer

Uh oh!

dormanzeDec 13, 2025 Author

Uh oh!

dormanzeDec 13, 2025 Author

Uh oh!

michaelklishinDec 13, 2025 Maintainer

Uh oh!

dormanzeDec 15, 2025 Author

Uh oh!

dormanze
Dec 10, 2025

Replies: 3 comments 7 replies

kjnilsson
Dec 10, 2025
Maintainer

dormanze Dec 13, 2025
Author

kjnilsson Dec 15, 2025
Maintainer

kjnilsson Dec 17, 2025
Maintainer

dormanze Dec 17, 2025
Author

michaelklishin
Dec 10, 2025
Maintainer

dormanze Dec 13, 2025
Author

dormanze
Dec 13, 2025
Author

michaelklishin Dec 13, 2025
Maintainer

dormanze Dec 15, 2025
Author