Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[Questions] create quorum queue failed#15099

Discussion options

Community Support Policy

RabbitMQ version used

4.1.2

Erlang version used

26.2.x

Operating system (distribution) used

linux

How is RabbitMQ deployed?

Generic binary package

rabbitmq-diagnostics status output

Seehttps://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics

Details
# PASTE OUTPUT HERE, BETWEEN BACKTICKS

Logs from node 1 (with sensitive values edited out)

Seehttps://www.rabbitmq.com/docs/logging to learn how to collect logs

Details
# PASTE LOG HERE, BETWEEN BACKTICKS

Logs from node 2 (if applicable, with sensitive values edited out)

Seehttps://www.rabbitmq.com/docs/logging to learn how to collect logs

Details
# PASTE LOG HERE, BETWEEN BACKTICKS

Logs from node 3 (if applicable, with sensitive values edited out)

Seehttps://www.rabbitmq.com/docs/logging to learn how to collect logs

Details
# PASTE LOG HERE, BETWEEN BACKTICKS

rabbitmq.conf

Seehttps://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location

Details
# PASTE rabbitmq.conf HERE, BETWEEN BACKTICKS

Steps to deploy RabbitMQ cluster

config cluster and start

Steps to reproduce the behavior in question

so many quorum queue create

advanced.config

Seehttps://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location

Details
# PASTE advanced.config HERE, BETWEEN BACKTICKS

Application code

Details
# PASTE CODE HERE, BETWEEN BACKTICKS

Kubernetes deployment file

Details
# Relevant parts of K8S deployment that demonstrate how RabbitMQ is deployed# PASTE YAML HERE, BETWEEN BACKTICKS

What problem are you trying to solve?

When my client restarts in batches (rolling upgrade), a large number of queue(1w+ quorum queues and 5w+ exclusive queues) registration requests are generated, and I occasionally receive some error messages. At the same time, we periodically call the API: /api/health/checks/port-listener/5673 to check whether the server status is normal. During the client start, this API also frequently reports errors.

./rabbit@rabbitmqservice-O.log:2025-12-10 14:29:11.855670+08:00 [error] <0.4797544.0> operation queue.declare caused a connection exception internal error: "Could not declare quorum queue 'mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum' in vhost '/' on node 'rabbit@rabbitmqservice-O' because the metadata store operation timed out"./rabbit@rabbitmqservice-0.log:2025-12-10 14:29:42.090122+08:00 [error] <0.5036261.0> operation queue.declare caused a connection exception internal error: "Could not declare quorum queue 'mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum' in vhost '/' on node 'rabbit@rabbitmqservice-O' because the metadata store operation timed out"/rabbit@rabbitmqservice-0.log:2025-12-10 14:29:51.902431+08:00 [error] <0.5036261.0> operation queue.declare caused a connection exception internal error: "Could not declare quorum queue 'mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum' in vhost '/' on node 'rabbit@rabbitmqservice-O' because the metadata store operation timed out"./rabbit@rabbitmqservice-0.log:2025-12-10 14:29:51.902638+08:00 [error] <0.5036261.0> operation queue.declare caused a connection exception internal error: "Could not declare quorum queue 'mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum’ in vhost '/' on node 'rabbit@rabbitmqservice-O' because the metadata store operation timed out"./rabbit@rabbitmqservice-0.log:2025-12-10 14:29:51.903377+08:00 [error] <0.5036261.0> operation queue.declare caused a connection exception internal error: "Could not declare quorum queue 'mateinfo.sysrtem.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5Tb-quorum' in vhost '/' on node 'rabbit@rabbitmqservice-O' because the metadata store operation timed out"./rabbit@rabbitmqservice-0.log:2025-12-10 14:30:41.995924+08:00 [error] <0.5425325.0> quorum:queryConsumer:ConsumerTag '<<"mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum-adc-bpm-adc-bpm-79dcd7548c-rf5lb-172.18.0.37-196745786">>' , ChannelID '<0.5425325.0>'./rabbit@rabbitmqservice-0.log:2025-12-10 14:30:41.998024+08:00 [error] <0.5374829.0>operation basic.consume caused a connection exception internal error: "failed consuming from quorum queue 'mateinfosystem.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum’ in vhost '/': noproc"./rabbit@rabbitmqservice-0.log:2025-12-10 14:31:22.275683+08:00 [error] <0.5637570.0> quorum:queryConsumer:ConsumerTag '<<"mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum-adc-bpm-adc-bpm-79dcd7548c-rf5lb-172.18.0.37-27639187">>’ , ChannelID '<0,5637570.0>'/rabbit@rabbitmqservice-0.log:2025-12-10 14:31:22.277879+08:00 [error] <0.5620055.0>operation basic.consume caused a connection exception internal_error: "failed consuming from quorum queue 'mateinfosystem.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5Lb-quorum’ in vhost '/': noproc"./rabbit@rabbitmqservice-0.log:2025-12-10 14:33:13.044177+08:00 [error] <0.6146092.0> quorum:queryConsumer:ConsumerTag '<<"mateinfo.system.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum-adc-bpm-adc-bpm-79dcd7548c-rf5Lb-172.18.0.37--797603050">>' , ChannelID '<0.6146092.0>'/rabbit@rabbitmqservice-0.log:2025-12-10 14:33:13.110614+08:00 [error] <0.6120239.0> operation basic.consume caused a connection exception internal error: "failed consuming from quorum queue 'mateinfosystem.bpm-async-job-consumer-adc-bpm-79dcd7548c-rf5lb-quorum’ in vhost '/': noproc"

It seems that some I/O timeout errors occurred when creating the queue, and then registering consumers generated a large number of noproc exceptions.
After the client restart was completed, some of my queue statuses became abnormal, and my management interface indicated that one of my instances was experiencing an issue.
image
image
I did not encounter this issue in the same environment when using 4.0.x.
I have seen the related PR#15003, could this explain why my consumer creation failed, but is my queue status abnormal due to slow local IO? Are there any parameters I can adjust to change this timeout period?

You must be logged in to vote

@dormanze inject a random delay in the 1-15s range to your clients so that they do not all run their declarations at once.

There were several efficiency improvements around Khepri in the upcoming4.2.2 release.

Replies: 3 comments 7 replies

Comment options

Around the time you restarted there will most likely be a stack trace with the full error and reason for why some processes did not start. Those are the logs I need to investigate further.

You must be logged in to vote
4 replies
@dormanze
Comment options

Could you tell me how to enable logging for this? I noticed that only my temporary exclusive queue is experiencing timeouts; the quorum queue does not.

@kjnilsson
Comment options

Errors are always logged by default, you just need to supply all logs from all nodes and we can take a look.

@kjnilsson
Comment options

@dormanze any update in providing logs covering the timeframe of the restart?

@dormanze
Comment options

Thank you for your reply. I am having some difficulty uploading the logs. I have carefully checked the startup logs and found no obvious errors. I will try version 4.2.x later to see if it can resolve my issue.

Comment options

@dormanze inject a random delay in the 1-15s range to your clients so that they do not all run their declarations at once.

There were several efficiency improvements around Khepri in the upcoming4.2.2 release.

You must be logged in to vote
1 reply
@dormanze
Comment options

Thank you for your response. Unfortunately, we cannot do that because we have a large number of queues, and we need to ensure the upgrade duration when performing rolling upgrades on the client.

Answer selected bydormanze
Comment options

I noticed in my observation logs that not only does creating queues sometimes time out, but adding bindings also experiences timeouts.

crasher:initial call: rabbit reader:init/3pid: <0.5125499.0>registered name: []exception exit: channel termination timeoutin function rabbit reader:wait for channel termination/3 (src/rabbit reader.erl, line 808)in call from rabbit_reader:send error on channelO and close/4 (src/rabbit reader.erl, line 1720)in call from rabbit reader:mainloop/4 (src/rabbit reader.erl, line 548)in call from rabbit reader:run/1 (src/rabbit reader.erl, line 469)in call from rabbit reader:start connection/5 (src/rabbit reader.erl, line 340)ancestors: [<0.5125478.0>,<0.614.0>,<0.613.0>,<0.612.0>,<0.610.0>,<0.609.0>,rabbit sup,<0.248.0 ]message queue len: 11messages: [{channel exit,4,{amqp_error,internal error,"Could not add binding due to timeout",'queue.bind'}},{channel exit,7,{amqp_error,internal error,"Could not add binding due to timeout",'queue.bind'}},{channel exit,8,{amqp_error,internal error,"Could not add binding due to timeout",'queue.bind'}},
You must be logged in to vote
2 replies
@michaelklishin
Comment options

All operations on a queue without an elected leader (because it timed out when enough of them are triggered at once) will fail, this is how Raft works.

@dormanze
Comment options

Yes, it seems that all of this is related to my switching to khepri_db. I will roll back that capability first. Thank you for your suggestion.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Labels
None yet
3 participants
@dormanze@michaelklishin@kjnilsson

[8]ページ先頭

©2009-2025 Movatter.jp