Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add Bootstrapping Logic toApplication.run_*#4673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
Bibo-Joshi merged 6 commits intomasterfromfeature/app-bootstrapping
Mar 1, 2025

Conversation

Bibo-Joshi
Copy link
Member

@Bibo-JoshiBibo-Joshi commentedFeb 7, 2025
edited
Loading

When ready,closes#4657.

This also changes the default value forbootstrap_retries forstart/run_polling to 0. Previously the logic was roughly

asyncdefrun_main(bootstrap_retries="indefinite"):delete_webhook_and_drop_pending_updates(retries=bootstrap_retries)poll_for_updates(retries="indefinite")

I now changed this to

asyncdefrun_main(bootstrap_retries=None):delete_webhook_and_drop_pending_updates(retries=bootstrap_retries)poll_for_updates(retries="indefinite")

I think this is saner, and I'm not even sure if indefinite retries during the bootstrapping phase were ever actually intended …

ToDo

  • Update existing tests to new logs
  • Add new tests for bootstrapping inApplication

Also changes the default value for `bootstrap_retries` on polling
@Bibo-JoshiBibo-Joshi added 🛠 refactorchange type: refactor 🔌 enhancementpr description: enhancement labelsFeb 7, 2025
Copy link
Member

@PoolitzerPoolitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Im just asking questions at this point

@@ -409,12 +414,14 @@ def default_error_callback(exc: TelegramError) -> None:
# updates from Telegram and inserts them in the update queue of the
# Application.
self.__polling_task = asyncio.create_task(
self._network_loop_retry(
network_retry_loop(
is_running=lambda: self.running,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

why is that a lambda?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

this line expects a callable. If we simply passis_runnig=self.running and edit above line to

whileis_running:

then we have an infinite loop, b/c settingself.running=False withinUpdater won't change the value in the network retry loop :) There are probably a billion other ways do to this (pass an object with arunning attribute to check, pass a dict with a corresponding entry, pass an event …). This one seemed rather straight forward to me, but I'm open to changing it. Accepting an instance of

classHasRunning(Protocol):running:bool

would sound like the next best thing to me at first glance

@tsnoam
Copy link
Member

@Eldinnie
here's my vague memory for the original changes:

  1. infinite retries were really intended.
  2. the idea was to find some common ground (i.e. defaults) for most of the user base (we understood that not all can be pleased)
  3. there is a bootstrapping phase in which certain actions had to be made (are they still needed? I don't know... I haven't been following the bot API in the past years).
  4. what happens if these operations fail? do we resume with normal execution of the bot? do we abort (and let the user run the bot again)? or do we just retry until a success?
  5. we chose the option of indefinite retries (and afair, with appropriate logging). that way we make sure that the bot is properly started and will be started eventually. aborting seemed like too much, after all, the user will just need to restart the bot and come to the same "problematic" part of the initialization. so what's the point in that?

I hope that helps...

@tsnoam
Copy link
Member

and why did i tag Eldinnie instead of@Bibo-Joshi ?

it's my memory playing tricks on me. lol.

@Bibo-Joshi
Copy link
MemberAuthor

Thanks very much for your insights@tsnoam !

The bootstrapping phase is indeed still needed. It deletes the previous webhook if you run polling, sets a new webhook if your run in webhook mode and drops pending updates if requested.

Do you recall why you chose indefinite retries over doing a limited number of retries (say, 3)?. Even though we have logging, indefinite retries can get you in a situation where the process is running in a healthy way (eagerly retrying to set the webhook) but the bot doesn't react at all. That sounds undesirable to me. A limitd number of retries OTOH would be a sane effort of getting things going whil still preventing that the process get's stuck in an unusable state without explicit indication to the user.

Moreover I noticed that for webhook mode, the retries were kept at "none". I suspect that this was an oversight: The docstring was updated, but not the signature:https://github.com/python-telegram-bot/python-telegram-bot/pull/1018/files#r1948035215

@tsnoam
Copy link
Member

tsnoam commentedFeb 9, 2025
edited
Loading

@Bibo-Joshi

Do you recall why you chose indefinite retries over doing a limited number of retries (say, 3)?.
as stated on bullet 5:

we chose the option of indefinite retries (and afair, with appropriate logging). that way we make sure that the bot is properly started and will be started eventually. aborting seemed like too much, after all, the user will just need to restart the bot and come to the same "problematic" part of the initialization. so what's the point in that?

A limitd number of retries OTOH would be a sane effort of getting things going whil still preventing that the process get's stuck in an unusable state without explicit indication to the user.

I'm not sure about it. The initialization phase is the for a reason. Skipping it might lead to a limbo state in which you don't know exactly what happens.

Moreover I noticed that for webhook mode, the retries were kept at "none". I suspect that this was an oversight

Could be an oversight. That much I don't remember.

@Bibo-Joshi
Copy link
MemberAuthor

Skipping it might lead to a limbo state in which you don't know exactly what happens.

I would not skip it, but rather abort if the bootstrapping phase fails. Meaning that fater n < ∞ retries you either have a bot that's able to handle updates or the process has shut down.

But okay, I get your reasoning and now have to decide what to make of it 😅 Thanks for the input!

@tsnoam
Copy link
Member

as long as you can get to an expected state, any solution is good.

septatrix reacted with thumbs up emoji

@septatrix
Copy link
Contributor

I think a finite number of retries is more than enough as long as the delay between them is large enough to eliminate intermittent connectivity problems as the source of potential problems. Eg. in my issue report the problem was that the service was brought up before network/DNS was online. This state is usually resolved within the first 10-20 seconds after boot.

Having an infinite amount of retries by default sound dangerous and may instead hide problems which the user wants the be notified about (by the service exiting)

@Bibo-Joshi
Copy link
MemberAuthor

as long as the delay between them is large enough to eliminate intermittent connectivity problems as the source of potential problems

So far I have implemented the initial retry-interval to 0 seconds forApplication.initialize

but I see that the bootstrapping withinUpdater uses an initial interval of 1 second - I can use that forApplication as well:

bootstrap_interval:float=1,

Note that the retry-loop retriesimmediately on timeout errors. On other errors, the interval is increased step by step up to 30 seconds:

try:
ifnotawaitdo_action():
break
exceptRetryAfterasexc:
slack_time=0.5
_LOGGER.info(
"%s %s. Adding %s seconds to the specified time.",log_prefix,exc,slack_time
)
cur_interval=slack_time+exc.retry_after
exceptTimedOutastoe:
_LOGGER.debug("%s Timed out: %s. Retrying immediately.",log_prefix,toe)
# If failure is due to timeout, we should retry asap.
cur_interval=0
exceptInvalidToken:
_LOGGER.exception("%s Invalid token. Aborting retry loop.",log_prefix)
raise
exceptTelegramErrorastelegram_exc:
ifon_err_cb:
on_err_cb(telegram_exc)
ifmax_retries<0orretries<max_retries:
_LOGGER.debug(
"%s Failed run number %s of %s. Retrying.",log_prefix,retries,max_retries
)
else:
_LOGGER.exception(
"%s Failed run number %s of %s. Aborting.",log_prefix,retries,max_retries
)
raise
# increase waiting times on subsequent errors up to 30secs
cur_interval=1ifcur_interval==0elsemin(30,1.5*cur_interval)

Specifying the retry-interval is currently not exposed to the user in theUpdater.start_* methods and for starters that would seem like a bit of overkill, TBH.

@septatrix would this + (customizable) finite number of retries be enough for you?

@septatrix
Copy link
Contributor

as long as the delay between them is large enough to eliminate intermittent connectivity problems as the source of potential problems

So far I have implemented the initial retry-interval to 0 seconds forApplication.initialize [...] but I see that the bootstrapping withinUpdater uses an initial interval of 1 second - I can use that forApplication as well:

I think having a default 1s interval is better than 0s - or at least something non-zero. Depending on how far along the network stack is the connections would try instantly instead of after a timeout so adding a small delay (such as the 1s) on our side is better.

Note that the retry-loop retriesimmediately on timeout errors. On other errors, the interval is increased step by step up to 30 seconds:

On timeout errors this should not be a problem because the timeout itself will result in some delay (though it also wouldn't hurt to wait, it's just not necessary).

Specifying the retry-interval is currently not exposed to the user in theUpdater.start_* methods and for starters that would seem like a bit of overkill, TBH.

I agree. As long as the default interval is non-zero it should be fine for most people.

@septatrix would this + (customizable) finite number of retries be enough for you?

Yes, as far as I understand how this is hooked up now this would be enough for me.


Small aside while peeking at the code I found a small curiosity, nothing urgent: I noticed the slack added toRetryAfter exceptions (introduced ina68cf8d) is different from the one inAIORateLimiter. If you already extract it into a variable maybe put it in some shared location? However, this is completely independent from the rest of this PR so feel free to disregard it

Bibo-Joshi reacted with thumbs up emoji

@codecovCodecov
Copy link

codecovbot commentedFeb 17, 2025
edited
Loading

❌ 2 Tests Failed:

Tests completedFailedPassedSkipped
653126529746
View the top 2 failed test(s) by shortest run time
tests.test_bot.TestBotWithRequest::test_send_close_date_default_tz[Europe/Berlin-ZoneInfo]
Stack Traces | 7.35s run time
self=<tests.test_bot.TestBotWithRequestobjectat0x000001CE6EFE70D0>tz_bot=PytestExtBot[token=690091347:AAFLmR5pAB5Ycpe_mOh7zM4JFBOh0z3T0To]super_group_id='-1001279600026'asyncdeftest_send_close_date_default_tz(self,tz_bot,super_group_id):question="Is this a test?"answers= ["Yes","No","Maybe"]reply_markup=InlineKeyboardMarkup.from_button(InlineKeyboardButton(text="text",callback_data="data")        )aware_close_date=dtm.datetime.now(tz=tz_bot.defaults.tzinfo)+dtm.timedelta(seconds=5)close_date=aware_close_date.replace(tzinfo=None)msg=awaittz_bot.send_poll(# The timezone returned from this is always converted to UTCchat_id=super_group_id,question=question,options=answers,close_date=close_date,read_timeout=60,        )msg.poll._unfreeze()# Sometimes there can be a few seconds delay, so don't let the test fail due to that->msg.poll.close_date=msg.poll.close_date.astimezone(aware_close_date.tzinfo)EAttributeError:'NoneType'objecthasnoattribute'astimezone'tests\test_bot.py:2843:AttributeError
tests.test_bot.TestBotWithRequest::test_send_close_date_default_tz[Asia/Singapore-timezone]
Stack Traces | 7.37s run time
self=<tests.test_bot.TestBotWithRequestobjectat0x00000223F0A048D0>tz_bot=PytestExtBot[token=690091347:AAFLmR5pAB5Ycpe_mOh7zM4JFBOh0z3T0To]super_group_id='-1001279600026'asyncdeftest_send_close_date_default_tz(self,tz_bot,super_group_id):question="Is this a test?"answers= ["Yes","No","Maybe"]reply_markup=InlineKeyboardMarkup.from_button(InlineKeyboardButton(text="text",callback_data="data")        )aware_close_date=dtm.datetime.now(tz=tz_bot.defaults.tzinfo)+dtm.timedelta(seconds=5)close_date=aware_close_date.replace(tzinfo=None)msg=awaittz_bot.send_poll(# The timezone returned from this is always converted to UTCchat_id=super_group_id,question=question,options=answers,close_date=close_date,read_timeout=60,        )msg.poll._unfreeze()# Sometimes there can be a few seconds delay, so don't let the test fail due to that->msg.poll.close_date=msg.poll.close_date.astimezone(aware_close_date.tzinfo)EAttributeError:'NoneType'objecthasnoattribute'astimezone'tests\test_bot.py:2843:AttributeError

To view more test analytics, go to theTest Analytics Dashboard
📋 Got 3 mins?Take this short survey to help us improve Test Analytics.

@Bibo-JoshiBibo-Joshi marked this pull request as ready for reviewFebruary 17, 2025 17:20
@Bibo-Joshi
Copy link
MemberAuthor

updated the tests, ready for review :)

Copy link

@CopilotCopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

@Bibo-JoshiBibo-Joshi merged commitb0d22ac intomasterMar 1, 2025
25 of 26 checks passed
@Bibo-JoshiBibo-Joshi deleted the feature/app-bootstrapping branchMarch 1, 2025 10:13
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsMar 9, 2025
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

Copilot code reviewCopilotCopilot left review comments

@PoolitzerPoolitzerAwaiting requested review from Poolitzer

@harshil21harshil21Awaiting requested review from harshil21

Assignees
No one assigned
Labels
🔌 enhancementpr description: enhancement🛠 refactorchange type: refactor
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Add Bootstrapping Logic toApplication.initialize when Called withinApplication.run_*
4 participants
@Bibo-Joshi@tsnoam@septatrix@Poolitzer

[8]ページ先頭

©2009-2025 Movatter.jp