Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

rp2: Fix hang triggered by timing of short sleeps and soft timer events#13329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation

projectgus
Copy link
Contributor

@projectgusprojectgus commentedJan 3, 2024
edited
Loading

The alarm pool on rp2 pico-sdk stops triggering at times, until an additional interrupt is received to wake the CPU from a WFE state. In systems without regular interrupts this can hang indefinitely. See#12873, and also linked pico-sdk issueraspberrypi/pico-sdk#1552.

This PR removes most MicroPython dependencies on the pico-sdk alarm pool, as a fix for the readily reproducible versions of this bug:

  • Soft timer is refactored to use a standalone hardware timer alarm. To support this,pendsv_suspend() &pendsv_resume() is made safe to run from their core, as these functions protect access to the soft timer data structures.
  • Functions for sleeping & waiting for events are refactored to use the soft timer instead of pico-sdk functions.

There are two remaining places that MicroPython uses the pico-sdk alarm pool on rp2:

  • machine.Timer() uses the default alarm pool. A future PR is planned to refactor this to use soft timer, but this requires deeper soft timer changes to support microsecond resolution.
  • pico-sdk invokes alarm pool functions internally. Notablymulticore_lockout_start_blocking() which adds an alarmat_end_of_time. This modifies the alarm pool - but with a timer that never expires, so unlikely to trigger the bug. There is also a USB workaround timeouthw_enumeration_fix_wait_se0() that creates a short-lived timeout using the alarm pool.

Aftermachine.Timer() is refactored then a fix for the second point is to build pico-sdk withPICO_TIME_DEFAULT_ALARM_POOL_DISABLED=1. This converts the other timeouts into busy-waits inside the SDK, and will presumably save some code size as alarm pool code will no longer be compiled into MicroPython. This is not possible yet, asmachine.Timer() needs an alarm pool and pico-sdk doesn't provide a way to statically initialise a new alarm pool (there isalarm_pool_create(), but this calls pico-sdk's malloc to allocate memory for the alarm pool... non-functional commit for this is atprojectgus@9c8ddd8)

Closes#12873 (Rationale: although the alarm pool is still used in the places described, the access patterns that made it easy to hang the rp2 in a WFE state should no longer be triggered unlessmachine.Timer() is used a lot for very short delays.)

Future work

  1. Refactor soft_timer to support 64-bit microsecond timeouts
  2. Refactormachine.Timer() to use these
  3. Disable default alarm pool in the pico-sdk configuration.

This work was funded through GitHub Sponsors.

@projectgus
Copy link
ContributorAuthor

projectgus commentedJan 3, 2024
edited
Loading

@dpgeorge Marked as Draft because I don't have a Pico-W so need help to test the soft timer changes thoroughly (I did some tests with custom C code.)

hardware_alarm_set_callback(MICROPY_HW_SOFT_TIMER_ALARM_NUM, (void *)soft_timer_handler);
// soft timer IRQ handler has to run at PendSV priority. adjusting priority here so we don't
// need to trigger PendSV the timer IRQ.
NVIC_SetPriority(TIMER_IRQ_0_IRQn + MICROPY_HW_SOFT_TIMER_ALARM_NUM, PICO_LOWEST_IRQ_PRIORITY);
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A weird thing here: thereal PendSV IRQ priority depends on whether the network support is compiled in. It's set here in modnetwork.c:
https://github.com/micropython/micropython/blob/master/ports/rp2/mpnetworkport.c#L69

Practically I don't think this matters too much as (AFAIK) it's not used for anything else on rp2 otherwise, but it's a bit odd so I wanted to point it out (which is why I've done it this way here as well.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is fine. But the comment on lines 306-307 needs rewriting, I don't understand it!

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@dpgeorge Right, it had at least one typo in it to boot. PTAL!

@codecovCodecov
Copy link

codecovbot commentedJan 3, 2024
edited
Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base(2ed976f) 98.36% compared to head(2923d54) 98.36%.

Additional details and impacted files
@@           Coverage Diff           @@##           master   #13329   +/-   ##=======================================  Coverage   98.36%   98.36%           =======================================  Files         159      159             Lines       21088    21088           =======================================  Hits        20743    20743             Misses        345      345

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report?Share it here.

hardware_alarm_set_callback(MICROPY_HW_SOFT_TIMER_ALARM_NUM, (void *)soft_timer_handler);
// soft timer IRQ handler has to run at PendSV priority. adjusting priority here so we don't
// need to trigger PendSV the timer IRQ.
NVIC_SetPriority(TIMER_IRQ_0_IRQn + MICROPY_HW_SOFT_TIMER_ALARM_NUM, PICO_LOWEST_IRQ_PRIORITY);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is fine. But the comment on lines 306-307 needs rewriting, I don't understand it!

@projectgusprojectgusforce-pushed thebugfix/remove_alarm_pool branch from35245a7 to8af2410CompareJanuary 3, 2024 05:29
@projectgus
Copy link
ContributorAuthor

@dpgeorge Updated, PTAL again! :)

@projectgusprojectgus marked this pull request as ready for reviewJanuary 3, 2024 05:31
@projectgusprojectgusforce-pushed thebugfix/remove_alarm_pool branch from8af2410 tof55f520CompareJanuary 3, 2024 06:13
@projectgusprojectgus marked this pull request as draftJanuary 3, 2024 06:16
@projectgus
Copy link
ContributorAuthor

Hmm, there's a test (extmod/vfs_lfs_mtime.py) that's started failing with this PR. Investigating now...

@dpgeorge
Copy link
Member

there's a test (extmod/vfs_lfs_mtime.py) that's started failing with this PR

I also see that failure. It's probably because the delay at the start ofmp_hal_time_ns_set_from_rtc() that was changed in this PR is no longer long enough.

@projectgusprojectgusforce-pushed thebugfix/remove_alarm_pool branch 2 times, most recently from748ef55 toa7e2484CompareJanuary 3, 2024 06:58
@projectgusprojectgus marked this pull request as ready for reviewJanuary 3, 2024 07:06
@dpgeorge
Copy link
Member

Unfortunately this fails on a Pico W running theextmod/select_poll_eintr.py test. It looks like it's due to races in the soft timer between core0 and core1, due tomp_wfe_or_timeout() being implemented using the soft timer. I guess soft timer needs to have a mutex and/or use a proper atomic section. That needs further investigation.

@projectgus
Copy link
ContributorAuthor

projectgus commentedJan 3, 2024
edited
Loading

@dpgeorge Ah, damn. I have a Pico W on order but it might not get here until late next week, but I can order another one with faster shipping if this is a blocker.

If you're sure it's a race or a deadlock here then another option would be to occupy another hardware alarm interrupt (0 & 1 both still unused) and use that one formp_wfe_or_timeout(). I can make some time to implement this if it's useful, although obviously I can't properly test it yet...

@dpgeorge
Copy link
Member

If you're sure it's a race or a deadlock here then another option would be to occupy another hardware alarm interrupt (0 & 1 both still unused) and use that one formp_wfe_or_timeout()

Yeah, I was thinking that. Would need to use both interrupts, in case the separate cores both need to do the wait at the same time.

I can make some time to implement this if it's useful, although obviously I can't properly test it yet...

I'm going to add/adjust the thread tests so they can run on rp2 properly. That should give a way to test this on a standard Pico.

Don't worry about getting this done quickly. It can wait until next week. It needs thorough testing.

projectgus reacted with thumbs up emoji

@dpgeorge
Copy link
Member

I added thread tests in#13351. Running the test suite on a Pico (via./run-tests.py --target rp2) should now pick up all known bugs with threading, including those already fixed by#13312.

@projectgus
Copy link
ContributorAuthor

@dpgeorge Thanks for adding those_thread tests! My Pico W turned up as well, so I was able to run tests on it also.

I think this is fixed now (failing testsbuiltin_pow3_intbig ssl_sslcontext_ciphers thread_ident1 which all seem to be unrelated/known failures.)

The fix forpendsv_suspend() uses a recursive mutex (@jimmo's favourite), as it seemed like the least complex way to achieve two slightly unrelated goals: atomic increment/decrement of the pendsv suspend counter, and mutual exclusive access to soft timer internals while pendsv is suspended. See the commit message for more of an explanation.

@dpgeorgedpgeorge added this to therelease-1.24.0 milestoneMar 25, 2024
@dpgeorgedpgeorgeforce-pushed thebugfix/remove_alarm_pool branch from2923d54 to2926001CompareMay 31, 2024 06:50
@dpgeorge
Copy link
Member

Sorry, I messed up rebasing this on latest master, and force pushing (I pushed master instead of the branch... so this PR disappeared!).

Rebased and merged in74fb42a through3af006e

projectgus reacted with laugh emoji

@projectgusprojectgus deleted the bugfix/remove_alarm_pool branchJune 4, 2024 00:42
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@dpgeorgedpgeorgedpgeorge left review comments

Assignees
No one assigned
Labels
Projects
None yet
Milestone
release-1.24.0
Development

Successfully merging this pull request may close these issues.

Pico W: sleep_us causes execution to hang
2 participants
@projectgus@dpgeorge

[8]ページ先頭

©2009-2025 Movatter.jp