Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-120754: Add a strace helper and test set of syscalls for open().read(), Take 2#123413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
hauntsaninja merged 12 commits intopython:mainfromcmaloney:cmaloney/strace_take_2
Nov 3, 2024

Conversation

@cmaloney
Copy link
Contributor

@cmaloneycmaloney commentedAug 28, 2024
edited
Loading

ReapplyGH-121143 with additional changes after it was reverted inGH-123303. Investigation why the bot broke + suggestions for improvements in#121143 (comment)

Note: This needs a buildbot trigger + pass before merge, see:#121143 (comment).

Changes fromGH-121143:

  1. Addedstatx to set of allowed syscall forms (Should make Raspian bot pass).
  2. Check that thefd returned from theopen call is passed to all future calls. This helps ensure that thestat call uses the file descriptor rather than thefilename to avoid TOCTOU isuses.
  3. Update thePath().read_bytes() test case to additionally validate the reduction inisatty/ioctl +seek calls fromGH-120754: Disable buffering in Path.read_bytes #122111.
  4. Better diagnostic assertion messages from@gpshead, so when the test fails have first information immediately available. Makes remote CI debugging much simpler.

NOTE: This needs a full buildbot test pass before merge, see:python#121143 (comment).1. Added `statx` to set of allowed syscall forms (Should make Raspian bot pass).2. Check that the `fd` returned from `open` is passed to all future calls. This helps ensure things like the `stat` call uses the file descriptor rather than the `filename` to avoid TOCTOU isuses.3. Update the `Path().read_bytes()` test case to additionally validate the reduction in`isatty`/`ioctl` + `seek` calls frompython#1221114. Better diagnostic assertion messagess from@gpshead, so when the test fails have first information immediately available. Makes remote CI debugging much simpler.
@hauntsaninjahauntsaninja added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelAug 28, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@hauntsaninja for commite196d3d 🤖

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelAug 28, 2024
@cmaloney
Copy link
ContributorAuthor

I looked into the buildbot failures and they don't seem related to the changes I make here (full details below). Raspbian bot which failed post-merge on last PR, does pass here.

Investigating buildbot failures:

  1. AMD64 Arch Linux Usan:
    - C compilation issue, seems unrelated to these changes
  2. PPC64LE RHEL8 Refleaks
    - test.test_concurrent_futures.test_shutdown timed out after 3 hour 49 min
    - seems to be:test_concurrent_futures.test_shutdown: AssertionError: b'' != b'apple' #112542
  3. test.test_asyncio.test_server.TestServer2.test_abort_clients failures
    • Also occur in main on same buildbots (ex.https://buildbot.python.org/#/builders/29/builds/6842)
    • AMD64 Fedora Stable Clang Installed
    • AMD64 Fedora Stable Clang PR
    • AMD64 Fedora Stable LTO + PGO
    • AMD64 Fedora Stable LTO PR
    • AMD64 Fedora Stable PR
    • note: I can get this same failure on my Arch Linux locally on main, may try investigating a bit more. Can't seem to find a issue for it currently.
test.test_asyncio.test_server.TestServer2.test_abort_clients```0:01:43 load avg: 8.22 [353/479/1] test.test_asyncio.test_server failed (1 failure)test_start_server_1 (test.test_asyncio.test_server.ProactorStartServerTests.test_start_server_1) ... skipped 'Windows only'test_start_server_1 (test.test_asyncio.test_server.SelectorStartServerTests.test_start_server_1) ... oktest_start_unix_server_1 (test.test_asyncio.test_server.SelectorStartServerTests.test_start_unix_server_1) ... oktest_abort_clients (test.test_asyncio.test_server.TestServer2.test_abort_clients) ... FAILtest_close_clients (test.test_asyncio.test_server.TestServer2.test_close_clients) ... oktest_wait_closed_basic (test.test_asyncio.test_server.TestServer2.test_wait_closed_basic) ... oktest_wait_closed_race (test.test_asyncio.test_server.TestServer2.test_wait_closed_race) ... oktest_unix_server_addr_cleanup (test.test_asyncio.test_server.UnixServerCleanupTests.test_unix_server_addr_cleanup) ... oktest_unix_server_cleanup_gone (test.test_asyncio.test_server.UnixServerCleanupTests.test_unix_server_cleanup_gone) ... oktest_unix_server_cleanup_prevented (test.test_asyncio.test_server.UnixServerCleanupTests.test_unix_server_cleanup_prevented) ... oktest_unix_server_cleanup_replaced (test.test_asyncio.test_server.UnixServerCleanupTests.test_unix_server_cleanup_replaced) ... oktest_unix_server_sock_cleanup (test.test_asyncio.test_server.UnixServerCleanupTests.test_unix_server_sock_cleanup) ... ok

======================================================================
FAIL: test_abort_clients (test.test_asyncio.test_server.TestServer2.test_abort_clients)

Traceback (most recent call last):
File "/home/buildbot/buildarea/pull_request.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.14/unittest/async_case.py", line 93, in _callTestMethod
result = self._callMaybeAsync(method)
File "/home/buildbot/buildarea/pull_request.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.14/unittest/async_case.py", line 119, in _callMaybeAsync
return self._asyncioRunner.run(
~~~~~~~~~~~~~~~~~~~~~~~^
func(*args, **kwargs),
^^^^^^^^^^^^^^^^^^^^^^
context=self._asyncioTestContext,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/buildbot/buildarea/pull_request.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.14/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
File "/home/buildbot/buildarea/pull_request.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.14/asyncio/base_events.py", line 721, in run_until_complete
return future.result()
~~~~~~~~~~~~~^^
File "/home/buildbot/buildarea/pull_request.cstratak-fedora-stable-x86_64.clang-installed/build/target/lib/python3.14/test/test_asyncio/test_server.py", line 249, in test_abort_clients
self.assertNotEqual(s_wr.transport.get_write_buffer_size(), 0)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 0 == 0


Ran 12 tests in 0.250s

FAILED (failures=1, skipped=1)
test test.test_asyncio.test_server failed

@cmaloney
Copy link
ContributorAuthor

Bots which failed aroundtest_asyncio should pass now with#123423

@cmaloney
Copy link
ContributorAuthor

TheTests / Address sanitizer failure seems to be a case that needs to be handled, investigating it (anmmap shows up in the middle of the I/O calls). Could this get theno news tag (just changes pythontest. module pieces)

@cmaloney
Copy link
ContributorAuthor

Could this get the "no news" label? (Internal changes). I think this is ready for a more full review again + run bots (merged current main in recently)

Recent changes:

  1. AddressSanitizer + LeakSanitizer can't be used understrace /ptrace (And new versions print a very helpful hint about this), added a skip for that so "Tests / Address sanitizer" should pass now.
  2. Updated forAvoid calling isatty() for most open() calls #90102 to validate the noisatty call

@hauntsaninjahauntsaninja added skip news 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelsNov 2, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@hauntsaninja for commit9e8b23d 🤖

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelNov 2, 2024
@cmaloney
Copy link
ContributorAuthor

Buildbots look fairly clean. Three failed (AMD64 Windows10 PR, ARM64 Windows Non-Debug PR, ARM64 Windows PR), all with issues inpegen (AttributeError: module 'token' has no attribute 'FSTRING_START'). Other Windows buildbots passed, and Windows Github action passed as well.

Copy link
Contributor

@hauntsaninjahauntsaninja left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for sticking with this change!

It looks like if unknown syscalls are introduced the current test is fine with that. If you wanted to make a follow-up PR tightening that up, I'd be in favour of it. (I acknowledge that that maybe contradicts the advice in#121143 (comment) , but I think the similar name logic helps address the maintainability concern)

cmaloney reacted with thumbs up emoji
@hauntsaninjahauntsaninja merged commit9ad57cf intopython:mainNov 3, 2024
115 of 118 checks passed
@cmaloneycmaloney deleted the cmaloney/strace_take_2 branchNovember 3, 2024 03:40
@cmaloney
Copy link
ContributorAuthor

This does prevent additionalfstat andioctl/isatty calls which were the two big performance changes ingh-120754. The seek is in theory all that remains in there but not checked for, and the BufferedIO changes I'm working on should eliminate that. Will be a bit to even get a full approach+proposal around BufferedIO pieces, is refactoring how the middle layer of the I/O stack works and touches a lot more Python C internals (locks, critical sections, ...). With that will look at adding the "no unexpected calls" check, as well as likely add some tests around more specific optimized cases (ex. stdout/stderr just write/append+flush, stdin just read, ...).

hauntsaninja reacted with thumbs up emoji

@mgorny
Copy link
Contributor

All the newly added tests fail on Gentoo Linux with sandbox enabled. From a quick check, the test wrongly assumes that the first syscall reported bystrace would be anopen* call, but it doesn't even verify that — it just blindly takes the return value of the first syscall ever and assumes it would be fd.

If I add some prints to get the list of syscalls, I get e.g. the following:

Check that the set of system calls produced by the I/O stack is what ... StraceEvent(syscall='faccessat2', args=['AT_FDCWD', '"@test_1752124_tmp\\303\\246"', 'F_OK', 'AT_EACCESS'], returncode='0')StraceEvent(syscall='mmap', args=['NULL', '8208', 'PROT_READ|PROT_WRITE', 'MAP_PRIVATE|MAP_ANONYMOUS', '-1', '0'], returncode='0x7fe693892000')StraceEvent(syscall='mmap', args=['NULL', '8208', 'PROT_READ|PROT_WRITE', 'MAP_PRIVATE|MAP_ANONYMOUS', '-1', '0'], returncode='0x7fe693892000')StraceEvent(syscall='getcwd', args=['"/var/tmp/portage/dev-lang/python-3.14.0_alpha2/work/Python-3.14.0a2/build/test_python_1752124\\303\\246"', '8192'], returncode='96')StraceEvent(syscall='newfstatat', args=['AT_FDCWD', '"/var/tmp/portage/dev-lang/python-3.14.0_alpha2/work/Python-3.14.0a2/build/test_python_1752124\\303\\246"', '{st_mode=S_IFDIR|0755', 'st_size=38', '...}', 'AT_SYMLINK_NOFOLLOW'], returncode='0')StraceEvent(syscall='newfstatat', args=['AT_FDCWD', '"/var/tmp/portage/dev-lang/python-3.14.0_alpha2/work/Python-3.14.0a2/build/test_python_1752124\\303\\246"', '{st_mode=S_IFDIR|0755', 'st_size=38', '...}', 'AT_SYMLINK_NOFOLLOW'], returncode='0')StraceEvent(syscall='mmap', args=['NULL', '8208', 'PROT_READ|PROT_WRITE', 'MAP_PRIVATE|MAP_ANONYMOUS', '-1', '0'], returncode='0x7fe69388f000')StraceEvent(syscall='getcwd', args=['"/var/tmp/portage/dev-lang/python-3.14.0_alpha2/work/Python-3.14.0a2/build/test_python_1752124\\303\\246"', '1024'], returncode='96')StraceEvent(syscall='readlink', args=['"/var/tmp/portage/dev-lang/python-3.14.0_alpha2/work/Python-3.14.0a2/build/test_python_1752124\\303\\246/@test_1752124_tmp\\303\\246"', '0x7ffecf5289f0', '1023'], returncode='-1 EINVAL (Invalid argument)')StraceEvent(syscall='openat', args=['AT_FDCWD', '"@test_1752124_tmp\\303\\246"', 'O_RDONLY|O_CLOEXEC'], returncode='3')StraceEvent(syscall='fstat', args=['3', '{st_mode=S_IFREG|0644', 'st_size=13', '...}'], returncode='0')StraceEvent(syscall='lseek', args=['3', '0', 'SEEK_CUR'], returncode='0')StraceEvent(syscall='read', args=['3', '"Hello', 'World!"', '14'], returncode='13')StraceEvent(syscall='read', args=['3', '""', '1'], returncode='0')StraceEvent(syscall='close', args=['3'], returncode='0')

i.e. there's a bunch of extra "preparatory" calls before theopenat() you're expecting, that should be ignored.

@mgorny
Copy link
Contributor

Besides, given how fragile this thing is by design, there should be an easy way to disable it (e.g. via-u-strace).

@mgorny
Copy link
Contributor

It's broken on musl (without any special environment) too:

Check that the set of system calls produced by the I/O stack is what ... StraceEvent(syscall='open', args=['"@test_26640_tmp\\303\\246"', 'O_RDONLY|O_LARGEFILE|O_CLOEXEC'], returncode='3')StraceEvent(syscall='fcntl', args=['3', 'F_SETFD', 'FD_CLOEXEC'], returncode='0')StraceEvent(syscall='fstat', args=['3', '{st_mode=S_IFREG|0644', 'st_size=13', '...}'], returncode='0')StraceEvent(syscall='mmap', args=['NULL', '16384', 'PROT_READ|PROT_WRITE', 'MAP_PRIVATE|MAP_ANONYMOUS', '-1', '0'], returncode='0x7efd4027f000')StraceEvent(syscall='lseek', args=['3', '0', 'SEEK_CUR'], returncode='0')StraceEvent(syscall='read', args=['3', '"Hello', 'World!"', '14'], returncode='13')StraceEvent(syscall='read', args=['3', '""', '1'], returncode='0')StraceEvent(syscall='close', args=['3'], returncode='0')

@cmaloney
Copy link
ContributorAuthor

cmaloney commentedNov 20, 2024
edited
Loading

Happy to work on making the state better on Gentoo. There are tradeoffs in the fragility, and I'm happy to iterate to get everything working well. Unfortunately not having a check like this has resulted in syscalls being re-added unintentionally (ex.bpo-21679 removed a double stat andgh-120754 removed a double stat again).

In the short term I think the--ignore /--ignorefile arguments should allow you to disable this individual test in the same spot you could add a-u argument to enable/disable specific test resources.

There looks like three cases to investigate/work on from this:

  1. Test on Gentoo + Musl + make pass there (there's one extrammap which doesn't refer to the fd which appears).
  2. Probably disable under gentoo + sandbox, it looks like in that environment significant runtime changes are introduced in the compiled binary. That there are twonewfstatat calls in that case worries me / will need to go investigate source.
  3. Potentially add a specialized resource switch-u to enable/disable more broadly.

For 3, did anything intest_subprocess fail for you?test_vfork_used_when_expected uses the samestrace testing infrastructure.

For all three of those I think the standard python process is either open a new issue or reopen thegh-120754 one. Happy to open one so can track these changes and get a fix landed.

@mgorny
Copy link
Contributor

On a glibc system, onlytest_fileio tests failed. On musl, I haven't run other tests — just this one, since I suspect it'd be affected too.

Could you open the bug, please? I still have lots to do today.

cmaloney reacted with thumbs up emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@hauntsaninjahauntsaninjahauntsaninja approved these changes

@gpsheadgpsheadAwaiting requested review from gpsheadgpshead is a code owner

Assignees

No one assigned

Labels

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@cmaloney@bedevere-bot@mgorny@hauntsaninja

[8]ページ先頭

©2009-2025 Movatter.jp