Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

MAINT: Convert umath_linalg to multi-phase init (PEP 489)#29030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
seberg merged 1 commit intonumpy:mainfromAA-Turner:multi-phase/umath_linalg
May 28, 2025

Conversation

AA-Turner
Copy link
Contributor

@AA-TurnerAA-Turner commentedMay 21, 2025
edited
Loading

@AA-TurnerAA-Turnerforce-pushed themulti-phase/umath_linalg branch 2 times, most recently fromebfb8a7 to7121db6CompareMay 22, 2025 03:15
@AA-TurnerAA-Turnerforce-pushed themulti-phase/umath_linalg branch 2 times, most recently fromeae5c53 to6876e24CompareMay 22, 2025 05:45
Copy link
Member

@mattipmattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Please work on a single PR to reduce maintainer review burden. Once we are settled that we wish to make this change, we can revisit the other ones. Does this impact performance, either import time or runtime?

PyModuleDef_HEAD_INIT, /* m_base */
"_umath_linalg", /* m_name */
NULL, /* m_doc */
0, /* m_size */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

For now, in a future PR this can be changed.

Suggested change
0,/* m_size*/
-1,/* m_size*/

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

See also my note in#29025 (comment). Briefly, I was mistaken in my understanding of the documentation,-1 is only valid for single-phase initialisation, and is forbidden for multi-phase.

The safest thing reinitialisation-wise would be to use themultiple interpreter opt-out suggested by CPython. This would enforce that each extension module is only imported & initialised once per process.

However, this causestest_reloading.py::test_full_reimport() to fail. Is that an acceptable trade-off? Multi-phase init (and at some point, isolated modules) opens the door to proper support for reloading/reimportingnumpy.

cc@encukou re the best way to do a gradual migration to multi-phase

In the case ofumath_linalg, we might be fine globals-wise, butmultiarray will definitely need a solution here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'vejust updated the HOWTO, seehere before it gets to main docs.

Basically, to allow re-import, clear the "loaded" flag inPyModuleDef'sm_clear. Note that thiswill do an actual reload of the module (sans C globals), that is,exec will run. (With single-phase, the module dict contents are stashed away and reused. You could emulatethat, but )

The PyMutex stuff in the updated HOWTO is 3.13+ non-limited API, but isn't necessary in the medium term (on CPython you're protected by the GIL; even free-threading builds use an import lock which isn't likely to go away). For the long-term (or alternate CAPI implementations) you could add no-op shims.

ThePy_MOD_MULTIPLE_INTERPRETERS_NOT_SUPPORTED flag will only allow loading in the main interpreter; that's redundant if you're limited to one module per process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

However, this causes test_reloading.py::test_full_reimport() to fail. Is that an acceptable trade-off?

I am not opposed to breaking the test/changing the behavior. The ask would be to summarize how things change for:

  • Potentialsubinterpreter users (Ithink that just doesn't work at all anymore, but not sure).
  • Users forimportlib.reload(), currently they get a warning that this is a bad idea. A full-blown error may not be so bad, though.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Potentialsubinterpreter users (Ithink that just doesn't work at all anymore, but not sure).

As far as I can tell, importingnumpy simply fails in a subinterpreter, and continues to do so with the explicit opt-out.

$ uvx --with numpy --python 3.13 pythonPython 3.13.3+ (main, May 20 2025, 08:56:23) [GCC 13.3.0] on linuxType "help", "copyright", "credits" or "license" for more information.>>>import _interpreters>>># first import in an interpreter with a 'legacy' shared GIL>>> c= _interpreters.new_config('legacy')  >>> id1= _interpreters.create(c,reqrefs=True)>>> _interpreters.incref(id1)# bookkeeping>>> _interpreters.exec(id1,'import sys',restrict=True)# stdlib works>>> _interpreters.exec(id1,'import numpy',restrict=True)namespace(  type=namespace(    __name__='RuntimeError',    __qualname__='RuntimeError',    __module__='builtins'  ),   msg='CPU dispatcher tracer already initlized',  formatted='RuntimeError: CPU dispatcher tracer already initlized',  errdisplay='Traceback (most recent call last):\n  File "<string>", line 1, in <module>\n'             '[...SNIP...]\n'             '    from numpy._core._multiarray_umath import (\n'             '        add_docstring,  _get_implementing_args, _ArrayFunctionDispatcher)\n'             'RuntimeError: CPU dispatcher tracer already initlized')>>># now import in an isolated interpreter with its own GIL>>> id2= _interpreters.create(reqrefs=True)>>> _interpreters.exec(id2,'import sys',restrict=True)>>>print(_interpreters.exec(id2,'import numpy',restrict=True))double free or corruption (out)$  # the REPL has crashed!

Users forimportlib.reload(), currently they get a warning that this is a bad idea. A full-blown error may not be so bad, though.

Reloading currently works with a UserWarning, unchanged with the explict opt-out.


The observable change I've found is clearingsys.modules and re-importing. Previously, it emittedUserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged., but would now raiseImportError: cannot load module more than once per process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't care, but it may be good to add a release note fragment about this somewhere.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done, added a brief note about the changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

For completeness, could you please check that%autoreload in IPython/Jupyter isn't affected?

Copy link
ContributorAuthor

@AA-TurnerAA-TurnerMay 28, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@rgommers Seems to work (editing__init__.py to add aspam() function):

$ ipythonPython 3.13.3+ (main, May 20 2025, 08:56:23) [GCC 13.3.0]IPython 9.2.0 -- An enhanced Interactive Python. Type '?' for help.In [1]: %load_ext autoreloadIn [2]: %autoreload 2In [3]: import numpyIn [4]: numpy.spam()Out[4]: 'spam spam spam spam spam egg and chips'In [5]: numpy.spam().../lib/python3.13/site-packages/IPython/extensions/deduperreload/deduperreload.py:290: DeprecationWarning: ast.Ellipsis is deprecated and will be removed in Python 3.14; use ast.Constant instead  elif not isinstance(ast_elt, (ast.Ellipsis, ast.Pass)):Out[5]: 'lobster thermidor'

I haven't tested in Jupyter, but I believe it shares an implementation with IPython.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Awesome, thanks for verifying. And yes, the implementation is shared.

@AA-TurnerAA-Turnerforce-pushed themulti-phase/umath_linalg branch from59ec898 to6876e24CompareMay 22, 2025 07:01
@AA-Turner
Copy link
ContributorAuthor

Please work on a single PR to reduce maintainer review burden. Once we are settled that we wish to make this change, we can revisit the other ones.

Will do. For context, there isa proposal to deprecate support for single-phase initialisation in favour of multi-phase. We're working toimprove the documentation here too.

A

rgommers reacted with thumbs up emoji

@AA-Turner
Copy link
ContributorAuthor

Does this impact performance, either import time or runtime?

Local timings usingpython -Ximporttime -c 'import numpy.linalg._umath_linalg' 2>&1| grep _umath_linalg from an editable installation ofnumpy.

Multi-phaseSingle-phase
Mean time (_umath_linalg)5.54ms5.06ms
Mean time (total)4396ms3863ms
Proportion0.126%0.131%

No significant impact of multi-phase on import time. I will investigate running the benchmarks (my laptop is ill-suited to it, though), but there should be no real impact -- this only changes the import process, the end result (initialised module) should be identical performance-wise.

A

@AA-TurnerAA-Turnerforce-pushed themulti-phase/umath_linalg branch 3 times, most recently from565f85f to5bbc0bfCompareMay 24, 2025 02:21
@AA-Turner
Copy link
ContributorAuthor

I don't understand the PyPy-onlynumpy/tests/test_ctypeslib.py::TestAsArray::test_pointer failure.

@AA-TurnerAA-Turner requested review fromseberg andmattipMay 24, 2025 03:36
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This change is required as otherwise the "cannot load module more than once per process" error is raised. Happy to split to a different preparatory PR if preferred.

@rgommers
Copy link
Member

Local timings usingpython -Ximporttime -c 'import numpy.linalg._umath_linalg' 2>&1| grep _umath_linalg from an editable installation ofnumpy.

Note that you cannot use editable installs for measuring import time. You're mostly measuringimportlib traversable overhead and invokingninja then.

For a regular install, I'm seeing the following timings on a macOS arm64 M1 machine over 15 invocations of your-Ximporttime invocation (cumulative time for_umath_linalg:

  • Single-phase init: 680 - 870 us
  • Multi-phase init: 700 - 910 us

Total cumulative import time is around 65 ms.

No significant impact of multi-phase on import time.

Yep, that holds up.

AA-Turner reacted with thumbs up emoji

@rgommers
Copy link
Member

Please work on a single PR to reduce maintainer review burden. Once we are settled that we wish to make this change, we can revisit the other ones.

Will do.

Thanks for working on this! It'd be useful to convert the PRs other than the couple that are under active review to Draft status, to make it clear at a glance to other reviewers browsing the PR list what is safe to ignore.

@AA-Turner
Copy link
ContributorAuthor

AA-Turner commentedMay 24, 2025
edited
Loading

In reply to@rgommers' comment in#29025 (comment), centralising discussion in this PR.

This change looks quite annoying:

-    import_array();+    import_array1(-1);

import_array is public API and used by a ton of extension modules, binding generators, and docs.

At a glance from your PRs, there seems to be a larger issue in the CPython design with the normal meaning of aNULL return used for an error condition no longer being true. Returning0 on success and-1 on error is going to lead to a lot of churn. What is going on there?

This is because the signature changes fromPyObject *PyInit_<NAME>(void) toint module_exec(PyObject *module), soNULL is not a valid return value &-1 is used to signal errors.

I was not around when the API was designed, though in the Python C API we do frequently use-1 as an error sentinel.@encukou /@scoder / etc may be better placed to comment as authors ofPEP 489, but it may just be that it was the most pragmatic choice at the time.

It may be worth considering a shorter alias forimport_array1(-1) assuming no better solution is found, but this is clearly a bigger discussion on NumPy's API and not one I intend to start!

A

@scoder
Copy link
Contributor

I had never looked up the definition ofimport_array() in NumPy, assuming it was a simple function that signaled errors in its return value. It's not:

#define import_array() { \
if (_import_array() < 0) { \
PyErr_Print(); \
PyErr_SetString( \
PyExc_ImportError, \
"numpy._core.multiarray failed to import" \
); \
return NULL; \
} \
}
#define import_array1(ret) { \
if (_import_array() < 0) { \
PyErr_Print(); \
PyErr_SetString( \
PyExc_ImportError, \
"numpy._core.multiarray failed to import" \
); \
return ret; \
} \
}
#define import_array2(msg, ret) { \
if (_import_array() < 0) { \
PyErr_Print(); \
PyErr_SetString(PyExc_ImportError, msg); \
return ret; \
} \
}

Luckily, however, itis a function for Cython code:

# Versions of the import_* functions which are more suitable for
# Cython code.
cdef inlineint import_array()except-1:
try:
__pyx_import_array()
exceptException:
raiseImportError("numpy._core.multiarray failed to import")

That mitigates the issue, I guess. A (very) large chunk of the code that interacts with NumPy's C-API probably does it through Cython, not bare C code.

rgommers reacted with thumbs up emoji

@scoder
Copy link
Contributor

the signature changes fromPyObject *PyInit_<NAME>(void) toint module_exec(PyObject *module)

Right. In the old mechanism, the init function created the module and returned it. In multi-phase init, there is a dedicated creation function that returns it (or an error indicated byNULL), and an exec function (to populate the module namespace) that receives the already created module and returnsonly an error indicator (because it has no objects to return).

@seberg
Copy link
Member

It may be worth considering a shorter alias for import_array1(-1) assuming no better solution is found,

We have a better solution already, which is the explicit thing that isn't a macro that includes a return:

    if (PyArray_ImportNumPyAPI() < 0) {        return -1;    }

please just use that (downstream can use it if compiling with NumPy 2, or just vendor it).

rgommers reacted with thumbs up emoji

@AA-TurnerAA-Turnerforce-pushed themulti-phase/umath_linalg branch from5bbc0bf to9d65240CompareMay 26, 2025 10:19
@AA-Turner
Copy link
ContributorAuthor

Thanks for the pointer, I've switched toPyArray_ImportNumPyAPI() andPyUFunc_ImportUFuncAPI().

A

seberg reacted with thumbs up emoji

@sebergseberg added the 56 - Needs Release Note.Needs an entry in doc/release/upcoming_changes labelMay 26, 2025
@AA-TurnerAA-Turnerforce-pushed themulti-phase/umath_linalg branch from9d65240 to3c3ee41CompareMay 26, 2025 10:51
@rgommers
Copy link
Member

xrefhttps://discuss.python.org/t/proposal-officially-deprecate-support-for-legacy-single-phase-init-extension-modules/89262/44, which points out that the current docs on multi-phase init say:

All modules created using multi-phase initialization are expected to supportsub-interpreters. Making sure multiple modules are independent is typically enough to achieve this.

Given the opt-out option, I assume it's a matter of those docs needing to be improved, but as it stands this PR conflicts with that requirement so it'd be nice to see that Discourse thread concluding and the docs fixed before we merge this.

@AA-Turner
Copy link
ContributorAuthor

Given the opt-out option, I assume it's a matter of those docs needing to be improved, but as it stands this PR conflicts with that requirement so it'd be nice to see that Discourse thread concluding and the docs fixed before we merge this.

For reference, the relevant PR ispython/cpython#134775. Feel free to comment there with any thoughts. The proposed text currently says "All modules created using multi-phase initialization are expected to support sub-interpreters, orotherwise explicitly signal a lack of support." (emph. added).

A

rgommers reacted with thumbs up emoji

@AA-TurnerAA-Turnerforce-pushed themulti-phase/umath_linalg branch from3c3ee41 to43119cdCompareMay 28, 2025 07:46
Copy link
Member

@sebergseberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Generally, LGTM, thanks. I would like to lower the open PR number soon :).

I think I have two asks left only:

  • It would be nice to restore/usetextwrap.dedent
  • From the docs, the code seems wrong for no-gil as a mutex must be used? (I am a bit surprised by import not already having a mutex, but, OK.)

All modules created using multi-phase initialization are expected to support sub-interpreters, or otherwise explicitly signal a lack of support.

My understanding is that this PR adds that now and that there is no major change in behavior (besides brutaldel sys.module[...]).
(Historically, people got away with using NumPy in subinterpreters even if it was buggy, but I think Python changes for non-phase init modules removed that ability for us.)

AA-Turner reacted with thumbs up emoji
encoding='utf-8',
check=False,
)
assert p.returncode == 0, p.stdout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@dschult if you can have a quick look: I think doing the lazy loading test in a subprocess is identical (but safer as it avoids messing withsys.modules). Does that seem right?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I had to make the change here as otherwise we get the new "cannot load module more than once per process" error. I believe it should be functionally identical in terms of the test.

seberg reacted with thumbs up emoji
@AA-Turner
Copy link
ContributorAuthor

From Petr above:

The PyMutex stuff in the updated HOWTO is 3.13+ non-limited API, but isn't necessary in the medium term (on CPython you're protected by the GIL; even free-threading builds use an import lock which isn't likely to go away). For the long-term (or alternate CAPI implementations) you could add no-op shims.

I chose not to add it as it's only relevant for CPython & the GIL is held during import, so I saw it as needless complexity.

I'm happy to add it you'd prefer to follow the documentation more closely. What would you prefer@seberg?

A

@seberg
Copy link
Member

I chose not to add it as it's only relevant for CPython & the GIL is held during import, so I saw it as needless complexity.

Sorry, missed that, happy to not do that then.

AA-Turner reacted with thumbs up emoji

@AA-TurnerAA-Turnerforce-pushed themulti-phase/umath_linalg branch from43119cd to0ee269fCompareMay 28, 2025 10:12
@sebergseberg changed the titleGH-29021: Convert umath_linalg to multi-phase init (PEP 489)MAINT: Convert umath_linalg to multi-phase init (PEP 489)May 28, 2025
Copy link
Member

@sebergseberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks. Will put it in once tests pass, supporting multi-phase init seems like it's in our future anyway and with this out of the way, hopefully we can do the others quickly.
(Then making it work for multiarray and actually supporting it in the caching are longer things.)

If anyone still finds something off, we can follow up.

rgommers reacted with thumbs up emojiAA-Turner reacted with hooray emoji
@AA-Turner
Copy link
ContributorAuthor

Thanks all for the reviews! I'll rebase the others once this has merged, do you have a preferred order?

A

@sebergseberg merged commit5a87756 intonumpy:mainMay 28, 2025
74 checks passed
@AA-TurnerAA-Turner deleted the multi-phase/umath_linalg branchMay 28, 2025 11:02
@mattip
Copy link
Member

Do we need to wait for clarification around the current requirement that "All modules created using multi-phase initialization are expected to support sub-interpreters" andpython/cpython#134775, which has not been merged yet?

@AA-Turner
Copy link
ContributorAuthor

AA-Turner commentedMay 28, 2025
edited
Loading

If it provides solace,PEP 630 (2020) and the Python documentationsince 3.11 both contain the text on explicit opt-out that was used as a guide here:

A non-negativePyModuleDef.m_size signals that a module supports multiple interpreters correctly. If this is not yet the case for your module, you can explicitly make your module loadable only once per process.

A

@rgommers
Copy link
Member

The proposed text currently says "All modules created using multi-phase initialization are expected to support sub-interpreters, orotherwise explicitly signal a lack of support." (emph. added).

Seems perfectly clear now, thanks! Glad to see this PR merged so smoothly.

AA-Turner reacted with heart emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@rgommersrgommersrgommers left review comments

@encukouencukouencukou left review comments

@sebergsebergseberg approved these changes

@mattipmattipAwaiting requested review from mattip

Assignees
No one assigned
Labels
56 - Needs Release Note.Needs an entry in doc/release/upcoming_changes
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

6 participants
@AA-Turner@rgommers@scoder@seberg@mattip@encukou

[8]ページ先頭

©2009-2025 Movatter.jp