Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Automatically infer the PyTorch index via--torch-backend=auto#12070

Merged
charliermarsh merged 3 commits intomainfromcharlie/ltt
Mar 19, 2025

Conversation

charliermarsh
Copy link
Member

@charliermarshcharliermarsh commentedMar 9, 2025
edited
Loading

Summary

This is a prototype that I'm considering shipping under--preview, based onlight-the-torch.

light-the-torch patches pip to pull PyTorch packages from the PyTorch indexes automatically. And, in particular,light-the-torch will query the installed CUDA drivers to determine which indexes are compatible with your system.

This PR implements equivalent behavior under--torch-backend auto, though you can also set--torch-backend cpu, etc. for convenience. When enabled, the registry client will fetch from the appropriate PyTorch index when it sees a package from the PyTorch ecosystem (and ignore any other configured indexes,unless the package is explicitly pinned to a different index).

Right now, this is only implemented in theuv pip CLI, since it doesn't quite fit into the lockfile APIs given that it relies on feature detection on the currently-running machine.

Test Plan

On macOS, you can test this with (e.g.):

UV_TORCH_BACKEND=auto UV_CUDA_DRIVER_VERSION=450.80.2 cargo run \  pip install torch --python-platform linux --python-version 3.12

On a GPU-enabled EC2 machine:

ubuntu@ip-172-31-47-149:~/uv$ UV_TORCH_BACKEND=auto cargo run pip install torch -v    Finished`dev` profile [unoptimized + debuginfo] target(s)in 0.31s     Running`target/debug/uv pip install torch -v`DEBUG uv 0.6.6 (e95ca063b 2025-03-14)DEBUG Searchingfordefault Python interpreterin virtual environmentsDEBUG Found`cpython-3.13.0-linux-x86_64-gnu` at`/home/ubuntu/uv/.venv/bin/python3` (virtual environment)DEBUG Using Python 3.13.0 environment at: .venvDEBUG Acquired lockfor`.venv`DEBUG At least one requirement is not satisfied: torchwarning: The`--torch-backend` setting is experimental and may change without warning. Pass`--preview` to disable this warning.DEBUG Detected CUDA driver version from`/sys/module/nvidia/version`: 550.144.3...

janosh reacted with thumbs up emojibmorphism, brancengregory, serabiasin, ilkersigirci, amitlevy, AetherMagee, vrdn-23, nathanjmcdougall, and MylesBartlett reacted with heart emojiMylesBartlett reacted with rocket emojinotatallshaw and JohnGiorgi reacted with eyes emoji
@charliermarshcharliermarsh added the no-buildDisable building binaries in CI labelMar 9, 2025
@charliermarshcharliermarshforce-pushed thecharlie/ltt branch 3 times, most recently from9d06dfb to3e85795CompareMarch 9, 2025 02:30
@DEKHTIARJonathan
Copy link

DEKHTIARJonathan commentedMar 10, 2025
edited
Loading

@charliermarsh : Adapted from a few different sources - namely conda.
Credit / Code Author: Michael Sarahan

I hope that illustrates my point better - why you need a plugin interface and you don't want to be the person responsible to maintain that 👍

# Copyright (C) 2012 Anaconda, Inc# SPDX-License-Identifier: BSD-3-Clause"""Detect CUDA version."""importctypesimportfunctoolsimportitertoolsimportmultiprocessingimportosimportplatformfromcontextlibimportsuppressfromdataclassesimportdataclassfromtypingimportOptional@dataclass()classCudaVersion:version:strarchitectures:list[str]defcuda_version()->Optional[CudaVersion]:# Do not inherit file descriptors and handles from the parent process.# The `fork` start method should be considered unsafe as it can lead to# crashes of the subprocess. The `spawn` start method is preferred.context=multiprocessing.get_context("spawn")queue=context.SimpleQueue()# Spawn a subprocess to detect the CUDA versiondetector=context.Process(target=_cuda_detector_target,args=(queue,),name="CUDA driver version detector",daemon=True,    )try:detector.start()detector.join(timeout=60.0)finally:# Always cleanup the subprocessdetector.kill()# requires Python 3.7+ifqueue.empty():returnNoneresult=queue.get()ifresult:driver_version,architectures=result.split(";")result=CudaVersion(driver_version,architectures.split(","))returnresult@functools.lru_cache(maxsize=None)defcached_cuda_version():returncuda_version()def_cuda_detector_target(queue):"""    Attempt to detect the version of CUDA present in the operating system in a    subprocess.    On Windows and Linux, the CUDA library is installed by the NVIDIA    driver package, and is typically found in the standard library path,    rather than with the CUDA SDK (which is optional for running CUDA apps).    On macOS, the CUDA library is only installed with the CUDA SDK, and    might not be in the library path.    Returns: version string with CUDA version first, then a set of unique SM's for the GPUs present in the system             (e.g., '12.4;8.6,9.0') or None if CUDA is not found.             The result is put in the queue rather than a return value.    """# Platform-specific libcuda locationsystem=platform.system()ifsystem=="Darwin":lib_filenames= ["libcuda.1.dylib",# check library path first"libcuda.dylib","/usr/local/cuda/lib/libcuda.1.dylib","/usr/local/cuda/lib/libcuda.dylib",        ]elifsystem=="Linux":lib_filenames= ["libcuda.so",# check library path first"/usr/lib64/nvidia/libcuda.so",# RHEL/Centos/Fedora"/usr/lib/x86_64-linux-gnu/libcuda.so",# Ubuntu"/usr/lib/wsl/lib/libcuda.so",# WSL        ]# Also add libraries with version suffix `.1`lib_filenames=list(itertools.chain.from_iterable((f"{lib}.1",lib)forlibinlib_filenames)        )elifsystem=="Windows":bits=platform.architecture()[0].replace("bit","")# e.g. "64" or "32"lib_filenames= [f"nvcuda{bits}.dll","nvcuda.dll"]else:queue.put(None)# CUDA not available for other operating systemsreturn# Open libraryifsystem=="Windows":dll=ctypes.windllelse:dll=ctypes.cdllforlib_filenameinlib_filenames:withsuppress(Exception):libcuda=dll.LoadLibrary(lib_filename)breakelse:queue.put(None)return# Empty `CUDA_VISIBLE_DEVICES` can cause `cuInit()` returns `CUDA_ERROR_NO_DEVICE`# Invalid `CUDA_VISIBLE_DEVICES` can cause `cuInit()` returns `CUDA_ERROR_INVALID_DEVICE`# Unset this environment variable to avoid these errorsos.environ.pop("CUDA_VISIBLE_DEVICES",None)# Get CUDA versiontry:cuInit=libcuda.cuInitflags=ctypes.c_uint(0)ret=cuInit(flags)ifret!=0:queue.put(None)returncuDriverGetVersion=libcuda.cuDriverGetVersionversion_int=ctypes.c_int(0)ret=cuDriverGetVersion(ctypes.byref(version_int))ifret!=0:queue.put(None)return# Convert version integer to version stringvalue=version_int.valueversion_value=f"{value//1000}.{(value%1000)//10}"count=ctypes.c_int(0)libcuda.cuDeviceGetCount(ctypes.pointer(count))architectures=set()fordeviceinrange(count.value):major=ctypes.c_int(0)minor=ctypes.c_int(0)libcuda.cuDeviceComputeCapability(ctypes.pointer(major),ctypes.pointer(minor),device)architectures.add(f"{major.value}.{minor.value}")queue.put(f"{version_value};{','.join(architectures)}")exceptException:queue.put(None)returnif__name__=="__main__":print(cuda_version())
charliermarsh reacted with thumbs up emoji

| "torchserve"
| "torchtext"
| "torchvision"
| "pytorch-triton"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we add this list to some documentation? Reading the high-level overview I didn't realize we were hardcoding a package list.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we generate this by querying the PyTorch indices to see what they have? (Maybe a manually-run script that queries them and updates this list, or an automatically-run integration tests that makes sure this list is in sync with what's currently on their indices?)

Along those lines it would be helpful to have this list somewhere declarative. It might also be helpful to allow user-controlled overrides of this list if the set of packages changes.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Unfortunately I don't know that we can... We don't wantall packages on these indexes, because they include things likejinja2. And in some cases, they includeincomplete packages likemarkupsafe (where they only have a few wheels).

notatallshaw reacted with thumbs up emoji
Copy link
Collaborator

@geofftgeofft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think this is a great idea.

Would it be worth naming this feature something likeuv-specialized-index instead ofuv-torch with an eye to extending it to other libraries in the future? (jaxlib and tensorflow, for instance, have current/popular versions on PyPI, but I think also have their own indees)?

| "torchserve"
| "torchtext"
| "torchvision"
| "pytorch-triton"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we generate this by querying the PyTorch indices to see what they have? (Maybe a manually-run script that queries them and updates this list, or an automatically-run integration tests that makes sure this list is in sync with what's currently on their indices?)

Along those lines it would be helpful to have this list somewhere declarative. It might also be helpful to allow user-controlled overrides of this list if the set of packages changes.

@samypr100
Copy link
Collaborator

I think this is a great idea.

Would it be worth naming this feature something likeuv-specialized-index instead ofuv-torch with an eye to extending it to other libraries in the future? (jaxlib and tensorflow, for instance, have current/popular versions on PyPI, but I think also have their own indees)?

I had a similar thought, I think this is one of many cases. Also considering when such indexes are mirrored or vendored internally. I was thinking what would be the right naming. I know some avenues refers to this as asuffixed index, so maybeuv-suffixed-index? Same with--torch-backend, maybe something more generic of it's intent would be more future proof, such as--index-suffix

@samypr100
Copy link
Collaborator

samypr100 commentedMar 12, 2025
edited
Loading

though I don't remember off hand whether NVIDIA's license allows redistributing it

iirc this is no longer an issue with the new open source drivers (e.g.nvidia-driver-{ver}-open)

Nevermind, didn't notice you were referring to CUDA.

I think we should get this via /sys/module/nvidia/version

💯 In my experience nvidia-smi can also take a long time depending on gpu load.

Although there multiple locations depending on how (e.g. dkms) and environment (windows, osx) it's installed. For example, WSL 2 its even weirder due to the shared drivers with the host situation. So nvidia-smi might be the most sure-fire low risk way (assuming no issues with install).

@charliermarsh
Copy link
MemberAuthor

Definitely agree with moving this out of the interpreter query (and possibly reading it from outsidenvidia-smi -- I need to do some research).

I'm alittle wary of trying to brand this as something more general thantorch, because I'll likely want to reconsider the mechanism and design entirely as we generalize it. So it seems nice to keep it as an experimentaltorch-specific feature, then modify it as we generalize.

samypr100 reacted with thumbs up emoji

@charliermarsh
Copy link
MemberAuthor

@konstin@geofft -- I believe I've addressed all feedback: we now query from/sys/module/nvidia/version and fall back tonvidia-smi; and all the accelerator stuff is decoupled from the Python interpreter (and no longer cached).

@charliermarshcharliermarshforce-pushed thecharlie/ltt branch 2 times, most recently from96038e9 toc9e4b20CompareMarch 15, 2025 02:28
Copy link
Member

@konstinkonstin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

deferring to@geofft for the new detect logic

Comment on lines +70 to +72
Ok(None) => {
debug!("Failed to parse CUDA driver version from `/proc/driver/nvidia/version`");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Should this case return an error instead of falling through tonvidia-smi?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm not confident enough in the format of this one... It seems like it varies across machines.

Comment on lines +86 to +90
if output.status.success() {
let driver_version = Version::from_str(&String::from_utf8(output.stdout)?)?;
debug!("Detected CUDA driver version from `nvidia-smi`: {driver_version}");
return Ok(Some(Self::Cuda { driver_version }));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

else {debug!("nvidia-smi returned error {output.status}: {output.stderr}")} might be nice

@charliermarshcharliermarshforce-pushed thecharlie/ltt branch 3 times, most recently from14cb5ed to9e40e0cCompareMarch 19, 2025 14:26
@charliermarshcharliermarshenabled auto-merge (squash)March 19, 2025 14:29
@charliermarshcharliermarsh merged commit5173b59 intomainMar 19, 2025
89 checks passed
@charliermarshcharliermarsh deleted the charlie/ltt branchMarch 19, 2025 14:37
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull requestMar 24, 2025
This MR contains the following updates:| Package | Update | Change ||---|---|---|| [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.6.5` -> `0.6.9` |MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).**Proposed changes to behavior should be submitted there as MRs.**---### Release Notes<details><summary>astral-sh/uv (astral-sh/uv)</summary>### [`v0.6.9`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#069)[Compare Source](astral-sh/uv@0.6.8...0.6.9)##### Enhancements-   Use `keyring --mode creds` when `authenticate = "always"` ([#&#8203;12316](astral-sh/uv#12316))-   Fail with specific error message when no password is present and `authenticate = "always"` ([#&#8203;12313](astral-sh/uv#12313))##### Bug fixes-   Add boolish value parser for `UV_MANAGED_PYTHON` flags ([#&#8203;12345](astral-sh/uv#12345))-   Make deserialization non-fatal when assessing source tree revisions ([#&#8203;12319](astral-sh/uv#12319))-   Use resolver-returned wheel over alternate cached wheel ([#&#8203;12301](astral-sh/uv#12301))##### Documentation-   Add experimental `--torch-backend` to the PyTorch guide ([#&#8203;12317](astral-sh/uv#12317))-   Fix `#keyring-provider` references in alternative index docs ([#&#8203;12315](astral-sh/uv#12315))-   Fix `--directory` path in examples ([#&#8203;12165](astral-sh/uv#12165))##### Preview changes-   Automatically infer the PyTorch index via `--torch-backend=auto` ([#&#8203;12070](astral-sh/uv#12070))### [`v0.6.8`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#068)[Compare Source](astral-sh/uv@0.6.7...0.6.8)##### Enhancements-   Add support for enabling all groups by default with `default-groups = "all"` ([#&#8203;12289](astral-sh/uv#12289))-   Add simpler `--managed-python` and `--no-managed-python` flags for toggling Python preferences ([#&#8203;12246](astral-sh/uv#12246))##### Performance-   Avoid allocations for default cache keys ([#&#8203;12063](astral-sh/uv#12063))##### Bug fixes-   Allow local version mismatches when validating lockfile ([#&#8203;12285](astral-sh/uv#12285))-   Allow owned string when deserializing `requires-python` ([#&#8203;12278](astral-sh/uv#12278))-   Make cache errors non-fatal in `Planner::build` ([#&#8203;12281](astral-sh/uv#12281))### [`v0.6.7`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#067)[Compare Source](astral-sh/uv@0.6.6...0.6.7)##### Python-   Add CPython 3.14.0a6-   Fix regression where extension modules would use wrong `CXX` compiler on Linux-   Enable FTS3 enhanced query syntax for SQLiteSee the [`python-build-standalone` release notes](https://github.com/astral-sh/python-build-standalone/releases/tag/20250317) for more details.##### Enhancements-   Add support for `-c` constraints in `uv add` ([#&#8203;12209](astral-sh/uv#12209))-   Add support for `--global` default version in `uv python pin` ([#&#8203;12115](astral-sh/uv#12115))-   Always reinstall local source trees passed to `uv pip install` ([#&#8203;12176](astral-sh/uv#12176))-   Render token claims on publish permission error ([#&#8203;12135](astral-sh/uv#12135))-   Add pip-compatible `--group` flag to `uv pip install` and `uv pip compile` ([#&#8203;11686](astral-sh/uv#11686))##### Preview features-   Avoid creating duplicate directory entries in built wheels ([#&#8203;12206](astral-sh/uv#12206))-   Allow overriding module names for editable builds ([#&#8203;12137](astral-sh/uv#12137))##### Performance-   Avoid replicating core-metadata field on `File` struct ([#&#8203;12159](astral-sh/uv#12159))##### Bug fixes-   Add `src` to default cache keys ([#&#8203;12062](astral-sh/uv#12062))-   Discard insufficient fork markers ([#&#8203;10682](astral-sh/uv#10682))-   Ensure `python pin --global` creates parent directories if missing ([#&#8203;12180](astral-sh/uv#12180))-   Fix GraalPy abi tag parsing and discovery ([#&#8203;12154](astral-sh/uv#12154))-   Remove extraneous script packages in `uv sync --script` ([#&#8203;12158](astral-sh/uv#12158))-   Remove redundant `activate.bat` output ([#&#8203;12160](astral-sh/uv#12160))-   Avoid subsequent index hint when no versions are available on the first index ([#&#8203;9332](astral-sh/uv#9332))-   Error on lockfiles with incoherent wheel versions ([#&#8203;12235](astral-sh/uv#12235))##### Rust API-   Update `BaseClientBuild` to accept custom proxies ([#&#8203;12232](astral-sh/uv#12232))##### Documentation-   Make testpypi index explicit in example snippet ([#&#8203;12148](astral-sh/uv#12148))-   Reverse and format the archived changelogs ([#&#8203;12099](astral-sh/uv#12099))-   Use consistent commas around i.e. and e.g. ([#&#8203;12157](astral-sh/uv#12157))-   Fix typos in MRE docs ([#&#8203;12198](astral-sh/uv#12198))-   Fix double space typo ([#&#8203;12171](astral-sh/uv#12171))### [`v0.6.6`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#066)[Compare Source](astral-sh/uv@0.6.5...0.6.6)##### Python-   Add support for dynamic musl Python distributions on x86-64 Linux ([#&#8203;12121](astral-sh/uv#12121))-   Allow the experimental JIT to be enabled at runtime on Python 3.13 and 3.14 on Linux-   Upgrade the build toolchain to LLVM 20, improving performanceSee the [`python-build-standalone` release notes](https://github.com/astral-sh/python-build-standalone/releases/tag/20250311) for more details.##### Enhancements-   Add `--marker` flag to `uv add` ([#&#8203;12012](astral-sh/uv#12012))-   Allow overriding module name for uv build backend ([#&#8203;11884](astral-sh/uv#11884))-   Sync latest Python releases ([#&#8203;12120](astral-sh/uv#12120))-   Use 'Upload' instead of 'Download' in publish reporter ([#&#8203;12029](astral-sh/uv#12029))-   Add `[index].authenticate` allowing authentication to be required on an index ([#&#8203;11896](astral-sh/uv#11896))-   Add support for Windows legacy scripts in `uv tool run` ([#&#8203;12079](astral-sh/uv#12079))-   Propagate conflicting dependency groups when using `include-group` ([#&#8203;12005](astral-sh/uv#12005))-   Show ambiguous requirements when `uv add` failed ([#&#8203;12106](astral-sh/uv#12106))##### Performance-   Cache workspace discovery ([#&#8203;12096](astral-sh/uv#12096))-   Insert dependencies into fork state prior to fetching metadata ([#&#8203;12057](astral-sh/uv#12057))-   Remove some allocations from `uv-auth` ([#&#8203;12077](astral-sh/uv#12077))##### Bug fixes-   Avoid considering `PATH` updated when the `export` is commented in the shellrc ([#&#8203;12043](astral-sh/uv#12043))-   Fix `uv publish` retry on network failures ([#&#8203;12041](astral-sh/uv#12041))-   Use a sized stream in `uv publish` to comply with WSGI PyPI server constraints ([#&#8203;12111](astral-sh/uv#12111))-   Fix `uv python install --reinstall` when the version was not previously installed ([#&#8203;12124](astral-sh/uv#12124))##### Preview features-   Fix `uv_build` invocation ([#&#8203;12058](astral-sh/uv#12058))##### Documentation-   Quote versions string in `python-versions.md` ([#&#8203;12112](astral-sh/uv#12112))-   Fix tool concept page headings ([#&#8203;12053](astral-sh/uv#12053))-   Update the `[index].authenticate` docs ([#&#8203;12102](astral-sh/uv#12102))-   Update versioning policy ([#&#8203;11666](astral-sh/uv#11666))</details>---### Configuration📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.🔕 **Ignore**: Close this MR and you won't be reminded about this update again.--- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box---This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xOTQuMCIsInVwZGF0ZWRJblZlciI6IjM5LjIwOS4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@geofftgeofftgeofft approved these changes

@konstinkonstinkonstin approved these changes

@zaniebzaniebAwaiting requested review from zanieb

Assignees
No one assigned
Labels
no-buildDisable building binaries in CI
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

6 participants
@charliermarsh@DEKHTIARJonathan@samypr100@geofft@zanieb@konstin

[8]ページ先頭

©2009-2025 Movatter.jp