- Notifications
You must be signed in to change notification settings - Fork26.3k
Tags: pytorch/pytorch
Tags
viable/strict/1766042748
Optimize Triton template heuristics (#170444)Summary:This diff contains three small optimizations:1) Directly cache the triton Config object import. Not a huge win, but measurably faster than relying on importlib's cache.2) Only copy configs when the new value is different from the old one. Configs are fairly large objects, so unneccesary dict copies get expensive.3) Replace `gcd(k, BLOCK_K) == BLOCK_K` with `(k % BLOCK_K) == 0`. This is equivalent when `BLOCK_K > 0`, which must be true.Test Plan:```tlp buck run mode/opt //scripts/paulzhan:repro```and then looking at perfetto.Differential Revision: D88415189Pull Requestresolved:#170444Approved by:https://github.com/PaulZhang12,https://github.com/eellison,https://github.com/shunting314
viable/strict/1766040973
Shorten the file names in libtorch_agnostic tests (#170664)To fix```ninja: error: Stat(C:/actions-runner/_work/pytorch/pytorch/test/cpp_extensions/libtorch_agnostic_2_10_extension/build/temp.win-amd64-cpython-310/Release/actions-runner/_work/pytorch/pytorch/test/cpp_extensions/libtorch_agnostic_2_10_extension/libtorch_agnostic_2_10/csrc/get_any_data_ptr.obj): Filename longer than 260 characters```in#170564Pull Requestresolved:#170664Approved by:https://github.com/mikaylagawarecki
viable/strict/1766033750
[17/N] Use Python 3.10 typing (#169735)This PR fixes typing of accelerator files.Pull Requestresolved:#169735Approved by:https://github.com/albanD
v2.10.0-rc2
[c10d] Add thread safety when calling ncclCommGetAsyncError (#170633)[c10d] Add thread safety when calling ncclCommGetAsyncError (#170424)Fixes#169484Pull Requestresolved:#170424Approved by:https://github.com/kwen2501(cherry picked from commit9d0d198)Co-authored-by: Rohit Singh Rathaur <rrathaur@redhat.com>
trunk/7031901e40749c8761d30d4f20bbe9ed3a9285c9
[BE][Inductor] Move bmm template into separate file (#170482)Summary:The inductor kernel files embed multiple Jinja templates inline, making them harder to read and maintain. This change switches bmm.py to using `load_kernel_template()`, placing each template in its own file and restoring proper Jinja syntax highlighting.To add a new template named, for example, new_mm, place the jinja code in _inductor/kernel/templates/new_mm.py.jinja, then just call load_template("new_mm").Test Plan: CIDifferential Revision: D89233930Pull Requestresolved:#170482Approved by:https://github.com/jananisriram
trunk/392330c7f29afad69b5935d7dd4d3e802f40f507
[audio hash update] update the pinned audio hash (#170727)This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).Update the pinned audio hash.Pull Requestresolved:#170727Approved by:https://github.com/pytorchbot
trunk/70971eabdcd2d92efc29a2d70aac85f2096b9042
[17/N] Use Python 3.10 typing (#169735)This PR fixes typing of accelerator files.Pull Requestresolved:#169735Approved by:https://github.com/albanD
trunk/863d0ebb5c3401f8d2f88e8946511784ba0b41ab
Optimize Triton template heuristics (#170444)Summary:This diff contains three small optimizations:1) Directly cache the triton Config object import. Not a huge win, but measurably faster than relying on importlib's cache.2) Only copy configs when the new value is different from the old one. Configs are fairly large objects, so unneccesary dict copies get expensive.3) Replace `gcd(k, BLOCK_K) == BLOCK_K` with `(k % BLOCK_K) == 0`. This is equivalent when `BLOCK_K > 0`, which must be true.Test Plan:```tlp buck run mode/opt //scripts/paulzhan:repro```and then looking at perfetto.Differential Revision: D88415189Pull Requestresolved:#170444Approved by:https://github.com/PaulZhang12,https://github.com/eellison,https://github.com/shunting314
trunk/614ff1a63ed8b4056ce9b9a9bafd2f15e8eb06a4
Skip failing tests on xpu with complex dtype on windows (#165049)Fixesintel/torch-xpu-ops#1195On xpu we use std:: implemetation of trig kernels. Issue comes from differences in implementation of trigonometry functions on complex dtypes in compiler headers. Windows compiler implementation is not conformant with ISO 9899. For example, following code```#include <cmath>#include <complex>#include <iostream>#include <limits>int main() { std::complex<float> x(std::numeric_limits<float>::infinity(), std::numeric_limits<float>::infinity()); std::cout << std::sinh(x) << std::endl;}```Compiled with g++:`(inf,-nan)`Compiled with msvc:`(inf,inf)`While ISO 9899 clearly says:> csinh(+∞ + i∞) returns ±∞ + iNaN (where the sign of the real part of the result is unspecified) and raises the ‘‘invalid’’ floating-point exceptionThese tests use numpy as reference and numpy is implemented according to ISO 9899, hence those tests fail on Windows.Same failures can be observed on cpu, and those tests are skipped there. I propose we do the same for xpu.(intel/torch-xpu-ops#1195 (comment))Pull Requestresolved:#165049Approved by:https://github.com/guangyey,https://github.com/EikanWang,https://github.com/albanD
trunk/57e5b3769c8d58f45e0a742f2f157c1a41f0a654
[CI] Swap TPUs from v6 to v7 (#170690)Fixes #ISSUE_NUMBERPull Requestresolved:#170690Approved by:https://github.com/seemethere
PreviousNext