NotificationsYou must be signed in to change notification settings
Fork26.3k
Star96k

Add type hint for cuda.set_rng_state#26200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Closed

yaroslavvb wants to merge2 commits intopytorch:masterfromyaroslavvb:patch-2

Closed

Add type hint for cuda.set_rng_state#26200

yaroslavvb wants to merge2 commits intopytorch:masterfromyaroslavvb:patch-2

Conversation

Copy link

Contributor

yaroslavvb commentedSep 13, 2019

Fixes#26199

Add type hint for cuda.set_rng_state

57032a9

Fixes#26199

pytorchbot added module: cuda

Related to torch.cuda, and CUDA support in general

module: typingRelated to mypy type annotations labels

Sep 13, 2019

Update __init__.pyi

3306597

Copy link

ContributorAuthor

yaroslavvb commentedSep 14, 2019

The 1 test failure looks like a flaky test

soumith approved these changes

Sep 15, 2019

View reviewed changes

facebook-github-bot reviewed

Sep 15, 2019

View reviewed changes

Copy link

Contributor

facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@soumith is landing this pull request. If you are a Facebook employee, you can view this diffon Phabricator.

facebook-github-bot closed this in7f3c423

Sep 15, 2019

yaroslavvb deleted the patch-2 branch

September 15, 2019 02:36

Copy link

Contributor

facebook-github-bot commentedSep 15, 2019

@soumith merged this pull request in7f3c423.

facebook-github-bot added the merged label

Sep 15, 2019

rohithkrn added a commit to ROCm/pytorch that referenced this pull request

Sep 21, 2019

Merge upstream master (#477)

02476d2

* C++ Average Pool Module (#25800)Summary:This PR adds Average Pool module to C++ front-end.Pull Request resolved: https://github.com/pytorch/pytorch/pull/25800Differential Revision: D17318094Pulled By: yf225fbshipit-source-id: c914c0e802bbe5f1d1f0a21a669c28bc956899db* Better error messages in C2 ONNX backend (#25809)Summary:Just a tiny fix to make debugging easier (output errors to stderr and include in the exception message)Pull Request resolved: https://github.com/pytorch/pytorch/pull/25809Reviewed By: zrpherculeDifferential Revision: D17329957Pulled By: houseroadfbshipit-source-id: 0d73dd9f62c735fbc5096e6a7c0e5f58e4cd90ae* Add new API for Fully Connected and Convolution Operators in QNNPACK (#25862)Summary:This change adds a new prepack and run function for FC and Convolution operators in QNNPACK.The new functions added are `PackBMatrix`, `qnnpackLinear`, `PrePackConvWeights` and `qnnpackConv`Pull Request resolved: https://github.com/pytorch/pytorch/pull/25862Test Plan:QNNPACK unit testsfully-connected-testconvolution-testDifferential Revision: D17299260Pulled By: supriyarfbshipit-source-id: fdc4e2d5f1232675acd153f3efb9d17ed8628a54* Enable more mGPU tests (#26055)Summary:Enable mGPU tests that pass on ROCm as of 2.7.Pull Request resolved: https://github.com/pytorch/pytorch/pull/26055Differential Revision: D17331484Pulled By: bddppqfbshipit-source-id: 51f956a84a6c14a1a41473d322950994fa29c25c* remove verbose in pytorch_ci hypothesis profile (#26075)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26075att, remove verbose argument to reduce noice in the logsTest Plan:ciImported from OSSDifferential Revision: D17335935fbshipit-source-id: 2e4289e838bf4489dcad8d5533353eebcff0d481* TorchScript Serialization for dynamic LSTM moduleSummary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25877Test Plan: Imported from OSSReviewed By: jianyuhDifferential Revision: D17275746Pulled By: jamesr66afbshipit-source-id: db2f38ddd99f02ccb4fb754fa1c1e6cad4425fa8* Upgrade the naming for fbgemm quantized op (#26064)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26064Just changing the names after https://github.com/pytorch/pytorch/pull/25678.ghstack-source-id: 89944542Test Plan: CIDifferential Revision: D17332068fbshipit-source-id: 5e9febed7a2fcd10d44273e55643b277d33a3ad7* Use BytesIO instead of tempfile (#25976)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25976As recommended in https://github.com/pytorch/pytorch/pull/25877/files#r322956051:> We should move more of these toward using BytesIO. Using files in tests is generally considered bad practice because it introduces syscalls and dependencies on the execution environment, and thus can cause test flakiness/instability.ghstack-source-id: 89929947Test Plan: CIDifferential Revision: D17310441fbshipit-source-id: ba97cce4224225df45ff44062f1bc8ebefb25922* Revert "TorchScript Serialization for dynamic LSTM module" (#26079)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26079This reverts commit e3039612d851d0fbd337546c8debc27ec7cfc4e4.Test Plan: Imported from OSSDifferential Revision: D17337585Pulled By: jamesr66afbshipit-source-id: 4b93a4c5ca2fe491d609da889a42d22be8e52889* Add Runtime flag for quantized backend. (#25680)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25680Add a runtime flag to choose between FBGEMM and QNNPACK when compiled with both.The flag can be set by using torch.backends.quantized.engine = torch.fbgemm/torch.qnnpack or ctx::setPreferredQuantizedEngine(at::QEngine)ghstack-source-id: 89935643Test Plan: Verified torch.backends.quantized.engine worksDifferential Revision: D17198233fbshipit-source-id: e5449d06f4136385e0e6d18bd4237f8654a61672* Dynamic registration of RPC backends (#25734)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25734[pytorch] Dynamic registration of RPC backendsAllow non-pg rpc backends to be plugged in as a backend.ghstack-source-id: 89938296Differential Revision: D17183789fbshipit-source-id: 885fed12d80b82b60f9a125f78302a161e708089* Make regular softmax warp size aware (#25956)Summary:Enable one unit test that passes now.Pull Request resolved: https://github.com/pytorch/pytorch/pull/25956Differential Revision: D17298150Pulled By: bddppqfbshipit-source-id: 8763e71ad7ef80be915fe93a3471b29f27f3f0a4* Move NamedTensorMetaInterface definitions to TensorImpl.h (#26030)Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26030Test Plan:- [namedtensor ci]Pull Request resolved: https://github.com/pytorch/pytorch/pull/26030Differential Revision: D17322383Pulled By: zou3519fbshipit-source-id: d5b914d646b48a6f4e0104aceb435e694b72bd96* Experimental warning for named tensors (#26050)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26050Throws a warning once when someone attempts to attach names to a tensor.This is guaranteed to happen at the callsite `set_named_tensor_meta`.Test Plan: - run tests [namedtensor ci]Differential Revision: D17331634Pulled By: zou3519fbshipit-source-id: 44f5e5c95acd9c7ba543c1210a3b1314aab348f0* print source code when a function is executed (#25868)Summary:While this isn't ideal as it might print out the same source every time a function is run; it's still easier to go and tweak python code to reduce loop counts, than to insert `std::cout` and recompile cpp code.Pull Request resolved: https://github.com/pytorch/pytorch/pull/25868Differential Revision: D17318386Pulled By: Krovatkinfbshipit-source-id: 928ba6543204042924ab41a724635594709630de* Disable test_cuda.test_stream_event_nogil on ROCm (#26087)Summary:Was recently enabled in https://github.com/pytorch/pytorch/pull/26055, it's flaky on master:https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/37575https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/37577```05:39:35 test_stream_event_nogil (__main__.TestCuda) ... Exception in thread Thread-3:05:39:40 Traceback (most recent call last):05:39:40   File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner05:39:40     self.run()05:39:40   File "/usr/lib/python2.7/threading.py", line 754, in run05:39:40     self.__target(*self.__args, **self.__kwargs)05:39:40   File "test_cuda.py", line 1894, in _test_stream_event_nogil05:39:40     c2p.put(sync_func(self, TestCuda.FIFTY_MIL_CYCLES))05:39:40   File "test_cuda.py", line 1882, in _event_wait05:39:40     self.assertTrue(s1.query())05:39:40   File "/usr/lib/python2.7/unittest/case.py", line 422, in assertTrue05:39:40     raise self.failureException(msg)05:39:40 AssertionError: False is not true```Pull Request resolved: https://github.com/pytorch/pytorch/pull/26087Differential Revision: D17340891Pulled By: bddppqfbshipit-source-id: b2b70beb1b068db53197a5f9f6a80cb046e66ebd* TorchScript Serialization for dynamic LSTMSummary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26084Test Plan: Imported from OSSDifferential Revision: D17339315Pulled By: jamesr66afbshipit-source-id: 03a2674edcf779becfe3b8ec96f1bae23c74b11c* Automatic update of fbcode/onnx to 7988d8360b11e6003560076e9b1d4aa426db3244 (#25959)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25959Previous import was 28ca699b69b5a31892619defca2391044a9a6052Included changes:- **[7988d836](https://github.com/onnx/onnx/commit/7988d836)**: Supporting negative axes for all existing onnx ops (#2281) <Negin Raoof>- **[5ca0a09e](https://github.com/onnx/onnx/commit/5ca0a09e)**: Update managingexperimentalops.md (#1981) <Joseph Spisak>- **[bc0495c1](https://github.com/onnx/onnx/commit/bc0495c1)**: Fix link to community docs in readme (#2261) <Prasanth Pulavarthi>- **[2fdb3ef6](https://github.com/onnx/onnx/commit/2fdb3ef6)**: move map and sequence types to onnx domain, (#2244) <Ke Zhang>- **[568b65aa](https://github.com/onnx/onnx/commit/568b65aa)**: Improve compatiblity with proto3 and enable reading attributes (#2288) <Dmitri Smirnov>- **[1f350f2c](https://github.com/onnx/onnx/commit/1f350f2c)**: Remove type info for loop variadic input in Loop op used to compose the Range op (#2287) <Hariharan Seshadri>- **[eb139446](https://github.com/onnx/onnx/commit/eb139446)**: Add Foundation WG to working-groups.md (#2276) <Ryan Loney>- **[4eabc4b3](https://github.com/onnx/onnx/commit/4eabc4b3)**: Fix testdata model for CumSum. Add exclusive attribute. (#2271) <jignparm>- **[1a62afdb](https://github.com/onnx/onnx/commit/1a62afdb)**: Support GatherND operator in ONNX (#2106) <Hariharan Seshadri>- **[0e330e9d](https://github.com/onnx/onnx/commit/0e330e9d)**: Support ScatterND operator in ONNX (#2220) <Bowen Bao>- **[733f7a6a](https://github.com/onnx/onnx/commit/733f7a6a)**: Add Det to ONNX (#2233) <Bowen Bao>- **[52187738](https://github.com/onnx/onnx/commit/52187738)**: Update the description of nearest_mode of resize op (#2257) <daquexian>- **[64b4b686](https://github.com/onnx/onnx/commit/64b4b686)**: Adding sparse tensor to ONNX (#2019) <G. Ramalingam>- **[c8a8b7cc](https://github.com/onnx/onnx/commit/c8a8b7cc)**: Support Range operator in ONNX (#2242) <Hariharan Seshadri>- **[44b0d6d5](https://github.com/onnx/onnx/commit/44b0d6d5)**: Update resize op (#2057) <daquexian>- **[7d907964](https://github.com/onnx/onnx/commit/7d907964)**: Add function to fuse dynamic quantization graph into 1 node (#2187) <Ashwini Khade>- **[36f8e6d9](https://github.com/onnx/onnx/commit/36f8e6d9)**: Update logo_request.md (#2231) <Prasanth Pulavarthi>- **[4eb737c8](https://github.com/onnx/onnx/commit/4eb737c8)**: Update Clip in opset 11 to support min/max as inputs instead of attributes (#2096) <Bowen Bao>- **[a25e1388](https://github.com/onnx/onnx/commit/a25e1388)**: Fix segfault in tile shape inference (#2221) <daquexian>- **[2dc273c7](https://github.com/onnx/onnx/commit/2dc273c7)**: update onehot shape inference to reflect the spec for depth input (#2224) <Ashwini Khade>- **[665211c1](https://github.com/onnx/onnx/commit/665211c1)**: Add GatherElements Op and Rename ScatterElements (#2143) <Lara Haidar>- **[3ba2e31a](https://github.com/onnx/onnx/commit/3ba2e31a)**: Unique (#2141) <liqunfu>- **[5a5588ad](https://github.com/onnx/onnx/commit/5a5588ad)**: Clarify dimension variable scoping (#2211) <G. Ramalingam>- **[fabe39d5](https://github.com/onnx/onnx/commit/fabe39d5)**: Liqun/topk sort (#2126) <liqunfu>- **[453aa644](https://github.com/onnx/onnx/commit/453aa644)**: Update document for NMS (#2193) <Hector Li>- **[34e28ec2](https://github.com/onnx/onnx/commit/34e28ec2)**: Handle negative 'axis' value in Split type and shape inferencing (#2177) <Scott McKay>- **[28ec4583](https://github.com/onnx/onnx/commit/28ec4583)**: depth to space shuffle order (#2163) <Negin Raoof>- **[98f72629](https://github.com/onnx/onnx/commit/98f72629)**: minor updates to fix links in readme (#2189) <Prasanth Pulavarthi>- **[321d1467](https://github.com/onnx/onnx/commit/321d1467)**: Add check to disallow squeezing input axes which are not 1 (#2204) <Ashwini Khade>- **[573f0dc9](https://github.com/onnx/onnx/commit/573f0dc9)**: fix a bug in fun shape inference (#2188) <Tang, Cheng>- **[36dc7110](https://github.com/onnx/onnx/commit/36dc7110)**: Clarify ambiguity in gather spec regarding indices expectation (#2202) <Ashwini Khade>- **[a2449673](https://github.com/onnx/onnx/commit/a2449673)**: Fix some minor issues in IR.md and Versioning.md (#2108) <edgchen1>- **[349aff69](https://github.com/onnx/onnx/commit/349aff69)**: Skip install typing package for python >=3.5 (#2199) <bddppq>Test Plan: ciReviewed By: bddppq, benoitsteinerDifferential Revision: D17296390fbshipit-source-id: 9f9f5ce85d9694128008d756c2ea393bd4e0cb71* Skip test_triangular_solve_batched (#26108)Summary:cc: gchanan zou3519I will look into why this is failing spuriously.Pull Request resolved: https://github.com/pytorch/pytorch/pull/26108Differential Revision: D17348399Pulled By: zou3519fbshipit-source-id: aed4ccfc3f106692d4e32acc029740309570b0c3* Exposing Fused8BitRowwiseQuantizedToFloat in PyTorch (#26080)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26080Will be used in c2 ctr_mbl_feed model to PyTorch conversionTest Plan: Unit testReviewed By: yinghaiDifferential Revision: D17337604fbshipit-source-id: a90d9f5dc38301608d1562c6f2418e7f4616e753* make sure all out stringstreams start out empty in jit_log.hppSummary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25863Differential Revision: D17347386Pulled By: Krovatkinfbshipit-source-id: a42cf56680a27bc3e50fd945ab372a409225b875* tracing with an opt-in by file name (#25895)Summary:This basically works a simple filter as you suggested ZolotukhinM`export PYTORCH_JIT_LOG_LEVEL=guard_elimination` will print all `GRAPH_DUMP` and `GRAPH_UPDATE` statements.`export PYTORCH_JIT_LOG_LEVEL=>guard_elimination:>alias_analysis` will print all `GRAPH_DUMP`, `GRAPH_UPDATE` **and** `GRAPH_DEBUG` statements in `guard_elimination.cpp` **and** in `alias_analysis.cpp`Pull Request resolved: https://github.com/pytorch/pytorch/pull/25895Differential Revision: D17309090Pulled By: Krovatkinfbshipit-source-id: 8fa9e67cc9af566b084d66cc15223633fda08444* Stop re-ordering TH(C)Blas arguments. (#25606)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25606This just complicates the codegen for no benefit.Test Plan: Imported from OSSDifferential Revision: D17172498Pulled By: gchananfbshipit-source-id: d2f50e45400ac0336792422518e03dbae3a1bedc* Kill TH(C)Blas kwarg_only declarations. (#25607)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25607Since we don't generate these as end-user bindings, and we no longer reorder based on this property, we can just get rid of the property.Test Plan: Imported from OSSDifferential Revision: D17172500Pulled By: gchananfbshipit-source-id: f84fd8bb2b13598501897f56871b21339585d844* simplify build_android_gradle.sh (#25897)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25897It doesn't hurt to set all variables unconditionally.And we can create link to lib directory instead of specific files - thisway it's easier to switch between dynamic/static library names.Test Plan:- check android gradle CI;- use stack diff to check all 4 architectures on PR;Pull Request resolved: https://github.com/pytorch/pytorch/pull/25897Differential Revision: D17307240Pulled By: ljk53fbshipit-source-id: c975085ddda852ef7da1c29935c2f6a28d797e5a* change gradle build to use static libtorch + gc-sections (#25984)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25984Link static libtorch libraries into pytorch.so (API library for android)with "-Wl,--gc-sections" flag to remove unused symbols in libtorch.Test Plan:- full gradle CI with stacked PR;- will check final artifacts.tgz size change;Differential Revision: D17312859Pulled By: ljk53fbshipit-source-id: 99584d15922867a7b3c3d661ba238a6f99f43db5* remove "build_deps" arg from setup.py command in (#26113)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26113After https://github.com/pytorch/pytorch/pull/16914, passing in anargument such as "build_deps" (i.e. python setup.py build_deps develop) isinvalid since it gets picked up as an invalid argument.ghstack-source-id: 90003508Test Plan:Before, this script would execute "python setup.py build_depsdevelop", which errored. Now it executes "python setup.py develop" without anerror. Verified by successfully running the script on devgpu. In setup.py,there is already a `RUN_BUILD_DEPS = True` flag.Differential Revision: D17350359fbshipit-source-id: 91278c3e9d9f7c7ed8dea62380f18ba5887ab081* Stop reordering TH random function arguments.Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25608Test Plan: Imported from OSSDifferential Revision: D17172494Pulled By: gchananfbshipit-source-id: 5a46889cc040297231e2473ae5b2879b39f8d60a* fix base_lr overridden in cyclic lr (#26105)Summary:base_lr parameter was being overridden by super `__init__`, see https://github.com/pytorch/pytorch/issues/21965.Pull Request resolved: https://github.com/pytorch/pytorch/pull/26105Reviewed By: yf225Differential Revision: D17346724Pulled By: vincentqbfbshipit-source-id: 4b146bd64f4f385c0a9c4f4df8eb8991312fb15c* Skip inserting duplicate observers (#25504)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25504Skip inserting duplicate observers for values observedin forward method of a child module or other methods inthe current module.Test Plan:python test/test_jit.py -- 'TestJit.insert_observers'python test/test_jit.py -- 'TestJit.insert_observers_child_qconfig'python test/test_jit.py -- 'TestJit.insert_observers_skip_values'Imported from OSSDifferential Revision: D17208888fbshipit-source-id: e04f1c22ab1c4f410933a17a3ef31acf5f217323* Implementation of ConstantThenLinearWarmupLRPolicy and CompositeCyclicalLRPolicy (#25970)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25970ConstantThenLinearWarmupLRPolicy:* first use a constant warm up* then ramp up to the fixed learning rate linearlyCompositeCyclicalLRPolicy:* first use a constant warm up* then ramp up to the fixed learning rate linearly* then use cyclical learning rates for the rest of timePull Request resolved: https://our.intern.facebook.com/intern/opensource/shipit/preview/D17302632/Test Plan:* buck test * https://our.intern.facebook.com/intern/testinfra/testconsole/testrun/5910974518377039/ * https://our.intern.facebook.com/intern/testinfra/testrun/1407375027118303* checked the consistency of learning rates w.r.t. iterations with offline simulations n143987Reviewed By: swatirallapalliDifferential Revision: D17302632fbshipit-source-id: 1098d4dd9109a48932b76e36d78239e49f8077a1* Fix build warning in vec256_qint.hSummary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26121Test Plan: Imported from OSSDifferential Revision: D17351960Pulled By: jamesr66afbshipit-source-id: 12389729fe5fb8d863cf47288920ea375a3e74ab* Kill kwarg_only declarations in Declarations.cwrap. (#25609)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25609They don't do anything anymore.Test Plan: Imported from OSSDifferential Revision: D17172497Pulled By: gchananfbshipit-source-id: 5cf7fdcf7d2da0054ac1bd7d8d2b70a2264b8c93* Support quantizing any methods called (#25505)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25505Support for quantizing all the methods called by forward method, includingchild module methods and other methods in the current moduleIt relies on module level constant prop, we need to figure out a way to do constant propfor these methods as well. We can either do constant prop in the module level or do constantprop in the quantization function, but this will need some discussion.Test Plan:python test/test_jit.py 'TestJit.insert_quant_dequant'python test/test_quantizer.pyImported from OSSDifferential Revision: D17208887fbshipit-source-id: 21749457b21b00a6edada290c26324e2fb210b10* C++ unregister_module function for Module (#26088)Summary:This PR adds ```unregister_module``` to ```nn::Module``` and ```erase``` function to ```OrderedDict```.Pull Request resolved: https://github.com/pytorch/pytorch/pull/26088Differential Revision: D17360058Pulled By: yf225fbshipit-source-id: f1f375b4751317da85b8da1458e092fe2405ceec* Port fuse_linear from pytorch/tvm (#25623)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25623Port over fuse_linear pass from pytorch/tvm project, we'll need thisin backend specific quantization pass to match aten::linear and swapit with quantized linearTest Plan:python test/test_jit.py 'TestJit.test_fuse_linear'Imported from OSSDifferential Revision: D17208890fbshipit-source-id: f4ff3889ae4525797d3b986f46ae37e50ea49116* Add device check before accessing data_ptr in PackLayer (#26056)Summary:fixes https://github.com/pytorch/xla/issues/927Pull Request resolved: https://github.com/pytorch/pytorch/pull/26056Differential Revision: D17331859Pulled By: ailzhangfbshipit-source-id: bdc334f03c8dcbb4ef4f5e059a63ef188a0b8b61* Create TensorBoard test classes in all cases (#26005)Summary:To give better signal to the user, we will now always create the TensorBoard tests classes and just  disable tests if TensorBoard is not installed.cc lanpa sanekmelnikov natalialunova pietern[test macos]Pull Request resolved: https://github.com/pytorch/pytorch/pull/26005Reviewed By: sanekmelnikovDifferential Revision: D17352430Pulled By: orionrfbshipit-source-id: 87a592064f4768ffded76a3d666a8e508a1ef164* Automatic update of fbcode/onnx to 95252c2adec185e305e34486c6756ece9aa8f57f (#26137)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26137Previous import was 7988d8360b11e6003560076e9b1d4aa426db3244Included changes:- **[95252c2a](https://github.com/onnx/onnx/commit/95252c2a)**: Fix shapeinference function (#2296) <jignparm>- **[414285bb](https://github.com/onnx/onnx/commit/414285bb)**: fix the buffer overflow problem in shape inference logic of Squeeze op <Lu Fang>- **[797cdd0f](https://github.com/onnx/onnx/commit/797cdd0f)**: Support for negative indices in 'Gather', 'GatherElements', 'ScatterElements', 'OneHot' (#2260) <Negin Raoof>- **[7636978d](https://github.com/onnx/onnx/commit/7636978d)**: Fix collect_snippets warnings (#2277) <Lutz Roeder>- **[fa70c33b](https://github.com/onnx/onnx/commit/fa70c33b)**: Update printable_graph in helper.py to output details of initializers that do not have matching graph inputs. (#2135) <Scott McKay>- **[428d09b0](https://github.com/onnx/onnx/commit/428d09b0)**: test int64 input type for 'where' op (#2253) <Negin Raoof>Test Plan: ciReviewed By: bddppqDifferential Revision: D17353795fbshipit-source-id: 6d4f39754863a30f427f4512c7b228e45d3ce84f* Add fusion for quantized linear (#25624)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25624First fuse the splitted op into aten::linear and then fuse`dequant - aten::linear - quant` into quantized linear opTest Plan:python test/test_jit.py 'TestJit.quant_fusion'Imported from OSSDifferential Revision: D17208891fbshipit-source-id: 864b19fabab2e8e6f8f8ad35eb3dbbf2d5fdb8c4* Implement tensor.refine_names (#25842)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25842`tensor.refine_names(*names)` takes `tensor` and attempts to name itsdimensions `names` out-of-place. If a dimension `i` already had a name,then it cannot be changed (so tensor.names[i] must equal names[i]);if the original dimension did not have a name, then the new name(names[i]) can be anything.`tensor.refine_names(*names)` also accepts a glob '*' that greedily selectsnames from `tensor`. Here are some examples:- `Tensor[None].refine_names('N') -> Tensor[N]`- `Tensor[N].refine_names('N') -> Tensor[N]`- `Tensor[N].refine_names('D') -> Error!`- `Tensor[N].refine_names(None) -> Error!`- `Tensor[None, None].refine_names('*', D) -> Tensor[None, D]`Test Plan: - new tests [namedtensor ci]Differential Revision: D17255548Pulled By: zou3519fbshipit-source-id: fdbdb3a12f24fbe37ce1e53ed09dc8a42589d928* Implement tensor.align_as(other), change tensor.align_to(names) (#25843)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25843`tensor.align_to(*names)` permutes the dimensions of `tensor` and addsadditional 1-sized dimensions such that the output tensor has dimensionsin the same order as `names`. All dimensions of `tensor` must bepresent in `names`, in addition, this function requires that all dims of`tensor` be named.`tensor.align_as(other)` is equivalent to`tensor.align_to(*other.names)`.I'm planning on changing `torch.align_tensors(*tensors)` to align closerto these semantics because there didn't seem to be a clear use case for the oldsemantics that preserve unnamed dimensions. That will come in a futurechange.Test Plan: - new tests [namedtensor ci]Differential Revision: D17255549Pulled By: zou3519fbshipit-source-id: 1e437ad81e9359b4d5bd0e7e64c3a1be441fc3e3* C++ API parity: at::Tensor::dataSummary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26008Test Plan: Imported from OSSDifferential Revision: D17343488Pulled By: pbelevichfbshipit-source-id: b9ba5e26cad621a428a14292446d7fb5a6e5535d* Fix bug with named tensors and (no) tracer support (#26106)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26106Previously, in the named tensors build, an operator is marked asnon-traceable if ANY of its overloads are named tensor overloads. Thisbreaks the tracer for things like torch.full (has a names= overload fornamed tensor) and tensor.sum (has a Dimname overload for named tensor).This PR fixes the problem by putting the "no tracer support" logic intothe location where the tracer attempts to construct a graph by adding aDimname/DimnameList argument to a node.Test Plan:- new test in test_jit.py to check if torch.full is traceable- new test in test_namedtensor.py to check what happens when someonetries to trace a function that uses named tensor APIs.- [namedtensor ci]Differential Revision: D17353452Pulled By: zou3519fbshipit-source-id: b0b843c8357ffe54baee6e8df86db914f0b1ece4* Add data field to Tensor pyi. (#26093)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26093Signed-off-by: Edward Z. Yang <ezyang@fb.com>Test Plan: Imported from OSSReviewed By: vsilesDifferential Revision: D17366320Pulled By: ezyangfbshipit-source-id: 025f1c3d75d294fc1b51ddc540e542a05dc72b6a* Change schedulers to chainable form (#24352)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/24352Enable chainable schedulers as requested in #13022 by implementing the changes mentioned below from [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513370208).* Changing the behavior of schedulers to the chainable formula when available* Using the closed form whenever epoch is different from None until the next release with a deprecation warning* Making `get_computed_values` the supported way of obtaining the last computed learning rate by the scheduler (see [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513940729) for new syntax)* Returning a deprecation warning when invoking the undocumented get_lr function (see [comment](https://github.com/pytorch/pytorch/pull/21800#discussion_r294305485)) referring to `get_computed_values`, and deprecating it in the next release.* `CosineAnnealingWarmRestart` still takes an epoch parameter as it is the only one with a mechanic relying on fractional epoch* `MultiplicativeLR` is consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax.# #20527### BeforeThe user calls scheduler with a constant epoch either across loops or in the same loop.```import torch.optim as optimfrom torch import nnconv = nn.Conv2d(3,3,3)optimizer = optim.Adam(conv.parameters())lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)# Scheduler with sometimes-constant epoch numberfor epoch in [0, 0, 1, 1, 2, 2, 3, 3]:  lr_scheduler.step(epoch)  print(optimizer.param_groups[0]['lr'])```### AfterIf the user wants to step```import torch.optim as optimfrom torch import nnconv = nn.Conv2d(3,3,3)optimizer = optim.Adam(conv.parameters())lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)last_epoch = -1for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:  # Check if epoch number has changed manually  if epoch-last_epoch > 0:    lr_scheduler.step()  last_epoch = epoch  print(epoch, scheduler.get_computed_values())```# #22107### Before```import torchfrom torchvision.models import resnet18net = resnet18()optimizer = torch.optim.SGD(net.parameters(), 0.1)scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)for i in range(10):  # Scheduler computes and returns new learning rate, leading to unexpected behavior  print(i, scheduler.get_lr())  scheduler.step()```### After```import torchfrom torchvision.models import resnet18net = resnet18()optimizer = torch.optim.SGD(net.parameters(), 0.1)lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)for i in range(10):    # Returns last computed learning rate by scheduler    print(i, lr_scheduler.get_computed_values())    lr_scheduler.step()```Test Plan: Imported from OSSDifferential Revision: D17349760Pulled By: vincentqbfbshipit-source-id: 0a6ac01e2a6b45000bc6f9df732033dd81f0d89f* Run PyTorch macOS CPU-only build/test on all PRsSummary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26096Test Plan: Imported from OSSDifferential Revision: D17366419Pulled By: pieternfbshipit-source-id: 138659dae346aad3cde52d488cd1780614e7692f* Use CircleCI commands for brew update/install (#26159)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26159The snippets for working with Homebrew were duplicated across binarybuilds, macOS builds, and iOS builds. In #25336, the CircleCIconfiguration version was updated to version 2.1, which supportsparameterized commands. This means we no longer have to use YAMLtricks to duplicate stanzas and instead can natively define a seriesof reusable steps.Motivation for doing this is that the macOS binary builds were stillusing the slow `brew update` instead of `git fetch` (see #25988).[test macos][test wheel]Test Plan: Imported from OSSDifferential Revision: D17366538Pulled By: pieternfbshipit-source-id: 194c0f37c1dc999705f3ba97fdabf4ff18728d93* Turn should_run_job into commandSummary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26160Test Plan: Imported from OSSDifferential Revision: D17366539Pulled By: pieternfbshipit-source-id: a870d6da21925764986c6c748ad291440b78e6fd* Turn setup_linux_system_environment into commandSummary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26162Test Plan: Imported from OSSDifferential Revision: D17366537Pulled By: pieternfbshipit-source-id: 98413daa344812f06578c3373d8516292d2f21f5* Turn setup_ci_environment into commandSummary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26163Test Plan: Imported from OSSDifferential Revision: D17366536Pulled By: pieternfbshipit-source-id: 07181a77aaeba5457aa716ceac9cc404aacefe5f* Kill most defaults in Declarations.cwrap. (#25610)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25610They don't do anything anymore, since this isn't the end-user interface.Test Plan: Imported from OSSDifferential Revision: D17172495Pulled By: gchananfbshipit-source-id: a380d970f0836ed85eb9ac2aa42eb73655d775aa* Get rid of more defaults in Declarations.cwrap.Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25611Test Plan: Imported from OSSDifferential Revision: D17172493Pulled By: gchananfbshipit-source-id: 0f4319f8024ac4eca62576231214227b341f56c4* Kill remaining defaults in Declarations.cwrap.Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25612Test Plan: Imported from OSSDifferential Revision: D17172499Pulled By: gchananfbshipit-source-id: f99e813a4a90e8576541da317027e6f8ae76079b* Remove requests as dependency (#26083)Summary:local build is slow... test in CI...Pull Request resolved: https://github.com/pytorch/pytorch/pull/26083Differential Revision: D17346949Pulled By: ailzhangfbshipit-source-id: f552d1a4be55ad4e2bd915af7c5a2c1b6667c446* Fix 'in' return true incorrectly (#24156)Summary:Because of 'return NotImplemented', __contains__ return True when the element is not a number.bool(NotImplemented) == TruePull Request resolved: https://github.com/pytorch/pytorch/pull/24156Differential Revision: D16829895Pulled By: zou3519fbshipit-source-id: 9d3d58025b2b78b33a26fdfcfa6029d0d049f11f* guard dyndep with a lock (#26153)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26153I am suspecting that our multithreaded test-system causes issue with dyndep, if two places try to concurrently InitOpsLibrary. So perhaps we just guard this by a lock. This is just a guess-fix, as it is impossible to repro.Test Plan: sandcastleReviewed By: bddppqDifferential Revision: D17361310fbshipit-source-id: 596634a2098b18881abbd26a5a727a5ba0d03b6e* Add documentation to loggingSummary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26175Differential Revision: D17371085Pulled By: Krovatkinfbshipit-source-id: ea06f4e16fc320940a299e8e1d4f4d7c76f5950a* Fold quantize op into module (#25625)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25625We want to fold the quantize op for weights/bias into module to avoid quantizing weights on the fly.Test Plan:python test/test_jit.pyImported from OSSDifferential Revision: D17208889fbshipit-source-id: 1854b8953b065855d210bc1166533c08ca264354* Revert D17349760: Change schedulers to chainable formTest Plan: revert-hammerDifferential Revision:D17349760Original commit changeset: 0a6ac01e2a6bfbshipit-source-id: 41c2c136215dabc26cad5098a08eff2a2a29b715* Use torch::from_blob instead of shareExternalPointer, nits (#25973)Summary:The main part is to switch at::Tensor creation from usage of `torch::empty(torch::IntArrayRef(...))->ShareExternalPointer(...) to torch::from_blob(...)`Removed explicit set of `device CPU` as `at::TensorOptions` by default `device CPU`And renaming of local variables removing `input` prefix to make them shorterPull Request resolved: https://github.com/pytorch/pytorch/pull/25973Differential Revision: D17356837Pulled By: IvanKobzarevfbshipit-source-id: 679e099b8aebd787dbf8ed422dae07a81243e18f* Make schema part of RegisterOperators::Options (#26114)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26114With this diff, the operator schema or name can be specified as part of the options objects:```static auto registry = torch::RegisterOperators()  .op(torch::RegisterOperators::options().schema("my_op").kernel(&kernel))  .op(...);```This does not break backwards compatibility, all old APIs are kept as shorthands.This (a) makes the API more consistent, accumulating all options into the options objects and not treating schema special anymore, and (b) this is required for allowing the c10 dispatcher to forward registration calls to ATenDispatch for ops that are still on that dispatcher, see plan in https://github.com/pytorch/pytorch/issues/24132ghstack-source-id: 90049402Test Plan: unit testsDifferential Revision: D17350383fbshipit-source-id: cbb8f33a52dccb2a4522753e7b5ac8ba35b908fd* Allow overwriting catch-all kernels (#25947)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25947Previously, the c10 dispatcher didn't allow having a catch-all kernel and backend specific kernels at the same time.This is also the long term goal. But to make the current XLA implementation work, we need to allow them to overwrite these ops with XLA variants.This diff changes that so that ops can have both, catchall and backend specific kernels, and will call into the catchall kernel if there is no more specific kernel registered.This is also the current behavior of globalATenDispatch.ghstack-source-id: 90049398Test Plan: unit testsDifferential Revision: D17293036fbshipit-source-id: f2d5928e904c1dc9b6b89e9bb468debe48a4056c* Register ATen ops with c10 (#26131)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26131Changes in this PR:- For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files.- This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things.- Enable the use_c10_dispatcher: True flag for about ~70% of operators- This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash- For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops.Reasons for some ops to be excluded (i.e. not have the `use_c10_dispatcher` flag set to true):- `Tensor?(a!)` (i.e. optional tensor with annotations) not supported in c++ function schema parser yet- `-> void` in native_functions.yaml vs `-> ()` expected by function schema parser- out functions have different argument order in C++ as in the jit schema- `Tensor?` (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None.- fixed-size arrays like `int[3]` not supported in c10 yetThese will be fixed in separate diffs and then the exclusion tag will be removed.ghstack-source-id: 90060748Test Plan: a diff stacked on top uses these registrations to call these ops from ATenDifferential Revision: D16603131fbshipit-source-id: 315eb83d0b567eb0cd49973060b44ee1d6d64bfb* Updating submodulesSummary:GitHub commits:https://github.com/facebook/rocksdb/commit/83a6a614e9bf5f3f06abc265b736e868acee498bhttps://github.com/pytorch/fbgemm/commit/c8cac64995d8d8af871e461affbf505ac7fce4d8Test Plan: n/aReviewed By: 2d2d2d2d2dfbshipit-source-id: 1f5bc1e065fe13d89eeb42539f21a8ab0ab8b8a1* Nightly build for for iOS (#26074)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26074### SummaryThis PR creates a nightly job for iOS builds. The job will generate a couple of static libraries that contains three architectures(x86, arm64, armv7s) and upload them to AWS s3.### NoteThe test phase in this job is missing right now, meaning if there is a linking error, we won't be able to know it. To add the test jobs, we have to put a dummy test App in the repo and manually link the libraries to the app after the build finishes. This will be done in the next following PRsTest Plan: Imported from OSSDifferential Revision: D17363066Pulled By: xta0fbshipit-source-id: 5beeb4263af5722f0a852297023f37aaea9ba4b1* Change the source link in podspec (#26089)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26089### SummaryA couple of changes1. Replace the source link with the newly nightly build address2. Remove module support for Swift and Objective-C3. Expose all static libraries instead of archiving them into one single library. This is because those static libraries might contain object files that have the same name, e.g. `init.c.o` in both `libcupinfo.a` and `libqnnpack.a`. If we archive them into one using this `libtool -static` command, by default, it only picks one object file and discards the others, which could result in undefined symbols when linking the executable. The change here is to expose all the static libraries and let the linker decide which one to use.### Test Plan- pod spec lint succeed - `pod spec lint --verbose --allow-warnings --no-clean --use-libraries --skip-import-validation`Test Plan: Imported from OSSDifferential Revision: D17363037Pulled By: xta0fbshipit-source-id: ba77b0001b58e6e2353d8379d932db598166d37d* Updating submodulesSummary:GitHub commits:https://github.com/facebook/rocksdb/commit/97631357aa274d06a7ab09b3cde7b909262cc4ddhttps://github.com/pytorch/fbgemm/commit/2f1477dfee9465c1e2dbdf21722970b3fa1baf86Test Plan: n/aReviewed By: 2d2d2d2d2dfbshipit-source-id: 33029d2e8c6a3664a35823829670f6ed9dfc3b44* Tensor renaming to dtype, shape; support long, double (#26183)Summary:Applying dzhulgakov  review commentsorg.pytorch.Tensor:  - dims renamed to shape  - typeCode to dtype  - numElements to numelnewFloatTensor, newIntTensor... to newTensor(...)Add support of dtype=long, doubleResorted in code byte,int,float,long,doubleFor if conditions order float,int,byte,long,double as I expect that float and int branches will be used more oftenTensor.toString() does not have data, only numel (data buffer capacity)Pull Request resolved: https://github.com/pytorch/pytorch/pull/26183Differential Revision: D17374332Pulled By: IvanKobzarevfbshipit-source-id: ee93977d9c43c400b6c054b6286080321ccb81bc* use whitelist for selecting observed values (#25974)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25974Previously we observe all the Tensor values, but what we want is actuallyobserving only the ones that can be quantized.Test Plan:python test/test_jit.pypython test/test_quantizer.pyImported from OSSDifferential Revision: D17348986fbshipit-source-id: 55be0d73862a0e7eb1e7fd882d16e0d830618b63* fix circle CISummary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26225Test Plan: Imported from OSSDifferential Revision: D17379899Pulled By: xta0fbshipit-source-id: 4077aa0149b23560f3a9e29531ca9bc612a2c09c* Add histogram observer (#23959)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/23959Add histogram observer that records the running histogram of tensor values along with min/max values.ghstack-source-id: 90076996Test Plan:Added a test test_histogram_observerbuck test mode/dev caffe2/test:quantization -- 'test_histogram_observer'buck test mode/dev caffe2/test:quantization -- 'test_observer_scriptable'Differential Revision: D16692835fbshipit-source-id: 0f047d3349cb9770fad4a2b6cb346c51d9e99cd4* Add isBackwardCompatibleWith for Argument and FunctionSchema (#23409)Summary:we intend to be conservative, and will relax the checks in future if necessary.So far, we consider the following three conditions as backward compatible:   1) two schemas are equal   2) two schemas have same number of arguments, and this schema's      arguments are backward compatible with the corresponding ones in      argument list of old_schema.   3) this schema has m argument, old_argument has n argument, m > n.      the first n arguments of this schema are backward compatible with      the corresponding arguments of old_schema. the remaning arguments      must be either OptionalType or provide default values.Pull Request resolved: https://github.com/pytorch/pytorch/pull/23409ghstack-source-id: 90111021Test Plan: buck test //caffe2/test:function_schemaReviewed By: hl475Differential Revision: D16505203fbshipit-source-id: e4099537776a60e8945e5c3cd57fa861f3598a9b* Creates generic device type testing framework (#25967)Summary:This PR addresses https://github.com/pytorch/pytorch/issues/24851 by...1. lets device types easily register themselves for testing2. lets tests be written to run on multiple devices and with multiple dtypes3. provides a mechanism to instantiate those tests so they are discoverable and filterable by unittest and pytestIt refactors three tests from test_torch.py to demonstrate how to use it.`test_diagonal` is the simplest example. Most tests just need to be modified to accept 'device' as an argument. The framework will then instantiate `test_diagonal_cpu` and `test_diagonal_cuda` (when CUDA is available) which call `test_diagonal` with the appropriate 'device' argument.`test_neg` also has dtype variants. It accepts both 'device' and 'dtype' as arguments, and the dtypes it runs with are specified with the 'dtypes' decorator. Dtypes can be specified for all device types and particular device types. The framework instantiates tests like `test_neg_cpu_torch.float`.`test_inverse` has device-specific dependencies. These dependencies are expressed with the sugary 'skipCUDAIfNoMagma' and 'skipCPUIfNoLapack' decorators. These decorators are device-specific so CPU testing is not skipped if Magma is not installed, and there conditions may be checked after or before the test case has been initialized. This means that skipCUDAIfNoMagma does not initialize CUDA. In fact, CUDA is only initialized if a CUDA test is run.These instantiated tests may be run as usual and with pytest filtering it's easy to run one test on all device types, run all the tests for a particular device type, or run a device type and dtype combination.See the note "Generic Device-Type Testing" for more detail.Pull Request resolved: https://github.com/pytorch/pytorch/pull/25967Differential Revision: D17381987Pulled By: mruberryfbshipit-source-id: 4a639641130f0a59d22da0efe0951b24b5bc4bfb* adds sync to flaky test_events_multi_gpu_query (#26231)Summary:This test can sometimes fail in CI.I suspect this flakiness is because the test asks a CUDA stream to record an event, fails to synchronize the CPU with that stream, then checks if the event is recorded on the CPU. There is no guarantee this will have happened.This one-line change preserves the intent of the test while ensuring the GPU has recorded the event before the CPU queries it.Pull Request resolved: https://github.com/pytorch/pytorch/pull/26231Differential Revision: D17382110Pulled By: mruberryfbshipit-source-id: 35b701f87f41c24b208aafde48bf10e1a54de059* Added possible out of shared memory error message (#25730)Summary:Fixes https://github.com/pytorch/pytorch/issues/5040Pull Request resolved: https://github.com/pytorch/pytorch/pull/25730Differential Revision: D17226214Pulled By: pbelevichfbshipit-source-id: 92278272aab74e6690f14fc9597acfd1a98854b7* Remove armv7s build from iOS (#26222)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26222### SummaryThe last generation of armv7s device is Phone 5C. As discussed with David offline, we decided not to support iOS armv7s devices.### Test plan- CI finishes successfully- Builds can be run only on X86_64 and arm64 devicesTest Plan: Imported from OSSDifferential Revision: D17385308Pulled By: xta0fbshipit-source-id: f883999aed18224ea3386b1f016964a33270fa34* Back out "[quant][observer] Add histogram observer" (#26236)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26236Original diff broke oss CI. Reverting.Original commit changeset: 0f047d3349cbghstack-source-id: 90125990Test Plan: testinprodReviewed By: hx89Differential Revision: D17385490fbshipit-source-id: 4258502bbc0e3a6dd6852c8ce01ed05eee618b1a* Ports most of test_torch.py to generic device type framework (#26232)Summary:This PR moves many tests in test_torch.py to the generic device type framework. This means that many CUDA tests now run in test_torch.py and there is greater consistency in how tests for many device types are written.One change is that all MAGMA tests are run on the default stream due to intermittent instability running MAGMA on the non-default stream. This is a known issue.Pull Request resolved: https://github.com/pytorch/pytorch/pull/26232Test Plan:While this PR edits the tests itself, it was validated using two independent methods:(1) The code was reviewed and it was verified that all deleted functions were actually moved.(2) The output of the TestTorch CI was reviewed and test outputs were matched before and after this PR.Differential Revision: D17386370Pulled By: mruberryfbshipit-source-id: 843d14911bbd52e8aac6861c0d9bc3d0d9418219* Add type hint for cuda.set_rng_state (#26200)Summary:Fixes https://github.com/pytorch/pytorch/issues/26199Pull Request resolved: https://github.com/pytorch/pytorch/pull/26200Differential Revision: D17386885Pulled By: soumithfbshipit-source-id: 9da03aae29281b2ed691cbfdd7b85fde55e5b7ef* Add a wrapper for inspect in JIT to produce better error message (#25415)Summary:If source code is not available due to packaging (e.g. sources are compiled to .pyc), TorchScript produces very obscure error message. This tries to make it nicer and allow to customize message by overriding _utils_internal.Pull Request resolved: https://github.com/pytorch/pytorch/pull/25415Test Plan: Really hard to unittest properly. Did one off testing by compiling to .pyc and checking the message.Differential Revision: D17118238Pulled By: dzhulgakovfbshipit-source-id: 3cbfee0abddc8613000680548bfe0b8ed52a36b0* Use MIOpen for transpose convolutions (#26172)Summary:Provides significant performance uplift where used.Pull Request resolved: https://github.com/pytorch/pytorch/pull/26172Differential Revision: D17374862Pulled By: bddppqfbshipit-source-id: 85d2df3c67b8935bc54f3a81a912a25c0102743a* Call aten ops through c10 dispatcher (#23668)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/23668- The eager mode frontend now calls operators who are defined in native_functions.yaml with `use_c10_dispatcher: True` through the c10 dispatcher and not anymore through globalATenDispatch().- These operators aren't registered with globalAtenDispatch anymore, only on c10 now.- Backend extensions calling globalATenDispatch().registerOp() to add their own kernels still work, this function will forward the registration to the c10 dispatcher for them.ghstack-source-id: 90130455Test Plan: benchmarks at https://docs.google.com/document/d/1gpzKZcFf1JJameY1vKxF7Cloul9s6D8HKIK2_Pp1hFo/edit#Differential Revision: D16603133fbshipit-source-id: 991f17b355e9c78c5e86fee4fa381df7ab98ac82* Remove unboxedAutogradKernel from c10 (#26130)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26130Since we now just use TensorTypeId::VariableTensorId, there's no need to treat autograd kernels any differently.ghstack-source-id: 90130457Test Plan: unit testsDifferential Revision: D17353873fbshipit-source-id: d4468506a5366bc5e7429144b090b3e78af9de62* Refines test_torch.py generic device testing (#26244)Summary:- Adds SkipCUDAIfRocm and skipCPUIfNoMkl decorators, ports corresponding tests- Changes "SkipIf" input semantics for consistency- Removes torchtest, which has been replaced with this new generic framework- Refactors some common parts out of CUDA tests to TestTorchDeviceType- Ensures all MAGMA tests run on default stream by putting the skipCUDANonDefaultStreamIf in the skipCUDAIfNoMagma decorator.Pull Request resolved: https://github.com/pytorch/pytorch/pull/26244Differential Revision: D17389060Pulled By: mruberryfbshipit-source-id: 1375774f24c2266049e6d4b899e7300ddf32eac8* Fix Windows build (#26246)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26246Broken due to https://github.com/pytorch/pytorch/issues/12117. Try fixing it.ghstack-source-id: 90137033Test Plan: waitforsandcastleReviewed By: zou3519Differential Revision: D17387317fbshipit-source-id: 705998c0b1608668d510b47f4fe20cecf5057c5f* Fix CI (#26250)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26250Exclude some ops from the c10 dispatcher that don't work with it yet.ghstack-source-id: 90138046Test Plan: waitforsandcastleReviewed By: zou3519Differential Revision: D17390117fbshipit-source-id: a87fb3048aeba2c3293b95d610ddb8e94369f8fe* Back out "[pytorch][PR] Refines test_torch.py generic device testing" (#26252)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26252Original commit changeset: 1375774f24c2Testing to see if this is somehow the source of hangs on ROCm builds.Test Plan: Change is to tests themselves. This diff is for testing the ROCm hang, however.Differential Revision: D17390575fbshipit-source-id: a6ffd5eb1df3971b99b6d42271a8d3d501ac79c6* Fix namedtensor ci (#26257)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26257In native_functions.yaml, all overloads must have unique overload names.This PR fixes `flatten` to have unique names for the overloads.Test Plan: - tested locally, but also [namedtensor ci]Differential Revision: D17391243Pulled By: zou3519fbshipit-source-id: aaef654953b4275c43b9d7bd949c46bd011f6c73* Switch to the new profiler infrastructure (#26174)Summary:The ones supported going forward are rocprofiler and roctracer.Pull Request resolved: https://github.com/pytorch/pytorch/pull/26174Differential Revision: D17387538Pulled By: bddppqfbshipit-source-id: 19d9828d9d07b5073ab5fa288e24fd65a8b18b52* Fix binary size of OpsAlreadyMovedToC10.cpp (#26237)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26237Calling a lot of `std::string` constructors is horrible for binary size, see t53997334.Using `const char*` instead should make the binary size much smaller.ghstack-source-id: 90145501Test Plan: size checks on the diffDifferential Revision: D17386002fbshipit-source-id: c5420adf225e535396e806a0df92419a7e2ad3e8* Fix no auto batching bugs: cannot bulk load; not work with namedtuple (#26065)Summary:see titlePull Request resolved: https://github.com/pytorch/pytorch/pull/26065Differential Revision: D17392851Pulled By: soumithfbshipit-source-id: 468cd41c8e03d689ff2e0261d948e28daad6bfaf* Upgrade MKLDNN to v0.20.5 (#25757)Summary:1. Fix issues exposed by below posts.https://github.com/pytorch/pytorch/issues/25242https://github.com/pytorch/pytorch/issues/25101https://github.com/pytorch/pytorch/issues/238252. Fix RNN support issue in mkldnn-bridgePull Request resolved: https://github.com/pytorch/pytorch/pull/25757Differential Revision: D17367948Pulled By: VitalyFedyuninfbshipit-source-id: d8430d3909ecbf853afa0ce3d968735f86f1da31* fix hypothesis timeout (#26280)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26280ghstack-source-id: 90160270Test Plan: testinprodDifferential Revision: D17396861fbshipit-source-id: ee2348ffa7f6092e2c5647a42d0e17879dcfacd0* Migrate away from using Variable( in test_nn.py (#26077)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26077As per #26071, we would like to get rid of the calls to Variable(where possible. This diff removes the calls in the test file test_nn.py. Theunit tests should all still pass as expected.ghstack-source-id: 90086624Test Plan: tests in `test_nn.py` should all pass.Differential Revision: D17336484fbshipit-source-id: 43fc7bd0b0be835ae89d06162ce1cbe4e0056d91* Enabled conv methods for the bfloat16Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26167Differential Revision: D17367728Pulled By: izdebyfbshipit-source-id: 0a7bd9a6dbc15815af195d644c9372af2135e93a* Move the CUDA implementation of round to ATen. (#25041)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/25041Fix #24617Pull Request resolved: https://github.com/pytorch/pytorch/pull/25041Test Plan: Imported from OSSDifferential Revision: D17114368Pulled By: VitalyFedyuninfbshipit-source-id: 6ec6ef99b4451acd7e93491fd4b44fca9ce1809d* Whiltelist and fusion support for quantized::linear - addmm (#26208)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26208Supporing `addmm` -> `quantized::linear` quant fusionTest Plan:python test/test_jit.py 'TestJit.test_quant_fusion'Imported from OSSDifferential Revision: D17380074fbshipit-source-id: fae88f118f85663d777648695768b0504ed7ccf9* Whiltelist and fusion support for quantized::linear - matmul(without bias) (#26209)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26209Support quant fusion for `matmul`(without bias) -> `quantized::linear`Test Plan:python test/test_jit.py 'TestJit.test_quant_fusion'Imported from OSSDifferential Revision: D17380075fbshipit-source-id: 290caee7f7bcf94d2731c0ee9bd40054f0fb9b07* Updating submodulesSummary:GitHub commits:https://github.com/facebook/mcrouter/commit/653434b898ea35810d7369d0911e3bdab9a1c3achttps://github.com/facebook/proxygen/commit/b74fbefc1a69de78989f540d9d0d312945aeadebhttps://github.com/facebook/rocksdb/commit/9bd5fce6e89fcb294a1d193f32f3e4bb2e41d994https://github.com/facebookincubator/mvfst/commit/6efcef720fac04011708840b89d1f174d3f290d0https://github.com/facebookresearch/pytorch-biggraph/commit/cb7830b6b30d2d24b591178705eaf9e8209ecd09https://github.com/pytorch/fbgemm/commit/53f0c0d175ae4283609a5b251052f9c6598b8aeeTest Plan: n/aReviewed By: yns88fbshipit-source-id: 78d0e24f5601aa990391a2404ae9d23b325de93f* Add ProcessGroupGloo::createDefaultDevice (#26166)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26166There were 2 variants to create a new device. One to do so based thename of a network interface, and one to do so based on a hostname oraddress. In the latter, if the address was not specified, it wouldlookup the local hostname and try to resolve that. If that failed, theprocess would crash.In this default path, we now try to lookup and use the local hostname,and if that fails we fallback to using the loopback address.If the local hostname doesn't resolve to an address that we can bindto, it is very likely that this process won't join other processesover the network, and that the user is trying to run a local test.If this assumption is wrong, the user can override the defaultinterface selection by setting the environment variable`GLOO_SOCKET_IFNAME` to the name of the external network interface.I tested this by changing the local hostname to a bogus name andconfirmed that default initialization works as expected.Closes #26049.Test Plan: Imported from OSSDifferential Revision: D17397898Pulled By: pieternfbshipit-source-id: 95a2467761d89df87b520d6e5837b92184b0dc12* Disable broken unit tests (#26301)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26301-ghstack-source-id: 90176419Test Plan: waitforsandcastleDifferential Revision: D17400971fbshipit-source-id: b6f9cb27fe955b0200d62591300c70ba79a90e5f* Kill defaults in nn.yaml. (#26282)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26282Since this isn't the end-user API anymore, we shouldn't have defaults.Test Plan: Imported from OSSDifferential Revision: D17397153Pulled By: gchananfbshipit-source-id: d44040bec0ee9c70734a53ebcc10a96f12226a29* Upgrade Caffe2 docker images to 306 to include roctracer and rocprofilerSummary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26260Differential Revision: D17391902Pulled By: bddppqfbshipit-source-id: 89ab3dedf05ba398acb7300fac95f03cfb31f0ba* Whiltelist and fusion support for quantized::linear - matmul(with bias) (#26204)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26204Support quant fusion for `matmul` with bias to `quantized::linear`.Test Plan:python test/test_jit.py 'TestJit.test_quant_fusion'Imported from OSSDifferential Revision: D17380073fbshipit-source-id: 00014469a852cc5d5b66469fc4b8d05eafba1e3e* Add __s390x__ compiler define for s390 builds. (#26233)Summary:pytorch builds fail on 390 architecture becausein simd.h the ifdef macros default to an x86 asm instruction.This patchs adds an ifdef __s390x__ to be able to build on s390.Pull Request resolved: https://github.com/pytorch/pytorch/pull/26233Differential Revision: D17392714Pulled By: soumithfbshipit-source-id: 037672bfea64fc5e52da2390d93b973534137c12* Clarified ambiguous docstring in NegativeBinomialSummary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25923Differential Revision: D17392848Pulled By: soumithfbshipit-source-id: 2833e72fe449c74dfd8273a7b1eb46c05c63d999* Dynamic quantization for bias. (#26057)Summary:Pull Request resolved: https://github.com/pytorch/pytorch/pull/26057bias is now unquantized (i.e. floating type) for qconv and qlinear. It is dynamically quantized by fbgemm.TODO: Add some performance numbers.Tests:test:quantization```Summary (total time 8.41s):  PASS: 24  FAIL: 0  SKIP: 0  FATAL: 0  TIMEOUT: 0More details at https://our.intern.facebook.com/intern/buck/build/74d5f6f7-55c9-4350-a618-2013042fffd8  OMIT: 0```test:quantized```Summary (total time 13.21s):  PASS: 43  FAIL: 0  SKIP: 5    caffe2/test:quantized - test_qnnpack_maxpool2d (test_quantized.TestQNNPackOps)    caffe2/test:quantized - test_compare_tensor_scalar (test_quantized.TestComparatorOps)    caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps)    caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps)    caffe2/test:quantized - test_qnnpack_add (test_quantized.TestQNNPackOps)  FATAL: 0  TIMEOUT: 0  OMIT: 0```ghstack-source-id: 90166254Test Plan:buck test mode/dev caffe2/test:quantizationbuck test mode/dev caffe2/test:quantizedDifferential Revision: D17328028fbshipit-source-id: d4a163d730d0f4a03e8e0faf7420710cf36eec09* Use expected_wrapper only if CMAKE_{C,CXX}_COMPILER and/or is not set by user (#26306)Summary:This will honor user's preference.Pull Request resolved: https://github.com/pytorch/pytorch/pull/26306Differential Revision: D17408030Pulled By: soumithfbshipit-source-id: 6841b805603d40cd7caf78dbb42405a0c931f052* Add derivative of cholesky_solve (#26185)Summary:Changelog:- Add derivative of cholesky_solve. The equations are derived akin to the derivative of solve methods using the technique detailed [here](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXrOjIyM7kAhWstlkKHRxqCDgQFjAAegQIAhAC&url=https%3A%2F%2Fpeople.maths.ox.ac.uk%2Fgilesm%2Ffiles%2FNA-08-01.pdf&usg=AOvVaw0BNISOvM_I9KjPrl0xv1R_)Pull Request resolved: https://github.com/pytorch/pytorch/pull/26185Test Plan:- Added tests for cholesky_solve in test_autograd.pyCloses half of https://github.com/pytorch/pytorch/issues/4669.Differential Revision: D17408123Pulled By: soumithfbshipit-source-id: f9668c8d4d758c0dc658941a8b730a17683091aa* Kill 'default_init', which isn't needed anymore.Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26281Test Plan: Imported from OSSDifferential Revision: D17397097Pulled By: gchananfbshipit-source-id: fb53e90637a3dfb2300fca78f414abe2d82832f3* Export round (#26126)Summary:Added round export in opset 11Pull Request resolved: https:…