Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Enable CPU fused kernel on Windows#25578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
peterjc123 wants to merge7 commits intopytorch:masterfrompeterjc123:cpu_fused_win

Conversation

@peterjc123
Copy link
Collaborator

@peterjc123peterjc123 commentedSep 3, 2019
edited
Loading

No description provided.

@pytorchbotpytorchbot added the oncall: jitAdd this issue/PR to JIT oncall triage queue labelSep 3, 2019
@peterjc123peterjc123 removed the request for review fromapaszkeSeptember 3, 2019 12:39
@pytorchbotpytorchbot added caffe2 module: buildBuild system issues labelsSep 4, 2019
@Immocat
Copy link

Immocat commentedSep 6, 2019
edited
Loading

Thank you@peterjc123 for the implementation. Actually I am working on writing a Unity native plugin(c++) on Windows to infer neural net results every frame, and CPU only is indeed much slower without this feature. I tried to the plugin with CUDA libtorch, however, Unity crashes at the exact line that doing neural net inference (forward(input_tensor)). I wrote a simple C++ program to load the plugin lib and dll and it works totally well. I suppose that the crash related to some library conflict between which libraries Unity uses and libtorch's CUDA prebuilt library on Windows. I found similar painful experience of OpenPose's Unity Plugin dealing with Neural Net library + CUDA + Unity Plugin setting.

So I think I will give up the CUDA version. My question is that is this feature finished in your fork branch(peterjc123:cpu_fused_win)? I only care about libtorch, CPU only version, on Windows.

If you've already finished it, may I try to build it from your last commit( on Windows and CPU only).
I would appreciate if you could share the pre-built binary, but I could also try build from your source code. Could you also link some building instruction for building libtorch CPU only on Windows with Visual Studio + cmake? Is there any difference in terms of building libtorch CPU only on Windwos between your code and the master pytorch branch?

Thank you very much!

@peterjc123
Copy link
CollaboratorAuthor

@Immocat No, it is still in an early stage. There are some difficulties that I have to tackle before it can be merged into master.

  1. Finding VS installation (Plan: usingvswhere and setting env vars. The current method is to activate the dev env every time we call the compiler.)
  2. Tempfile under Windows (Plan: Using code from GCC and make some adaptations)
  3. Some other things that I haven't considered (e.g. Some util functions and OS-dependent code logic)

For cuda jit fusion conflicts, maybe you could try building the static version of LibTorch. Below are the steps:

cmd:: EssentialsetBUILD_SHARED_LIBS=OFF:: [Optional] If you want to build with VS 2019 generator, please change the value in the next line to `Visual Studio 16 2019`.:: Note: This value is useless if Ninja is detected. However, you can force that by using `set USE_NINJA=OFF`.setCMAKE_GENERATOR=Visual Studio152017:: Read the content in the previous section carefully before you preceed.:: [Optional] If you want to override the underlying toolset used by Ninja and Visual Studio with CUDA, please run the following script block.:: "Visual Studio 2017 Developer Command Prompt" will be run automatically.:: Make sure you have CMake >= 3.12 before you do this when you use the Visual Studio generator.:: It's an essential step if you use Python 3.5.setCMAKE_GENERATOR_TOOLSET_VERSION=14.11setDISTUTILS_USE_SDK=1for /f"usebackq tokens=*"%i in (`"%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe" -version [15^,16^) -products * -latest -property installationPath`) do call "%i\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%CMAKE_GENERATOR_TOOLSET_VERSION%:: [Optional] If you want to override the cuda host compilersetCUDAHOSTCXX=C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Tools\MSVC\14.11.25503\bin\HostX64\x64\cl.exepython tools\build_libtorch.py

@pytorchbotpytorchbot added module: ciRelated to continuous integration module: pybindRelated to our Python bindings / interactions with other Python libraries labelsSep 7, 2019
@peterjc123peterjc123force-pushed thecpu_fused_win branch 2 times, most recently from491b8cf toc4731a1CompareSeptember 7, 2019 15:41
@peterjc123
Copy link
CollaboratorAuthor

peterjc123 commentedSep 7, 2019
edited
Loading

The basic functionality is working now. However, there are still some points to improve:

  • Currently, we activate the dev env every time if it's not activated. It is very slow, and we may need to move it to the python frontend.
  • OpenMP is not working. It complains that "index variable in OpenMP 'for' statement must have signed integral type".
  • We should skip the tests when VS is not installed.
  • Implement something like-march=native.

@ezyang
Copy link
Contributor

This is nifty stuff. Let us know if there is stuff we can do to help move it along.

@peterjc123
Copy link
CollaboratorAuthor

@ezyang Could you please tell me where the jit frontend is? That is, how can I disable it in the Python side?

@ezyang
Copy link
Contributor

Could you please tell me where the jit frontend is? That is, how can I disable it in the Python side?

Are you talking about the TorchScript compiler? It's not really disableable; when you request a function to be compiled for torchscript, we recursively collect the source code reachable from it and compile it. Maybe you could tell us more about what's going on?

cc@suo for perhaps more comments

@peterjc123
Copy link
CollaboratorAuthor

peterjc123 commentedSep 10, 2019
edited
Loading

@ezyang
Some more questions:

  1. What about usingPYTORCH_JIT=0 in this line:https://github.com/pytorch/pytorch/blob/master/torch/jit/__init__.py#L57?
  2. You called it the TorchScript compiler, so it implies that JIT fusion is not working when we dotorch.jit.trace, right?
  3. Could you please tell me a bit more about whattorch.jit.trace is currently doing on Windows?

The following is what I want to do now. First, I want to add a check for the VS env before every jit fuse call. If it is not activated, then we will try to activate it, but if we cannot find it, then we will skip the fusion step. Do you know where should I add these code?

@ezyang
Copy link
Contributor

What about using PYTORCH_JIT=0 in this line:https://github.com/pytorch/pytorch/blob/master/torch/jit/__init__.py#L57?

Ah yes, I forgot about that. That will indeed turn off JIT globally; it's meant as an easy way to turn off script if you're debugging an issue without having to edit source code.

You called it the TorchScript compiler, so it implies that JIT fusion is not working when we do torch.jit.trace, right?

Actually, fusion can apply to trace too. Trace versus script refers to different ways of getting the IR in question; trace means we run your program and record what happened; script means we parse the literal program text. The IR can be fused in both cases.

Could you please tell me a bit more about what torch.jit.trace is currently doing on Windows?

I am not aware of any Windows specific behavior for torch.jit.trace, and we don't seem to have any macros on MSVC that would affect this.

First, I want to add a check for the VS env before every jit fuse call. If it is not activated, then we will try to activate it, but if we cannot find it, then we will skip the fusion step. Do you know where should I add these code?

For cpu, it's going to be somewhere liketorch/csrc/jit/fuser/cpu/fused_kernel.cpp, probablyrunCompiler.

@pytorchbotpytorchbot added the module: testsIssues related to tests (not the torch.testing module) labelSep 11, 2019
@peterjc123
Copy link
CollaboratorAuthor

@pytorchbot rebase this please

@peterjc123peterjc123 changed the title[WIP] Enable CPU fused kernel on WindowsEnable CPU fused kernel on WindowsSep 11, 2019
@peterjc123
Copy link
CollaboratorAuthor

@xscha Sure, it should be fairly easy to support clang or any other compilers, but not in this PR. And we may need a code refactory otherwise the code will look messy. As for android, I think it should be just be the same with the deskop OS, using interpreter to run the operators when using jit script, and for jit fusion, only gcc is supported.

@xsacha
Copy link
Contributor

I'm just worried about the fact we need a compiler on deployed systems where we do inferencing (JIT is only for the inferencing right?).
Is there an alternative to ahead-of-time compile with options like -mavx -mavx2, etc?

@peterjc123
Copy link
CollaboratorAuthor

@xsacha Yes, I agree with you that we should use some lightweight cross-platform compilers like llvmlite used by numba.

Copy link
Contributor

@ezyangezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is very nice work. Inclusion of LGPL code is a blocker; we'll have to find an implementation somewhere else. I think my only other major concern is in-place mutation of environment variables in process.

bug fixEnable the jit tests on WindowsMore fixesFix tempfile for WindowsMore fixesMinor fixesadd headerlint changesDebugging stuff.....dllexportChange working dir to make git cleanCleanupRemove useless print
Fix lintmore lint fixes
More fixesFix comments.
@peterjc123
Copy link
CollaboratorAuthor

@pytorchbot rebase this please

pytorchbot reacted with thumbs up emoji

@peterjc123
Copy link
CollaboratorAuthor

@ezyang Could you please take some time to review this PR?

Copy link
Contributor

@facebook-github-botfacebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diffon Phabricator.

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in2ce8c83.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@facebook-github-botfacebook-github-botfacebook-github-bot left review comments

@ezyangezyangezyang approved these changes

@yf225yf225Awaiting requested review from yf225

@apaszkeapaszkeAwaiting requested review from apaszke

@goldsboroughgoldsboroughAwaiting requested review from goldsborough

@zdevitozdevitoAwaiting requested review from zdevito

@suosuoAwaiting requested review from suo

Assignees

No one assigned

Labels

caffe2Mergedmodule: buildBuild system issuesmodule: ciRelated to continuous integrationmodule: pybindRelated to our Python bindings / interactions with other Python librariesmodule: testsIssues related to tests (not the torch.testing module)oncall: jitAdd this issue/PR to JIT oncall triage queue

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

7 participants

@peterjc123@Immocat@ezyang@xsacha@facebook-github-bot@pytorchbot@mruberry

[8]ページ先頭

©2009-2025 Movatter.jp