Commit44dac51

ArekSredzki

authored and

pytorchmergebot

committed

Improve Autograd Documentation Clarity (#89401)

This makes minor adjustments to the autograd docs, improving clarity and resolving grammatical errorsPull Requestresolved:#89401Approved by:https://github.com/kit1980

1 parent49ccc41 commit44dac51Copy full SHA for 44dac51

File tree

1 file changed

+26

-26

lines changed

docs/source/notes
- autograd.rst

1 file changed

+26

-26

lines changed

`‎docs/source/notes/autograd.rst‎`

Lines changed: 26 additions & 26 deletions

Original file line number	Diff line number	Diff line change
`@@ -13,7 +13,7 @@ programs, and can aid you in debugging.`
`13`	`13`	`How autograd encodes the history`
`14`	`14`	`--------------------------------`
`15`	`15`
`16`		`-Autograd is reverse automatic differentiation system. Conceptually,`
	`16`	`+Autograd isareverse automatic differentiation system. Conceptually,`
`17`	`17`	`autograd records a graph recording all of the operations that created`
`18`	`18`	`the data as you execute operations, giving you a directed acyclic graph`
`19`	`19`	`whose leaves are the input tensors and roots are the output tensors.`
`@@ -23,11 +23,11 @@ compute the gradients using the chain rule.`
`23`	`23`	`Internally, autograd represents this graph as a graph of`
`24`	`24`	:class:`Function` objects (really expressions), which can be
`25`	`25`	:meth:`~torch.autograd.Function.apply` ed to compute the result of
`26`		`-evaluating the graph. When computing theforwards pass, autograd`
	`26`	`+evaluating the graph. When computing theforward pass, autograd`
`27`	`27`	`simultaneously performs the requested computations and builds up a graph`
`28`	`28`	representing the function that computes the gradient (the ``.grad_fn``
`29`	`29`	attribute of each:class:`torch.Tensor` is an entry point into this graph).
`30`		`-When theforwards pass is completed, we evaluate this graph in the`
	`30`	`+When theforward pass is completed, we evaluate this graph in the`
`31`	`31`	`backwards pass to compute the gradients.`
`32`	`32`
`33`	`33`	`An important thing to note is that the graph is recreated from scratch at every`
`@@ -119,7 +119,7 @@ For more fine-grained exclusion of subgraphs from gradient computation,`
`119`	`119`	there is setting the ``requires_grad`` field of a tensor.
`120`	`120`
`121`	`121`	`Below, in addition to discussing the mechanisms above, we also describe`
`122`		-evaluation mode (:meth:`nn.Module.eval()`), a method that is notactuallyused
	`122`	+evaluation mode (:meth:`nn.Module.eval()`), a method that is not used
`123`	`123`	`to disable gradient computation but, because of its name, is often mixed up with the three.`
`124`	`124`
`125`	`125`	Setting ``requires_grad``
@@ -164,16 +164,16 @@ of the module's parameters (which have ``requires_grad=True`` by default).
`164`	`164`	`Grad Modes`
`165`	`165`	`^^^^^^^^^^`
`166`	`166`
`167`		-Apart from setting ``requires_grad`` there are also threepossible modes
`168`		`-enableable from Python that can affect how computations in PyTorch are`
	`167`	+Apart from setting ``requires_grad`` there are also threegrad modes that can
	`168`	`+be selected from Python that can affect how computations in PyTorch are`
`169`	`169`	`processed by autograd internally: default mode (grad mode), no-grad mode,`
`170`	`170`	`and inference mode, all of which can be togglable via context managers and`
`171`	`171`	`decorators.`
`172`	`172`
`173`	`173`	`Default Mode (Grad Mode)`
`174`	`174`	`^^^^^^^^^^^^^^^^^^^^^^^^`
`175`	`175`
`176`		`-The "default mode" isactuallythe mode we are implicitly in when no other modes like`
	`176`	`+The "default mode" is the mode we are implicitly in when no other modes like`
`177`	`177`	`no-grad and inference mode are enabled. To be contrasted with`
`178`	`178`	`"no-grad mode" the default mode is also sometimes called "grad mode".`
`179`	`179`
`@@ -237,7 +237,7 @@ For implementation details of inference mode see`
`237`	`237`	Evaluation Mode (``nn.Module.eval()``)
`238`	`238`	`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`
`239`	`239`
`240`		`-Evaluation mode is notactuallya mechanism to locally disable gradient computation.`
	`240`	`+Evaluation mode is not a mechanism to locally disable gradient computation.`
`241`	`241`	`It is included here anyway because it is sometimes confused to be such a mechanism.`
`242`	`242`
`243`	`243`	Functionally, ``module.eval()`` (or equivalently ``module.train(False)``) are completely
`@@ -263,21 +263,21 @@ In-place operations with autograd`
`263`	`263`	`Supporting in-place operations in autograd is a hard matter, and we discourage`
`264`	`264`	`their use in most cases. Autograd's aggressive buffer freeing and reuse makes`
`265`	`265`	`it very efficient and there are very few occasions when in-place operations`
`266`		`-actuallylower memory usage by any significant amount. Unless you're operating`
	`266`	`+lower memory usage by any significant amount. Unless you're operating`
`267`	`267`	`under heavy memory pressure, you might never need to use them.`
`268`	`268`
`269`	`269`	`There are two main reasons that limit the applicability of in-place operations:`
`270`	`270`
`271`	`271`	`1. In-place operations can potentially overwrite values required to compute`
`272`	`272`	`gradients.`
`273`	`273`
`274`		`-2. Every in-place operationactuallyrequires the implementation to rewrite the`
	`274`	`+2. Every in-place operation requires the implementation to rewrite the`
`275`	`275`	`computational graph. Out-of-place versions simply allocate new objects and`
`276`	`276`	`keep references to the old graph, while in-place operations, require`
`277`	`277`	changing the creator of all inputs to the:class:`Function` representing
`278`	`278`	`this operation. This can be tricky, especially if there are many Tensors`
`279`	`279`	`that reference the same storage (e.g. created by indexing or transposing),`
`280`		`- and in-place functions willactuallyraise an error if the storage of`
	`280`	`+ and in-place functions will raise an error if the storage of`
`281`	`281`	modified inputs is referenced by any other:class:`Tensor`.
`282`	`282`
`283`	`283`	`In-place correctness checks`
`@@ -338,18 +338,18 @@ serializing all the backward calls in a specific order during execution`
`338`	`338`	`Non-determinism`
`339`	`339`	`^^^^^^^^^^^^^^^`
`340`	`340`
`341`		-If you are calling ``backward()``on multiplethread concurrentlybut with
`342`		`-shared inputs (i.e. Hogwild CPU training). Since parameters are automatically`
`343`		`-shared across threads, gradient accumulation might become non-deterministic on`
`344`		`-backward calls across threads, because two backward calls might access and try`
`345`		-to accumulate the same ``.grad`` attribute. This is technically not safe, and
`346`		`-it might result inracing condition and the result might be invalid to use.`
	`341`	+If you are calling ``backward()``from multiplethreads concurrentlyand have
	`342`	`+shared inputs (i.e. Hogwild CPU training), then non-determinsim should be expected.`
	`343`	`+This can occur because parameters are automatically shared across threads,`
	`344`	+as such, multiple threads may access and try to accumulate the same ``.grad``
	`345`	`+attribute during gradient accumulation. This is technically not safe, and`
	`346`	`+it might result inrace condition and the result might be invalid to use.`
`347`	`347`
`348`		`-But this is expected pattern if you are using the multithreading approach to`
`349`		`-drive the whole training process but using shared parameters, user who use`
`350`		`-multithreading should have the threading model in mind and should expect this`
`351`		-to happen. User could use thefunctional API:func:`torch.autograd.grad`to
`352`		-calculate thegradients instead of ``backward()`` to avoid non-determinism.
	`348`	`+Users developing multithreaded models featuring shared parameters should have the`
	`349`	`+threading model in mind and should understand the issues described above.`
	`350`	`+`
	`351`	+Thefunctional API:func:`torch.autograd.grad`may be used to calculate the
	`352`	+gradients instead of ``backward()`` to avoid non-determinism.
`353`	`353`
`354`	`354`	`Graph retaining`
`355`	`355`	`^^^^^^^^^^^^^^^`
`@@ -368,9 +368,9 @@ Thread Safety on Autograd Node`
`368`	`368`
`369`	`369`	`Since Autograd allows the caller thread to drive its backward execution for`
`370`	`370`	`potential parallelism, it's important that we ensure thread safety on CPU with`
`371`		`-parallelbackwards that share part/whole of the GraphTask.`
	`371`	+parallel``backward()`` calls that share part/whole of the GraphTask.
`372`	`372`
`373`		-Custom Python ``autograd.Function`` is automatically thread safe because of GIL.
	`373`	+Custom Python ``autograd.Function``\s are automatically thread safe because of GIL.
`374`	`374`	`For built-in C++ Autograd Nodes (e.g. AccumulateGrad, CopySlices) and custom`
`375`	`375`	``autograd::Function``\s, the Autograd Engine uses thread mutex locking to ensure
`376`	`376`	`thread safety on autograd Nodes that might have state write/read.`
`@@ -440,8 +440,8 @@ It also turns out that no interesting real-valued objective fulfill the`
`440`	`440`	`Cauchy-Riemann equations. So the theory with homomorphic function cannot be`
`441`	`441`	`used for optimization and most people therefore use the Wirtinger calculus.`
`442`	`442`
`443`		`-Wirtinger Calculus comesin picture ...`
`444`		`-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`
	`443`	`+Wirtinger Calculus comesinto the picture ...`
	`444`	`+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^`
`445`	`445`
`446`	`446`	`So, we have this great theory of complex differentiability and`
`447`	`447`	`holomorphic functions, and we can’t use any of it at all, because many`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit44dac51

File tree

1 file changed

1 file changed

`‎docs/source/notes/autograd.rst‎`

0 commit comments