chainer.Variable¶
- classchainer.Variable(data=None,*,name=None,grad=None,requires_grad=True)[source]¶
Array with a structure to keep track of computation.
Every variable holds a data array of type either
numpy.ndarrayorcupy.ndarray.A variable object holds a data array and a
VariableNodeobject ofa computational graph. If the variable is constructed by the user, the nodeisroot and does not hold any parent. If the variable is constructed by aFunctionNodeobject (i.e., by calling functions underchainer.functionsor user-defined functions), or by using operators(see the list below), the node holds a reference to its parent calledcreator_node.This reference is used in backpropagation to backtrack the graph.Users can disable (resp. enable) this chaining behavior by calling
no_backprop_mode()(resp.force_backprop_mode()).In the former context, a variable never creates a computational graph,whereas in the latter context, it is forced to create.Note
The following operators are defined for variable(s).
Indexing:
a[slices](__getitem__())Addition:
a+b(__add__(),__radd__())Subtraction:
a-b(__sub__(),__rsub__())Multiplication:
a*b(__mul__(),__rmul__())Division:
a/b(__div__(),__rdiv__(),__truediv__(),__rtruediv__())Floor Division:
a//b(__floordiv__(),__rfloordiv__())Exponentiation:
a**b(__pow__(),__rpow__())Matrix Multiplication:
a@b(__matmul__(),__rmatmul__())Negation (Arithmetic):
-a(__neg__())Absolute value:
abs(a)(__abs__())
- Parameters
data (N-dimensional array) – Initial data array.
name (str) – Name of the variable.
grad (N-dimensional array) – Initial gradient array.
requires_grad (bool) – Boolean indicating whether
gradwill be setin backward calculation.
Methods
- __getitem__(slices)[source]¶
Extract elements from array with specified shape, axes and offsets.
- Parameters
x (
VariableorN-dimensional array) – A variable to be sliced.slices (int,slice,Ellipsis,None,integer array-like,boolean array-like ortuple of them) – An object to specify the selection of elements.
- Returns
A
Variableobject which contains sliced array ofx.
Note
It only supports types that are supported by CUDA’s atomicAdd whenan integer array is included in
slices.The supported types arenumpy.float32,numpy.int32,numpy.uint32,numpy.uint64andnumpy.ulonglong.Note
It does not support
slicesthat contains multiple boolean arrays.Note
See NumPy documentation for details ofindexing.
Example
>>>x=np.arange(12).reshape((2,2,3))>>>xarray([[[ 0, 1, 2], [ 3, 4, 5]], [[ 6, 7, 8], [ 9, 10, 11]]])>>>F.get_item(x,0)variable([[0, 1, 2], [3, 4, 5]])>>>F.get_item(x,(0,0,slice(0,2,1)))# equals x[0, 0, 0:2:1]variable([0, 1])>>>F.get_item(x,(Ellipsis,2))# equals x[..., 2]variable([[ 2, 5], [ 8, 11]])>>>F.get_item(x,(1,np.newaxis,1,0))# equals x[1, None, 1, 0]variable([9])
- __len__()[source]¶
Returns the first dimension of the data array.
- Returns
Number of the first dimension of the data array.
- Return type
- addgrad(var)[source]¶
Accumulates the gradient array from given source variable.
This method adds the gradient of a given variable to the gradient ofthis variable. The accumulation is even done across the host anddifferent devices. If this variable has uninitialized data/grad arrays,this method initializes it with the shape of the given variable andthen accumulates the gradient.
- Parameters
var (Variable) – Source variable.
- backward(retain_grad=False,enable_double_backprop=False,loss_scale=None)[source]¶
Runs error backpropagation (a.k.a. backprop) from this variable.
On backprop,
FunctionNode.backward()is called on eachFunctionNodeobject appearing inthe backward graph starting from this variable.The backward graph is represented by backwardreferences from variable nodes to their creators, and from functionnodes to their input variable nodes. The backprop stops at all rootnodes. Some function nodes setNoneas gradients of some inputs,where further backprop does not take place at such inputs.This method uses
gradas the initial error array. User canmanually set a gradient array before calling this method.If the shape ofdatais()(i.e., it is scalar) andgradisNone, then this method automatically complements1.0 as the initial error. This is useful on starting backprop fromsome scalar loss value.From v3, this method supportsdifferentiable backprop (a.k.a. doublebackprop, grad of grads). To enable it, pass
enable_double_backprop=True.- Parameters
retain_grad (bool) –
If
True, the gradient arrays of allintermediate variables are kept.Otherwise,gradof theintermediate variables are set toNoneon appropriatetiming, which may reduce the maximum memory consumption.In most cases of training some models, the purpose of backpropis to compute gradients of parameters, not of all variables,and therefore it is recommended that this flag be set to
False.enable_double_backprop (bool) –(Added in v3.0) If
True,computational trace of the whole backpropagation procedure isrecorded to the computational graph so that one can further dobackpropagation from the resulting gradients. Note thatenabling it results in larger memory consumption needed tostore the gradients w.r.t intermediate variables that arerequired for the second gradient computation.loss_scale (float) – Loss scaling factor. Loss scaling is a usefultechnique to mitigate vanishing gradient issue that tends tohappen when low precision data type like float16 is used duringtraining. If you set loss scaling factor, gradients of lossvalues are to be multiplied by the factor before backpropstarts. The factor is propagated to whole gradients in acomputational graph along the backprop. The gradients ofparameters are divided by the factor just before the parametersare to be updated.
- copydata(var)[source]¶
Copies the data array from given source variable.
This method copies the data array from given variable to this variable.The copy is done even if the arrays reside on different devices,including across the host and a GPU device. If this variable has anuninitialized data array, this method initializes it by the data arrayof the given variable. Similarly, if the given variable has anuninitialized data array, this method initializes it by the data arrayof this variable (
self). If both are uninitialized, this methoddoes nothing.- Parameters
var (Variable) – Source variable.
- from_chx()[source]¶
Converts the array and gradient to non-ChainerX arrays without copy.
This method converts the underlying ChainerX array and gradientresiding in either a
nativeorcudadevice to NumPy or CuPyarrays respectively, on their same physical device. It does nothingif the array held by the Variable object is not a ChainerX array. Thenew array is a view of the original one.Raises an error if such a conversion is not supported for the device.
- item()[source]¶
Converts the variable with one element to a Python scalar.
This will incur host-device synchronization.
- mean(axis=None,*,weights=None,keepdims=False)[source]¶
Calculate weighted average of array elements over a given axis.
See also
chainer.functions.average()for full documentation,
- reshape(*shape)[source]¶
Returns a variable of a different shape and the same content.
See also
chainer.functions.reshape()for full documentation,
- set_creator(gen_func)[source]¶
Notifies the variable that the given function is its creator.
- Parameters
gen_func (Function) – Function object that creates this variable asone of its outputs.
- set_creator_node(fnode)[source]¶
Notifies the variable that the given node is its creator.
- Parameters
fnode (FunctionNode) – Function node that has this variable as anoutput.
- to_chx()[source]¶
Converts the array and gradient to ChainerX arrays without copy.
This method converts the underlying array and gradient to
chainerx.ndarrayon the same physical device. It does nothingif the array held by the Variable object is already a ChainerX array.The new array is a view of the original one.
- to_device(device)[source]¶
Copies the data and gradient arrays to specified device.
- Parameters
device – Target device specifier. See
get_device()for available values.
- to_gpu(device=None)[source]¶
Copies the data and gradient arrays to specified GPU.
- Parameters
device – Target device specifier. If omitted, the current device isused.
- to_intel64()[source]¶
Copies the data and gradient arrays to intel64 specific mdarray.
If the array is not suited for intel64, it will be converted to
numpy.ndarray.
- transpose(*axes)[source]¶
Permute the dimensions of an input variable without copy.
See also
chainer.functions.transpose()for full documentation.
- unchain()[source]¶
Deletes the reference to the creator of this variable.
This method deletes the reference to the creator from the correspondingvariable node. Unlike
unchain_backward(), it does not backtrackthe graph.This method is equivalent to
self.creator_node=None.
- unchain_backward()[source]¶
Deletes references between variable nodes and functions backward.
After this method completes, intermediate variable nodes and functionsthat are not referenced from anywhere are deallocated by referencecount GC. Also this variable itself deletes the reference to itscreator function from the node, i.e. the node becomes root in thecomputation graph. It indicates that backprop after unchaining stops atthis variable. This behavior is useful to implement truncated BPTT.
- zerograd()[source]¶
Initializes the gradient array by zeros.
Note that the gradient variable is unchained from the computationalgraph by this method, because this operation breaks the backpropvalidity.
Deprecated since version v1.15:Use more efficient
cleargrads()instead.
Attributes
- T¶
Transposition of this variable.
- array¶
The underlying data array.
It is either
numpy.ndarrayorcupy.ndarrayobject,orNoneif the variable in in an uninitialized state.
- chx_array¶
A view of the raw ChainerX array.
In contrary to
Variable.arraywhich is always disconnected,the array represented by this attribute may be connected to thecomputational graph.It is a view, so it has a distinct gradient from the original array.
If this attribute is queried on a
Variablewith a non-ChainerXarray,ValueErrorwill be raised.
- creator¶
Function implementation that created this variable.
When this variable has been created by an old-style function (i.e., itis implemented as a subclass of
Function), this propertyreturns thatFunctionobject.When this variable has been created by a new-style function (i.e., itis implemented as a subclass of
FunctionNodeclass), thisproperty returns that node object.
- creator_node¶
FunctionNodeobject that created this variable.This property has a setter to which
Nonecan be set. SettingNoneto this property is equivalent to callunchain();it purges the variable from the function that created this variable.The setter also accepts the original
FunctionNodeobject thatcreated this variable. For example, you can once setNoneto thisproperty and then set the original value again.Note
Setting an irrelevant
FunctionNode()object does not emit anyerror immediately, whereas the behavior is undefined. Do not setaFunctionNode()object that did not create this variableobject.
- data¶
The underlying data array (equivalent to
array).Note that using this attribute directly is discouraged; use
arrayinstead. Usingarray, you can find an errorearlier when your code mixes up Variable and ndarray becausendarray does not have an attribute.arraywhile it has.data.
- device¶
Device on which the data array of this variable reside.
- dtype¶
- grad¶
Gradient array of this variable.
Note that this property returns the underlying array of the gradientvariable instead of the gradient variable itself; to get/setgradient variable, use
grad_varinstead.If the underlying array is a
chainerx.ndarrayandrequires_grad is false, trying to access the gradient will results inand error.
- grad_var¶
Gradient variable.
- label¶
Short text that represents the variable.
- layout¶
- name¶
- ndim¶
- node¶
- rank¶
- raw_array¶
The underlying raw data array.
Its shape does not have to be the semantic shape, if the memory layoutis non-standard.
- requires_grad¶
It indicates that
gradwill be set in backward calculation.
- shape¶
- size¶
- xp¶
Array module for the data array of this variable.