chainer.Variable

classchainer.Variable(data=None,*,name=None,grad=None,requires_grad=True)[source]

Array with a structure to keep track of computation.

Every variable holds a data array of type eithernumpy.ndarray orcupy.ndarray.

A variable object holds a data array and aVariableNode object ofa computational graph. If the variable is constructed by the user, the nodeisroot and does not hold any parent. If the variable is constructed by aFunctionNode object (i.e., by calling functions underchainer.functions or user-defined functions), or by using operators(see the list below), the node holds a reference to its parent calledcreator_node.This reference is used in backpropagation to backtrack the graph.

Users can disable (resp. enable) this chaining behavior by callingno_backprop_mode() (resp.force_backprop_mode()).In the former context, a variable never creates a computational graph,whereas in the latter context, it is forced to create.

Note

The following operators are defined for variable(s).

Parameters
  • data (N-dimensional array) – Initial data array.

  • name (str) – Name of the variable.

  • grad (N-dimensional array) – Initial gradient array.

  • requires_grad (bool) – Boolean indicating whethergrad will be setin backward calculation.

Methods

__getitem__(slices)[source]

Extract elements from array with specified shape, axes and offsets.

Parameters
  • x (Variable orN-dimensional array) – A variable to be sliced.

  • slices (int,slice,Ellipsis,None,integer array-like,boolean array-like ortuple of them) – An object to specify the selection of elements.

Returns

AVariable object which contains sliced array ofx.

Note

It only supports types that are supported by CUDA’s atomicAdd whenan integer array is included inslices.The supported types arenumpy.float32,numpy.int32,numpy.uint32,numpy.uint64 andnumpy.ulonglong.

Note

It does not supportslices that contains multiple boolean arrays.

Note

See NumPy documentation for details ofindexing.

Example

>>>x=np.arange(12).reshape((2,2,3))>>>xarray([[[ 0,  1,  2],        [ 3,  4,  5]],       [[ 6,  7,  8],        [ 9, 10, 11]]])>>>F.get_item(x,0)variable([[0, 1, 2],          [3, 4, 5]])>>>F.get_item(x,(0,0,slice(0,2,1)))# equals x[0, 0, 0:2:1]variable([0, 1])>>>F.get_item(x,(Ellipsis,2))# equals x[..., 2]variable([[ 2,  5],          [ 8, 11]])>>>F.get_item(x,(1,np.newaxis,1,0))# equals x[1, None, 1, 0]variable([9])
__len__()[source]

Returns the first dimension of the data array.

Returns

Number of the first dimension of the data array.

Return type

int

__copy__()[source]
addgrad(var)[source]

Accumulates the gradient array from given source variable.

This method adds the gradient of a given variable to the gradient ofthis variable. The accumulation is even done across the host anddifferent devices. If this variable has uninitialized data/grad arrays,this method initializes it with the shape of the given variable andthen accumulates the gradient.

Parameters

var (Variable) – Source variable.

as_layout(layout)[source]
backward(retain_grad=False,enable_double_backprop=False,loss_scale=None)[source]

Runs error backpropagation (a.k.a. backprop) from this variable.

On backprop,FunctionNode.backward()is called on eachFunctionNode object appearing inthe backward graph starting from this variable.The backward graph is represented by backwardreferences from variable nodes to their creators, and from functionnodes to their input variable nodes. The backprop stops at all rootnodes. Some function nodes setNone as gradients of some inputs,where further backprop does not take place at such inputs.

This method usesgrad as the initial error array. User canmanually set a gradient array before calling this method.If the shape ofdata is() (i.e., it is scalar) andgrad isNone, then this method automatically complements1.0 as the initial error. This is useful on starting backprop fromsome scalar loss value.

From v3, this method supportsdifferentiable backprop (a.k.a. doublebackprop, grad of grads). To enable it, passenable_double_backprop=True.

Parameters
  • retain_grad (bool) –

    IfTrue, the gradient arrays of allintermediate variables are kept.Otherwise,grad of theintermediate variables are set toNone on appropriatetiming, which may reduce the maximum memory consumption.

    In most cases of training some models, the purpose of backpropis to compute gradients of parameters, not of all variables,and therefore it is recommended that this flag be set toFalse.

  • enable_double_backprop (bool) –(Added in v3.0) IfTrue,computational trace of the whole backpropagation procedure isrecorded to the computational graph so that one can further dobackpropagation from the resulting gradients. Note thatenabling it results in larger memory consumption needed tostore the gradients w.r.t intermediate variables that arerequired for the second gradient computation.

  • loss_scale (float) – Loss scaling factor. Loss scaling is a usefultechnique to mitigate vanishing gradient issue that tends tohappen when low precision data type like float16 is used duringtraining. If you set loss scaling factor, gradients of lossvalues are to be multiplied by the factor before backpropstarts. The factor is propagated to whole gradients in acomputational graph along the backprop. The gradients ofparameters are divided by the factor just before the parametersare to be updated.

cleargrad()[source]

Clears the gradient array.

copydata(var)[source]

Copies the data array from given source variable.

This method copies the data array from given variable to this variable.The copy is done even if the arrays reside on different devices,including across the host and a GPU device. If this variable has anuninitialized data array, this method initializes it by the data arrayof the given variable. Similarly, if the given variable has anuninitialized data array, this method initializes it by the data arrayof this variable (self). If both are uninitialized, this methoddoes nothing.

Parameters

var (Variable) – Source variable.

debug_print()[source]

Display a summary of the stored data and location of the Variable

from_chx()[source]

Converts the array and gradient to non-ChainerX arrays without copy.

This method converts the underlying ChainerX array and gradientresiding in either anative orcuda device to NumPy or CuPyarrays respectively, on their same physical device. It does nothingif the array held by the Variable object is not a ChainerX array. Thenew array is a view of the original one.

Raises an error if such a conversion is not supported for the device.

item()[source]

Converts the variable with one element to a Python scalar.

This will incur host-device synchronization.

Returns

The element of the array.

Return type

int orfloat

mean(axis=None,*,weights=None,keepdims=False)[source]

Calculate weighted average of array elements over a given axis.

See also

chainer.functions.average() for full documentation,

reshape(*shape)[source]

Returns a variable of a different shape and the same content.

See also

chainer.functions.reshape() for full documentation,

retain_data()[source]

Lets the corresponding variable node keep the underlying array.

set_creator(gen_func)[source]

Notifies the variable that the given function is its creator.

Parameters

gen_func (Function) – Function object that creates this variable asone of its outputs.

set_creator_node(fnode)[source]

Notifies the variable that the given node is its creator.

Parameters

fnode (FunctionNode) – Function node that has this variable as anoutput.

summary()[source]
to_chx()[source]

Converts the array and gradient to ChainerX arrays without copy.

This method converts the underlying array and gradient tochainerx.ndarray on the same physical device. It does nothingif the array held by the Variable object is already a ChainerX array.The new array is a view of the original one.

to_cpu()[source]

Copies the data and gradient arrays to CPU.

to_device(device)[source]

Copies the data and gradient arrays to specified device.

Parameters

device – Target device specifier. Seeget_device() for available values.

to_gpu(device=None)[source]

Copies the data and gradient arrays to specified GPU.

Parameters

device – Target device specifier. If omitted, the current device isused.

to_intel64()[source]

Copies the data and gradient arrays to intel64 specific mdarray.

If the array is not suited for intel64, it will be converted tonumpy.ndarray.

transpose(*axes)[source]

Permute the dimensions of an input variable without copy.

See also

chainer.functions.transpose() for full documentation.

unchain()[source]

Deletes the reference to the creator of this variable.

This method deletes the reference to the creator from the correspondingvariable node. Unlikeunchain_backward(), it does not backtrackthe graph.

This method is equivalent toself.creator_node=None.

unchain_backward()[source]

Deletes references between variable nodes and functions backward.

After this method completes, intermediate variable nodes and functionsthat are not referenced from anywhere are deallocated by referencecount GC. Also this variable itself deletes the reference to itscreator function from the node, i.e. the node becomes root in thecomputation graph. It indicates that backprop after unchaining stops atthis variable. This behavior is useful to implement truncated BPTT.

zerograd()[source]

Initializes the gradient array by zeros.

Note that the gradient variable is unchained from the computationalgraph by this method, because this operation breaks the backpropvalidity.

Deprecated since version v1.15:Use more efficientcleargrads() instead.

__eq__(other)[source]

This operator is not supported in Variables.

__ne__(other)[source]

This operator is not supported in Variables.

__lt__(other)[source]

This operator is not supported in Variables.

__le__(other)[source]

This operator is not supported in Variables.

__gt__(other)[source]

This operator is not supported in Variables.

__ge__(other)[source]

This operator is not supported in Variables.

__nonzero__()[source]

This operator is not supported in Variables.

__bool__()[source]

This operator is not supported in Variables.

__neg__()[source]

Element-wise negation.

Returns

Output variable.

Return type

Variable

__abs__()[source]

Element-wise absolute.

Returns

Output variable.

Return type

Variable

__add__()[source]

Element-wise addition.

Returns

Output variable.

Return type

Variable

__radd__()[source]

Element-wise addition.

Returns

Output variable.

Return type

Variable

__sub__(rhs)[source]

Element-wise subtraction.

Returns

Output variable.

Return type

Variable

__rsub__(rhs)[source]

Element-wise subtraction.

Returns

Output variable.

Return type

Variable

__mul__(rhs)[source]

Element-wise multiplication.

Returns

Output variable.

Return type

Variable

__rmul__(rhs)[source]

Element-wise multiplication.

Returns

Output variable.

Return type

Variable

__div__(rhs)[source]

Element-wise division

Returns

Output variable.

Return type

Variable

__truediv__(rhs)[source]

Element-wise division

Returns

Output variable.

Return type

Variable

__rdiv__(rhs)[source]

Element-wise division.

Returns

Output variable.

Return type

Variable

__rtruediv__(rhs)[source]

Element-wise division.

Returns

Output variable.

Return type

Variable

__floordiv__(rhs)[source]

Element-wise floor division.

Returns

Output variable.

Return type

Variable

__rfloordiv__(rhs)[source]

Element-wise floor division.

Returns

Output variable.

Return type

Variable

__pow__(rhs)[source]

Element-wise power function.

Returns

Output variable.

Return type

Variable

__rpow__(rhs)[source]

Element-wise power function.

Returns

Output variable.

Return type

Variable

__matmul__(rhs)[source]

Matrix multiplication.

Returns

Output variable.

Return type

Variable

__rmatmul__(rhs)[source]

Matrix multiplication.

Returns

Output variable.

Return type

Variable

Attributes

T

Transposition of this variable.

array

The underlying data array.

It is eithernumpy.ndarray orcupy.ndarray object,orNone if the variable in in an uninitialized state.

chx_array

A view of the raw ChainerX array.

In contrary toVariable.array which is always disconnected,the array represented by this attribute may be connected to thecomputational graph.

It is a view, so it has a distinct gradient from the original array.

If this attribute is queried on aVariable with a non-ChainerXarray,ValueError will be raised.

creator

Function implementation that created this variable.

When this variable has been created by an old-style function (i.e., itis implemented as a subclass ofFunction), this propertyreturns thatFunction object.

When this variable has been created by a new-style function (i.e., itis implemented as a subclass ofFunctionNode class), thisproperty returns that node object.

creator_node

FunctionNode object that created this variable.

This property has a setter to whichNone can be set. SettingNone to this property is equivalent to callunchain();it purges the variable from the function that created this variable.

The setter also accepts the originalFunctionNode object thatcreated this variable. For example, you can once setNone to thisproperty and then set the original value again.

Note

Setting an irrelevantFunctionNode() object does not emit anyerror immediately, whereas the behavior is undefined. Do not setaFunctionNode() object that did not create this variableobject.

data

The underlying data array (equivalent toarray).

Note that using this attribute directly is discouraged; usearray instead. Usingarray, you can find an errorearlier when your code mixes up Variable and ndarray becausendarray does not have an attribute.array while it has.data.

device

Device on which the data array of this variable reside.

dtype
grad

Gradient array of this variable.

Note that this property returns the underlying array of the gradientvariable instead of the gradient variable itself; to get/setgradient variable, usegrad_var instead.

If the underlying array is achainerx.ndarray andrequires_grad is false, trying to access the gradient will results inand error.

grad_var

Gradient variable.

label

Short text that represents the variable.

layout
name
ndim
node
rank
raw_array

The underlying raw data array.

Its shape does not have to be the semantic shape, if the memory layoutis non-standard.

requires_grad

It indicates thatgrad will be set in backward calculation.

shape
size
xp

Array module for the data array of this variable.