Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Matrix calculus

From Wikipedia, the free encyclopedia
Specialized notation for multivariable calculus
Not to be confused withgeometric calculus orvector calculus.
Part of a series of articles about
Calculus
abf(t)dt=f(b)f(a){\displaystyle \int _{a}^{b}f'(t)\,dt=f(b)-f(a)}

Inmathematics,matrix calculus is a specialized notation for doingmultivariable calculus, especially over spaces ofmatrices. It collects the variouspartial derivatives of a singlefunction with respect to manyvariables, and/or of amultivariate function with respect to a single variable, intovectors and matrices that can be treated as single entities. This greatly simplifies operations such as finding the maximum or minimum of a multivariate function and solving systems ofdifferential equations. The notation used here is commonly used instatistics andengineering, while thetensor index notation is preferred inphysics.

Two competing notational conventions split the field of matrix calculus into two separate groups. The two groups can be distinguished by whether they write the derivative of ascalar with respect to a vector as acolumn vector or a row vector. Both of these conventions are possible even when the common assumption is made that vectors should be treated as column vectors when combined with matrices (rather than row vectors). A single convention can be somewhat standard throughout a single field that commonly uses matrix calculus (e.g.econometrics, statistics,estimation theory andmachine learning). However, even within a given field different authors can be found using competing conventions. Authors of both groups often write as though their specific conventions were standard. Serious mistakes can result when combining results from different authors without carefully verifying that compatible notations have been used. Definitions of these two conventions and comparisons between them are collected in thelayout conventions section.

Scope

[edit]

Matrix calculus refers to a number of different notations that use matrices and vectors to collect the derivative of each component of the dependent variable with respect to each component of the independent variable. In general, the independent variable can be a scalar, a vector, or a matrix while the dependent variable can be any of these as well. Each different situation will lead to a different set of rules, or a separatecalculus, using the broader sense of the term. Matrix notation serves as a convenient way to collect the many derivatives in an organized way.

As a first example, consider thegradient fromvector calculus. For a scalar function of three independent variables,f(x1,x2,x3){\displaystyle f(x_{1},x_{2},x_{3})}, the gradient is given by the vector equation

f=fx1x^1+fx2x^2+fx3x^3,{\displaystyle \nabla f={\frac {\partial f}{\partial x_{1}}}{\hat {x}}_{1}+{\frac {\partial f}{\partial x_{2}}}{\hat {x}}_{2}+{\frac {\partial f}{\partial x_{3}}}{\hat {x}}_{3},}

wherex^i{\displaystyle {\hat {x}}_{i}} represents a unit vector in thexi{\displaystyle x_{i}} direction for1i3{\displaystyle 1\leq i\leq 3}. This type of generalized derivative can be seen as the derivative of a scalar,f, with respect to a vector,x{\displaystyle \mathbf {x} }, and its result can be easily collected in vector form.

f=(fx)T=[fx1fx2fx3]T.{\displaystyle \nabla f=\left({\frac {\partial f}{\partial \mathbf {x} }}\right)^{\mathsf {T}}={\begin{bmatrix}{\dfrac {\partial f}{\partial x_{1}}}&{\dfrac {\partial f}{\partial x_{2}}}&{\dfrac {\partial f}{\partial x_{3}}}\\\end{bmatrix}}^{\textsf {T}}.}

More complicated examples include the derivative of a scalar function with respect to a matrix, known as thegradient matrix, which collects the derivative with respect to each matrix element in the corresponding position in the resulting matrix. In that case the scalar must be a function of each of the independent variables in the matrix. As another example, if we have ann-vector of dependent variables, or functions, ofm independent variables we might consider the derivative of the dependent vector with respect to the independent vector. The result could be collected in anm×n matrix consisting of all of the possible derivative combinations.

There are a total of nine possibilities using scalars, vectors, and matrices. Notice that as we consider higher numbers of components in each of the independent and dependent variables we can be left with a very large number of possibilities. The six kinds of derivatives that can be most neatly organized in matrix form are collected in the following table.[1]

Types of matrix derivative
TypesScalarVectorMatrix
Scalaryx{\displaystyle {\frac {\partial y}{\partial x}}}yx{\displaystyle {\frac {\partial \mathbf {y} }{\partial x}}}Yx{\displaystyle {\frac {\partial \mathbf {Y} }{\partial x}}}
Vectoryx{\displaystyle {\frac {\partial y}{\partial \mathbf {x} }}}yx{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}}
MatrixyX{\displaystyle {\frac {\partial y}{\partial \mathbf {X} }}}

Here, we have used the term "matrix" in its most general sense, recognizing that vectors are simply matrices with one column (and scalars are simply vectors with one row). Moreover, we have used bold letters to indicate vectors and bold capital letters for matrices. This notation is used throughout.

Notice that we could also talk about the derivative of a vector with respect to a matrix, or any of the other unfilled cells in our table. However, these derivatives are most naturally organized in atensor of rank higher than 2, so that they do not fit neatly into a matrix. In the following three sections we will define each one of these derivatives and relate them to other branches of mathematics. See thelayout conventions section for a more detailed table.

Relation to other derivatives

[edit]

The matrix derivative is a convenient notation for keeping track of partial derivatives for doing calculations. TheFréchet derivative is the standard way in the setting offunctional analysis to take derivatives with respect to vectors. In the case that a matrix function of a matrix is Fréchet differentiable, the two derivatives will agree up to translation of notations. As is the case in general forpartial derivatives, some formulae may extend under weaker analytic conditions than the existence of the derivative as approximating linear mapping.

Usages

[edit]

Matrix calculus is used for deriving optimal stochastic estimators, often involving the use ofLagrange multipliers. This includes the derivation of:

Notation

[edit]

The vector and matrix derivatives presented in the sections to follow take full advantage ofmatrix notation, using a single variable to represent a large number of variables. In what follows we will distinguish scalars, vectors and matrices by their typeface. We will letM(n,m) denote the space ofrealn×m matrices withn rows andm columns. Such matrices will be denoted using bold capital letters:A,X,Y, etc. An element ofM(n,1), that is, acolumn vector, is denoted with a boldface lowercase letter:a,x,y, etc. An element ofM(1,1) is a scalar, denoted with lowercase italic typeface:a,t,x, etc.XT denotes matrixtranspose,tr(X) is thetrace, anddet(X) or|X| is thedeterminant. All functions are assumed to be ofdifferentiability classC1 unless otherwise noted. Generally letters from the first half of the alphabet (a, b, c, ...) will be used to denote constants, and from the second half (t, x, y, ...) to denote variables.

NOTE: As mentioned above, there are competing notations for laying out systems ofpartial derivatives in vectors and matrices, and no standard appears to be emerging yet. The next two introductory sections use thenumerator layout convention simply for the purposes of convenience, to avoid overly complicating the discussion. The section after them discusseslayout conventions in more detail. It is important to realize the following:

  1. Despite the use of the terms "numerator layout" and "denominator layout", there are actually more than two possible notational choices involved. The reason is that the choice of numerator vs. denominator (or in some situations, numerator vs. mixed) can be made independently for scalar-by-vector, vector-by-scalar, vector-by-vector, and scalar-by-matrix derivatives, and a number of authors mix and match their layout choices in various ways.
  2. The choice of numerator layout in the introductory sections below does not imply that this is the "correct" or "superior" choice. There are advantages and disadvantages to the various layout types. Serious mistakes can result from carelessly combining formulas written in different layouts, and converting from one layout to another requires care to avoid errors. As a result, when working with existing formulas the best policy is probably to identify whichever layout is used and maintain consistency with it, rather than attempting to use the same layout in all situations.

Alternatives

[edit]

Thetensor index notation with itsEinstein summation convention is very similar to the matrix calculus, except one writes only a single component at a time. It has the advantage that one can easily manipulate arbitrarily high rank tensors, whereas tensors of rank higher than two are quite unwieldy with matrix notation. All of the work here can be done in this notation without use of the single-variable matrix notation. However, many problems in estimation theory and other areas of applied mathematics would result in too many indices to properly keep track of, pointing in favor of matrix calculus in those areas. Also, Einstein notation can be very useful in proving the identities presented here (see section ondifferentiation) as an alternative to typical element notation, which can become cumbersome when the explicit sums are carried around. Note that a matrix can be considered a tensor of rank two.

Derivatives with vectors

[edit]
Main article:Vector calculus

Because vectors are matrices with only one column, the simplest matrix derivatives are vector derivatives.

The notations developed here can accommodate the usual operations ofvector calculus by identifying the spaceM(n,1) ofn-vectors with theEuclidean spaceRn, and the scalarM(1,1) is identified withR. The corresponding concept from vector calculus is indicated at the end of each subsection.

NOTE: The discussion in this section assumes thenumerator layout convention for pedagogical purposes. Some authors use different conventions. The section onlayout conventions discusses this issue in greater detail. The identities given further down are presented in forms that can be used in conjunction with all common layout conventions.

Vector-by-scalar

[edit]

Thederivative of avectory=[y1y2ym]T{\displaystyle \mathbf {y} ={\begin{bmatrix}y_{1}&y_{2}&\cdots &y_{m}\end{bmatrix}}^{\mathsf {T}}}, by ascalarx is written (innumerator layout notation) as

dydx=[dy1dxdy2dxdymdx].{\displaystyle {\frac {d\mathbf {y} }{dx}}={\begin{bmatrix}{\frac {dy_{1}}{dx}}\\{\frac {dy_{2}}{dx}}\\\vdots \\{\frac {dy_{m}}{dx}}\\\end{bmatrix}}.}

Invector calculus the derivative of a vectory with respect to a scalarx is known as thetangent vector of the vectory,yx{\displaystyle {\frac {\partial \mathbf {y} }{\partial x}}}. Notice here thaty:R1Rm.

Example Simple examples of this include thevelocity vector inEuclidean space, which is thetangent vector of theposition vector (considered as a function of time). Also, theacceleration is the tangent vector of the velocity.

Scalar-by-vector

[edit]

Thederivative of ascalary by a vectorx=[x1x2xn]{\displaystyle \mathbf {x} ={\begin{bmatrix}x_{1}&x_{2}&\cdots &x_{n}\end{bmatrix}}}, is written (innumerator layout notation) as

yx=[yx1yx2yxn].{\displaystyle {\frac {\partial y}{\partial \mathbf {x} }}={\begin{bmatrix}{\dfrac {\partial y}{\partial x_{1}}}&{\dfrac {\partial y}{\partial x_{2}}}&\cdots &{\dfrac {\partial y}{\partial x_{n}}}\end{bmatrix}}.}

Invector calculus, thegradient of a scalar fieldf :RnR (whose independent coordinates are the components ofx) is the transpose of the derivative of a scalar by a vector.

f=[fx1fxn]=(fx)T{\displaystyle \nabla f={\begin{bmatrix}{\frac {\partial f}{\partial x_{1}}}\\\vdots \\{\frac {\partial f}{\partial x_{n}}}\end{bmatrix}}=\left({\frac {\partial f}{\partial \mathbf {x} }}\right)^{\mathsf {T}}}

For example, in physics, theelectric field is the negative vectorgradient of theelectric potential.

Thedirectional derivative of a scalar functionf(x) of the space vectorx in the direction of the unit vectoru (represented in this case as a column vector) is defined using the gradient as follows.

uf(x)=f(x)u{\displaystyle \nabla _{\mathbf {u} }{f}(\mathbf {x} )=\nabla f(\mathbf {x} )\cdot \mathbf {u} }

Using the notation just defined for the derivative of a scalar with respect to a vector we can re-write the directional derivative asuf=fxu.{\displaystyle \nabla _{\mathbf {u} }f={\frac {\partial f}{\partial \mathbf {x} }}\mathbf {u} .} This type of notation will be nice when proving product rules and chain rules that come out looking similar to what we are familiar with for the scalarderivative.

Vector-by-vector

[edit]

Each of the previous two cases can be considered as an application of the derivative of a vector with respect to a vector, using a vector of size one appropriately. Similarly we will find that the derivatives involving matrices will reduce to derivatives involving vectors in a corresponding way.

The derivative of avector function (a vector whose components are functions)y=[y1y2ym]T{\displaystyle \mathbf {y} ={\begin{bmatrix}y_{1}&y_{2}&\cdots &y_{m}\end{bmatrix}}^{\mathsf {T}}}, with respect to an input vector,x=[x1x2xn]T{\displaystyle \mathbf {x} ={\begin{bmatrix}x_{1}&x_{2}&\cdots &x_{n}\end{bmatrix}}^{\mathsf {T}}}, is written (innumerator layout notation) as

yx=[y1x1y1x2y1xny2x1y2x2y2xnymx1ymx2ymxn].{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}={\begin{bmatrix}{\frac {\partial y_{1}}{\partial x_{1}}}&{\frac {\partial y_{1}}{\partial x_{2}}}&\cdots &{\frac {\partial y_{1}}{\partial x_{n}}}\\{\frac {\partial y_{2}}{\partial x_{1}}}&{\frac {\partial y_{2}}{\partial x_{2}}}&\cdots &{\frac {\partial y_{2}}{\partial x_{n}}}\\\vdots &\vdots &\ddots &\vdots \\{\frac {\partial y_{m}}{\partial x_{1}}}&{\frac {\partial y_{m}}{\partial x_{2}}}&\cdots &{\frac {\partial y_{m}}{\partial x_{n}}}\\\end{bmatrix}}.}

Invector calculus, the derivative of a vector functiony with respect to a vectorx whose components represent a space is known as thepushforward (or differential), or theJacobian matrix.

The pushforward along a vector functionf with respect to vectorv inRn is given bydf(v)=fvdv.{\displaystyle d\mathbf {f} (\mathbf {v} )={\frac {\partial \mathbf {f} }{\partial \mathbf {v} }}d\mathbf {v} .}

Derivatives with matrices

[edit]

There are two types of derivatives with matrices that can be organized into a matrix of the same size. These are the derivative of a matrix by a scalar and the derivative of a scalar by a matrix. These can be useful in minimization problems found in many areas of applied mathematics and have adopted the namestangent matrix andgradient matrix respectively after their analogs for vectors.

Note: The discussion in this section assumes thenumerator layout convention for pedagogical purposes. Some authors use different conventions. The section onlayout conventions discusses this issue in greater detail. The identities given further down are presented in forms that can be used in conjunction with all common layout conventions.

Matrix-by-scalar

[edit]

The derivative of a matrix functionY by a scalarx is known as thetangent matrix and is given (innumerator layout notation) by

Yx=[y11xy12xy1nxy21xy22xy2nxym1xym2xymnx].{\displaystyle {\frac {\partial \mathbf {Y} }{\partial x}}={\begin{bmatrix}{\frac {\partial y_{11}}{\partial x}}&{\frac {\partial y_{12}}{\partial x}}&\cdots &{\frac {\partial y_{1n}}{\partial x}}\\{\frac {\partial y_{21}}{\partial x}}&{\frac {\partial y_{22}}{\partial x}}&\cdots &{\frac {\partial y_{2n}}{\partial x}}\\\vdots &\vdots &\ddots &\vdots \\{\frac {\partial y_{m1}}{\partial x}}&{\frac {\partial y_{m2}}{\partial x}}&\cdots &{\frac {\partial y_{mn}}{\partial x}}\\\end{bmatrix}}.}

Scalar-by-matrix

[edit]

The derivative of a scalar functiony, with respect to ap×q matrixX of independent variables, is given (innumerator layout notation) by

yX=[yx11yx21yxp1yx12yx22yxp2yx1qyx2qyxpq].{\displaystyle {\frac {\partial y}{\partial \mathbf {X} }}={\begin{bmatrix}{\frac {\partial y}{\partial x_{11}}}&{\frac {\partial y}{\partial x_{21}}}&\cdots &{\frac {\partial y}{\partial x_{p1}}}\\{\frac {\partial y}{\partial x_{12}}}&{\frac {\partial y}{\partial x_{22}}}&\cdots &{\frac {\partial y}{\partial x_{p2}}}\\\vdots &\vdots &\ddots &\vdots \\{\frac {\partial y}{\partial x_{1q}}}&{\frac {\partial y}{\partial x_{2q}}}&\cdots &{\frac {\partial y}{\partial x_{pq}}}\\\end{bmatrix}}.}

Important examples of scalar functions of matrices include thetrace of a matrix and thedeterminant.

In analog withvector calculus this derivative is often written as the following.

Xy(X)=y(X)X{\displaystyle \nabla _{\mathbf {X} }y(\mathbf {X} )={\frac {\partial y(\mathbf {X} )}{\partial \mathbf {X} }}}

Also in analog withvector calculus, thedirectional derivative of a scalarf(X) of a matrixX in the direction of matrixY is given by

Yf=tr(fXY).{\displaystyle \nabla _{\mathbf {Y} }f=\operatorname {tr} \left({\frac {\partial f}{\partial \mathbf {X} }}\mathbf {Y} \right).}

It is the gradient matrix, in particular, that finds many uses in minimization problems inestimation theory, particularly in thederivation of theKalman filter algorithm, which is of great importance in the field.

Other matrix derivatives

[edit]

The three types of derivatives that have not been considered are those involving vectors-by-matrices, matrices-by-vectors, and matrices-by-matrices. These are not as widely considered and a notation is not widely agreed upon.

Layout conventions

[edit]

This section discusses the similarities and differences between notational conventions that are used in the various fields that take advantage of matrix calculus. Although there are largely two consistent conventions, some authors find it convenient to mix the two conventions in forms that are discussed below. After this section, equations will be listed in both competing forms separately.

The fundamental issue is that the derivative of a vector with respect to a vector, i.e.yx{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}}, is often written in two competing ways. If the numeratory is of sizem and the denominatorx of sizen, then the result can be laid out as either anm×n matrix orn×m matrix, i.e. them elements ofy laid out in rows and then elements ofx laid out in columns, or vice versa. This leads to the following possibilities:

  1. Numerator layout, i.e. lay out according toy andxT (i.e. contrarily tox). This is sometimes known as theJacobian formulation. This corresponds to them×n layout in the previous example, which means that the row number ofyx{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}} equals to the size of the numeratory{\displaystyle \mathbf {y} } and the column number ofyx{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}} equals to the size ofxT.
  2. Denominator layout, i.e. lay out according toyT andx (i.e. contrarily toy). This is sometimes known as theHessian formulation. Some authors term this layout thegradient, in distinction to theJacobian (numerator layout), which is its transpose. (However,gradient more commonly means the derivativeyx,{\displaystyle {\frac {\partial y}{\partial \mathbf {x} }},} regardless of layout.). This corresponds to then×m layout in the previous example, which means that the row number ofyx{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}} equals to the size ofx (the denominator).
  3. A third possibility sometimes seen is to insist on writing the derivative asyx,{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} '}},} (i.e. the derivative is taken with respect to the transpose ofx) and follow the numerator layout. This makes it possible to claim that the matrix is laid out according to both numerator and denominator. In practice this produces results the same as the numerator layout.

When handling thegradientyx{\displaystyle {\frac {\partial y}{\partial \mathbf {x} }}} and the opposite caseyx,{\displaystyle {\frac {\partial \mathbf {y} }{\partial x}},} we have the same issues. To be consistent, we should do one of the following:

  1. If we choose numerator layout foryx,{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }},} we should lay out thegradientyx{\displaystyle {\frac {\partial y}{\partial \mathbf {x} }}} as a row vector, andyx{\displaystyle {\frac {\partial \mathbf {y} }{\partial x}}} as a column vector.
  2. If we choose denominator layout foryx,{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }},} we should lay out thegradientyx{\displaystyle {\frac {\partial y}{\partial \mathbf {x} }}} as a column vector, andyx{\displaystyle {\frac {\partial \mathbf {y} }{\partial x}}} as a row vector.
  3. In the third possibility above, we writeyx{\displaystyle {\frac {\partial y}{\partial \mathbf {x} '}}} andyx,{\displaystyle {\frac {\partial \mathbf {y} }{\partial x}},} and use numerator layout.

Not all math textbooks and papers are consistent in this respect throughout. That is, sometimes different conventions are used in different contexts within the same book or paper. For example, some choose denominator layout for gradients (laying them out as column vectors), but numerator layout for the vector-by-vector derivativeyx.{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}.}

Similarly, when it comes to scalar-by-matrix derivativesyX{\displaystyle {\frac {\partial y}{\partial \mathbf {X} }}} and matrix-by-scalar derivativesYx,{\displaystyle {\frac {\partial \mathbf {Y} }{\partial x}},} then consistent numerator layout lays out according toY andXT, while consistent denominator layout lays out according toYT andX. In practice, however, following a denominator layout forYx,{\displaystyle {\frac {\partial \mathbf {Y} }{\partial x}},} and laying the result out according toYT, is rarely seen because it makes for ugly formulas that do not correspond to the scalar formulas. As a result, the following layouts can often be found:

  1. Consistent numerator layout, which lays outYx{\displaystyle {\frac {\partial \mathbf {Y} }{\partial x}}} according toY andyX{\displaystyle {\frac {\partial y}{\partial \mathbf {X} }}} according toXT.
  2. Mixed layout, which lays outYx{\displaystyle {\frac {\partial \mathbf {Y} }{\partial x}}} according toY andyX{\displaystyle {\frac {\partial y}{\partial \mathbf {X} }}} according toX.
  3. Use the notationyX,{\displaystyle {\frac {\partial y}{\partial \mathbf {X} '}},} with results the same as consistent numerator layout.

In the following formulas, we handle the five possible combinationsyx,yx,yx,yX{\displaystyle {\frac {\partial y}{\partial \mathbf {x} }},{\frac {\partial \mathbf {y} }{\partial x}},{\frac {\partial \mathbf {y} }{\partial \mathbf {x} }},{\frac {\partial y}{\partial \mathbf {X} }}} andYx{\displaystyle {\frac {\partial \mathbf {Y} }{\partial x}}} separately. We also handle cases of scalar-by-scalar derivatives that involve an intermediate vector or matrix. (This can arise, for example, if a multi-dimensionalparametric curve is defined in terms of a scalar variable, and then a derivative of a scalar function of the curve is taken with respect to the scalar that parameterizes the curve.) For each of the various combinations, we give numerator-layout and denominator-layout results, except in the cases above where denominator layout rarely occurs. In cases involving matrices where it makes sense, we give numerator-layout and mixed-layout results. As noted above, cases where vector and matrix denominators are written in transpose notation are equivalent to numerator layout with the denominators written without the transpose.

Keep in mind that various authors use different combinations of numerator and denominator layouts for different types of derivatives, and there is no guarantee that an author will consistently use either numerator or denominator layout for all types. Match up the formulas below with those quoted in the source to determine the layout used for that particular type of derivative, but be careful not to assume that derivatives of other types necessarily follow the same kind of layout.

When taking derivatives with an aggregate (vector or matrix) denominator in order to find a maximum or minimum of the aggregate, it should be kept in mind that using numerator layout will produce results that are transposed with respect to the aggregate. For example, in attempting to find themaximum likelihood estimate of amultivariate normal distribution using matrix calculus, if the domain is ak×1 column vector, then the result using the numerator layout will be in the form of a 1×k row vector. Thus, either the results should be transposed at the end or the denominator layout (or mixed layout) should be used.

Result of differentiating various kinds of aggregates with other kinds of aggregates
ScalaryColumn vectory (sizem×1)MatrixY (sizem×n)
NotationTypeNotationTypeNotationType
ScalarxNumeratoryx{\displaystyle {\frac {\partial y}{\partial x}}}Scalaryx{\displaystyle {\frac {\partial \mathbf {y} }{\partial x}}}Size-mcolumn vectorYx{\displaystyle {\frac {\partial \mathbf {Y} }{\partial x}}}m×n matrix
DenominatorSize-mrow vector
Column vectorx
(sizen×1)
Numeratoryx{\displaystyle {\frac {\partial y}{\partial \mathbf {x} }}}Size-nrow vectoryx{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}}m×n matrixYx{\displaystyle {\frac {\partial \mathbf {Y} }{\partial \mathbf {x} }}}
DenominatorSize-ncolumn vectorn×m matrix
MatrixX
(sizep×q)
NumeratoryX{\displaystyle {\frac {\partial y}{\partial \mathbf {X} }}}q×p matrixyX{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {X} }}}YX{\displaystyle {\frac {\partial \mathbf {Y} }{\partial \mathbf {X} }}}
Denominatorp×q matrix

The results of operations will be transposed when switching between numerator-layout and denominator-layout notation.

Numerator-layout notation

[edit]

Using numerator-layout notation, we have:[1]

yx=[yx1yx2yxn].yx=[y1xy2xymx].yx=[y1x1y1x2y1xny2x1y2x2y2xnymx1ymx2ymxn].yX=[yx11yx21yxp1yx12yx22yxp2yx1qyx2qyxpq].{\displaystyle {\begin{aligned}{\frac {\partial y}{\partial \mathbf {x} }}&={\begin{bmatrix}{\frac {\partial y}{\partial x_{1}}}&{\frac {\partial y}{\partial x_{2}}}&\cdots &{\frac {\partial y}{\partial x_{n}}}\end{bmatrix}}.\\{\frac {\partial \mathbf {y} }{\partial x}}&={\begin{bmatrix}{\frac {\partial y_{1}}{\partial x}}\\{\frac {\partial y_{2}}{\partial x}}\\\vdots \\{\frac {\partial y_{m}}{\partial x}}\\\end{bmatrix}}.\\{\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}&={\begin{bmatrix}{\frac {\partial y_{1}}{\partial x_{1}}}&{\frac {\partial y_{1}}{\partial x_{2}}}&\cdots &{\frac {\partial y_{1}}{\partial x_{n}}}\\{\frac {\partial y_{2}}{\partial x_{1}}}&{\frac {\partial y_{2}}{\partial x_{2}}}&\cdots &{\frac {\partial y_{2}}{\partial x_{n}}}\\\vdots &\vdots &\ddots &\vdots \\{\frac {\partial y_{m}}{\partial x_{1}}}&{\frac {\partial y_{m}}{\partial x_{2}}}&\cdots &{\frac {\partial y_{m}}{\partial x_{n}}}\\\end{bmatrix}}.\\{\frac {\partial y}{\partial \mathbf {X} }}&={\begin{bmatrix}{\frac {\partial y}{\partial x_{11}}}&{\frac {\partial y}{\partial x_{21}}}&\cdots &{\frac {\partial y}{\partial x_{p1}}}\\{\frac {\partial y}{\partial x_{12}}}&{\frac {\partial y}{\partial x_{22}}}&\cdots &{\frac {\partial y}{\partial x_{p2}}}\\\vdots &\vdots &\ddots &\vdots \\{\frac {\partial y}{\partial x_{1q}}}&{\frac {\partial y}{\partial x_{2q}}}&\cdots &{\frac {\partial y}{\partial x_{pq}}}\\\end{bmatrix}}.\end{aligned}}}

The following definitions are only provided in numerator-layout notation:

Yx=[y11xy12xy1nxy21xy22xy2nxym1xym2xymnx].dX=[dx11dx12dx1ndx21dx22dx2ndxm1dxm2dxmn].{\displaystyle {\begin{aligned}{\frac {\partial \mathbf {Y} }{\partial x}}&={\begin{bmatrix}{\frac {\partial y_{11}}{\partial x}}&{\frac {\partial y_{12}}{\partial x}}&\cdots &{\frac {\partial y_{1n}}{\partial x}}\\{\frac {\partial y_{21}}{\partial x}}&{\frac {\partial y_{22}}{\partial x}}&\cdots &{\frac {\partial y_{2n}}{\partial x}}\\\vdots &\vdots &\ddots &\vdots \\{\frac {\partial y_{m1}}{\partial x}}&{\frac {\partial y_{m2}}{\partial x}}&\cdots &{\frac {\partial y_{mn}}{\partial x}}\\\end{bmatrix}}.\\d\mathbf {X} &={\begin{bmatrix}dx_{11}&dx_{12}&\cdots &dx_{1n}\\dx_{21}&dx_{22}&\cdots &dx_{2n}\\\vdots &\vdots &\ddots &\vdots \\dx_{m1}&dx_{m2}&\cdots &dx_{mn}\\\end{bmatrix}}.\end{aligned}}}

Denominator-layout notation

[edit]

Using denominator-layout notation, we have:[2]

yx=[yx1yx2yxn].yx=[y1xy2xymx].yx=[y1x1y2x1ymx1y1x2y2x2ymx2y1xny2xnymxn].yX=[yx11yx12yx1qyx21yx22yx2qyxp1yxp2yxpq].{\displaystyle {\begin{aligned}{\frac {\partial y}{\partial \mathbf {x} }}&={\begin{bmatrix}{\frac {\partial y}{\partial x_{1}}}\\{\frac {\partial y}{\partial x_{2}}}\\\vdots \\{\frac {\partial y}{\partial x_{n}}}\\\end{bmatrix}}.\\{\frac {\partial \mathbf {y} }{\partial x}}&={\begin{bmatrix}{\frac {\partial y_{1}}{\partial x}}&{\frac {\partial y_{2}}{\partial x}}&\cdots &{\frac {\partial y_{m}}{\partial x}}\end{bmatrix}}.\\{\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}&={\begin{bmatrix}{\frac {\partial y_{1}}{\partial x_{1}}}&{\frac {\partial y_{2}}{\partial x_{1}}}&\cdots &{\frac {\partial y_{m}}{\partial x_{1}}}\\{\frac {\partial y_{1}}{\partial x_{2}}}&{\frac {\partial y_{2}}{\partial x_{2}}}&\cdots &{\frac {\partial y_{m}}{\partial x_{2}}}\\\vdots &\vdots &\ddots &\vdots \\{\frac {\partial y_{1}}{\partial x_{n}}}&{\frac {\partial y_{2}}{\partial x_{n}}}&\cdots &{\frac {\partial y_{m}}{\partial x_{n}}}\\\end{bmatrix}}.\\{\frac {\partial y}{\partial \mathbf {X} }}&={\begin{bmatrix}{\frac {\partial y}{\partial x_{11}}}&{\frac {\partial y}{\partial x_{12}}}&\cdots &{\frac {\partial y}{\partial x_{1q}}}\\{\frac {\partial y}{\partial x_{21}}}&{\frac {\partial y}{\partial x_{22}}}&\cdots &{\frac {\partial y}{\partial x_{2q}}}\\\vdots &\vdots &\ddots &\vdots \\{\frac {\partial y}{\partial x_{p1}}}&{\frac {\partial y}{\partial x_{p2}}}&\cdots &{\frac {\partial y}{\partial x_{pq}}}\\\end{bmatrix}}.\end{aligned}}}

Identities

[edit]

As noted above, in general, the results of operations will be transposed when switching between numerator-layout and denominator-layout notation.

To help make sense of all the identities below, keep in mind the most important rules: thechain rule,product rule andsum rule. The sum rule applies universally, and the product rule applies in most of the cases below, provided that the order of matrix products is maintained, since matrix products are not commutative. The chain rule applies in some of the cases, but unfortunately doesnot apply in matrix-by-scalar derivatives or scalar-by-matrix derivatives (in the latter case, mostly involving thetrace operator applied to matrices). In the latter case, the product rule can't quite be applied directly, either, but the equivalent can be done with a bit more work using the differential identities.

The following identities adopt the following conventions:

  • the scalars,a,b,c,d, ande are constant in respect of, and the scalars,u, andv are functions of one ofx,x, orX;
  • the vectors,a,b,c,d, ande are constant in respect of, and the vectors,u, andv are functions of one ofx,x, orX;
  • the matrices,A,B,C,D, andE are constant in respect of, and the matrices,U andV are functions of one ofx,x, orX.

Vector-by-vector identities

[edit]

This is presented first because all of the operations that apply to vector-by-vector differentiation apply directly to vector-by-scalar or scalar-by-vector differentiation simply by reducing the appropriate vector in the numerator or denominator to a scalar.

Identities: vector-by-vectoryx{\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}}
ConditionExpressionNumerator layout, i.e. byy andxTDenominator layout, i.e. byyT andx
a is not a function ofxax={\displaystyle {\frac {\partial \mathbf {a} }{\partial \mathbf {x} }}=}0{\displaystyle \mathbf {0} }
xx={\displaystyle {\frac {\partial \mathbf {x} }{\partial \mathbf {x} }}=}I{\displaystyle \mathbf {I} }
A is not a function ofxAxx={\displaystyle {\frac {\partial \mathbf {A} \mathbf {x} }{\partial \mathbf {x} }}=}A{\displaystyle \mathbf {A} }A{\displaystyle \mathbf {A} ^{\top }}
A is not a function ofxxAx={\displaystyle {\frac {\partial \mathbf {x} ^{\top }\mathbf {A} }{\partial \mathbf {x} }}=}A{\displaystyle \mathbf {A} ^{\top }}A{\displaystyle \mathbf {A} }
a is not a function ofx,
u =u(x)
aux={\displaystyle {\frac {\partial a\mathbf {u} }{\partial \,\mathbf {x} }}=}aux{\displaystyle a{\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}}
v =v(x),
a is not a function ofx
vax={\displaystyle {\frac {\partial v\mathbf {a} }{\partial \mathbf {x} }}=}avx{\displaystyle \mathbf {a} {\frac {\partial v}{\partial \mathbf {x} }}}vxa{\displaystyle {\frac {\partial v}{\partial \mathbf {x} }}\mathbf {a} ^{\top }}
v =v(x),u =u(x)vux={\displaystyle {\frac {\partial v\mathbf {u} }{\partial \mathbf {x} }}=}vux+uvx{\displaystyle v{\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}+\mathbf {u} {\frac {\partial v}{\partial \mathbf {x} }}}vux+vxu{\displaystyle v{\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}+{\frac {\partial v}{\partial \mathbf {x} }}\mathbf {u} ^{\top }}
A is not a function ofx,
u =u(x)
Aux={\displaystyle {\frac {\partial \mathbf {A} \mathbf {u} }{\partial \mathbf {x} }}=}Aux{\displaystyle \mathbf {A} {\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}}uxA{\displaystyle {\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}\mathbf {A} ^{\top }}
u =u(x),v =v(x)(u+v)x={\displaystyle {\frac {\partial (\mathbf {u} +\mathbf {v} )}{\partial \mathbf {x} }}=}ux+vx{\displaystyle {\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}+{\frac {\partial \mathbf {v} }{\partial \mathbf {x} }}}
u =u(x)g(u)x={\displaystyle {\frac {\partial \mathbf {g} (\mathbf {u} )}{\partial \mathbf {x} }}=}g(u)uux{\displaystyle {\frac {\partial \mathbf {g} (\mathbf {u} )}{\partial \mathbf {u} }}{\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}}uxg(u)u{\displaystyle {\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}{\frac {\partial \mathbf {g} (\mathbf {u} )}{\partial \mathbf {u} }}}
u =u(x)f(g(u))x={\displaystyle {\frac {\partial \mathbf {f} (\mathbf {g} (\mathbf {u} ))}{\partial \mathbf {x} }}=}f(g)gg(u)uux{\displaystyle {\frac {\partial \mathbf {f} (\mathbf {g} )}{\partial \mathbf {g} }}{\frac {\partial \mathbf {g} (\mathbf {u} )}{\partial \mathbf {u} }}{\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}}uxg(u)uf(g)g{\displaystyle {\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}{\frac {\partial \mathbf {g} (\mathbf {u} )}{\partial \mathbf {u} }}{\frac {\partial \mathbf {f} (\mathbf {g} )}{\partial \mathbf {g} }}}

Scalar-by-vector identities

[edit]

The fundamental identities are placed above the thick black line.

Identities: scalar-by-vectoryx=xy{\displaystyle {\frac {\partial y}{\partial \mathbf {x} }}=\nabla _{\mathbf {x} }y}
ConditionExpressionNumerator layout,
i.e. byxT; result is row vector
Denominator layout,
i.e. byx; result is column vector
a is not a function ofxax={\displaystyle {\frac {\partial a}{\partial \mathbf {x} }}=}0{\displaystyle \mathbf {0} ^{\top }}[nb 1]0{\displaystyle \mathbf {0} }[nb 1]
a is not a function ofx,
u =u(x)
aux={\displaystyle {\frac {\partial au}{\partial \mathbf {x} }}=}aux{\displaystyle a{\frac {\partial u}{\partial \mathbf {x} }}}
u =u(x),v =v(x)(u+v)x={\displaystyle {\frac {\partial (u+v)}{\partial \mathbf {x} }}=}ux+vx{\displaystyle {\frac {\partial u}{\partial \mathbf {x} }}+{\frac {\partial v}{\partial \mathbf {x} }}}
u =u(x),v =v(x)uvx={\displaystyle {\frac {\partial uv}{\partial \mathbf {x} }}=}uvx+vux{\displaystyle u{\frac {\partial v}{\partial \mathbf {x} }}+v{\frac {\partial u}{\partial \mathbf {x} }}}
u =u(x)g(u)x={\displaystyle {\frac {\partial g(u)}{\partial \mathbf {x} }}=}g(u)uux{\displaystyle {\frac {\partial g(u)}{\partial u}}{\frac {\partial u}{\partial \mathbf {x} }}}
u =u(x)f(g(u))x={\displaystyle {\frac {\partial f(g(u))}{\partial \mathbf {x} }}=}f(g)gg(u)uux{\displaystyle {\frac {\partial f(g)}{\partial g}}{\frac {\partial g(u)}{\partial u}}{\frac {\partial u}{\partial \mathbf {x} }}}
u =u(x),v =v(x)(uv)x=uvx={\displaystyle {\frac {\partial (\mathbf {u} \cdot \mathbf {v} )}{\partial \mathbf {x} }}={\frac {\partial \mathbf {u} ^{\top }\mathbf {v} }{\partial \mathbf {x} }}=}uvx+vux{\displaystyle \mathbf {u} ^{\top }{\frac {\partial \mathbf {v} }{\partial \mathbf {x} }}+\mathbf {v} ^{\top }{\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}}

ux,vx{\displaystyle {\frac {\partial \mathbf {u} }{\partial \mathbf {x} }},{\frac {\partial \mathbf {v} }{\partial \mathbf {x} }}} in numerator layout

uxv+vxu{\displaystyle {\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}\mathbf {v} +{\frac {\partial \mathbf {v} }{\partial \mathbf {x} }}\mathbf {u} }

ux,vx{\displaystyle {\frac {\partial \mathbf {u} }{\partial \mathbf {x} }},{\frac {\partial \mathbf {v} }{\partial \mathbf {x} }}} in denominator layout

u =u(x),v =v(x),
A is not a function ofx
(uAv)x=uAvx={\displaystyle {\frac {\partial (\mathbf {u} \cdot \mathbf {A} \mathbf {v} )}{\partial \mathbf {x} }}={\frac {\partial \mathbf {u} ^{\top }\mathbf {A} \mathbf {v} }{\partial \mathbf {x} }}=}uAvx+vAux{\displaystyle \mathbf {u} ^{\top }\mathbf {A} {\frac {\partial \mathbf {v} }{\partial \mathbf {x} }}+\mathbf {v} ^{\top }\mathbf {A} ^{\top }{\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}}

ux,vx{\displaystyle {\frac {\partial \mathbf {u} }{\partial \mathbf {x} }},{\frac {\partial \mathbf {v} }{\partial \mathbf {x} }}} in numerator layout

uxAv+vxAu{\displaystyle {\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}\mathbf {A} \mathbf {v} +{\frac {\partial \mathbf {v} }{\partial \mathbf {x} }}\mathbf {A} ^{\top }\mathbf {u} }

ux,vx{\displaystyle {\frac {\partial \mathbf {u} }{\partial \mathbf {x} }},{\frac {\partial \mathbf {v} }{\partial \mathbf {x} }}} in denominator layout

2fxx={\displaystyle {\frac {\partial ^{2}f}{\partial \mathbf {x} \partial \mathbf {x} ^{\top }}}=}H{\displaystyle \mathbf {H} ^{\top }}H{\displaystyle \mathbf {H} }, theHessian matrix[3]
a is not a function ofx(ax)x=(xa)x={\displaystyle {\frac {\partial (\mathbf {a} \cdot \mathbf {x} )}{\partial \mathbf {x} }}={\frac {\partial (\mathbf {x} \cdot \mathbf {a} )}{\partial \mathbf {x} }}=}

axx=xax={\displaystyle {\frac {\partial \mathbf {a} ^{\top }\mathbf {x} }{\partial \mathbf {x} }}={\frac {\partial \mathbf {x} ^{\top }\mathbf {a} }{\partial \mathbf {x} }}=}
a{\displaystyle \mathbf {a} ^{\top }}a{\displaystyle \mathbf {a} }
A is not a function ofx
b is not a function ofx
bAxx={\displaystyle {\frac {\partial \mathbf {b} ^{\top }\mathbf {A} \mathbf {x} }{\partial \mathbf {x} }}=}bA{\displaystyle \mathbf {b} ^{\top }\mathbf {A} }Ab{\displaystyle \mathbf {A} ^{\top }\mathbf {b} }
A is not a function ofxxAxx={\displaystyle {\frac {\partial \mathbf {x} ^{\top }\mathbf {A} \mathbf {x} }{\partial \mathbf {x} }}=}x(A+A){\displaystyle \mathbf {x} ^{\top }\left(\mathbf {A} +\mathbf {A} ^{\top }\right)}(A+A)x{\displaystyle \left(\mathbf {A} +\mathbf {A} ^{\top }\right)\mathbf {x} }
A is not a function ofx
A issymmetric
xAxx={\displaystyle {\frac {\partial \mathbf {x} ^{\top }\mathbf {A} \mathbf {x} }{\partial \mathbf {x} }}=}2xA{\displaystyle 2\mathbf {x} ^{\top }\mathbf {A} }2Ax{\displaystyle 2\mathbf {A} \mathbf {x} }
A is not a function ofx2xAxxx={\displaystyle {\frac {\partial ^{2}\mathbf {x} ^{\top }\mathbf {A} \mathbf {x} }{\partial \mathbf {x} \partial \mathbf {x} ^{\top }}}=}A+A{\displaystyle \mathbf {A} +\mathbf {A} ^{\top }}
A is not a function ofx
A issymmetric
2xAxxx={\displaystyle {\frac {\partial ^{2}\mathbf {x} ^{\top }\mathbf {A} \mathbf {x} }{\partial \mathbf {x} \partial \mathbf {x} ^{\top }}}=}2A{\displaystyle 2\mathbf {A} }
(xx)x=xxx=x2x={\displaystyle {\frac {\partial (\mathbf {x} \cdot \mathbf {x} )}{\partial \mathbf {x} }}={\frac {\partial \mathbf {x} ^{\top }\mathbf {x} }{\partial \mathbf {x} }}={\frac {\partial \left\Vert \mathbf {x} \right\Vert ^{2}}{\partial \mathbf {x} }}=}2x{\displaystyle 2\mathbf {x} ^{\top }}2x{\displaystyle 2\mathbf {x} }
a is not a function ofx,
u =u(x)
(au)x=aux={\displaystyle {\frac {\partial (\mathbf {a} \cdot \mathbf {u} )}{\partial \mathbf {x} }}={\frac {\partial \mathbf {a} ^{\top }\mathbf {u} }{\partial \mathbf {x} }}=}aux{\displaystyle \mathbf {a} ^{\top }{\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}}

ux{\displaystyle {\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}} in numerator layout

uxa{\displaystyle {\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}\mathbf {a} }

ux{\displaystyle {\frac {\partial \mathbf {u} }{\partial \mathbf {x} }}} in denominator layout

a,b are not functions ofxaxxbx={\displaystyle {\frac {\partial \;{\textbf {a}}^{\top }{\textbf {x}}{\textbf {x}}^{\top }{\textbf {b}}}{\partial \;{\textbf {x}}}}=}x(ab+ba){\displaystyle {\textbf {x}}^{\top }\left({\textbf {a}}{\textbf {b}}^{\top }+{\textbf {b}}{\textbf {a}}^{\top }\right)}(ab+ba)x{\displaystyle \left({\textbf {a}}{\textbf {b}}^{\top }+{\textbf {b}}{\textbf {a}}^{\top }\right){\textbf {x}}}
A,b,C,D,e are not functions ofx(Ax+b)C(Dx+e)x={\displaystyle {\frac {\partial \;({\textbf {A}}{\textbf {x}}+{\textbf {b}})^{\top }{\textbf {C}}({\textbf {D}}{\textbf {x}}+{\textbf {e}})}{\partial \;{\textbf {x}}}}=}(Dx+e)CA+(Ax+b)CD{\displaystyle ({\textbf {D}}{\textbf {x}}+{\textbf {e}})^{\top }{\textbf {C}}^{\top }{\textbf {A}}+({\textbf {A}}{\textbf {x}}+{\textbf {b}})^{\top }{\textbf {C}}{\textbf {D}}}DC(Ax+b)+AC(Dx+e){\displaystyle {\textbf {D}}^{\top }{\textbf {C}}^{\top }({\textbf {A}}{\textbf {x}}+{\textbf {b}})+{\textbf {A}}^{\top }{\textbf {C}}({\textbf {D}}{\textbf {x}}+{\textbf {e}})}
a is not a function ofxxax={\displaystyle {\frac {\partial \;\|\mathbf {x} -\mathbf {a} \|}{\partial \;\mathbf {x} }}=}(xa)xa{\displaystyle {\frac {(\mathbf {x} -\mathbf {a} )^{\top }}{\|\mathbf {x} -\mathbf {a} \|}}}xaxa{\displaystyle {\frac {\mathbf {x} -\mathbf {a} }{\|\mathbf {x} -\mathbf {a} \|}}}

Vector-by-scalar identities

[edit]
Identities: vector-by-scalaryx{\displaystyle {\frac {\partial \mathbf {y} }{\partial x}}}
ConditionExpressionNumerator layout, i.e. byy,
result is column vector
Denominator layout, i.e. byyT,
result is row vector
a is not a function ofxax={\displaystyle {\frac {\partial \mathbf {a} }{\partial x}}=}0{\displaystyle \mathbf {0} }[nb 1]
a is not a function ofx,
u =u(x)
aux={\displaystyle {\frac {\partial a\mathbf {u} }{\partial x}}=}aux{\displaystyle a{\frac {\partial \mathbf {u} }{\partial x}}}
A is not a function ofx,
u =u(x)
Aux={\displaystyle {\frac {\partial \mathbf {A} \mathbf {u} }{\partial x}}=}Aux{\displaystyle \mathbf {A} {\frac {\partial \mathbf {u} }{\partial x}}}uxA{\displaystyle {\frac {\partial \mathbf {u} }{\partial x}}\mathbf {A} ^{\top }}
u =u(x)ux={\displaystyle {\frac {\partial \mathbf {u} ^{\top }}{\partial x}}=}(ux){\displaystyle \left({\frac {\partial \mathbf {u} }{\partial x}}\right)^{\top }}
u =u(x),v =v(x)(u+v)x={\displaystyle {\frac {\partial (\mathbf {u} +\mathbf {v} )}{\partial x}}=}ux+vx{\displaystyle {\frac {\partial \mathbf {u} }{\partial x}}+{\frac {\partial \mathbf {v} }{\partial x}}}
u =u(x),v =v(x)(u×v)x={\displaystyle {\frac {\partial (\mathbf {u} ^{\top }\times \mathbf {v} )}{\partial x}}=}(ux)×v+u×vx{\displaystyle \left({\frac {\partial \mathbf {u} }{\partial x}}\right)^{\top }\times \mathbf {v} +\mathbf {u} ^{\top }\times {\frac {\partial \mathbf {v} }{\partial x}}}ux×v+u×(vx){\displaystyle {\frac {\partial \mathbf {u} }{\partial x}}\times \mathbf {v} +\mathbf {u} ^{\top }\times \left({\frac {\partial \mathbf {v} }{\partial x}}\right)^{\top }}
u =u(x)g(u)x={\displaystyle {\frac {\partial \mathbf {g} (\mathbf {u} )}{\partial x}}=}g(u)uux{\displaystyle {\frac {\partial \mathbf {g} (\mathbf {u} )}{\partial \mathbf {u} }}{\frac {\partial \mathbf {u} }{\partial x}}}uxg(u)u{\displaystyle {\frac {\partial \mathbf {u} }{\partial x}}{\frac {\partial \mathbf {g} (\mathbf {u} )}{\partial \mathbf {u} }}}
Assumes consistent matrix layout; see below.
u =u(x)f(g(u))x={\displaystyle {\frac {\partial \mathbf {f} (\mathbf {g} (\mathbf {u} ))}{\partial x}}=}f(g)gg(u)uux{\displaystyle {\frac {\partial \mathbf {f} (\mathbf {g} )}{\partial \mathbf {g} }}{\frac {\partial \mathbf {g} (\mathbf {u} )}{\partial \mathbf {u} }}{\frac {\partial \mathbf {u} }{\partial x}}}uxg(u)uf(g)g{\displaystyle {\frac {\partial \mathbf {u} }{\partial x}}{\frac {\partial \mathbf {g} (\mathbf {u} )}{\partial \mathbf {u} }}{\frac {\partial \mathbf {f} (\mathbf {g} )}{\partial \mathbf {g} }}}
Assumes consistent matrix layout; see below.
U =U(x),v =v(x)(U×v)x={\displaystyle {\frac {\partial (\mathbf {U} \times \mathbf {v} )}{\partial x}}=}Ux×v+U×vx{\displaystyle {\frac {\partial \mathbf {U} }{\partial x}}\times \mathbf {v} +\mathbf {U} \times {\frac {\partial \mathbf {v} }{\partial x}}}v×(Ux)+vx×U{\displaystyle \mathbf {v} ^{\top }\times \left({\frac {\partial \mathbf {U} }{\partial x}}\right)+{\frac {\partial \mathbf {v} }{\partial x}}\times \mathbf {U} ^{\top }}

NOTE: The formulas involving the vector-by-vector derivativesg(u)u{\displaystyle {\frac {\partial \mathbf {g} (\mathbf {u} )}{\partial \mathbf {u} }}} andf(g)g{\displaystyle {\frac {\partial \mathbf {f} (\mathbf {g} )}{\partial \mathbf {g} }}} (whose outputs are matrices) assume the matrices are laid out consistent with the vector layout, i.e. numerator-layout matrix when numerator-layout vector and vice versa; otherwise, transpose the vector-by-vector derivatives.

Scalar-by-matrix identities

[edit]

Note that exact equivalents of the scalarproduct rule andchain rule do not exist when applied to matrix-valued functions of matrices. However, the product rule of this sort does apply to the differential form (see below), and this is the way to derive many of the identities below involving thetrace function, combined with the fact that the trace function allows transposing and cyclic permutation, i.e.:

tr(A)=tr(A)tr(ABCD)=tr(BCDA)=tr(CDAB)=tr(DABC){\displaystyle {\begin{aligned}\operatorname {tr} (\mathbf {A} )&=\operatorname {tr} \left(\mathbf {A^{\top }} \right)\\\operatorname {tr} (\mathbf {ABCD} )&=\operatorname {tr} (\mathbf {BCDA} )=\operatorname {tr} (\mathbf {CDAB} )=\operatorname {tr} (\mathbf {DABC} )\end{aligned}}}

For example, to computetr(AXBXC)X:{\displaystyle {\frac {\partial \operatorname {tr} (\mathbf {AXBX^{\top }C} )}{\partial \mathbf {X} }}:}dtr(AXBXC)=dtr(CAXBX)=tr(d(CAXBX))=tr(CAXd(BX)+d(CAX)BX)=tr(CAXd(BX))+tr(d(CAX)BX)=tr(CAXBd(X))+tr(CA(dX)BX)=tr(CAXB(dX))+tr(CA(dX)BX)=tr((CAXB(dX)))+tr(CA(dX)BX)=tr((dX)BXAC)+tr(CA(dX)BX)=tr(BXAC(dX))+tr(BXCA(dX))=tr((BXAC+BXCA)dX)=tr((CAXB+ACXB)dX){\displaystyle {\begin{aligned}d\operatorname {tr} (\mathbf {AXBX^{\top }C} )&=d\operatorname {tr} \left(\mathbf {CAXBX^{\top }} \right)=\operatorname {tr} \left(d\left(\mathbf {CAXBX^{\top }} \right)\right)\\[1ex]&=\operatorname {tr} \left(\mathbf {CAX} d(\mathbf {BX^{\top }} \right)+d\left(\mathbf {CAX} )\mathbf {BX^{\top }} \right)\\[1ex]&=\operatorname {tr} \left(\mathbf {CAX} d\left(\mathbf {BX^{\top }} \right)\right)+\operatorname {tr} \left(d(\mathbf {CAX} )\mathbf {BX^{\top }} \right)\\[1ex]&=\operatorname {tr} \left(\mathbf {CAXB} d\left(\mathbf {X^{\top }} \right)\right)+\operatorname {tr} \left(\mathbf {CA} (d\mathbf {X} )\mathbf {BX^{\top }} \right)\\[1ex]&=\operatorname {tr} \left(\mathbf {CAXB} (d\mathbf {X} )^{\top }\right)+\operatorname {tr} (\mathbf {CA} \left(d\mathbf {X} )\mathbf {BX^{\top }} \right)\\[1ex]&=\operatorname {tr} \left(\left(\mathbf {CAXB} (d\mathbf {X} )^{\top }\right)^{\top }\right)+\operatorname {tr} \left(\mathbf {CA} (d\mathbf {X} )\mathbf {BX^{\top }} \right)\\[1ex]&=\operatorname {tr} \left((d\mathbf {X} )\mathbf {B^{\top }X^{\top }A^{\top }C^{\top }} \right)+\operatorname {tr} \left(\mathbf {CA} (d\mathbf {X} )\mathbf {BX^{\top }} \right)\\[1ex]&=\operatorname {tr} \left(\mathbf {B^{\top }X^{\top }A^{\top }C^{\top }} (d\mathbf {X} )\right)+\operatorname {tr} \left(\mathbf {BX^{\top }} \mathbf {CA} (d\mathbf {X} )\right)\\[1ex]&=\operatorname {tr} \left(\left(\mathbf {B^{\top }X^{\top }A^{\top }C^{\top }} +\mathbf {BX^{\top }} \mathbf {CA} \right)d\mathbf {X} \right)\\[1ex]&=\operatorname {tr} \left(\left(\mathbf {CAXB} +\mathbf {A^{\top }C^{\top }XB^{\top }} \right)^{\top }d\mathbf {X} \right)\end{aligned}}}

Therefore,

tr(AXBXC)X=BXAC+BXCA.{\displaystyle {\frac {\partial \operatorname {tr} \left(\mathbf {AXBX^{\top }C} \right)}{\partial \mathbf {X} }}=\mathbf {B^{\top }X^{\top }A^{\top }C^{\top }} +\mathbf {BX^{\top }CA} .} (numerator layout)
tr(AXBXC)X=CAXB+ACXB.{\displaystyle {\frac {\partial \operatorname {tr} \left(\mathbf {AXBX^{\top }C} \right)}{\partial \mathbf {X} }}=\mathbf {CAXB} +\mathbf {A^{\top }C^{\top }XB^{\top }} .} (denominator layout)

(For the last step, see theConversion from differential to derivative form section.)

Identities: scalar-by-matrixyX{\displaystyle {\frac {\partial y}{\partial \mathbf {X} }}}
ConditionExpressionNumerator layout, i.e. byXTDenominator layout, i.e. byX
a is not a function ofXaX={\displaystyle {\frac {\partial a}{\partial \mathbf {X} }}=}0{\displaystyle \mathbf {0} ^{\top }}[nb 2]0{\displaystyle \mathbf {0} }[nb 2]
a is not a function ofX,u =u(X)auX={\displaystyle {\frac {\partial au}{\partial \mathbf {X} }}=}auX{\displaystyle a{\frac {\partial u}{\partial \mathbf {X} }}}
u =u(X),v =v(X)(u+v)X={\displaystyle {\frac {\partial (u+v)}{\partial \mathbf {X} }}=}uX+vX{\displaystyle {\frac {\partial u}{\partial \mathbf {X} }}+{\frac {\partial v}{\partial \mathbf {X} }}}
u =u(X),v =v(X)uvX={\displaystyle {\frac {\partial uv}{\partial \mathbf {X} }}=}uvX+vuX{\displaystyle u{\frac {\partial v}{\partial \mathbf {X} }}+v{\frac {\partial u}{\partial \mathbf {X} }}}
u =u(X)g(u)X={\displaystyle {\frac {\partial g(u)}{\partial \mathbf {X} }}=}g(u)uuX{\displaystyle {\frac {\partial g(u)}{\partial u}}{\frac {\partial u}{\partial \mathbf {X} }}}
u =u(X)f(g(u))X={\displaystyle {\frac {\partial f(g(u))}{\partial \mathbf {X} }}=}f(g)gg(u)uuX{\displaystyle {\frac {\partial f(g)}{\partial g}}{\frac {\partial g(u)}{\partial u}}{\frac {\partial u}{\partial \mathbf {X} }}}
U =U(X)[3]    g(U)Xij={\displaystyle {\frac {\partial g(\mathbf {U} )}{\partial X_{ij}}}=}tr(g(U)UUXij){\displaystyle \operatorname {tr} \left({\frac {\partial g(\mathbf {U} )}{\partial \mathbf {U} }}{\frac {\partial \mathbf {U} }{\partial X_{ij}}}\right)}tr((g(U)U)UXij){\displaystyle \operatorname {tr} \left(\left({\frac {\partial g(\mathbf {U} )}{\partial \mathbf {U} }}\right)^{\top }{\frac {\partial \mathbf {U} }{\partial X_{ij}}}\right)}
Both forms assumenumerator layout forUXij,{\displaystyle {\frac {\partial \mathbf {U} }{\partial X_{ij}}},}

i.e. mixed layout if denominator layout forX is being used.

a andb are not functions ofXaXbX={\displaystyle {\frac {\partial \mathbf {a} ^{\top }\mathbf {X} \mathbf {b} }{\partial \mathbf {X} }}=}ba{\displaystyle \mathbf {b} \mathbf {a} ^{\top }}ab{\displaystyle \mathbf {a} \mathbf {b} ^{\top }}
a andb are not functions ofXaXbX={\displaystyle {\frac {\partial \mathbf {a} ^{\top }\mathbf {X} ^{\top }\mathbf {b} }{\partial \mathbf {X} }}=}ab{\displaystyle \mathbf {a} \mathbf {b} ^{\top }}ba{\displaystyle \mathbf {b} \mathbf {a} ^{\top }}
a andb are not functions ofX,f(v) is a real-valued differentiable functionf(Xa+b)X={\displaystyle {\frac {\partial f(\mathbf {Xa+b} )}{\partial \mathbf {X} }}=}afv{\displaystyle \mathbf {a} {\frac {\partial f}{\partial \mathbf {v} }}}fva{\displaystyle {\frac {\partial f}{\partial \mathbf {v} }}\mathbf {a} ^{\top }}
a,b andC are not functions ofX(Xa+b)C(Xa+b)X={\displaystyle {\frac {\partial (\mathbf {X} \mathbf {a} +\mathbf {b} )^{\top }\mathbf {C} (\mathbf {X} \mathbf {a} +\mathbf {b} )}{\partial \mathbf {X} }}=}((C+C)(Xa+b)a){\displaystyle \left(\left(\mathbf {C} +\mathbf {C} ^{\top }\right)(\mathbf {X} \mathbf {a} +\mathbf {b} )\mathbf {a} ^{\top }\right)^{\top }}(C+C)(Xa+b)a{\displaystyle \left(\mathbf {C} +\mathbf {C} ^{\top }\right)(\mathbf {X} \mathbf {a} +\mathbf {b} )\mathbf {a} ^{\top }}
a,b andC are not functions ofX(Xa)C(Xb)X={\displaystyle {\frac {\partial (\mathbf {X} \mathbf {a} )^{\top }\mathbf {C} (\mathbf {X} \mathbf {b} )}{\partial \mathbf {X} }}=}(CXba+CXab){\displaystyle \left(\mathbf {C} \mathbf {X} \mathbf {b} \mathbf {a} ^{\top }+\mathbf {C} ^{\top }\mathbf {X} \mathbf {a} \mathbf {b} ^{\top }\right)^{\top }}CXba+CXab{\displaystyle \mathbf {C} \mathbf {X} \mathbf {b} \mathbf {a} ^{\top }+\mathbf {C} ^{\top }\mathbf {X} \mathbf {a} \mathbf {b} ^{\top }}
tr(X)X={\displaystyle {\frac {\partial \operatorname {tr} (\mathbf {X} )}{\partial \mathbf {X} }}=}I{\displaystyle \mathbf {I} }
U =U(X),V =V(X)tr(U+V)X={\displaystyle {\frac {\partial \operatorname {tr} (\mathbf {U} +\mathbf {V} )}{\partial \mathbf {X} }}=}tr(U)X+tr(V)X{\displaystyle {\frac {\partial \operatorname {tr} (\mathbf {U} )}{\partial \mathbf {X} }}+{\frac {\partial \operatorname {tr} (\mathbf {V} )}{\partial \mathbf {X} }}}
a is not a function ofX,
U =U(X)
tr(aU)X={\displaystyle {\frac {\partial \operatorname {tr} (a\mathbf {U} )}{\partial \mathbf {X} }}=}atr(U)X{\displaystyle a{\frac {\partial \operatorname {tr} (\mathbf {U} )}{\partial \mathbf {X} }}}
g(X) is anypolynomial with scalar coefficients, or any matrix function defined by an infinite polynomial series (e.g.eX,sin(X),cos(X),ln(X), etc. using aTaylor series);g(x) is the equivalent scalar function,g(x) is its derivative, andg(X) is the corresponding matrix functiontr(g(X))X={\displaystyle {\frac {\partial \operatorname {tr} (\mathbf {g(X)} )}{\partial \mathbf {X} }}=}g(X){\displaystyle \mathbf {g} '(\mathbf {X} )}(g(X)){\displaystyle \left(\mathbf {g} '(\mathbf {X} )\right)^{\top }}
A is not a function ofX[4]    tr(AX)X=tr(XA)X={\displaystyle {\frac {\partial \operatorname {tr} (\mathbf {AX} )}{\partial \mathbf {X} }}={\frac {\partial \operatorname {tr} (\mathbf {XA} )}{\partial \mathbf {X} }}=}A{\displaystyle \mathbf {A} }A{\displaystyle \mathbf {A} ^{\top }}
A is not a function ofX[3]    tr(AX)X=tr(XA)X={\displaystyle {\frac {\partial \operatorname {tr} \left(\mathbf {AX^{\top }} \right)}{\partial \mathbf {X} }}={\frac {\partial \operatorname {tr} \left(\mathbf {X^{\top }A} \right)}{\partial \mathbf {X} }}=}A{\displaystyle \mathbf {A} ^{\top }}A{\displaystyle \mathbf {A} }
A is not a function ofX[3]    tr(XAX)X={\displaystyle {\frac {\partial \operatorname {tr} \left(\mathbf {X^{\top }AX} \right)}{\partial \mathbf {X} }}=}X(A+A){\displaystyle \mathbf {X} ^{\top }\left(\mathbf {A} +\mathbf {A} ^{\top }\right)}(A+A)X{\displaystyle \left(\mathbf {A} +\mathbf {A} ^{\top }\right)\mathbf {X} }
A is not a function ofX[3]    tr(X1A)X={\displaystyle {\frac {\partial \operatorname {tr} (\mathbf {X^{-1}A} )}{\partial \mathbf {X} }}=}X1AX1{\displaystyle -\mathbf {X} ^{-1}\mathbf {A} \mathbf {X} ^{-1}}(X1)A(X1){\displaystyle -\left(\mathbf {X} ^{-1}\right)^{\top }\mathbf {A} ^{\top }\left(\mathbf {X} ^{-1}\right)^{\top }}
A,B are not functions ofXtr(AXB)X=tr(BAX)X={\displaystyle {\frac {\partial \operatorname {tr} (\mathbf {AXB} )}{\partial \mathbf {X} }}={\frac {\partial \operatorname {tr} (\mathbf {BAX} )}{\partial \mathbf {X} }}=}BA{\displaystyle \mathbf {BA} }AB{\displaystyle \mathbf {A^{\top }B^{\top }} }
A,B,C are not functions ofXtr(AXBXC)X={\displaystyle {\frac {\partial \operatorname {tr} \left(\mathbf {AXBX^{\top }C} \right)}{\partial \mathbf {X} }}=}BXCA+BXAC{\displaystyle \mathbf {BX^{\top }CA} +\mathbf {B^{\top }X^{\top }A^{\top }C^{\top }} }ACXB+CAXB{\displaystyle \mathbf {A^{\top }C^{\top }XB^{\top }} +\mathbf {CAXB} }
n is a positive integer[3]    tr(Xn)X={\displaystyle {\frac {\partial \operatorname {tr} \left(\mathbf {X} ^{n}\right)}{\partial \mathbf {X} }}=}nXn1{\displaystyle n\mathbf {X} ^{n-1}}n(Xn1){\displaystyle n\left(\mathbf {X} ^{n-1}\right)^{\top }}
A is not a function ofX,
n is a positive integer
[3]    tr(AXn)X={\displaystyle {\frac {\partial \operatorname {tr} \left(\mathbf {A} \mathbf {X} ^{n}\right)}{\partial \mathbf {X} }}=}i=0n1XiAXni1{\displaystyle \sum _{i=0}^{n-1}\mathbf {X} ^{i}\mathbf {A} \mathbf {X} ^{n-i-1}}i=0n1(XiAXni1){\displaystyle \sum _{i=0}^{n-1}\left(\mathbf {X} ^{i}\mathbf {A} \mathbf {X} ^{n-i-1}\right)^{\top }}
[3]    tr(eX)X={\displaystyle {\frac {\partial \operatorname {tr} \left(e^{\mathbf {X} }\right)}{\partial \mathbf {X} }}=}eX{\displaystyle e^{\mathbf {X} }}(eX){\displaystyle \left(e^{\mathbf {X} }\right)^{\top }}
[3]    tr(sin(X))X={\displaystyle {\frac {\partial \operatorname {tr} (\sin(\mathbf {X} ))}{\partial \mathbf {X} }}=}cos(X){\displaystyle \cos(\mathbf {X} )}(cos(X)){\displaystyle (\cos(\mathbf {X} ))^{\top }}
[5]    |X|X={\displaystyle {\frac {\partial |\mathbf {X} |}{\partial \mathbf {X} }}=}cofactor(X)=|X|X1{\displaystyle \operatorname {cofactor} (X)^{\top }=|\mathbf {X} |\mathbf {X} ^{-1}}cofactor(X)=|X|(X1){\displaystyle \operatorname {cofactor} (X)=|\mathbf {X} |\left(\mathbf {X} ^{-1}\right)^{\top }}
a is not a function ofX[3]ln|aX|X={\displaystyle {\frac {\partial \ln |a\mathbf {X} |}{\partial \mathbf {X} }}=}[nb 3]X1{\displaystyle \mathbf {X} ^{-1}}(X1){\displaystyle \left(\mathbf {X} ^{-1}\right)^{\top }}
A,B are not functions ofX[3]    |AXB|X={\displaystyle {\frac {\partial |\mathbf {AXB} |}{\partial \mathbf {X} }}=}|AXB|X1{\displaystyle |\mathbf {AXB} |\mathbf {X} ^{-1}}|AXB|(X1){\displaystyle |\mathbf {AXB} |\left(\mathbf {X} ^{-1}\right)^{\top }}
n is a positive integer[3]    |Xn|X={\displaystyle {\frac {\partial \left|\mathbf {X} ^{n}\right|}{\partial \mathbf {X} }}=}n|Xn|X1{\displaystyle n\left|\mathbf {X} ^{n}\right|\mathbf {X} ^{-1}}n|Xn|(X1){\displaystyle n\left|\mathbf {X} ^{n}\right|\left(\mathbf {X} ^{-1}\right)^{\top }}
(seepseudo-inverse)[3]     ln|XX|X={\displaystyle {\frac {\partial \ln \left|\mathbf {X} ^{\top }\mathbf {X} \right|}{\partial \mathbf {X} }}=}2X+{\displaystyle 2\mathbf {X} ^{+}}2(X+){\displaystyle 2\left(\mathbf {X} ^{+}\right)^{\top }}
(seepseudo-inverse)[3]    ln|XX|X+={\displaystyle {\frac {\partial \ln \left|\mathbf {X} ^{\top }\mathbf {X} \right|}{\partial \mathbf {X} ^{+}}}=}2X{\displaystyle -2\mathbf {X} }2X{\displaystyle -2\mathbf {X} ^{\top }}
A is not a function ofX,
X is square and invertible
|XAX|X={\displaystyle {\frac {\partial \left|\mathbf {X^{\top }} \mathbf {A} \mathbf {X} \right|}{\partial \mathbf {X} }}=}2|XAX|X1=2|X||A||X|X1{\displaystyle 2\left|\mathbf {X^{\top }} \mathbf {A} \mathbf {X} \right|\mathbf {X} ^{-1}=2\left|\mathbf {X^{\top }} \right||\mathbf {A} ||\mathbf {X} |\mathbf {X} ^{-1}}2|XAX|(X1){\displaystyle 2\left|\mathbf {X^{\top }} \mathbf {A} \mathbf {X} \right|\left(\mathbf {X} ^{-1}\right)^{\top }}
A is not a function ofX,
X is non-square,
A is symmetric
|XAX|X={\displaystyle {\frac {\partial \left|\mathbf {X^{\top }} \mathbf {A} \mathbf {X} \right|}{\partial \mathbf {X} }}=}2|XAX|(XAX)1XA{\displaystyle 2\left|\mathbf {X^{\top }} \mathbf {A} \mathbf {X} \right|\left(\mathbf {X^{\top }A^{\top }X} \right)^{-1}\mathbf {X^{\top }A^{\top }} }2|XAX|AX(XAX)1{\displaystyle 2\left|\mathbf {X^{\top }} \mathbf {A} \mathbf {X} \right|\mathbf {AX} \left(\mathbf {X^{\top }AX} \right)^{-1}}
A is not a function ofX,
X is non-square,
A is non-symmetric
|XAX|X={\displaystyle {\frac {\partial |\mathbf {X^{\top }} \mathbf {A} \mathbf {X} |}{\partial \mathbf {X} }}=}|XAX|((XAX)1XA+(XAX)1XA){\displaystyle {\begin{aligned}\left|\mathbf {X^{\top }} \mathbf {A} \mathbf {X} \right|{\Big (}&\left(\mathbf {X^{\top }AX} \right)^{-1}\mathbf {X^{\top }A} +{}\\&\left(\mathbf {X^{\top }A^{\top }X} \right)^{-1}\mathbf {X^{\top }A^{\top }} {\Big )}\end{aligned}}}|XAX|(AX(XAX)1+AX(XAX)1){\displaystyle {\begin{aligned}\left|\mathbf {X^{\top }} \mathbf {A} \mathbf {X} \right|{\Big (}&\mathbf {AX} \left(\mathbf {X^{\top }AX} \right)^{-1}+{}\\&\mathbf {A^{\top }X} \left(\mathbf {X^{\top }A^{\top }X} \right)^{-1}{\Big )}\end{aligned}}}

Matrix-by-scalar identities

[edit]
Identities: matrix-by-scalarYx{\displaystyle {\frac {\partial \mathbf {Y} }{\partial x}}}
ConditionExpressionNumerator layout, i.e. byY
U =U(x)aUx={\displaystyle {\frac {\partial a\mathbf {U} }{\partial x}}=}aUx{\displaystyle a{\frac {\partial \mathbf {U} }{\partial x}}}
A,B are not functions ofx,
U =U(x)
AUBx={\displaystyle {\frac {\partial \mathbf {AUB} }{\partial x}}=}AUxB{\displaystyle \mathbf {A} {\frac {\partial \mathbf {U} }{\partial x}}\mathbf {B} }
U =U(x),V =V(x)(U+V)x={\displaystyle {\frac {\partial (\mathbf {U} +\mathbf {V} )}{\partial x}}=}Ux+Vx{\displaystyle {\frac {\partial \mathbf {U} }{\partial x}}+{\frac {\partial \mathbf {V} }{\partial x}}}
U =U(x),V =V(x)(UV)x={\displaystyle {\frac {\partial (\mathbf {U} \mathbf {V} )}{\partial x}}=}UVx+UxV{\displaystyle \mathbf {U} {\frac {\partial \mathbf {V} }{\partial x}}+{\frac {\partial \mathbf {U} }{\partial x}}\mathbf {V} }
U =U(x),V =V(x)(UV)x={\displaystyle {\frac {\partial (\mathbf {U} \otimes \mathbf {V} )}{\partial x}}=}UVx+UxV{\displaystyle \mathbf {U} \otimes {\frac {\partial \mathbf {V} }{\partial x}}+{\frac {\partial \mathbf {U} }{\partial x}}\otimes \mathbf {V} }
U =U(x),V =V(x)(UV)x={\displaystyle {\frac {\partial (\mathbf {U} \circ \mathbf {V} )}{\partial x}}=}UVx+UxV{\displaystyle \mathbf {U} \circ {\frac {\partial \mathbf {V} }{\partial x}}+{\frac {\partial \mathbf {U} }{\partial x}}\circ \mathbf {V} }
U =U(x)U1x={\displaystyle {\frac {\partial \mathbf {U} ^{-1}}{\partial x}}=}U1UxU1{\displaystyle -\mathbf {U} ^{-1}{\frac {\partial \mathbf {U} }{\partial x}}\mathbf {U} ^{-1}}
U =U(x,y)2U1xy={\displaystyle {\frac {\partial ^{2}\mathbf {U} ^{-1}}{\partial x\partial y}}=}U1(UxU1Uy2Uxy+UyU1Ux)U1{\displaystyle \mathbf {U} ^{-1}\left({\frac {\partial \mathbf {U} }{\partial x}}\mathbf {U} ^{-1}{\frac {\partial \mathbf {U} }{\partial y}}-{\frac {\partial ^{2}\mathbf {U} }{\partial x\partial y}}+{\frac {\partial \mathbf {U} }{\partial y}}\mathbf {U} ^{-1}{\frac {\partial \mathbf {U} }{\partial x}}\right)\mathbf {U} ^{-1}}
A is not a function ofx,g(X) is any polynomial with scalar coefficients, or any matrix function defined by an infinite polynomial series (e.g.eX,sin(X),cos(X),ln(X), etc.);g(x) is the equivalent scalar function,g(x) is its derivative, andg(X) is the corresponding matrix functiong(xA)x={\displaystyle {\frac {\partial \,\mathbf {g} (x\mathbf {A} )}{\partial x}}=}Ag(xA)=g(xA)A{\displaystyle \mathbf {A} \mathbf {g} '(x\mathbf {A} )=\mathbf {g} '(x\mathbf {A} )\mathbf {A} }
A is not a function ofxexAx={\displaystyle {\frac {\partial e^{x\mathbf {A} }}{\partial x}}=}AexA=exAA{\displaystyle \mathbf {A} e^{x\mathbf {A} }=e^{x\mathbf {A} }\mathbf {A} }
See also:Derivative of the exponential map

Scalar-by-scalar identities

[edit]

With vectors involved

[edit]
Identities: scalar-by-scalar, with vectors involved
ConditionExpressionAny layout (assumesdot product ignores row vs. column layout)
u =u(x)g(u)x={\displaystyle {\frac {\partial g(\mathbf {u} )}{\partial x}}=}g(u)uux{\displaystyle {\frac {\partial g(\mathbf {u} )}{\partial \mathbf {u} }}\cdot {\frac {\partial \mathbf {u} }{\partial x}}}
u =u(x),v =v(x)(uv)x={\displaystyle {\frac {\partial (\mathbf {u} \cdot \mathbf {v} )}{\partial x}}=}uvx+uxv{\displaystyle \mathbf {u} \cdot {\frac {\partial \mathbf {v} }{\partial x}}+{\frac {\partial \mathbf {u} }{\partial x}}\cdot \mathbf {v} }

With matrices involved

[edit]
Identities: scalar-by-scalar, with matrices involved[3]
ConditionExpressionConsistent numerator layout,
i.e. byY andXT
Mixed layout,
i.e. byY andX
U =U(x)|U|x={\displaystyle {\frac {\partial |\mathbf {U} |}{\partial x}}=}|U|tr(U1Ux){\displaystyle |\mathbf {U} |\operatorname {tr} \left(\mathbf {U} ^{-1}{\frac {\partial \mathbf {U} }{\partial x}}\right)}
U =U(x)ln|U|x={\displaystyle {\frac {\partial \ln |\mathbf {U} |}{\partial x}}=}tr(U1Ux){\displaystyle \operatorname {tr} \left(\mathbf {U} ^{-1}{\frac {\partial \mathbf {U} }{\partial x}}\right)}
U =U(x)2|U|x2={\displaystyle {\frac {\partial ^{2}|\mathbf {U} |}{\partial x^{2}}}=}|U|[tr(U12Ux2)+tr2(U1Ux)tr((U1Ux)2)]{\displaystyle \left|\mathbf {U} \right|\left[\operatorname {tr} \left(\mathbf {U} ^{-1}{\frac {\partial ^{2}\mathbf {U} }{\partial x^{2}}}\right)+\operatorname {tr} ^{2}\left(\mathbf {U} ^{-1}{\frac {\partial \mathbf {U} }{\partial x}}\right)-\operatorname {tr} \left(\left(\mathbf {U} ^{-1}{\frac {\partial \mathbf {U} }{\partial x}}\right)^{2}\right)\right]}
U =U(x)g(U)x={\displaystyle {\frac {\partial g(\mathbf {U} )}{\partial x}}=}tr(g(U)UUx){\displaystyle \operatorname {tr} \left({\frac {\partial g(\mathbf {U} )}{\partial \mathbf {U} }}{\frac {\partial \mathbf {U} }{\partial x}}\right)}tr((g(U)U)Ux){\displaystyle \operatorname {tr} \left(\left({\frac {\partial g(\mathbf {U} )}{\partial \mathbf {U} }}\right)^{\top }{\frac {\partial \mathbf {U} }{\partial x}}\right)}
A is not a function ofx,g(X) is any polynomial with scalar coefficients, or any matrix function defined by an infinite polynomial series (e.g.eX,sin(X),cos(X),ln(X), etc.);g(x) is the equivalent scalar function,g(x) is its derivative, andg(X) is the corresponding matrix function.tr(g(xA))x={\displaystyle {\frac {\partial \operatorname {tr} (\mathbf {g} (x\mathbf {A} ))}{\partial x}}=}tr(Ag(xA)){\displaystyle \operatorname {tr} \left(\mathbf {A} \mathbf {g} '(x\mathbf {A} )\right)}
A is not a function ofxtr(exA)x={\displaystyle {\frac {\partial \operatorname {tr} \left(e^{x\mathbf {A} }\right)}{\partial x}}=}tr(AexA){\displaystyle \operatorname {tr} \left(\mathbf {A} e^{x\mathbf {A} }\right)}

Identities in differential form

[edit]

It is often easier to work in differential form and then convert back to normal derivatives. This only works well using the numerator layout. In these rules,a is a scalar.

Differential identities: scalar involving matrix[1][3]
ExpressionResult (numerator layout)
d(tr(X))={\displaystyle d(\operatorname {tr} (\mathbf {X} ))=}tr(dX){\displaystyle \operatorname {tr} (d\mathbf {X} )}
d(|X|)={\displaystyle d(|\mathbf {X} |)=}|X|tr(X1dX)=tr(adj(X)dX){\displaystyle |\mathbf {X} |\operatorname {tr} \left(\mathbf {X} ^{-1}d\mathbf {X} \right)=\operatorname {tr} (\operatorname {adj} (\mathbf {X} )d\mathbf {X} )}
d(ln|X|)={\displaystyle d(\ln |\mathbf {X} |)=}tr(X1dX){\displaystyle \operatorname {tr} \left(\mathbf {X} ^{-1}d\mathbf {X} \right)}
Differential identities: matrix[1][3][6][7]
ConditionExpressionResult (numerator layout)
A is not a function ofXd(A)={\displaystyle d(\mathbf {A} )=}0{\displaystyle 0}
a is not a function ofXd(aX)={\displaystyle d(a\mathbf {X} )=}adX{\displaystyle a\,d\mathbf {X} }
d(X+Y)={\displaystyle d(\mathbf {X} +\mathbf {Y} )=}dX+dY{\displaystyle d\mathbf {X} +d\mathbf {Y} }
d(XY)={\displaystyle d(\mathbf {X} \mathbf {Y} )=}(dX)Y+X(dY){\displaystyle (d\mathbf {X} )\mathbf {Y} +\mathbf {X} (d\mathbf {Y} )}
(Kronecker product)d(XY)={\displaystyle d(\mathbf {X} \otimes \mathbf {Y} )=}(dX)Y+X(dY){\displaystyle (d\mathbf {X} )\otimes \mathbf {Y} +\mathbf {X} \otimes (d\mathbf {Y} )}
(Hadamard product)d(XY)={\displaystyle d(\mathbf {X} \circ \mathbf {Y} )=}(dX)Y+X(dY){\displaystyle (d\mathbf {X} )\circ \mathbf {Y} +\mathbf {X} \circ (d\mathbf {Y} )}
d(X)={\displaystyle d\left(\mathbf {X} ^{\top }\right)=}(dX){\displaystyle (d\mathbf {X} )^{\top }}
d(X1)={\displaystyle d\left(\mathbf {X} ^{-1}\right)=}X1(dX)X1{\displaystyle -\mathbf {X} ^{-1}\left(d\mathbf {X} \right)\mathbf {X} ^{-1}}
(conjugate transpose)d(XH)={\displaystyle d\left(\mathbf {X} ^{\mathrm {H} }\right)=}(dX)H{\displaystyle (d\mathbf {X} )^{\mathrm {H} }}
n is a positive integerd(Xn)={\displaystyle d\left(\mathbf {X} ^{n}\right)=}i=0n1Xi(dX)Xni1{\displaystyle \sum _{i=0}^{n-1}\mathbf {X} ^{i}(d\mathbf {X} )\mathbf {X} ^{n-i-1}}
d(eX)={\displaystyle d\left(e^{\mathbf {X} }\right)=}01eaX(dX)e(1a)Xda{\displaystyle \int _{0}^{1}e^{a\mathbf {X} }(d\mathbf {X} )e^{(1-a)\mathbf {X} }\,da}
d(logX)={\displaystyle d\left(\log {X}\right)=}0(X+zI)1(dX)(X+zI)1dz{\displaystyle \int _{0}^{\infty }(\mathbf {X} +z\,\mathbf {I} )^{-1}(d\mathbf {X} )(\mathbf {X} +z\,\mathbf {I} )^{-1}\,dz}
X=iλiPi{\displaystyle \mathbf {X} =\sum _{i}\lambda _{i}\mathbf {P} _{i}} isdiagonalizable

PiPj=δijPi{\displaystyle \mathbf {P} _{i}\mathbf {P} _{j}=\delta _{ij}\mathbf {P} _{i}}
f isdifferentiable at every eigenvalueλi{\displaystyle \lambda _{i}}

d(f(X))={\displaystyle d\left(f(\mathbf {X} )\right)=}ijPi(dX)Pj{f(λi)λi=λjf(λi)f(λj)λiλjλiλj{\displaystyle \sum _{ij}\mathbf {P} _{i}(d\mathbf {X} )\mathbf {P} _{j}{\begin{cases}f'(\lambda _{i})&\lambda _{i}=\lambda _{j}\\{\frac {f(\lambda _{i})-f(\lambda _{j})}{\lambda _{i}-\lambda _{j}}}&\lambda _{i}\neq \lambda _{j}\end{cases}}}

In the last row,δij{\displaystyle \delta _{ij}} is theKronecker delta and(Pk)ij=(Q)ik(Q1)kj{\displaystyle (\mathbf {P} _{k})_{ij}=(\mathbf {Q} )_{ik}(\mathbf {Q} ^{-1})_{kj}} is the set of orthogonal projection operators that project onto thek-th eigenvector ofX.Q is the matrix ofeigenvectors ofX=QΛQ1{\displaystyle \mathbf {X} =\mathbf {Q} {\boldsymbol {\Lambda }}\mathbf {Q} ^{-1}}, and(Λ)ii=λi{\displaystyle ({\boldsymbol {\Lambda }})_{ii}=\lambda _{i}} are the eigenvalues.The matrix functionf(X){\displaystyle f(\mathbf {X} )} isdefined in terms of the scalar functionf(x){\displaystyle f(x)} for diagonalizable matrices byf(X)=if(λi)Pi{\textstyle f(\mathbf {X} )=\sum _{i}f(\lambda _{i})\mathbf {P} _{i}} whereX=iλiPi{\textstyle \mathbf {X} =\sum _{i}\lambda _{i}\mathbf {P} _{i}} withPiPj=δijPi{\displaystyle \mathbf {P} _{i}\mathbf {P} _{j}=\delta _{ij}\mathbf {P} _{i}}.

To convert to normal derivative form, first convert it to one of the following canonical forms, and then use these identities:

Conversion from differential to derivative form[1]
Canonical differential formEquivalent derivative form (numerator layout)
dy=adx{\displaystyle dy=a\,dx}dydx=a{\displaystyle {\frac {dy}{dx}}=a}
dy=adx{\displaystyle dy=\mathbf {a} ^{\top }d\mathbf {x} }dydx=a{\displaystyle {\frac {dy}{d\mathbf {x} }}=\mathbf {a} ^{\top }}
dy=tr(AdX){\displaystyle dy=\operatorname {tr} (\mathbf {A} \,d\mathbf {X} )}dydX=A{\displaystyle {\frac {dy}{d\mathbf {X} }}=\mathbf {A} }
dy=adx{\displaystyle d\mathbf {y} =\mathbf {a} \,dx}dydx=a{\displaystyle {\frac {d\mathbf {y} }{dx}}=\mathbf {a} }
dy=Adx{\displaystyle d\mathbf {y} =\mathbf {A} \,d\mathbf {x} }dydx=A{\displaystyle {\frac {d\mathbf {y} }{d\mathbf {x} }}=\mathbf {A} }
dY=Adx{\displaystyle d\mathbf {Y} =\mathbf {A} \,dx}dYdx=A{\displaystyle {\frac {d\mathbf {Y} }{dx}}=\mathbf {A} }

Applications

[edit]

Matrix differential calculus is used in statistics and econometrics, particularly for the statistical analysis ofmultivariate distributions, especially themultivariate normal distribution and otherelliptical distributions.[8][9][10]

It is used inregression analysis to compute, for example, theordinary least squares regression formula for the case of multipleexplanatory variables.[11]It is also used in random matrices, statistical moments, local sensitivity and statistical diagnostics.[12][13]

See also

[edit]

Notes

[edit]
  1. ^abcHere,0{\displaystyle \mathbf {0} } refers to acolumn vector of all 0's, of sizen, wheren is the length ofx.
  2. ^abHere,0{\displaystyle \mathbf {0} } refers to a matrix of all 0's, of the same shape asX.
  3. ^The constanta disappears in the result. This is intentional. In general,dlnaudx=1aud(au)dx=1auadudx=1ududx=dlnudx.{\displaystyle {\frac {d\ln au}{dx}}={\frac {1}{au}}{\frac {d(au)}{dx}}={\frac {1}{au}}a{\frac {du}{dx}}={\frac {1}{u}}{\frac {du}{dx}}={\frac {d\ln u}{dx}}.}or, alsodlnaudx=d(lna+lnu)dx=dlnadx+dlnudx=dlnudx.{\displaystyle {\frac {d\ln au}{dx}}={\frac {d(\ln a+\ln u)}{dx}}={\frac {d\ln a}{dx}}+{\frac {d\ln u}{dx}}={\frac {d\ln u}{dx}}.}

References

[edit]
  1. ^abcdeThomas P., Minka (December 28, 2000)."Old and New Matrix Algebra Useful for Statistics". MIT Media Lab note (1997; revised 12/00). Retrieved5 February 2016.
  2. ^Felippa, Carlos A."Appendix D, Linear Algebra: Determinants, Inverses, Rank"(PDF).ASEN 5007: Introduction To Finite Element Methods. Boulder, Colorado: University of Colorado. Retrieved5 February 2016. Uses theHessian (transpose toJacobian) definition of vector and matrix derivatives.
  3. ^abcdefghijklmnopqPetersen, Kaare Brandt; Pedersen, Michael Syskind.The Matrix Cookbook(PDF). Archived fromthe original on 2 March 2010. Retrieved5 February 2016. This book uses a mixed layout, i.e. byY inYx,{\displaystyle {\frac {\partial \mathbf {Y} }{\partial x}},} byX inyX.{\displaystyle {\frac {\partial y}{\partial \mathbf {X} }}.}
  4. ^Duchi, John C."Properties of the Trace and Matrix Derivatives"(PDF). Stanford University. Retrieved5 February 2016.
  5. ^SeeDeterminant § Derivative for the derivation.
  6. ^Giles, Mike B. (2008). "Collected matrix derivative results for forward and reverse mode algorithmic differentiation". In Bischof, Christian H.; Bücker, H. Martin; Hovland, Paul; Naumann, Uwe; Utke, Jean (eds.).Advances in Automatic Differentiation. Lecture Notes in Computational Science and Engineering. Vol. 64. Berlin: Springer. pp. 35–44.doi:10.1007/978-3-540-68942-3_4.ISBN 978-3-540-68935-5.MR 2531677.
  7. ^Unpublished memo by S Adler (IAS)
  8. ^Fang, Kai-Tai; Zhang, Yao-Ting (1990).Generalized multivariate analysis. Science Press (Beijing) and Springer-Verlag (Berlin).ISBN 3-540-17651-9. 9783540176510.
  9. ^Pan, Jianxin;Fang, Kaitai (2007).Growth curve models and statistical diagnostics. Beijing: Science Press.ISBN 978-0-387-95053-2.
  10. ^Kollo, Tõnu; von Rosen, Dietrich (2005).Advanced multivariate statistics with matrices. Dordrecht: Springer.ISBN 978-1-4020-3418-3.
  11. ^Magnus, Jan; Neudecker, Heinz (2019).Matrix differential calculus with applications in statistics and econometrics. New York: John Wiley.ISBN 978-1-119-54120-2.
  12. ^Liu, Shuangzhe; Leiva, Victor; Zhuang, Dan; Ma, Tiefeng; Figueroa-Zúñiga, Jorge I. (2022)."Matrix differential calculus with applications in the multivariate linear model and its diagnostics".Journal of Multivariate Analysis.188 104849.doi:10.1016/j.jmva.2021.104849.
  13. ^Liu, Shuangzhe; Trenkler, Götz; Kollo, Tõnu; von Rosen, Dietrich; Baksalary, Oskar Maria (2023). "Professor Heinz Neudecker and matrix differential calculus".Statistical Papers.65 (4):2605–2639.doi:10.1007/s00362-023-01499-w.S2CID 263661094.

Further reading

[edit]
  • Abadir, Karim M.; Magnus, Jan R. (2005).Matrix algebra. Econometric Exercises. Cambridge: Cambridge University Press.ISBN 978-0-511-64796-3.OCLC 569411497.
  • Lax, Peter D. (2007). "9. Calculus of Vector- and Matrix-Valued Functions".Linear algebra and its applications (2nd ed.). Hoboken, N.J.: Wiley-Interscience.ISBN 978-0-471-75156-4.
  • Magnus, Jan R. (October 2010). "On the concept of matrix derivative".Journal of Multivariate Analysis.101 (9):2200–2206.doi:10.1016/j.jmva.2010.05.005.. Note that this Wikipedia article has been nearly completely revised from the version criticized in this article.

External links

[edit]

Software

[edit]

Information

[edit]
Precalculus
Limits
Differential calculus
Integral calculus
Vector calculus
Multivariable calculus
Sequences and series
Special functions
and numbers
History of calculus
Lists
Integrals
Miscellaneous topics
Retrieved from "https://en.wikipedia.org/w/index.php?title=Matrix_calculus&oldid=1315897806"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp