Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Singular value decomposition

From Wikipedia, the free encyclopedia
Matrix decomposition

Illustration of the singular value decompositionUΣV of a real2 × 2 matrixM.
  • Top: The action ofM, indicated by its effect on the unit discD and the two canonical unit vectorse1 ande2.
  • Left: The action ofV, a rotation, onD,e1, ande2.
  • Bottom: The action ofΣ, a scaling by the singular valuesσ1 horizontally andσ2 vertically.
  • Right: The action ofU, another rotation.

Inlinear algebra, thesingular value decomposition (SVD) is afactorization of areal orcomplexmatrix into a rotation, followed by a rescaling followed by another rotation. It generalizes theeigendecomposition of a squarenormal matrix with an orthonormal eigenbasis to anym×n{\displaystyle m\times n} matrix. It is related to thepolar decomposition.

Specifically, the singular value decomposition of anm×n{\displaystyle m\times n} complex matrixM{\displaystyle \mathbf {M} } is a factorization of the formM=UΣV,{\displaystyle \mathbf {M} =\mathbf {U\Sigma V^{*}} ,} whereU{\displaystyle \mathbf {U} } is anm×m{\displaystyle m\times m} complexunitary matrix,Σ{\displaystyle \mathbf {\Sigma } } is anm×n{\displaystyle m\times n}rectangular diagonal matrix with non-negative real numbers on the diagonal,V{\displaystyle \mathbf {V} } is ann×n{\displaystyle n\times n} complex unitary matrix, andV{\displaystyle \mathbf {V} ^{*}} is theconjugate transpose ofV{\displaystyle \mathbf {V} }. Such decomposition always exists for any complex matrix. IfM{\displaystyle \mathbf {M} } is real, thenU{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} } can be guaranteed to be realorthogonal matrices; in such contexts, the SVD is often denotedUΣVT.{\displaystyle \mathbf {U} \mathbf {\Sigma } \mathbf {V} ^{\mathrm {T} }.}

The diagonal entriesσi=Σii{\displaystyle \sigma _{i}=\Sigma _{ii}} ofΣ{\displaystyle \mathbf {\Sigma } } are uniquely determined byM{\displaystyle \mathbf {M} } and are known as thesingular values ofM{\displaystyle \mathbf {M} }. The number of non-zero singular values is equal to therank ofM{\displaystyle \mathbf {M} }. The columns ofU{\displaystyle \mathbf {U} } and the columns ofV{\displaystyle \mathbf {V} } are called left-singular vectors and right-singular vectors ofM{\displaystyle \mathbf {M} }, respectively. They form two sets oforthonormal basesu1,,um{\displaystyle \mathbf {u} _{1},\ldots ,\mathbf {u} _{m}} andv1,,vn,{\displaystyle \mathbf {v} _{1},\ldots ,\mathbf {v} _{n},} and if they are sorted so that the singular valuesσi{\displaystyle \sigma _{i}} with value zero are all in the highest-numbered columns (or rows), the singular value decomposition can be written asM=i=1rσiuivi,{\displaystyle \mathbf {M} =\sum _{i=1}^{r}\sigma _{i}\mathbf {u} _{i}\mathbf {v} _{i}^{*},}wherermin{m,n}{\displaystyle r\leq \min\{m,n\}} is the rank ofM.{\displaystyle \mathbf {M} .}

The SVD is not unique. However, it is always possible to choose the decomposition such that the singular valuesΣii{\displaystyle \Sigma _{ii}} are in descending order. In this case,Σ{\displaystyle \mathbf {\Sigma } } (but notU{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} }) is uniquely determined byM.{\displaystyle \mathbf {M} .}

The term sometimes refers to thecompact SVD, a similar decompositionM=UΣV{\displaystyle \mathbf {M} =\mathbf {U\Sigma V} ^{*}} in whichΣ{\displaystyle \mathbf {\Sigma } } is square diagonal of sizer×r,{\displaystyle r\times r,} wherermin{m,n}{\displaystyle r\leq \min\{m,n\}} is the rank ofM,{\displaystyle \mathbf {M} ,} and has only the non-zero singular values. In this variant,U{\displaystyle \mathbf {U} } is anm×r{\displaystyle m\times r}semi-unitary matrix andV{\displaystyle \mathbf {V} } is ann×r{\displaystyle n\times r}semi-unitary matrix, such thatUU=VV=Ir.{\displaystyle \mathbf {U} ^{*}\mathbf {U} =\mathbf {V} ^{*}\mathbf {V} =\mathbf {I} _{r}.}

Mathematical applications of the SVD include computing thepseudoinverse, matrix approximation, and determining the rank,range, andnull space of a matrix. The SVD is also extremely useful in many areas of science,engineering, andstatistics, such assignal processing,least squares fitting of data, andprocess control.

Intuitive interpretations

[edit]
Animated illustration of the SVD of a 2D, realshearing matrixM. First, we see theunit disc in blue together with the twocanonical unit vectors. We then see the actions ofM, which distorts the disk to anellipse. The SVD decomposesM into three simple transformations: an initialrotationV, ascalingΣ{\displaystyle \mathbf {\Sigma } } along the coordinate axes, and a final rotationU. The lengthsσ1 andσ2 of thesemi-axes of the ellipse are thesingular values ofM, namelyΣ1,1 andΣ2,2.
Visualization of the matrix multiplications in singular value decomposition

Rotation, coordinate scaling, and reflection

[edit]

In the special case whenM{\displaystyle \mathbf {M} } is anm×m{\displaystyle m\times m} realsquare matrix, the matricesU{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} ^{*}} can be chosen to be realm×m{\displaystyle m\times m} matrices too. In that case, "unitary" is the same as "orthogonal". Then, interpreting both unitary matrices as well as the diagonal matrix, summarized here asA,{\displaystyle \mathbf {A} ,} as alinear transformationxAx{\displaystyle \mathbf {x} \mapsto \mathbf {Ax} } of the spaceRm,{\displaystyle \mathbf {R} ^{m},} the matricesU{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} ^{*}} representrotations orreflection of the space, whileΣ{\displaystyle \mathbf {\Sigma } } represents thescaling of each coordinatexi{\displaystyle \mathbf {x} _{i}} by the factorσi.{\displaystyle \sigma _{i}.} Thus the SVD decomposition breaks down any linear transformation ofRm{\displaystyle \mathbf {R} ^{m}} into acomposition of three geometricaltransformations: a rotation or reflection(V{\displaystyle \mathbf {V} ^{*}}), followed by a coordinate-by-coordinatescaling(Σ{\displaystyle \mathbf {\Sigma } }), followed by another rotation or reflection(U{\displaystyle \mathbf {U} }).

In particular, ifM{\displaystyle \mathbf {M} } has a positive determinant, thenU{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} ^{*}} can be chosen to be both rotations with reflections, or both rotations without reflections.[citation needed] If the determinant is negative, exactly one of them will have a reflection. If the determinant is zero, each can be independently chosen to be of either type.

If the matrixM{\displaystyle \mathbf {M} } is real but not square, namelym×n{\displaystyle m\times n} withmn,{\displaystyle m\neq n,} it can be interpreted as a linear transformation fromRn{\displaystyle \mathbf {R} ^{n}} toRm.{\displaystyle \mathbf {R} ^{m}.} ThenU{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} ^{*}} can be chosen to be rotations/reflections ofRm{\displaystyle \mathbf {R} ^{m}} andRn,{\displaystyle \mathbf {R} ^{n},} respectively; andΣ,{\displaystyle \mathbf {\Sigma } ,} besides scaling the firstmin{m,n}{\displaystyle \min\{m,n\}} coordinates, also extends the vector with zeros, i.e. removes trailing coordinates, so as to turnRn{\displaystyle \mathbf {R} ^{n}} intoRm.{\displaystyle \mathbf {R} ^{m}.}

Singular values as semiaxes of an ellipse or ellipsoid

[edit]

As shown in the figure, thesingular values can be interpreted as the magnitude of the semiaxes of anellipse in 2D. This concept can be generalized ton{\displaystyle n}-dimensionalEuclidean space, with the singular values of anyn×n{\displaystyle n\times n}square matrix being viewed as the magnitude of the semiaxis of ann{\displaystyle n}-dimensionalellipsoid. Similarly, the singular values of anym×n{\displaystyle m\times n} matrix can be viewed as the magnitude of the semiaxis of ann{\displaystyle n}-dimensionalellipsoid inm{\displaystyle m}-dimensional space, for example as an ellipse in a (tilted) 2D plane in a 3D space. Singular values encode magnitude of the semiaxis, while singular vectors encode direction. Seebelow for further details.

The columns ofU andV are orthonormal bases

[edit]

SinceU{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} ^{*}} are unitary, the columns of each of them form a set oforthonormal vectors, which can be regarded asbasis vectors. The matrixM{\displaystyle \mathbf {M} } maps the basis vectorVi{\displaystyle \mathbf {V} _{i}} to the stretched unit vectorσiUi.{\displaystyle \sigma _{i}\mathbf {U} _{i}.} By the definition of a unitary matrix, the same is true for their conjugate transposesU{\displaystyle \mathbf {U} ^{*}} andV,{\displaystyle \mathbf {V} ,} except the geometric interpretation of the singular values as stretches is lost. In short, the columns ofU,{\displaystyle \mathbf {U} ,}U,{\displaystyle \mathbf {U} ^{*},}V,{\displaystyle \mathbf {V} ,} andV{\displaystyle \mathbf {V} ^{*}} areorthonormal bases. WhenM{\displaystyle \mathbf {M} } is apositive-semidefiniteHermitian matrix,U{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} } are both equal to the unitary matrix used to diagonalizeM.{\displaystyle \mathbf {M} .} However, whenM{\displaystyle \mathbf {M} } is not positive-semidefinite and Hermitian but stilldiagonalizable, itseigendecomposition and singular value decomposition are distinct.

Relation to the four fundamental subspaces

[edit]

Geometric meaning

[edit]

BecauseU{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} } are unitary, we know that the columnsU1,,Um{\displaystyle \mathbf {U} _{1},\ldots ,\mathbf {U} _{m}} ofU{\displaystyle \mathbf {U} } yield anorthonormal basis ofKm{\displaystyle K^{m}} and the columnsV1,,Vn{\displaystyle \mathbf {V} _{1},\ldots ,\mathbf {V} _{n}} ofV{\displaystyle \mathbf {V} } yield an orthonormal basis ofKn{\displaystyle K^{n}} (with respect to the standardscalar products on these spaces).

Thelinear transformationT:{KnKmxMx{\displaystyle T:\left\{{\begin{aligned}K^{n}&\to K^{m}\\x&\mapsto \mathbf {M} x\end{aligned}}\right.}has a particularly simple description with respect to these orthonormal bases: we haveT(Vi)=σiUi,i=1,,min(m,n),{\displaystyle T(\mathbf {V} _{i})=\sigma _{i}\mathbf {U} _{i},\qquad i=1,\ldots ,\min(m,n),}whereσi{\displaystyle \sigma _{i}} is thei{\displaystyle i}-th diagonal entry ofΣ,{\displaystyle \mathbf {\Sigma } ,} andT(Vi)=0{\displaystyle T(\mathbf {V} _{i})=0} fori>min(m,n).{\displaystyle i>\min(m,n).}

The geometric content of the SVD theorem can thus be summarized as follows: for every linear mapT:KnKm{\displaystyle T:K^{n}\to K^{m}} one can find orthonormal bases ofKn{\displaystyle K^{n}} andKm{\displaystyle K^{m}} such thatT{\displaystyle T} maps thei{\displaystyle i}-th basis vector ofKn{\displaystyle K^{n}} to a non-negative multiple of thei{\displaystyle i}-th basis vector ofKm,{\displaystyle K^{m},} and sends the leftover basis vectors to zero. With respect to these bases, the mapT{\displaystyle T} is therefore represented by a diagonal matrix with non-negative real diagonal entries.

To get a more visual flavor of singular values and SVD factorization – at least when working on real vector spaces – consider the sphereS{\displaystyle S} of radius one inRn.{\displaystyle \mathbf {R} ^{n}.} The linear mapT{\displaystyle T} maps this sphere onto anellipsoid inRm.{\displaystyle \mathbf {R} ^{m}.} Non-zero singular values are simply the lengths of thesemi-axes of this ellipsoid. Especially whenn=m,{\displaystyle n=m,} and all the singular values are distinct and non-zero, the SVD of the linear mapT{\displaystyle T} can be easily analyzed as a succession of three consecutive moves: consider the ellipsoidT(S){\displaystyle T(S)} and specifically its axes; then consider the directions inRn{\displaystyle \mathbf {R} ^{n}} sent byT{\displaystyle T} onto these axes. These directions happen to be mutually orthogonal. Apply first an isometryV{\displaystyle \mathbf {V} ^{*}} sending these directions to the coordinate axes ofRn.{\displaystyle \mathbf {R} ^{n}.} On a second move, apply anendomorphismD{\displaystyle \mathbf {D} } diagonalized along the coordinate axes and stretching or shrinking in each direction, using the semi-axes lengths ofT(S){\displaystyle T(S)} as stretching coefficients. The compositionDV{\displaystyle \mathbf {D} \circ \mathbf {V} ^{*}} then sends the unit-sphere onto an ellipsoid isometric toT(S).{\displaystyle T(S).} To define the third and last move, apply an isometryU{\displaystyle \mathbf {U} } to this ellipsoid to obtainT(S).{\displaystyle T(S).} As can be easily checked, the compositionUDV{\displaystyle \mathbf {U} \circ \mathbf {D} \circ \mathbf {V} ^{*}} coincides withT.{\displaystyle T.}

Example

[edit]

Consider the4×5{\displaystyle 4\times 5} matrixM=[10002003000000002000]{\displaystyle \mathbf {M} ={\begin{bmatrix}1&0&0&0&2\\0&0&3&0&0\\0&0&0&0&0\\0&2&0&0&0\end{bmatrix}}}

A singular value decomposition of this matrix is given byUΣV{\displaystyle \mathbf {U} \mathbf {\Sigma } \mathbf {V} ^{*}}

U=[0100100000010010]Σ=[30000050000020000000]V=[001000.20000.801000000100.80000.2]{\displaystyle {\begin{aligned}\mathbf {U} &={\begin{bmatrix}\color {Green}0&\color {Blue}-1&\color {Cyan}0&\color {Emerald}0\\\color {Green}-1&\color {Blue}0&\color {Cyan}0&\color {Emerald}0\\\color {Green}0&\color {Blue}0&\color {Cyan}0&\color {Emerald}-1\\\color {Green}0&\color {Blue}0&\color {Cyan}-1&\color {Emerald}0\end{bmatrix}}\\[6pt]\mathbf {\Sigma } &={\begin{bmatrix}3&0&0&0&\color {Gray}{\mathit {0}}\\0&{\sqrt {5}}&0&0&\color {Gray}{\mathit {0}}\\0&0&2&0&\color {Gray}{\mathit {0}}\\0&0&0&\color {Red}\mathbf {0} &\color {Gray}{\mathit {0}}\end{bmatrix}}\\[6pt]\mathbf {V} ^{*}&={\begin{bmatrix}\color {Violet}0&\color {Violet}0&\color {Violet}-1&\color {Violet}0&\color {Violet}0\\\color {Plum}-{\sqrt {0.2}}&\color {Plum}0&\color {Plum}0&\color {Plum}0&\color {Plum}-{\sqrt {0.8}}\\\color {Magenta}0&\color {Magenta}-1&\color {Magenta}0&\color {Magenta}0&\color {Magenta}0\\\color {Orchid}0&\color {Orchid}0&\color {Orchid}0&\color {Orchid}1&\color {Orchid}0\\\color {Purple}-{\sqrt {0.8}}&\color {Purple}0&\color {Purple}0&\color {Purple}0&\color {Purple}{\sqrt {0.2}}\end{bmatrix}}\end{aligned}}}

The scaling matrixΣ{\displaystyle \mathbf {\Sigma } } is zero outside of the diagonal (grey italics) and one diagonal element is zero (red bold, light blue bold in dark mode). Furthermore, because the matricesU{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} ^{*}} areunitary, multiplying by their respective conjugate transposes yieldsidentity matrices, as shown below. In this case, becauseU{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} ^{*}} are real valued, each is anorthogonal matrix.

UU=[1000010000100001]=I4VV=[1000001000001000001000001]=I5{\displaystyle {\begin{aligned}\mathbf {U} \mathbf {U} ^{*}&={\begin{bmatrix}1&0&0&0\\0&1&0&0\\0&0&1&0\\0&0&0&1\end{bmatrix}}=\mathbf {I} _{4}\\[6pt]\mathbf {V} \mathbf {V} ^{*}&={\begin{bmatrix}1&0&0&0&0\\0&1&0&0&0\\0&0&1&0&0\\0&0&0&1&0\\0&0&0&0&1\end{bmatrix}}=\mathbf {I} _{5}\end{aligned}}}

This particular singular value decomposition is not unique. For instance, we can keepU{\displaystyle \mathbf {U} } andΣ{\displaystyle \mathbf {\Sigma } } the same, but change the last two rows ofV{\displaystyle \mathbf {V} ^{*}} such thatV=[001000.20000.8010000.4000.50.10.4000.50.1]{\displaystyle \mathbf {V} ^{*}={\begin{bmatrix}\color {Violet}0&\color {Violet}0&\color {Violet}-1&\color {Violet}0&\color {Violet}0\\\color {Plum}-{\sqrt {0.2}}&\color {Plum}0&\color {Plum}0&\color {Plum}0&\color {Plum}-{\sqrt {0.8}}\\\color {Magenta}0&\color {Magenta}-1&\color {Magenta}0&\color {Magenta}0&\color {Magenta}0\\\color {Orchid}{\sqrt {0.4}}&\color {Orchid}0&\color {Orchid}0&\color {Orchid}{\sqrt {0.5}}&\color {Orchid}-{\sqrt {0.1}}\\\color {Purple}-{\sqrt {0.4}}&\color {Purple}0&\color {Purple}0&\color {Purple}{\sqrt {0.5}}&\color {Purple}{\sqrt {0.1}}\end{bmatrix}}}

and get an equally valid singular value decomposition. As the matrixM{\displaystyle \mathbf {M} } has rank 3, it has only 3 nonzero singular values. In taking the productUΣV{\displaystyle \mathbf {U} \mathbf {\Sigma } \mathbf {V} ^{*}}, the final column ofU{\displaystyle \mathbf {U} } and the final two rows ofV{\displaystyle \mathbf {V^{*}} } are multiplied by zero, so have no effect on the matrix product, and can be replaced by any unit vectors which are orthogonal to the first three and to each-other.

Thecompact SVD,M=UrΣrVr{\displaystyle \mathbf {M} =\mathbf {U} _{r}\mathbf {\Sigma } _{r}\mathbf {V} _{r}^{*}}, eliminates these superfluous rows, columns, and singular values:Ur=[010100000001]Σr=[300050002]Vr=[001000.20000.801000]{\displaystyle {\begin{aligned}\mathbf {U} _{r}&={\begin{bmatrix}\color {Green}0&\color {Blue}-1&\color {Cyan}0\\\color {Green}-1&\color {Blue}0&\color {Cyan}0\\\color {Green}0&\color {Blue}0&\color {Cyan}0\\\color {Green}0&\color {Blue}0&\color {Cyan}-1\end{bmatrix}}\\[6pt]\mathbf {\Sigma } _{r}&={\begin{bmatrix}3&0&0\\0&{\sqrt {5}}&0\\0&0&2\end{bmatrix}}\\[6pt]\mathbf {V} _{r}^{*}&={\begin{bmatrix}\color {Violet}0&\color {Violet}0&\color {Violet}-1&\color {Violet}0&\color {Violet}0\\\color {Plum}-{\sqrt {0.2}}&\color {Plum}0&\color {Plum}0&\color {Plum}0&\color {Plum}-{\sqrt {0.8}}\\\color {Magenta}0&\color {Magenta}-1&\color {Magenta}0&\color {Magenta}0&\color {Magenta}0\end{bmatrix}}\end{aligned}}}

SVD and spectral decomposition

[edit]

Singular values, singular vectors, and their relation to the SVD

[edit]

A non-negative real numberσ{\displaystyle \sigma } is asingular value forM{\displaystyle \mathbf {M} } if and only if there existunit vectorsu{\displaystyle \mathbf {u} } inKm{\displaystyle K^{m}} andv{\displaystyle \mathbf {v} } inKn{\displaystyle K^{n}} such thatMv=σu,Mu=σv.{\displaystyle {\begin{aligned}\mathbf {Mv} &=\sigma \mathbf {u} ,\\[3mu]\mathbf {M} ^{*}\mathbf {u} &=\sigma \mathbf {v} .\end{aligned}}}

The vectorsu{\displaystyle \mathbf {u} } andv{\displaystyle \mathbf {v} } are calledleft-singular andright-singular vectors forσ,{\displaystyle \sigma ,} respectively.

In any singular value decompositionM=UΣV{\displaystyle \mathbf {M} =\mathbf {U} \mathbf {\Sigma } \mathbf {V} ^{*}}the diagonal entries ofΣ{\displaystyle \mathbf {\Sigma } } are equal to the singular values ofM.{\displaystyle \mathbf {M} .} The firstp=min(m,n){\displaystyle p=\min(m,n)} columns ofU{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} } are, respectively, left- and right-singular vectors for the corresponding singular values. Consequently, the above theorem implies that:

A singular value for which we can find two left (or right) singular vectors that are linearly independent is calleddegenerate. Ifu1{\displaystyle \mathbf {u} _{1}} andu2{\displaystyle \mathbf {u} _{2}} are two left-singular vectors which both correspond to the singular value σ, then any normalized linear combination of the two vectors is also a left-singular vector corresponding to the singular value σ. The similar statement is true for right-singular vectors. The number of independent left and right-singular vectors coincides, and these singular vectors appear in the same columns ofU{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} } corresponding to diagonal elements ofΣ{\displaystyle \mathbf {\Sigma } } all with the same valueσ.{\displaystyle \sigma .}

As an exception, the left and right-singular vectors of singular value 0 comprise all unit vectors in thecokernel andkernel, respectively, ofM,{\displaystyle \mathbf {M} ,} which by therank–nullity theorem cannot be the same dimension ifmn.{\displaystyle m\neq n.} Even if all singular values are nonzero, ifm>n{\displaystyle m>n} then the cokernel is nontrivial, in which caseU{\displaystyle \mathbf {U} } is padded withmn{\displaystyle m-n} orthogonal vectors from the cokernel. Conversely, ifm<n,{\displaystyle m<n,} thenV{\displaystyle \mathbf {V} } is padded bynm{\displaystyle n-m} orthogonal vectors from the kernel. However, if the singular value of0{\displaystyle 0} exists, the extra columns ofU{\displaystyle \mathbf {U} } orV{\displaystyle \mathbf {V} } already appear as left or right-singular vectors.

Non-degenerate singular values always have unique left- and right-singular vectors, up to multiplication by a unit-phase factoreiφ{\displaystyle e^{i\varphi }} (for the real case up to a sign). Consequently, if all singular values of a square matrixM{\displaystyle \mathbf {M} } are non-degenerate and non-zero, then its singular value decomposition is unique, up to multiplication of a column ofU{\displaystyle \mathbf {U} } by a unit-phase factor and simultaneous multiplication of the corresponding column ofV{\displaystyle \mathbf {V} } by the same unit-phase factor.In general, the SVD is unique up to arbitrary unitary transformations applied uniformly to the column vectors of bothU{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} } spanning the subspaces of each singular value, and up to arbitrary unitary transformations on vectors ofU{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} } spanning the kernel and cokernel, respectively, ofM.{\displaystyle \mathbf {M} .}

Relation to eigenvalue decomposition

[edit]

The singular value decomposition is very general in the sense that it can be applied to anym×n{\displaystyle m\times n} matrix, whereaseigenvalue decomposition can only be applied to squarediagonalizable matrices. Nevertheless, the two decompositions are related.

IfM{\displaystyle \mathbf {M} } has SVDM=UΣV,{\displaystyle \mathbf {M} =\mathbf {U} \mathbf {\Sigma } \mathbf {V} ^{*},} the following two relations hold:MM=VΣUUΣV=V(ΣΣ)V,MM=UΣVVΣU=U(ΣΣ)U.{\displaystyle {\begin{aligned}\mathbf {M} ^{*}\mathbf {M} &=\mathbf {V} \mathbf {\Sigma } ^{*}\mathbf {U} ^{*}\,\mathbf {U} \mathbf {\Sigma } \mathbf {V} ^{*}=\mathbf {V} (\mathbf {\Sigma } ^{*}\mathbf {\Sigma } )\mathbf {V} ^{*},\\[3mu]\mathbf {M} \mathbf {M} ^{*}&=\mathbf {U} \mathbf {\Sigma } \mathbf {V} ^{*}\,\mathbf {V} \mathbf {\Sigma } ^{*}\mathbf {U} ^{*}=\mathbf {U} (\mathbf {\Sigma } \mathbf {\Sigma } ^{*})\mathbf {U} ^{*}.\end{aligned}}}

The right-hand sides of these relations describe the eigenvalue decompositions of the left-hand sides. Consequently:

In the special case ofM{\displaystyle \mathbf {M} } being anormal matrix, and thus also square, thespectral theorem ensures that it can beunitarilydiagonalized using a basis ofeigenvectors, and thus decomposed asM=UDU{\displaystyle \mathbf {M} =\mathbf {U} \mathbf {D} \mathbf {U} ^{*}} for some unitary matrixU{\displaystyle \mathbf {U} } and diagonal matrixD{\displaystyle \mathbf {D} } with complex elementsσi{\displaystyle \sigma _{i}} along the diagonal. WhenM{\displaystyle \mathbf {M} } ispositive semi-definite,σi{\displaystyle \sigma _{i}} will be non-negative real numbers so that the decompositionM=UDU{\displaystyle \mathbf {M} =\mathbf {U} \mathbf {D} \mathbf {U} ^{*}} is also a singular value decomposition. Otherwise, it can be recast as an SVD by moving the phaseeiφ{\displaystyle e^{i\varphi }} of eachσi{\displaystyle \sigma _{i}} to either its correspondingVi{\displaystyle \mathbf {V} _{i}} orUi.{\displaystyle \mathbf {U} _{i}.} The natural connection of the SVD to non-normal matrices is through thepolar decomposition theorem:M=SR,{\displaystyle \mathbf {M} =\mathbf {S} \mathbf {R} ,} whereS=UΣU{\displaystyle \mathbf {S} =\mathbf {U} \mathbf {\Sigma } \mathbf {U} ^{*}} is positive semidefinite and normal, andR=UV{\displaystyle \mathbf {R} =\mathbf {U} \mathbf {V} ^{*}} is unitary.

Thus, except for positive semi-definite matrices, the eigenvalue decomposition and SVD ofM,{\displaystyle \mathbf {M} ,} while related, differ: the eigenvalue decomposition isM=UDU1,{\displaystyle \mathbf {M} =\mathbf {U} \mathbf {D} \mathbf {U} ^{-1},} whereU{\displaystyle \mathbf {U} } is not necessarily unitary andD{\displaystyle \mathbf {D} } is not necessarily positive semi-definite, while the SVD isM=UΣV,{\displaystyle \mathbf {M} =\mathbf {U} \mathbf {\Sigma } \mathbf {V} ^{*},} whereΣ{\displaystyle \mathbf {\Sigma } } is diagonal and positive semi-definite, andU{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} } are unitary matrices that are not necessarily related except through the matrixM.{\displaystyle \mathbf {M} .} While onlynon-defective square matrices have an eigenvalue decomposition, anym×n{\displaystyle m\times n} matrix has a SVD.

Applications of the SVD

[edit]

Pseudoinverse

[edit]

The singular value decomposition can be used for computing thepseudoinverse of a matrix. The pseudoinverse of the matrixM{\displaystyle \mathbf {M} } with singular value decompositionM=UΣV{\displaystyle \mathbf {M} =\mathbf {U} \mathbf {\Sigma } \mathbf {V} ^{*}} isM+=VΣ+U,{\displaystyle \mathbf {M} ^{+}=\mathbf {V} {\boldsymbol {\Sigma }}^{+}\mathbf {U} ^{\ast },}whereΣ+{\displaystyle {\boldsymbol {\Sigma }}^{+}} is the pseudoinverse ofΣ{\displaystyle {\boldsymbol {\Sigma }}}, which is formed by replacing every non-zero diagonal entry by itsreciprocal and transposing the resulting matrix. The pseudoinverse is one way to solvelinear least squares problems.

Solving homogeneous linear equations

[edit]

A set ofhomogeneous linear equations can be written asAx=0{\displaystyle \mathbf {A} \mathbf {x} =\mathbf {0} } for a matrixA{\displaystyle \mathbf {A} }, vectorx{\displaystyle \mathbf {x} }, andzero vector0{\displaystyle \mathbf {0} }. A typical situation is thatA{\displaystyle \mathbf {A} } is known and a non-zerox{\displaystyle \mathbf {x} } is to be determined which satisfies the equation. Such anx{\displaystyle \mathbf {x} } belongs toA{\displaystyle \mathbf {A} }'snull space and is sometimes called a (right) null vector ofA.{\displaystyle \mathbf {A} .} The vectorx{\displaystyle \mathbf {x} } can be characterized as a right-singular vector corresponding to a singular value ofA{\displaystyle \mathbf {A} } that is zero. This observation means that ifA{\displaystyle \mathbf {A} } is asquare matrix and has no vanishing singular value, the equation has no non-zerox{\displaystyle \mathbf {x} } as a solution. It also means that if there are several vanishing singular values, any linear combination of the corresponding right-singular vectors is a valid solution. Analogously to the definition of a (right) null vector, a non-zerox{\displaystyle \mathbf {x} } satisfyingxA=0{\displaystyle \mathbf {x} ^{*}\mathbf {A} =\mathbf {0} } withx{\displaystyle \mathbf {x} ^{*}} denoting the conjugate transpose ofx{\displaystyle \mathbf {x} } is called a left null vector ofA.{\displaystyle \mathbf {A} .}

Total least squares minimization

[edit]

Atotal least squares problem seeks the vectorx{\displaystyle \mathbf {x} } that minimizes the2-norm of a vectorAx{\displaystyle \mathbf {A} \mathbf {x} } under the constraintx=1.{\displaystyle \|\mathbf {x} \|=1.} The solution turns out to be the right-singular vector ofA{\displaystyle \mathbf {A} } corresponding to the smallest singular value.

Range, null space and rank

[edit]

Another application of the SVD is that it provides an explicit representation of therange andnull space of a matrixM.{\displaystyle \mathbf {M} .} The right-singular vectors corresponding to vanishing singular values ofM{\displaystyle \mathbf {M} } span the null space ofM{\displaystyle \mathbf {M} } and the left-singular vectors corresponding to the non-zero singular values ofM{\displaystyle \mathbf {M} } span the range ofM.{\displaystyle \mathbf {M} .} For example, in the aboveexample the null space is spanned by the last row ofV{\displaystyle \mathbf {V} ^{*}} and the range is spanned by the first three columns ofU.{\displaystyle \mathbf {U} .}

As a consequence, therank ofM{\displaystyle \mathbf {M} } equals the number of non-zero singular values which is the same as the number of non-zero diagonal elements inΣ{\displaystyle \mathbf {\Sigma } }. In numerical linear algebra the singular values can be used to determine theeffective rank of a matrix, asrounding error may lead to small but non-zero singular values in a rank deficient matrix. Singular values beyond a significant gap are assumed to be numerically equivalent to zero.

Low-rank matrix approximation

[edit]

Some practical applications need to solve the problem ofapproximating a matrixM{\displaystyle \mathbf {M} } with another matrixM~{\displaystyle {\tilde {\mathbf {M} }}}, said to betruncated, which has a specific rankr{\displaystyle r}. In the case that the approximation is based on minimizing theFrobenius norm of the difference betweenM{\displaystyle \mathbf {M} } andM~{\displaystyle {\tilde {\mathbf {M} }}} under the constraint thatrank(M~)=r,{\displaystyle \operatorname {rank} {\bigl (}{\tilde {\mathbf {M} }}{\bigr )}=r,} it turns out that the solution is given by the SVD ofM,{\displaystyle \mathbf {M} ,} namelyM~=UΣ~V,{\displaystyle {\tilde {\mathbf {M} }}=\mathbf {U} {\tilde {\mathbf {\Sigma } }}\mathbf {V} ^{*},}whereΣ~{\displaystyle {\tilde {\mathbf {\Sigma } }}} is the same matrix asΣ{\displaystyle \mathbf {\Sigma } } except that it contains only ther{\displaystyle r} largest singular values (the other singular values are replaced by zero). This is known as theEckart–Young theorem, as it was proved by those two authors in 1936.[a]

Image compression

[edit]
Singular-value decomposition (SVD) image compression of a 1996 Chevrolet Corvette photograph. The original RGB image (upper-left) is compared with rank 1, 10, and 100 reconstructions.

One practical consequence of the low-rank approximation given by SVD is that agreyscale image represented as anm×n{\displaystyle m\times n} matrixA{\displaystyle \mathbf {A} }, can be efficiently represented by keeping the firstk{\displaystyle k} singular values and corresponding vectors. The truncated decomposition

Ak=j=1kσjujvjT{\displaystyle \mathbf {A} _{k}=\sum _{j=1}^{k}\sigma _{j}\mathbf {u} _{j}\mathbf {v} _{j}^{T}}gives an image with the best 2-norm error out of all rank k approximations. Thus, the task becomes finding an approximation that balances retaining perceptual fidelity with the number of vectors required to reconstruct the image. StoringAk{\displaystyle \mathbf {A} _{k}} requires onlyk(n+m+1){\displaystyle k(n+m+1)} floating-point numbers compared tonm{\displaystyle nm} integers. This same idea extends to color images by applying this operation to each channel or stacking the channels into one matrix.

Since the singular values of most natural images decay quickly, most of their variance is often captured by a smallk{\displaystyle k}. For a 1528 × 1225 greyscale image, we can achieve a relative error of.7%{\displaystyle .7\%} with as little ask=100{\displaystyle k=100}.[1] In practice, however, computing the SVD can be too computationally expensive and the resulting compression is typically less storage efficient than a specialized algorithm such asJPEG.

Separable models

[edit]

The SVD can be thought of as decomposing a matrix into a weighted, ordered sum of separable matrices. By separable, we mean that a matrixA{\displaystyle \mathbf {A} } can be written as anouter product of two vectorsA=uv,{\displaystyle \mathbf {A} =\mathbf {u} \otimes \mathbf {v} ,} or, in coordinates,Aij=uivj.{\displaystyle A_{ij}=u_{i}v_{j}.} Specifically, the matrixM{\displaystyle \mathbf {M} } can be decomposed as,

M=iAi=iσiUiVi.{\displaystyle \mathbf {M} =\sum _{i}\mathbf {A} _{i}=\sum _{i}\sigma _{i}\mathbf {U} _{i}\otimes \mathbf {V} _{i}.}

HereUi{\displaystyle \mathbf {U} _{i}} andVi{\displaystyle \mathbf {V} _{i}} are thei{\displaystyle i}-th columns of the corresponding SVD matrices,σi{\displaystyle \sigma _{i}} are the ordered singular values, and eachAi{\displaystyle \mathbf {A} _{i}} is separable. The SVD can be used to find the decomposition of an image processing filter into separable horizontal and vertical filters. Note that the number of non-zeroσi{\displaystyle \sigma _{i}} is exactly the rank of the matrix.[citation needed] Separable models often arise in biological systems, and the SVD factorization is useful to analyze such systems. For example, some visual area V1 simple cells' receptive fields can be well described[2] by aGabor filter in the space domain multiplied by a modulation function in the time domain. Thus, given a linear filter evaluated through, for example,reverse correlation, one can rearrange the two spatial dimensions into one dimension, thus yielding a two-dimensional filter (space, time) which can be decomposed through SVD. The first column ofU{\displaystyle \mathbf {U} } in the SVD factorization is then a Gabor while the first column ofV{\displaystyle \mathbf {V} } represents the time modulation (or vice versa). One may then define an index of separability

α=σ12iσi2,{\displaystyle \alpha ={\frac {\sigma _{1}^{2}}{\sum _{i}\sigma _{i}^{2}}},}

which is the fraction of the power in the matrix M which is accounted for by the first separable matrix in the decomposition.[3]

Nearest orthogonal matrix

[edit]

It is possible to use the SVD of a square matrixA{\displaystyle \mathbf {A} } to determine theorthogonal matrixQ{\displaystyle \mathbf {Q} } closest toA.{\displaystyle \mathbf {A} .} The closeness of fit is measured by theFrobenius norm ofQA.{\displaystyle \mathbf {Q} -\mathbf {A} .} The solution is the productUV.{\displaystyle \mathbf {U} \mathbf {V} ^{*}.}[4][5] This intuitively makes sense because an orthogonal matrix would have the decompositionUIV{\displaystyle \mathbf {U} \mathbf {I} \mathbf {V} ^{*}} whereI{\displaystyle \mathbf {I} } is the identity matrix, so that ifA=UΣV{\displaystyle \mathbf {A} =\mathbf {U} \mathbf {\Sigma } \mathbf {V} ^{*}} then the productA=UV{\displaystyle \mathbf {A} =\mathbf {U} \mathbf {V} ^{*}} amounts to replacing the singular values with ones. Equivalently, the solution is the unitary matrixR=UV{\displaystyle \mathbf {R} =\mathbf {U} \mathbf {V} ^{*}} of the Polar DecompositionM=RP=PR{\displaystyle \mathbf {M} =\mathbf {R} \mathbf {P} =\mathbf {P} '\mathbf {R} } in either order of stretch and rotation, as described above.

A similar problem, with interesting applications inshape analysis, is theorthogonal Procrustes problem, which consists of finding an orthogonal matrixQ{\displaystyle \mathbf {Q} } which most closely mapsA{\displaystyle \mathbf {A} } toB.{\displaystyle \mathbf {B} .} Specifically,Q=argminΩAΩBFsubject toΩTΩ=I,{\displaystyle \mathbf {Q} ={\underset {\Omega }{\operatorname {argmin} }}\|\mathbf {A} {\boldsymbol {\Omega }}-\mathbf {B} \|_{F}\quad {\text{subject to}}\quad {\boldsymbol {\Omega }}^{\operatorname {T} }{\boldsymbol {\Omega }}=\mathbf {I} ,}whereF{\displaystyle \|\cdot \|_{F}} denotes the Frobenius norm.

This problem is equivalent to finding the nearest orthogonal matrix to a given matrixM=ATB{\displaystyle \mathbf {M} =\mathbf {A} ^{\operatorname {T} }\mathbf {B} }.

The Kabsch algorithm

[edit]

TheKabsch algorithm (calledWahba's problem in other fields) uses SVD to compute the optimal rotation (with respect to least-squares minimization) that will align a set of points with a corresponding set of points. It is used, among other applications, to compare the structures of molecules.

Principal Component Analysis

[edit]

The SVD can be used to construct the principal components inprincipal component analysis as follows:[6]

LetXRN×p{\displaystyle \mathbf {X} \in \mathbb {R} ^{N\times p}} be a data matrix where each of theN{\displaystyle N} rows is a (feature-wise) mean-centered observation, each of dimensionp{\displaystyle p}.

The SVD ofX{\displaystyle \mathbf {X} } is:X=VΣU{\displaystyle \mathbf {X} =\mathbf {V} {\boldsymbol {\Sigma }}\mathbf {U} ^{\ast }}

We see thatVΣ{\displaystyle \mathbf {V} {\boldsymbol {\Sigma }}} contains the scores of the rows ofX{\displaystyle \mathbf {X} } (i.e. each observation), andU{\displaystyle \mathbf {U} } is the matrix whose columns are principal component loading vectors.[6]

Signal processing

[edit]

The SVD and pseudoinverse have been successfully applied tosignal processing,[7]image processing[8] andbig data (e.g., in genomic signal processing).[9][10][11][12]

Other examples

[edit]

The SVD is also applied extensively to the study of linearinverse problems and is useful in the analysis of regularization methods such as that ofTikhonov. It is widely used in statistics, where it is related toprincipal component analysis and tocorrespondence analysis, and insignal processing andpattern recognition. It is also used in output-onlymodal analysis, where the non-scaledmode shapes can be determined from the singular vectors. Yet another usage islatent semantic indexing in natural-language text processing.

In general numerical computation involving linear or linearized systems, there is a universal constant that characterizes the regularity or singularity of a problem, which is the system's "condition number"κ:=σmax/σmin{\displaystyle \kappa :=\sigma _{\text{max}}/\sigma _{\text{min}}}. It often controls the error rate or convergence rate of a given computational scheme on such systems.[13][14]

The SVD also plays a crucial role in the field ofquantum information, in a form often referred to as theSchmidt decomposition. Through it, states of two quantum systems are naturally decomposed, providing a necessary and sufficient condition for them to beentangled: if the rank of theΣ{\displaystyle \mathbf {\Sigma } } matrix is larger than one.

One application of SVD to rather large matrices is innumerical weather prediction, whereLanczos methods are used to estimate the most linearly quickly growing few perturbations to the central numerical weather prediction over a given initial forward time period; i.e., the singular vectors corresponding to the largest singular values of the linearized propagator for the global weather over that time interval. The output singular vectors in this case are entire weather systems. These perturbations are then run through the full nonlinear model to generate anensemble forecast, giving a handle on some of the uncertainty that should be allowed for around the current central prediction.

SVD has also been applied to reduced order modelling. The aim of reduced order modelling is to reduce the number of degrees of freedom in a complex system which is to be modeled. SVD was coupled withradial basis functions to interpolate solutions to three-dimensional unsteady flow problems.[15]

Interestingly, SVD has been used to improve gravitational waveform modeling by the ground-based gravitational-wave interferometer aLIGO.[16] SVD can help to increase the accuracy and speed of waveform generation to support gravitational-waves searches and update two different waveform models.

Singular value decomposition is used inrecommender systems to predict people's item ratings.[17] Distributed algorithms have been developed for the purpose of calculating the SVD on clusters of commodity machines.[18]

Low-rank SVD has been applied for hotspot detection from spatiotemporal data with application to diseaseoutbreak detection.[19] A combination of SVD andhigher-order SVD also has been applied for real time event detection from complex data streams (multivariate data with space and time dimensions) indisease surveillance.[20]

Inastrodynamics, the SVD and its variants are used as an option to determine suitable maneuver directions for transfer trajectory design[21] andorbital station-keeping.[22]

The SVD can be used to measure the similarity between real-valued matrices.[23] By measuring the angles between the singular vectors, the inherent two-dimensional structure of matrices is accounted for. This method was shown to outperformcosine similarity andFrobenius norm in most cases, including brain activity measurements fromneuroscience experiments.

Proof of existence

[edit]

An eigenvalueλ{\displaystyle \lambda } of a matrixM{\displaystyle \mathbf {M} } is characterized by the algebraic relationMu=λu.{\displaystyle \mathbf {M} \mathbf {u} =\lambda \mathbf {u} .} WhenM{\displaystyle \mathbf {M} } isHermitian, a variational characterization is also available. LetM{\displaystyle \mathbf {M} } be a realn×n{\displaystyle n\times n}symmetric matrix. Define

f:{RnRxxTMx{\displaystyle f:\left\{{\begin{aligned}\mathbb {R} ^{n}&\to \mathbb {R} \\\mathbf {x} &\mapsto \mathbf {x} ^{\operatorname {T} }\mathbf {M} \mathbf {x} \end{aligned}}\right.}

By theextreme value theorem, this continuous function attains a maximum at someu{\displaystyle \mathbf {u} } when restricted to the unit sphere{x=1}.{\displaystyle \{\|\mathbf {x} \|=1\}.} By theLagrange multipliers theorem,u{\displaystyle \mathbf {u} } necessarily satisfiesuTMuλuTu=0{\displaystyle \nabla \mathbf {u} ^{\operatorname {T} }\mathbf {M} \mathbf {u} -\lambda \cdot \nabla \mathbf {u} ^{\operatorname {T} }\mathbf {u} =\mathbf {0} }for some real numberλ.{\displaystyle \lambda .} The nabla symbol,{\displaystyle \nabla }, is thedel operator (differentiation with respect tox{\displaystyle \mathbf {x} }). Using the symmetry ofM{\displaystyle \mathbf {M} } we obtainxTMxλxTx=2(MλI)x.{\displaystyle \nabla \mathbf {x} ^{\operatorname {T} }\mathbf {M} \mathbf {x} -\lambda \cdot \nabla \mathbf {x} ^{\operatorname {T} }\mathbf {x} =2(\mathbf {M} -\lambda \mathbf {I} )\mathbf {x} .}

ThereforeMu=λu,{\displaystyle \mathbf {M} \mathbf {u} =\lambda \mathbf {u} ,} sou{\displaystyle \mathbf {u} } is a unit length eigenvector ofM.{\displaystyle \mathbf {M} .} For every unit length eigenvectorv{\displaystyle \mathbf {v} } ofM{\displaystyle \mathbf {M} } its eigenvalue isf(v),{\displaystyle f(\mathbf {v} ),} soλ{\displaystyle \lambda } is the largest eigenvalue ofM.{\displaystyle \mathbf {M} .} The same calculation performed on the orthogonal complement ofu{\displaystyle \mathbf {u} } gives the next largest eigenvalue and so on. The complex Hermitian case is similar; theref(x)=xMx{\displaystyle f(\mathbf {x} )=\mathbf {x} ^{*}\mathbf {M} \mathbf {x} } is a real-valued function of2n{\displaystyle 2n} real variables.

Singular values are similar in that they can be described algebraically or from variational principles. Although, unlike the eigenvalue case, Hermiticity, or symmetry, ofM{\displaystyle \mathbf {M} } is no longer required.

This section gives these two arguments for existence of singular value decomposition.

Based on the spectral theorem

[edit]

LetM{\displaystyle \mathbf {M} } be anm×n{\displaystyle m\times n} complex matrix. SinceMM{\displaystyle \mathbf {M} ^{*}\mathbf {M} } is positive semi-definite and Hermitian, by thespectral theorem, there exists ann×n{\displaystyle n\times n} unitary matrixV{\displaystyle \mathbf {V} } such thatVMMV=D¯=[D000],{\displaystyle \mathbf {V} ^{*}\mathbf {M} ^{*}\mathbf {M} \mathbf {V} ={\bar {\mathbf {D} }}={\begin{bmatrix}\mathbf {D} &0\\0&0\end{bmatrix}},}whereD{\displaystyle \mathbf {D} } is diagonal and positive definite, of dimension×{\displaystyle \ell \times \ell }, with{\displaystyle \ell } the number of non-zero eigenvalues ofMM{\displaystyle \mathbf {M} ^{*}\mathbf {M} } (which can be shown to verifymin(n,m){\displaystyle \ell \leq \min(n,m)}). Note thatV{\displaystyle \mathbf {V} } is here by definition a matrix whosei{\displaystyle i}-th column is thei{\displaystyle i}-th eigenvector ofMM{\displaystyle \mathbf {M} ^{*}\mathbf {M} }, corresponding to the eigenvalueD¯ii{\displaystyle {\bar {\mathbf {D} }}_{ii}}. Moreover, thej{\displaystyle j}-th column ofV{\displaystyle \mathbf {V} }, forj>{\displaystyle j>\ell }, is an eigenvector ofMM{\displaystyle \mathbf {M} ^{*}\mathbf {M} } with eigenvalueD¯jj=0{\displaystyle {\bar {\mathbf {D} }}_{jj}=0}. This can be expressed by writingV{\displaystyle \mathbf {V} } asV=[V1V2]{\displaystyle \mathbf {V} ={\begin{bmatrix}\mathbf {V} _{1}&\mathbf {V} _{2}\end{bmatrix}}}, where the columns ofV1{\displaystyle \mathbf {V} _{1}} andV2{\displaystyle \mathbf {V} _{2}} therefore contain the eigenvectors ofMM{\displaystyle \mathbf {M} ^{*}\mathbf {M} } corresponding to non-zero and zero eigenvalues, respectively. Using this rewriting ofV{\displaystyle \mathbf {V} }, the equation becomes:[V1V2]MM[V1V2]=[V1MMV1V1MMV2V2MMV1V2MMV2]=[D000].{\displaystyle {\begin{bmatrix}\mathbf {V} _{1}^{*}\\\mathbf {V} _{2}^{*}\end{bmatrix}}\mathbf {M} ^{*}\mathbf {M} \,{\begin{bmatrix}\mathbf {V} _{1}&\!\!\mathbf {V} _{2}\end{bmatrix}}={\begin{bmatrix}\mathbf {V} _{1}^{*}\mathbf {M} ^{*}\mathbf {M} \mathbf {V} _{1}&\mathbf {V} _{1}^{*}\mathbf {M} ^{*}\mathbf {M} \mathbf {V} _{2}\\\mathbf {V} _{2}^{*}\mathbf {M} ^{*}\mathbf {M} \mathbf {V} _{1}&\mathbf {V} _{2}^{*}\mathbf {M} ^{*}\mathbf {M} \mathbf {V} _{2}\end{bmatrix}}={\begin{bmatrix}\mathbf {D} &0\\0&0\end{bmatrix}}.}

This implies thatV1MMV1=D,V2MMV2=0.{\displaystyle \mathbf {V} _{1}^{*}\mathbf {M} ^{*}\mathbf {M} \mathbf {V} _{1}=\mathbf {D} ,\quad \mathbf {V} _{2}^{*}\mathbf {M} ^{*}\mathbf {M} \mathbf {V} _{2}=\mathbf {0} .}

Moreover, the second equation impliesMV2=0{\displaystyle \mathbf {M} \mathbf {V} _{2}=\mathbf {0} }.[b] Finally, the unitary-ness ofV{\displaystyle \mathbf {V} } translates, in terms ofV1{\displaystyle \mathbf {V} _{1}} andV2{\displaystyle \mathbf {V} _{2}}, into the following conditions:V1V1=I1,V2V2=I2,V1V1+V2V2=I12,{\displaystyle {\begin{aligned}\mathbf {V} _{1}^{*}\mathbf {V} _{1}&=\mathbf {I} _{1},\\\mathbf {V} _{2}^{*}\mathbf {V} _{2}&=\mathbf {I} _{2},\\\mathbf {V} _{1}\mathbf {V} _{1}^{*}+\mathbf {V} _{2}\mathbf {V} _{2}^{*}&=\mathbf {I} _{12},\end{aligned}}}where the subscripts on the identity matrices are used to remark that they are of different dimensions.

Let us now defineU1=MV1D12.{\displaystyle \mathbf {U} _{1}=\mathbf {M} \mathbf {V} _{1}\mathbf {D} ^{-{\frac {1}{2}}}.}

Then,U1D12V1=MV1D12D12V1=M(IV2V2)=M(MV2)V2=M,{\displaystyle \mathbf {U} _{1}\mathbf {D} ^{\frac {1}{2}}\mathbf {V} _{1}^{*}=\mathbf {M} \mathbf {V} _{1}\mathbf {D} ^{-{\frac {1}{2}}}\mathbf {D} ^{\frac {1}{2}}\mathbf {V} _{1}^{*}=\mathbf {M} (\mathbf {I} -\mathbf {V} _{2}\mathbf {V} _{2}^{*})=\mathbf {M} -(\mathbf {M} \mathbf {V} _{2})\mathbf {V} _{2}^{*}=\mathbf {M} ,}

sinceMV2=0.{\displaystyle \mathbf {M} \mathbf {V} _{2}=\mathbf {0} .} This can be also seen as immediate consequence of the fact thatMV1V1=M{\displaystyle \mathbf {M} \mathbf {V} _{1}\mathbf {V} _{1}^{*}=\mathbf {M} }. This is equivalent to the observation that if{vi}i=1{\displaystyle \{{\boldsymbol {v}}_{i}\}_{i=1}^{\ell }} is the set of eigenvectors ofMM{\displaystyle \mathbf {M} ^{*}\mathbf {M} } corresponding to non-vanishing eigenvalues{λi}i=1{\displaystyle \{\lambda _{i}\}_{i=1}^{\ell }}, then{Mvi}i=1{\displaystyle \{\mathbf {M} {\boldsymbol {v}}_{i}\}_{i=1}^{\ell }} is a set of orthogonal vectors, and{λi1/2Mvi}|i=1{\displaystyle {\bigl \{}\lambda _{i}^{-1/2}\mathbf {M} {\boldsymbol {v}}_{i}{\bigr \}}{\vphantom {|}}_{i=1}^{\ell }} is a (generally not complete) set oforthonormal vectors. This matches with the matrix formalism used above denoting withV1{\displaystyle \mathbf {V} _{1}} the matrix whose columns are{vi}i=1{\displaystyle \{{\boldsymbol {v}}_{i}\}_{i=1}^{\ell }}, withV2{\displaystyle \mathbf {V} _{2}} the matrix whose columns are the eigenvectors ofMM{\displaystyle \mathbf {M} ^{*}\mathbf {M} } with vanishing eigenvalue, andU1{\displaystyle \mathbf {U} _{1}} the matrix whose columns are the vectors{λi1/2Mvi}|i=1{\displaystyle {\bigl \{}\lambda _{i}^{-1/2}\mathbf {M} {\boldsymbol {v}}_{i}{\bigr \}}{\vphantom {|}}_{i=1}^{\ell }}.

We see that this is almost the desired result, except thatU1{\displaystyle \mathbf {U} _{1}} andV1{\displaystyle \mathbf {V} _{1}} are in general not unitary, since they might not be square. However, we do know that the number of rows ofU1{\displaystyle \mathbf {U} _{1}} is no smaller than the number of columns, since the dimensions ofD{\displaystyle \mathbf {D} } is no greater thanm{\displaystyle m} andn{\displaystyle n}. Also, sinceU1U1=D12V1MMV1D12=D12DD12=I1,{\displaystyle \mathbf {U} _{1}^{*}\mathbf {U} _{1}=\mathbf {D} ^{-{\frac {1}{2}}}\mathbf {V} _{1}^{*}\mathbf {M} ^{*}\mathbf {M} \mathbf {V} _{1}\mathbf {D} ^{-{\frac {1}{2}}}=\mathbf {D} ^{-{\frac {1}{2}}}\mathbf {D} \mathbf {D} ^{-{\frac {1}{2}}}=\mathbf {I_{1}} ,}the columns inU1{\displaystyle \mathbf {U} _{1}} are orthonormal and can be extended to an orthonormal basis. This means that we can chooseU2{\displaystyle \mathbf {U} _{2}} such thatU=[U1U2]{\displaystyle \mathbf {U} ={\begin{bmatrix}\mathbf {U} _{1}&\mathbf {U} _{2}\end{bmatrix}}} is unitary.

ForV1{\displaystyle \mathbf {V} _{1}} we already haveV2{\displaystyle \mathbf {V} _{2}} to make it unitary. Now, defineΣ=[[D12000]0],{\displaystyle \mathbf {\Sigma } ={\begin{bmatrix}{\begin{bmatrix}\mathbf {D} ^{\frac {1}{2}}&0\\0&0\end{bmatrix}}\\0\end{bmatrix}},}

where extra zero rows are addedor removed to make the number of zero rows equal the number of columns ofU2,{\displaystyle \mathbf {U} _{2},} and hence the overall dimensions ofΣ{\displaystyle \mathbf {\Sigma } } equal tom×n{\displaystyle m\times n}. Then[U1U2][[D12000]0][V1V2]=[U1U2][D12V10]=U1D12V1=M,{\displaystyle {\begin{bmatrix}\mathbf {U} _{1}&\mathbf {U} _{2}\end{bmatrix}}{\begin{bmatrix}{\begin{bmatrix}\mathbf {} D^{\frac {1}{2}}&0\\0&0\end{bmatrix}}\\0\end{bmatrix}}{\begin{bmatrix}\mathbf {V} _{1}&\mathbf {V} _{2}\end{bmatrix}}^{*}={\begin{bmatrix}\mathbf {U} _{1}&\mathbf {U} _{2}\end{bmatrix}}{\begin{bmatrix}\mathbf {D} ^{\frac {1}{2}}\mathbf {V} _{1}^{*}\\0\end{bmatrix}}=\mathbf {U} _{1}\mathbf {D} ^{\frac {1}{2}}\mathbf {V} _{1}^{*}=\mathbf {M} ,}which is the desired result:M=UΣV.{\displaystyle \mathbf {M} =\mathbf {U} \mathbf {\Sigma } \mathbf {V} ^{*}.}

Notice the argument could begin with diagonalizingMM{\displaystyle \mathbf {M} \mathbf {M} ^{*}} rather thanMM{\displaystyle \mathbf {M} ^{*}\mathbf {M} } (This shows directly thatMM{\displaystyle \mathbf {M} \mathbf {M} ^{*}} andMM{\displaystyle \mathbf {M} ^{*}\mathbf {M} } have the same non-zero eigenvalues).

Based on variational characterization

[edit]

The singular values can also be characterized as the maxima ofuTMv,{\displaystyle \mathbf {u} ^{\mathrm {T} }\mathbf {M} \mathbf {v} ,} considered as a function ofu{\displaystyle \mathbf {u} } andv,{\displaystyle \mathbf {v} ,} over particular subspaces. The singular vectors are the values ofu{\displaystyle \mathbf {u} } andv{\displaystyle \mathbf {v} } where these maxima are attained.

LetM{\displaystyle \mathbf {M} } denote anm×n{\displaystyle m\times n} matrix with real entries. LetSk1{\displaystyle S^{k-1}} be the unit(k1){\displaystyle (k-1)}-sphere inRk{\displaystyle \mathbb {R} ^{k}}, and defineσ(u,v)=uTMv,{\displaystyle \sigma (\mathbf {u} ,\mathbf {v} )=\mathbf {u} ^{\operatorname {T} }\mathbf {M} \mathbf {v} ,}uSm1,{\displaystyle \mathbf {u} \in S^{m-1},}vSn1.{\displaystyle \mathbf {v} \in S^{n-1}.}

Consider the functionσ{\displaystyle \sigma } restricted toSm1×Sn1.{\displaystyle S^{m-1}\times S^{n-1}.} Since bothSm1{\displaystyle S^{m-1}} andSn1{\displaystyle S^{n-1}} arecompact sets, theirproduct is also compact. Furthermore, sinceσ{\displaystyle \sigma } is continuous, it attains a largest value for at least one pair of vectorsu{\displaystyle \mathbf {u} } inSm1{\displaystyle S^{m-1}} andv{\displaystyle \mathbf {v} } inSn1.{\displaystyle S^{n-1}.} This largest value is denotedσ1{\displaystyle \sigma _{1}} and the corresponding vectors are denotedu1{\displaystyle \mathbf {u} _{1}} andv1.{\displaystyle \mathbf {v} _{1}.} Sinceσ1{\displaystyle \sigma _{1}} is the largest value ofσ(u,v){\displaystyle \sigma (\mathbf {u} ,\mathbf {v} )} it must be non-negative. If it were negative, changing the sign of eitheru1{\displaystyle \mathbf {u} _{1}} orv1{\displaystyle \mathbf {v} _{1}} would make it positive and therefore larger.

Statementu1{\displaystyle \mathbf {u} _{1}} andv1{\displaystyle \mathbf {v} _{1}} are left and right-singular vectors ofM{\displaystyle \mathbf {M} } with corresponding singular valueσ1.{\displaystyle \sigma _{1}.}

Proof

Similar to the eigenvalues case, by assumption the two vectors satisfy the Lagrange multiplier equation:σ=uTMvλ1uTuλ2vTv{\displaystyle \nabla \sigma =\nabla \mathbf {u} ^{\operatorname {T} }\mathbf {M} \mathbf {v} -\lambda _{1}\cdot \nabla \mathbf {u} ^{\operatorname {T} }\mathbf {u} -\lambda _{2}\cdot \nabla \mathbf {v} ^{\operatorname {T} }\mathbf {v} }

After some algebra, this becomesMv1=2λ1u1+0,MTu1=0+2λ2v1.{\displaystyle {\begin{aligned}\mathbf {M} \mathbf {v} _{1}&=2\lambda _{1}\mathbf {u} _{1}+0,\\\mathbf {M} ^{\operatorname {T} }\mathbf {u} _{1}&=0+2\lambda _{2}\mathbf {v} _{1}.\end{aligned}}}

Multiplying the first equation from left byu1T{\displaystyle \mathbf {u} _{1}^{\textrm {T}}} and the second equation from left byv1T{\displaystyle \mathbf {v} _{1}^{\textrm {T}}} and takingu=v=1{\displaystyle \|\mathbf {u} \|=\|\mathbf {v} \|=1} into account givesσ1=2λ1=2λ2.{\displaystyle \sigma _{1}=2\lambda _{1}=2\lambda _{2}.}

Plugging this into the pair of equations above, we haveMv1=σ1u1,MTu1=σ1v1.{\displaystyle {\begin{aligned}\mathbf {M} \mathbf {v} _{1}&=\sigma _{1}\mathbf {u} _{1},\\\mathbf {M} ^{\operatorname {T} }\mathbf {u} _{1}&=\sigma _{1}\mathbf {v} _{1}.\end{aligned}}}

This proves the statement.

More singular vectors and singular values can be found by maximizingσ(u,v){\displaystyle \sigma (\mathbf {u} ,\mathbf {v} )} over normalizedu{\displaystyle \mathbf {u} } andv{\displaystyle \mathbf {v} } which are orthogonal tou1{\displaystyle \mathbf {u} _{1}} andv1,{\displaystyle \mathbf {v} _{1},} respectively.

The passage from real to complex is similar to the eigenvalue case.

Calculating the SVD

[edit]

One-sided Jacobi algorithm

[edit]

One-sided Jacobi algorithm is an iterative algorithm,[24]where a matrix is iteratively transformed into a matrix with orthogonal columns. The elementary iteration is given as aJacobi rotation,MMJ(p,q,θ),{\displaystyle M\leftarrow MJ(p,q,\theta ),}where the angleθ{\displaystyle \theta } of the Jacobi rotation matrixJ(p,q,θ){\displaystyle J(p,q,\theta )} is chosen such that after the rotation the columns with numbersp{\displaystyle p} andq{\displaystyle q} become orthogonal. The indices(p,q){\displaystyle (p,q)} are swept cyclically,(p=1m,q=p+1m){\displaystyle (p=1\dots m,q=p+1\dots m)}, wherem{\displaystyle m} is the number of columns.

After the algorithm has converged, the singular value decompositionM=USVT{\displaystyle M=USV^{T}} is recovered as follows: the matrixV{\displaystyle V} is the accumulation of Jacobi rotation matrices, the matrixU{\displaystyle U} is given bynormalising the columns of the transformed matrixM{\displaystyle M}, and the singular values are given as the norms of the columns of the transformed matrixM{\displaystyle M}.

Two-sided Jacobi algorithm

[edit]

Two-sided Jacobi SVD algorithm—a generalization of theJacobi eigenvalue algorithm—is an iterative algorithm where a square matrix is iteratively transformed into a diagonal matrix. If the matrix is not square theQR decomposition is performed first and then the algorithm is applied to theR{\displaystyle R} matrix. The elementary iteration zeroes a pair of off-diagonal elements by first applying aGivens rotation to symmetrize the pair of elements and then applying aJacobi transformation to zero them,MJTGMJ{\displaystyle M\leftarrow J^{T}GMJ}whereG{\displaystyle G} is the Givens rotation matrix with the angle chosen such that the given pair of off-diagonal elements become equal after the rotation, and whereJ{\displaystyle J} is the Jacobi transformation matrix that zeroes these off-diagonal elements. The iterations proceeds exactly as in the Jacobi eigenvalue algorithm: by cyclic sweeps over all off-diagonal elements.

After the algorithm has converged the resulting diagonal matrix contains the singular values.The matricesU{\displaystyle U} andV{\displaystyle V} are accumulated as follows:UUGTJ,VVJ.{\displaystyle {\begin{aligned}U&\leftarrow UG^{T}J,\\V&\leftarrow VJ.\end{aligned}}}

Numerical approach

[edit]

The singular value decomposition can be computed using the following observations:

The SVD of a matrixM{\displaystyle \mathbf {M} } is typically computed by a two-step procedure. In the first step, the matrix is reduced to abidiagonal matrix. This takesorderO(mn2){\displaystyle O(mn^{2})} floating-point operations (flop), assuming thatmn.{\displaystyle m\geq n.} The second step is to compute the SVD of the bidiagonal matrix. This step can only be done with aniterative method (as witheigenvalue algorithms). However, in practice it suffices to compute the SVD up to a certain precision, like themachine epsilon. If this precision is considered constant, then the second step takesO(n){\displaystyle O(n)} iterations, each costingO(n){\displaystyle O(n)} flops. Thus, the first step is more expensive, and the overall cost isO(mn2){\displaystyle O(mn^{2})} flops.[25]

The first step can be done usingHouseholder reflections for a cost of4mn24n3/3{\displaystyle 4mn^{2}-4n^{3}/3} flops, assuming that only the singular values are needed and not the singular vectors. Ifm{\displaystyle m} is much larger thann{\displaystyle n} then it is advantageous to first reduce the matrixM{\displaystyle \mathbf {M} } to a triangular matrix with theQR decomposition and then use Householder reflections to further reduce the matrix to bidiagonal form; the combined cost is2mn2+2n3{\displaystyle 2mn^{2}+2n^{3}} flops.[25]

The second step can be done by a variant of theQR algorithm for the computation of eigenvalues, which was first described by Golub and Kahan in 1965.[26] TheLAPACK subroutineDBDSQR[27] implements this iterative method, with some modifications to cover the case where the singular values are very small.[28] Together with a first step using Householder reflections and, if appropriate, QR decomposition, this forms theDGESVD routine[29] for the computation of the singular value decomposition.

The same algorithm is implemented in theGNU Scientific Library (GSL). The GSL also offers an alternative method that uses a one-sidedJacobi orthogonalization in step 2.[30] This method computes the SVD of the bidiagonal matrix by solving a sequence of2×2{\displaystyle 2\times 2} SVD problems, similar to how theJacobi eigenvalue algorithm solves a sequence of2×2{\displaystyle 2\times 2} eigenvalue methods.[31] Yet another method for step 2 uses the idea ofdivide-and-conquer eigenvalue algorithms.[25]

There is an alternative way that does not explicitly use the eigenvalue decomposition.[32] Usually the singular value problem of a matrixM{\displaystyle \mathbf {M} } is converted into an equivalent symmetric eigenvalue problem such asMM,{\displaystyle \mathbf {M} \mathbf {M} ^{*},}MM,{\displaystyle \mathbf {M} ^{*}\mathbf {M} ,} or

[0MM0].{\displaystyle {\begin{bmatrix}\mathbf {0} &\mathbf {M} \\\mathbf {M} ^{*}&\mathbf {0} \end{bmatrix}}.}

The approaches that use eigenvalue decompositions are based on theQR algorithm, which is well-developed to be stable and fast. Note that the singular values are real and right- and left- singular vectors are not required to form similarity transformations. One can iteratively alternate between theQR decomposition and theLQ decomposition to find the real diagonalHermitian matrices. TheQR decomposition givesMQR{\displaystyle \mathbf {M} \Rightarrow \mathbf {Q} \mathbf {R} } and theLQ decomposition ofR{\displaystyle \mathbf {R} } givesRLP.{\displaystyle \mathbf {R} \Rightarrow \mathbf {L} \mathbf {P} ^{*}.} Thus, at every iteration, we haveMQLP,{\displaystyle \mathbf {M} \Rightarrow \mathbf {Q} \mathbf {L} \mathbf {P} ^{*},} updateML{\displaystyle \mathbf {M} \Leftarrow \mathbf {L} } and repeat the orthogonalizations. Eventually,[clarification needed] this iteration betweenQR decomposition andLQ decomposition produces left- and right- unitary singular matrices. This approach cannot readily be accelerated, as the QR algorithm can with spectral shifts or deflation. This is because the shift method is not easily defined without using similarity transformations. However, this iterative approach is very simple to implement, so is a good choice when speed does not matter. This method also provides insight into how purely orthogonal/unitary transformations can obtain the SVD.

Analytic result of 2 × 2 SVD

[edit]

The singular values of a2×2{\displaystyle 2\times 2} matrix can be found analytically. Let the matrix beM=z0I+z1σ1+z2σ2+z3σ3{\displaystyle \mathbf {M} =z_{0}\mathbf {I} +z_{1}\sigma _{1}+z_{2}\sigma _{2}+z_{3}\sigma _{3}}

whereziC{\displaystyle z_{i}\in \mathbb {C} } are complex numbers that parameterize the matrix,I{\displaystyle \mathbf {I} } is the identity matrix, andσi{\displaystyle \sigma _{i}} denote thePauli matrices. Then its two singular values are given by

σ±=|z0|2+|z1|2+|z2|2+|z3|2±(|z0|2+|z1|2+|z2|2+|z3|2)2|z02z12z22z32|2=|z0|2+|z1|2+|z2|2+|z3|2±2(Rez0z1)2+(Rez0z2)2+(Rez0z3)2+(Imz1z2)2+(Imz2z3)2+(Imz3z1)2{\displaystyle {\begin{aligned}\sigma _{\pm }&={\sqrt {|z_{0}|^{2}+|z_{1}|^{2}+|z_{2}|^{2}+|z_{3}|^{2}\pm {\sqrt {{\bigl (}|z_{0}|^{2}+|z_{1}|^{2}+|z_{2}|^{2}+|z_{3}|^{2}{\bigr )}^{2}-|z_{0}^{2}-z_{1}^{2}-z_{2}^{2}-z_{3}^{2}|^{2}}}}}\\&={\sqrt {|z_{0}|^{2}+|z_{1}|^{2}+|z_{2}|^{2}+|z_{3}|^{2}\pm 2{\sqrt {(\operatorname {Re} z_{0}z_{1}^{*})^{2}+(\operatorname {Re} z_{0}z_{2}^{*})^{2}+(\operatorname {Re} z_{0}z_{3}^{*})^{2}+(\operatorname {Im} z_{1}z_{2}^{*})^{2}+(\operatorname {Im} z_{2}z_{3}^{*})^{2}+(\operatorname {Im} z_{3}z_{1}^{*})^{2}}}}}\end{aligned}}}

Reduced SVDs

[edit]
Visualization of Reduced SVD variants. From top to bottom: 1: Full SVD, 2: Thin SVD (remove columns ofU not corresponding to rows ofV*), 3: Compact SVD (remove vanishing singular values and corresponding columns/rows inU andV*), 4: Truncated SVD (keep only largest t singular values and corresponding columns/rows inU andV*)

In applications it is quite unusual for the full SVD, including a full unitary decomposition of the null-space of the matrix, to be required. Instead, it is often sufficient (as well as faster, and more economical for storage) to compute a reduced version of the SVD. The following can be distinguished for anm×n{\displaystyle m\times n} matrixM{\displaystyle \mathbf {M} } of rankr{\displaystyle r}:

Thin SVD

[edit]

The thin, or economy-sized, SVD of a matrixM{\displaystyle \mathbf {M} } is given by[33]

M=UkΣkVk,{\displaystyle \mathbf {M} =\mathbf {U} _{k}\mathbf {\Sigma } _{k}\mathbf {V} _{k}^{*},}

wherek=min(m,n),{\displaystyle k=\min(m,n),} the matricesUk{\displaystyle \mathbf {U} _{k}} andVk{\displaystyle \mathbf {V} _{k}} contain only the firstk{\displaystyle k} columns ofU{\displaystyle \mathbf {U} } andV,{\displaystyle \mathbf {V} ,} andΣk{\displaystyle \mathbf {\Sigma } _{k}} contains only the firstk{\displaystyle k} singular values fromΣ.{\displaystyle \mathbf {\Sigma } .} The matrixUk{\displaystyle \mathbf {U} _{k}} is thusm×k,{\displaystyle m\times k,}Σk{\displaystyle \mathbf {\Sigma } _{k}} isk×k{\displaystyle k\times k} diagonal, andVk{\displaystyle \mathbf {V} _{k}^{*}} isk×n.{\displaystyle k\times n.}

The thin SVD uses significantly less space and computation time ifkmax(m,n).{\displaystyle k\ll \max(m,n).} The first stage in its calculation will usually be aQR decomposition ofM,{\displaystyle \mathbf {M} ,} which can make for a significantly quicker calculation in this case.

Compact SVD

[edit]

The compact SVD of a matrixM{\displaystyle \mathbf {M} } is given by

M=UrΣrVr.{\displaystyle \mathbf {M} =\mathbf {U} _{r}\mathbf {\Sigma } _{r}\mathbf {V} _{r}^{*}.}

Only ther{\displaystyle r} column vectors ofU{\displaystyle \mathbf {U} } andr{\displaystyle r} row vectors ofV{\displaystyle \mathbf {V} ^{*}} corresponding to the non-zero singular valuesΣr{\displaystyle \mathbf {\Sigma } _{r}} are calculated. The remaining vectors ofU{\displaystyle \mathbf {U} } andV{\displaystyle \mathbf {V} ^{*}} are not calculated. This is quicker and more economical than the thin SVD ifrmin(m,n).{\displaystyle r\ll \min(m,n).} The matrixUr{\displaystyle \mathbf {U} _{r}} is thusm×r,{\displaystyle m\times r,}Σr{\displaystyle \mathbf {\Sigma } _{r}} isr×r{\displaystyle r\times r} diagonal, andVr{\displaystyle \mathbf {V} _{r}^{*}} isr×n.{\displaystyle r\times n.}

Truncated SVD

[edit]

In many applications the numberr{\displaystyle r} of the non-zero singular values is large making even the Compact SVD impractical to compute. In such cases, the smallest singular values may need to be truncated to compute onlytr{\displaystyle t\ll r} non-zero singular values. The truncated SVD is no longer an exact decomposition of the original matrixM,{\displaystyle \mathbf {M} ,} but rather provides the optimallow-rank matrix approximationM~{\displaystyle {\tilde {\mathbf {M} }}} by any matrix of a fixed rankt{\displaystyle t}

M~=UtΣtVt,{\displaystyle {\tilde {\mathbf {M} }}=\mathbf {U} _{t}\mathbf {\Sigma } _{t}\mathbf {V} _{t}^{*},}

where matrixUt{\displaystyle \mathbf {U} _{t}} ism×t,{\displaystyle m\times t,}Σt{\displaystyle \mathbf {\Sigma } _{t}} ist×t{\displaystyle t\times t} diagonal, andVt{\displaystyle \mathbf {V} _{t}^{*}} ist×n.{\displaystyle t\times n.} Only thet{\displaystyle t} column vectors ofU{\displaystyle \mathbf {U} } andt{\displaystyle t} row vectors ofV{\displaystyle \mathbf {V} ^{*}} corresponding to thet{\displaystyle t} largest singular valuesΣt{\displaystyle \mathbf {\Sigma } _{t}} are calculated. This can be much quicker and more economical than the compact SVD iftr,{\displaystyle t\ll r,} but requires a completely different toolset of numerical solvers.

In applications that require an approximation to theMoore–Penrose inverse of the matrixM,{\displaystyle \mathbf {M} ,} the smallest singular values ofM{\displaystyle \mathbf {M} } are of interest, which are more challenging to compute compared to the largest ones.

Truncated SVD is employed inlatent semantic indexing.[34]

Norms

[edit]

Ky Fan norms

[edit]

The sum of thek{\displaystyle k} largest singular values ofM{\displaystyle \mathbf {M} } is amatrix norm, theKy Fank{\displaystyle k}-norm ofM.{\displaystyle \mathbf {M} .}[35]

The first of the Ky Fan norms, the Ky Fan 1-norm, is the same as theoperator norm ofM{\displaystyle \mathbf {M} } as a linear operator with respect to the Euclidean norms ofKm{\displaystyle K^{m}} andKn.{\displaystyle K^{n}.} In other words, the Ky Fan 1-norm is the operator norm induced by the standard2{\displaystyle \ell ^{2}} Euclidean inner product. For this reason, it is also called the operator 2-norm. One can easily verify the relationship between the Ky Fan 1-norm and singular values. It is true in general, for a bounded operatorM{\displaystyle \mathbf {M} } on (possibly infinite-dimensional) Hilbert spaces

M=MM12{\displaystyle \|\mathbf {M} \|=\|\mathbf {M} ^{*}\mathbf {M} \|^{\frac {1}{2}}}

But, in the matrix case,(MM)1/2{\displaystyle (\mathbf {M} ^{*}\mathbf {M} )^{1/2}} is anormal matrix, soMM1/2{\displaystyle \|\mathbf {M} ^{*}\mathbf {M} \|^{1/2}} is the largest eigenvalue of(MM)1/2,{\displaystyle (\mathbf {M} ^{*}\mathbf {M} )^{1/2},} i.e. the largest singular value ofM.{\displaystyle \mathbf {M} .}

The last of the Ky Fan norms, the sum of all singular values, is thetrace norm (also known as the 'nuclear norm'), defined byM=Tr(MM)1/2{\displaystyle \|\mathbf {M} \|=\operatorname {Tr} (\mathbf {M} ^{*}\mathbf {M} )^{1/2}} (the eigenvalues ofMM{\displaystyle \mathbf {M} ^{*}\mathbf {M} } are the squares of the singular values).

Hilbert–Schmidt norm

[edit]

The singular values are related to another norm on the space of operators. Consider theHilbert–Schmidt inner product on then×n{\displaystyle n\times n} matrices, defined by

M,N=tr(NM).{\displaystyle \langle \mathbf {M} ,\mathbf {N} \rangle =\operatorname {tr} \left(\mathbf {N} ^{*}\mathbf {M} \right).}

So the induced norm is

M=M,M=tr(MM).{\displaystyle \|\mathbf {M} \|={\sqrt {\langle \mathbf {M} ,\mathbf {M} \rangle }}={\sqrt {\operatorname {tr} \left(\mathbf {M} ^{*}\mathbf {M} \right)}}.}

Since the trace is invariant under unitary equivalence, this shows

M=|iσi2{\displaystyle \|\mathbf {M} \|={\sqrt {{\vphantom {\bigg |}}\sum _{i}\sigma _{i}^{2}}}}

whereσi{\displaystyle \sigma _{i}} are the singular values ofM.{\displaystyle \mathbf {M} .} This is called theFrobenius norm,Schatten 2-norm, orHilbert–Schmidt norm ofM.{\displaystyle \mathbf {M} .} Direct calculation shows that the Frobenius norm ofM=(mij){\displaystyle \mathbf {M} =(m_{ij})} coincides with:

|ij|mij|2.{\displaystyle {\sqrt {{\vphantom {\bigg |}}\sum _{ij}|m_{ij}|^{2}}}.}

In addition, the Frobenius norm and the trace norm (the nuclear norm) are special cases of theSchatten norm.

Variations and generalizations

[edit]

Scale-invariant SVD

[edit]

The singular values of a matrixA{\displaystyle \mathbf {A} } are uniquely defined and are invariant with respect to left and/or right unitary transformations ofA.{\displaystyle \mathbf {A} .} In other words, the singular values ofUAV,{\displaystyle \mathbf {U} \mathbf {A} \mathbf {V} ,} for unitary matricesU{\displaystyle \mathbf {U} } andV,{\displaystyle \mathbf {V} ,} are equal to the singular values ofA.{\displaystyle \mathbf {A} .} This is an important property for applications in which it is necessary to preserve Euclidean distances and invariance with respect to rotations.

The Scale-Invariant SVD, or SI-SVD,[36] is analogous to the conventional SVD except that its uniquely-determined singular values are invariant with respect to diagonal transformations ofA.{\displaystyle \mathbf {A} .} In other words, the singular values ofDAE,{\displaystyle \mathbf {D} \mathbf {A} \mathbf {E} ,} for invertible diagonal matricesD{\displaystyle \mathbf {D} } andE,{\displaystyle \mathbf {E} ,} are equal to the singular values ofA.{\displaystyle \mathbf {A} .} This is an important property for applications for which invariance to the choice of units on variables (e.g., metric versus imperial units) is needed.

Bounded operators on Hilbert spaces

[edit]

The factorizationM=UΣV{\displaystyle \mathbf {M} =\mathbf {U} \mathbf {\Sigma } \mathbf {V} ^{*}} can be extended to abounded operatorM{\displaystyle \mathbf {M} } on a separable Hilbert spaceH.{\displaystyle H.} Namely, for any bounded operatorM,{\displaystyle \mathbf {M} ,} there exist apartial isometryU,{\displaystyle \mathbf {U} ,} a unitaryV,{\displaystyle \mathbf {V} ,} a measure space(X,μ),{\displaystyle (X,\mu ),} and a non-negative measurablef{\displaystyle f} such that

M=UTfV{\displaystyle \mathbf {M} =\mathbf {U} T_{f}\mathbf {V} ^{*}}

whereTf{\displaystyle T_{f}} is themultiplication byf{\displaystyle f} onL2(X,μ).{\displaystyle L^{2}(X,\mu ).}

This can be shown by mimicking the linear algebraic argument for the matrix case above.VTfV{\displaystyle \mathbf {V} T_{f}\mathbf {V} ^{*}} is the unique positive square root ofMM,{\displaystyle \mathbf {M} ^{*}\mathbf {M} ,} as given by theBorel functional calculus forself-adjoint operators. The reason whyU{\displaystyle \mathbf {U} } need not be unitary is that, unlike the finite-dimensional case, given an isometryU1{\displaystyle U_{1}} with nontrivial kernel, a suitableU2{\displaystyle U_{2}} may not be found such that

[U1U2]{\displaystyle {\begin{bmatrix}U_{1}\\U_{2}\end{bmatrix}}}

is a unitary operator.

As for matrices, the singular value factorization is equivalent to thepolar decomposition for operators: we can simply write

M=UVVTfV{\displaystyle \mathbf {M} =\mathbf {U} \mathbf {V} ^{*}\cdot \mathbf {V} T_{f}\mathbf {V} ^{*}}

and notice thatUV{\displaystyle \mathbf {U} \mathbf {V} ^{*}} is still a partial isometry whileVTfV{\displaystyle \mathbf {V} T_{f}\mathbf {V} ^{*}} is positive.

Singular values and compact operators

[edit]

The notion of singular values and left/right-singular vectors can be extended tocompact operator on Hilbert space as they have a discrete spectrum. IfT{\displaystyle T} is compact, every non-zeroλ{\displaystyle \lambda } in its spectrum is an eigenvalue. Furthermore, a compact self-adjoint operator can be diagonalized by its eigenvectors. IfM{\displaystyle \mathbf {M} } is compact, so isMM{\displaystyle \mathbf {M} ^{*}\mathbf {M} }. Applying the diagonalization result, the unitary image of its positive square rootTf{\displaystyle T_{f}} has a set of orthonormal eigenvectors{ei}{\displaystyle \{e_{i}\}} corresponding to strictly positive eigenvalues{σi}{\displaystyle \{\sigma _{i}\}}. For anyψ{\displaystyle \psi } inH,{\displaystyle H,}

Mψ=UTfVψ=iUTfVψ,UeiUei=iσiψ,VeiUei,{\displaystyle \mathbf {M} \psi =\mathbf {U} T_{f}\mathbf {V} ^{*}\psi =\sum _{i}\left\langle \mathbf {U} T_{f}\mathbf {V} ^{*}\psi ,\mathbf {U} e_{i}\right\rangle \mathbf {U} e_{i}=\sum _{i}\sigma _{i}\left\langle \psi ,\mathbf {V} e_{i}\right\rangle \mathbf {U} e_{i},}

where the series converges in the norm topology onH.{\displaystyle H.} Notice how this resembles the expression from the finite-dimensional case.σi{\displaystyle \sigma _{i}} are called the singular values ofM.{\displaystyle \mathbf {M} .}{Uei}{\displaystyle \{\mathbf {U} e_{i}\}} (resp.{Vei}{\displaystyle \{\mathbf {V} e_{i}\}}) can be considered the left-singular (resp. right-singular) vectors ofM.{\displaystyle \mathbf {M} .}

Compact operators on a Hilbert space are the closure offinite-rank operators in the uniform operator topology. The above series expression gives an explicit such representation. An immediate consequence of this is:

Theorem.M{\displaystyle \mathbf {M} } is compact if and only ifMM{\displaystyle \mathbf {M} ^{*}\mathbf {M} } is compact.

History

[edit]

The singular value decomposition was originally developed bydifferential geometers, who wished to determine whether a realbilinear form could be made equal to another by independent orthogonal transformations of the two spaces it acts on.Eugenio Beltrami andCamille Jordan discovered independently, in 1873 and 1874 respectively, that the singular values of the bilinear forms, represented as a matrix, form acomplete set ofinvariants for bilinear forms under orthogonal substitutions.James Joseph Sylvester also arrived at the singular value decomposition for real square matrices in 1889, apparently independently of both Beltrami and Jordan. Sylvester called the singular values thecanonical multipliers of the matrixA.{\displaystyle \mathbf {A} .} The fourth mathematician to discover the singular value decomposition independently isAutonne in 1915, who arrived at it via thepolar decomposition. The first proof of the singular value decomposition for rectangular and complex matrices seems to be byCarl Eckart andGale J. Young in 1936;[37] they saw it as a generalization of theprincipal axis transformation forHermitian matrices.

In 1907,Erhard Schmidt defined an analog of singular values forintegral operators (which are compact, under some weak technical assumptions); it seems he was unaware of the parallel work on singular values of finite matrices. This theory was further developed byÉmile Picard in 1910, who is the first to call the numbersσk{\displaystyle \sigma _{k}}singular values (or in French,valeurs singulières).

Practical methods for computing the SVD date back toKogbetliantz in 1954–1955 andHestenes in 1958,[38] resembling closely theJacobi eigenvalue algorithm, which uses plane rotations orGivens rotations. However, these were replaced by the method ofGene Golub andWilliam Kahan published in 1965,[26] which usesHouseholder transformations or reflections. In 1970, Golub andChristian Reinsch published a variant of the Golub/Kahan algorithm[39] that is still the one most-used today.

See also

[edit]

Notes

[edit]
  1. ^Although, it was later found to have been known to earlier authors; seeStewart (1993).
  2. ^To see this, we just have to notice thatTr(V2MMV2)=MV22,{\textstyle \operatorname {Tr} (\mathbf {V} _{2}^{*}\mathbf {M} ^{*}\mathbf {M} \mathbf {V} _{2})=\|\mathbf {M} \mathbf {V} _{2}\|^{2},} and remember thatA=0A=0{\displaystyle \|A\|=0\Leftrightarrow A=0}.

Footnotes

[edit]
  1. ^Holmes, Mark (2023).Introduction to Scientific Computing and Data Analysis, 2nd Ed. Springer.ISBN 978-3-031-22429-4.
  2. ^DeAngelis, G. C.; Ohzawa, I.; Freeman, R. D. (October 1995). "Receptive-field dynamics in the central visual pathways".Trends Neurosci.18 (10):451–8.doi:10.1016/0166-2236(95)94496-R.PMID 8545912.S2CID 12827601.
  3. ^Depireux, D. A.; Simon, J. Z.; Klein, D. J.; Shamma, S. A. (March 2001). "Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex".J. Neurophysiol.85 (3):1220–34.doi:10.1152/jn.2001.85.3.1220.PMID 11247991.
  4. ^The Singular Value Decomposition in Symmetric (Lowdin) Orthogonalization and Data Compression
  5. ^Golub & Van Loan (1996), §12.4.
  6. ^abHastie, Tibshirani & Friedman (2009), pp. 535–536.
  7. ^Sahidullah, Md.; Kinnunen, Tomi (March 2016)."Local spectral variability features for speaker verification".Digital Signal Processing.50:1–11.Bibcode:2016DSP....50....1S.doi:10.1016/j.dsp.2015.10.011.
  8. ^Mademlis, Ioannis; Tefas, Anastasios; Pitas, Ioannis (2018). "Regularized SVD-Based Video Frame Saliency for Unsupervised Activity Video Summarization".2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. pp. 2691–2695.doi:10.1109/ICASSP.2018.8462274.ISBN 978-1-5386-4658-8.S2CID 52286352.
  9. ^Alter, O.; Brown, P. O.; Botstein, D. (September 2000)."Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling".PNAS.97 (18):10101–10106.Bibcode:2000PNAS...9710101A.doi:10.1073/pnas.97.18.10101.PMC 27718.PMID 10963673.
  10. ^Alter, O.; Golub, G. H. (November 2004)."Integrative Analysis of Genome-Scale Data by Using Pseudoinverse Projection Predicts Novel Correlation Between DNA Replication and RNA Transcription".PNAS.101 (47):16577–16582.Bibcode:2004PNAS..10116577A.doi:10.1073/pnas.0406767101.PMC 534520.PMID 15545604.
  11. ^Alter, O.; Golub, G. H. (August 2006)."Singular Value Decomposition of Genome-Scale mRNA Lengths Distribution Reveals Asymmetry in RNA Gel Electrophoresis Band Broadening".PNAS.103 (32):11828–11833.Bibcode:2006PNAS..10311828A.doi:10.1073/pnas.0604756103.PMC 1524674.PMID 16877539.
  12. ^Bertagnolli, N. M.; Drake, J. A.; Tennessen, J. M.; Alter, O. (November 2013)."SVD Identifies Transcript Length Distribution Functions from DNA Microarray Data and Reveals Evolutionary Forces Globally Affecting GBM Metabolism".PLOS ONE.8 (11) e78913.Bibcode:2013PLoSO...878913B.doi:10.1371/journal.pone.0078913.PMC 3839928.PMID 24282503.Highlight.
  13. ^Edelman, Alan (1992)."On the distribution of a scaled condition number"(PDF).Math. Comp.58 (197):185–190.Bibcode:1992MaCom..58..185E.doi:10.1090/S0025-5718-1992-1106966-2.
  14. ^Shen, Jianhong (Jackie) (2001)."On the singular values of Gaussian random matrices".Linear Alg. Appl.326 (1–3):1–14.doi:10.1016/S0024-3795(00)00322-0.
  15. ^Walton, S.; Hassan, O.; Morgan, K. (2013)."Reduced order modelling for unsteady fluid flow using proper orthogonal decomposition and radial basis functions".Applied Mathematical Modelling.37 (20–21):8930–8945.doi:10.1016/j.apm.2013.04.025.
  16. ^Setyawati, Y.; Ohme, F.; Khan, S. (2019). "Enhancing gravitational waveform model through dynamic calibration".Physical Review D.99 (2) 024010.arXiv:1810.07060.Bibcode:2019PhRvD..99b4010S.doi:10.1103/PhysRevD.99.024010.S2CID 118935941.
  17. ^Sarwar, Badrul; Karypis, George;Konstan, Joseph A. &Riedl, John T. (2000).Application of Dimensionality Reduction in Recommender System – A Case Study (Technical report 00-043).University of Minnesota.hdl:11299/215429.
  18. ^Bosagh Zadeh, Reza; Carlsson, Gunnar (2013). "Dimension Independent Matrix Square Using MapReduce".arXiv:1304.1467 [cs.DS].
  19. ^Fanaee Tork, Hadi; Gama, João (September 2014). "Eigenspace method for spatiotemporal hotspot detection".Expert Systems.32 (3):454–464.arXiv:1406.3506.Bibcode:2014arXiv1406.3506F.doi:10.1111/exsy.12088.S2CID 15476557.
  20. ^Fanaee Tork, Hadi; Gama, João (May 2015). "EigenEvent: An Algorithm for Event Detection from Complex Data Streams in Syndromic Surveillance".Intelligent Data Analysis.19 (3):597–616.arXiv:1406.3496.doi:10.3233/IDA-150734.S2CID 17966555.
  21. ^Muralidharan, Vivek; Howell, Kathleen (2023). "Stretching directions in cislunar space: Applications for departures and transfer design".Astrodynamics.7 (2):153–178.Bibcode:2023AsDyn...7..153M.doi:10.1007/s42064-022-0147-z.S2CID 252637213.
  22. ^Muralidharan, Vivek; Howell, Kathleen (2022). "Leveraging stretching directions for stationkeeping in Earth-Moon halo orbits".Advances in Space Research.69 (1):620–646.Bibcode:2022AdSpR..69..620M.doi:10.1016/j.asr.2021.10.028.S2CID 239490016.
  23. ^Albers, Jasper; Kurth, Anno; Gutzen, Robin; Morales-Gregorio, Aitor; Denker, Michael; Gruen, Sonja; van Albada, Sacha; Diesmann, Markus (2025). "Assessing the Similarity of Real Matrices with Arbitrary Shape".PRX Life.3 (2) 023005.arXiv:2403.17687.Bibcode:2025PRXL....3b3005A.doi:10.1103/PRXLife.3.023005.
  24. ^Rijk, P.P.M. de (1989). "A one-sided Jacobi algorithm for computing the singular value decomposition on a vector computer".SIAM J. Sci. Stat. Comput.10 (2):359–371.doi:10.1137/0910023.
  25. ^abcTrefethen & Bau (1997), Lecture 31.
  26. ^abGolub & Kahan (1965).
  27. ^Anderson et al. (1999),'DBDSQR' source.
  28. ^Demmel & Kahan (1990).
  29. ^Anderson et al. (1999),'DGESVD' source.
  30. ^GSL Team (2007).
  31. ^Golub & Van Loan (1996), §8.6.3.
  32. ^mathworks.co.kr/matlabcentral/fileexchange/12674-simple-svd
  33. ^Demmel, James (2000)."Decompositions".Templates for the Solution of Algebraic Eigenvalue Problems. By Bai, Zhaojun; Demmel, James; Dongarra, Jack J.; Ruhe, Axel; van der Vorst, Henk A. Society for Industrial and Applied Mathematics.doi:10.1137/1.9780898719581.ISBN 978-0-89871-471-5.
  34. ^Chicco, D; Masseroli, M (2015). "Software suite for gene and protein annotation prediction and similarity search".IEEE/ACM Transactions on Computational Biology and Bioinformatics.12 (4):837–843.Bibcode:2015ITCBB..12..837C.doi:10.1109/TCBB.2014.2382127.hdl:11311/959408.PMID 26357324.S2CID 14714823.
  35. ^Fan, Ky. (1951)."Maximum properties and inequalities for the eigenvalues of completely continuous operators".Proceedings of the National Academy of Sciences of the United States of America.37 (11):760–766.Bibcode:1951PNAS...37..760F.doi:10.1073/pnas.37.11.760.PMC 1063464.PMID 16578416.
  36. ^Uhlmann, Jeffrey (2018).A Generalized Matrix Inverse that is Consistent with Respect to Diagonal Transformations(PDF). SIAM Journal on Matrix Analysis. Vol. 239. pp. 781–800. Archived fromthe original(PDF) on 17 June 2019.
  37. ^Eckart, C.; Young, G. (1936). "The approximation of one matrix by another of lower rank".Psychometrika.1 (3):211–8.doi:10.1007/BF02288367.S2CID 10163399.
  38. ^Hestenes, M. R. (1958). "Inversion of Matrices by Biorthogonalization and Related Results".Journal of the Society for Industrial and Applied Mathematics.6 (1):51–90.doi:10.1137/0106005.JSTOR 2098862.MR 0092215.
  39. ^Golub & Reinsch (1970).

References

[edit]

h

External links

[edit]
Key concepts
Problems
Hardware
Software
Spaces
Properties
Theorems
Operators
Algebras
Open problems
Applications
Advanced topics
Basic concepts
Main results
Special Elements/Operators
Spectrum
Decomposition
Spectral Theorem
Special algebras
Finite-Dimensional
Generalizations
Miscellaneous
Examples
Applications
Retrieved from "https://en.wikipedia.org/w/index.php?title=Singular_value_decomposition&oldid=1318604205"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp