Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Kaczmarz method

From Wikipedia, the free encyclopedia
Algorithm

TheKaczmarz method orKaczmarz's algorithm is aniterative algorithm for solvinglinear equation systemsAx=b{\displaystyle Ax=b}. It was first discovered by the Polish mathematicianStefan Kaczmarz,[1] and was rediscovered in the field of image reconstruction from projections byRichard Gordon, Robert Bender, andGabor Herman in 1970, where it is called theAlgebraic Reconstruction Technique (ART).[2] ART includes the positivity constraint, making it nonlinear.[3]

The Kaczmarz method is applicable to any linear system of equations, but its computational advantage relative to other methods depends on the system beingsparse. It has been demonstrated to be superior, in some biomedical imaging applications, to other methods such as thefiltered backprojection method.[4]

It has many applications ranging fromcomputed tomography (CT) tosignal processing. It can be obtained also by applying to the hyperplanes, described by the linear system, the method of successiveprojections onto convex sets (POCS).[5][6]

Algorithm 1: Kaczmarz algorithm

[edit]
Kaczmarz iteration example.

The original Kaczmarz algorithm solves a complex-valuedsystem of linear equationsAx=b{\displaystyle Ax=b}.

Letai{\displaystyle a_{i}} be theconjugate transpose of thei{\displaystyle i}-th row ofA{\displaystyle A}. Initializex0{\displaystyle x_{0}} to be an arbitrary complex-valued initial approximation. (e.g.x0=0{\displaystyle x_{0}=0}.) Fork=0,1,{\displaystyle k=0,1,\ldots } compute:

xk+1=xk+bikaik,xkaik2aik{\displaystyle x_{k+1}=x_{k}+{\frac {b_{i_{k}}-\langle a_{i_{k}},x_{k}\rangle }{\|a_{i_{k}}\|^{2}}}a_{i_{k}}}1

wherei0,i1,i2,{\displaystyle i_{0},i_{1},i_{2},\dots } iterates over the rows ofA{\displaystyle A} in any order, deterministic or random. It is only necessary that each row is iterated infinitely often.

When we are in the space of real vectors, the Kaczmarz iteration has a clear geometric meaning. It means projectingxk{\textstyle x_{k}} orthogonally to thehyperplane defined by{x:ai,x=bi}{\textstyle \{x:\langle a_{i},x\rangle =b_{i}\}}. In this interpretation, it is clear that if the Kaczmarz iteration converges, then it must converge to one of the solutions toAx=b{\textstyle Ax=b}.

A more general algorithm can be defined using arelaxation parameterλk{\displaystyle \lambda ^{k}}

xk+1=xk+λkbikaik,xkaik2aik{\displaystyle x_{k+1}=x_{k}+\lambda _{k}{\frac {b_{i_{k}}-\langle a_{i_{k}},x_{k}\rangle }{\|a_{i_{k}}\|^{2}}}a_{i_{k}}}

If the system has a solution,xk{\displaystyle x_{k}} converges to the minimum-norm solution, provided that the iterations start with the zero vector. If the rows are iterated in order, andλk=1{\displaystyle \lambda _{k}=1}, then convergence is exponential.

Proof

LetV{\textstyle V} be the space of solutions toAx=b{\textstyle Ax=b}, then since at each Kaczmarz iteration,xk+1xk{\textstyle x_{k+1}-x_{k}} is a vector parallel toai{\textstyle a_{i}}, the final solution is a linear sum of{ai}i{\textstyle \{a_{i}\}_{i}}.

Now,V{\textstyle V} is parallel to the kernel ofA{\textstyle A}, so it is perpendicular to everyai{\textstyle a_{i}}, so the finalx{\textstyle x} is perpendicular toV{\textstyle V}, meaning it is the minimal-norm solution.

Letx{\textstyle x_{*}} be the minimal-norm solution. Ifxk{\textstyle x_{k}} is notx{\textstyle x_{*}}, then after one iteration through all rows ofA{\textstyle A}, it must have been orthogonally projected at least once, so thatxk+nx2cosθxkx2{\textstyle \|x_{k+n}-x_{*}\|_{2}\leq \cos \theta \|x_{k}-x_{*}\|_{2}}, whereθ{\textstyle \theta } is the largest acute angle between the hyperplanes defined by{x:a1,x=b1},{x:a2,x=b2},{\textstyle \{x:\langle a_{1},x\rangle =b_{1}\},\{x:\langle a_{2},x\rangle =b_{2}\},\dots }.

There are versions of the method that converge to a regularized weighted least squares solution when applied to a system of inconsistent equations and, at least as far as initial behavior is concerned, at a lesser cost than other iterative methods, such as theconjugate gradient method.[7]

Algorithm 2: Randomized Kaczmarz algorithm

[edit]

In 2009, a randomized version of the Kaczmarz method foroverdetermined linear systems was introduced by Thomas Strohmer and Roman Vershynin[8] in which thei-th equation is selected randomly with probability proportional toai2.{\displaystyle \|a_{i}\|^{2}.}

This method can be seen as a particular case ofstochastic gradient descent.[9]

Under such circumstancesxk{\displaystyle x_{k}} converges exponentially fast to the solution ofAx=b,{\displaystyle Ax=b,} and the rate of convergence depends only on the scaledcondition numberκ(A){\displaystyle \kappa (A)}.

Theorem. Letx{\displaystyle x} be the solution ofAx=b.{\displaystyle Ax=b.} Then Algorithm 2 converges tox{\displaystyle x} in expectation, with the average error:
Exkx2(1κ(A)2)kx0x2.{\displaystyle \mathbb {E} \|x_{k}-x\|^{2}\leq \left(1-\kappa (A)^{-2}\right)^{k}\cdot \|x_{0}-x\|^{2}.}

Proof

[edit]

We have

zCn:j=1m|z,aj|2z2A12{\displaystyle \forall z\in \mathbb {C} ^{n}:\quad \sum _{j=1}^{m}|\langle z,a_{j}\rangle |^{2}\geq {\frac {\|z\|^{2}}{\|A^{-1}\|^{2}}}}2

Using

A2=j=1maj2{\displaystyle \|A\|^{2}=\sum _{j=1}^{m}\|a_{j}\|^{2}}

we can write (2) as

zCn:j=1maj2A2|z,ajaj|2κ(A)2z2{\displaystyle \forall z\in \mathbb {C} ^{n}:\quad \sum _{j=1}^{m}{\frac {\|a_{j}\|^{2}}{\|A\|^{2}}}\left|\left\langle z,{\frac {a_{j}}{\|a_{j}\|}}\right\rangle \right|^{2}\geq \kappa (A)^{-2}{\|z\|^{2}}}3

The main point of the proof is to view the left hand side in (3) as an expectation of some random variable. Namely, recall that the solution space of thejth{\displaystyle j-th} equation ofAx=b{\displaystyle Ax=b} is the hyperplane

{y:y,aj=bj},{\displaystyle \{y:\langle y,a_{j}\rangle =b_{j}\},}

whose normal isajaj2.{\displaystyle {\tfrac {a_{j}}{\|a_{j}\|^{2}}}.} Define a random vectorZ whose values are the normals to all the equations ofAx=b{\displaystyle Ax=b}, with probabilities as in our algorithm:

Z=ajaj{\displaystyle Z={\frac {a_{j}}{\|a_{j}\|}}} with probabilityaj2A2j=1,,m{\displaystyle {\frac {\|a_{j}\|^{2}}{\|A\|^{2}}}\qquad \qquad \qquad j=1,\ldots ,m}

Then (3) says that

zCn:E|z,Z|2κ(A)2z2{\displaystyle \forall z\in \mathbb {C} ^{n}:\quad \mathbb {E} |\langle z,Z\rangle |^{2}\geq \kappa (A)^{-2}{\|z\|^{2}}}4

The orthogonal projectionP{\displaystyle P} onto the solution space of a random equation ofAx=b{\displaystyle Ax=b} is given byPz=zzx,ZZ.{\displaystyle Pz=z-\langle z-x,Z\rangle Z.}

Now we are ready to analyze our algorithm. We want to show that the errorxkx2{\displaystyle {\|x_{k}-x\|^{2}}} reduces at each step in average (conditioned on the previous steps) by at least the factor of(1κ(A)2).{\displaystyle (1-\kappa (A)^{-2}).} The next approximationxk{\displaystyle x_{k}} is computed fromxk1{\displaystyle x_{k-1}} asxk=Pkxk1,{\displaystyle x_{k}=P_{k}x_{k-1},} whereP1,P2,{\displaystyle P_{1},P_{2},\ldots } are independent realizations of the random projectionP.{\displaystyle P.} The vectorxk1xk{\displaystyle x_{k-1}-x_{k}} is in the kernel ofPk.{\displaystyle P_{k}.} It is orthogonal to the solution space of the equation onto whichPk{\displaystyle P_{k}} projects, which contains the vectorxkx{\displaystyle x_{k}-x} (recall thatx{\displaystyle x} is the solution to all equations). The orthogonality of these two vectors then yields

xkx2=xk1x2xk1xk2.{\displaystyle \|x_{k}-x\|^{2}=\|x_{k-1}-x\|^{2}-\|x_{k-1}-x_{k}\|^{2}.}

To complete the proof, we have to boundxk1xk2{\displaystyle \|x_{k-1}-x_{k}\|^{2}} from below. By the definition ofxk{\displaystyle x_{k}}, we have

xk1xk=xk1x,Zk{\displaystyle \|x_{k-1}-x_{k}\|=\langle x_{k-1}-x,Z_{k}\rangle }

whereZ1,Z2,{\displaystyle Z_{1},Z_{2},\ldots } are independent realizations of the random vectorZ.{\displaystyle Z.}

Thus

xkx2(1|xk1xxk1x,Zk|2)xk1x2.{\displaystyle \|x_{k}-x\|^{2}\leq \left(1-\left|\left\langle {\frac {x_{k-1}-x}{\|x_{k-1}-x\|}},Z_{k}\right\rangle \right|^{2}\right){\|x_{k-1}-x\|^{2}}.}

Now we take the expectation of both sides conditional upon the choice of the random vectorsZ1,,Zk1{\displaystyle Z_{1},\ldots ,Z_{k-1}} (hence we fix the choice of the random projectionsP1,,Pk1{\displaystyle P_{1},\ldots ,P_{k-1}} and thus the random vectorsx1,,xk1{\displaystyle x_{1},\ldots ,x_{k-1}} and we average over the random vectorZk{\displaystyle Z_{k}}). Then

EZ1,,Zk1xkx2=(1EZ1,,Zk1,Zk|xk1xxk1x,Zk|2)xk1x2.{\displaystyle \mathbb {E} _{Z_{1},\ldots ,Z_{k-1}}{\|x_{k}-x\|^{2}}=\left(1-\mathbb {E} _{Z_{1},\ldots ,Z_{k-1},Z_{k}}\left|\left\langle {\frac {x_{k-1}-x}{\|x_{k-1}-x\|}},Z_{k}\right\rangle \right|^{2}\right){\|x_{k-1}-x\|^{2}}.}

By (4) and the independence,

EZ1,,Zk1xkx2(1κ(A)2)xk1x2.{\displaystyle \mathbb {E} _{Z_{1},\ldots ,Z_{k-1}}{\|x_{k}-x\|^{2}}\leq (1-\kappa (A)^{-2}){\|x_{k-1}-x\|^{2}}.}

Taking the full expectation of both sides, we conclude that

Exkx2(1κ(A)2)Exk1x2.{\displaystyle \mathbb {E} \|x_{k}-x\|^{2}\leq (1-\kappa (A)^{-2})\mathbb {E} {\|x_{k-1}-x\|^{2}}.\blacksquare }

The superiority of this selection was illustrated with the reconstruction of a bandlimited function from its nonuniformly spaced sampling values. However, it has been pointed out[10] that the reported success by Strohmer and Vershynin depends on the specific choices that were made there in translating the underlying problem, whose geometrical nature is tofind a common point of a set of hyperplanes, into a system of algebraic equations. There will always be legitimate algebraic representations of the underlying problem for which the selection method in[8] will perform in an inferior manner.[8][10][11]

The Kaczmarz iteration (1) has a purely geometric interpretation: the algorithm successively projects the current iterate onto the hyperplane defined by the next equation. Hence, any scaling of the equations is irrelevant; it can also be seen from (1) that any (nonzero) scaling of the equations cancels out. Thus, in RK, one can useai{\displaystyle \|a_{i}\|} or any other weights that may be relevant. Specifically, in the above-mentioned reconstruction example, the equations were chosen with probability proportional to the average distance of each sample point from its two nearest neighbors — a concept introduced byFeichtinger andGröchenig. For additional progress on this topic, see,[12][13] and the references therein.

Algorithm 3: Gower-Richtarik algorithm

[edit]

In 2015, Robert M. Gower andPeter Richtarik[14] developed a versatile randomized iterative method for solving a consistent system of linear equationsAx=b{\displaystyle Ax=b} which includes the randomized Kaczmarz algorithm as a special case. Other special cases include randomizedcoordinate descent, randomized Gaussian descent and randomized Newton method. Block versions and versions withimportance sampling of all these methods also arise as special cases. The method is shown to enjoy exponential rate decay (in expectation) - also known as linear convergence, under very mild conditions on the way randomness enters the algorithm. The Gower-Richtarik method is the first algorithm uncovering a "sibling" relationship between these methods, some of which were independently proposed before, while many of which were new.

Insights about Randomized Kaczmarz

[edit]

Interesting new insights about the randomized Kaczmarz method that can be gained from the analysis of the method include:

  • The general rate of the Gower-Richtarik algorithm precisely recovers the rate of the randomized Kaczmarz method in the special case when it reduced to it.
  • The choice of probabilities for which the randomized Kaczmarz algorithm was originally formulated and analyzed (probabilities proportional to the squares of the row norms) is not optimal. Optimal probabilities are the solution of a certain semidefinite program. The theoretical complexity of randomized Kaczmarz with the optimal probabilities can be arbitrarily better than the complexity for the standard probabilities. However, the amount by which it is better depends on the matrixA{\displaystyle A}. There are problems for which the standard probabilities are optimal.
  • When applied to a system with matrixA{\displaystyle A} which is positive definite, Randomized Kaczmarz method is equivalent to the Stochastic Gradient Descent (SGD) method (with a very special stepsize) for minimizing the strongly convex quadratic functionf(x)=12xTAxbTx.{\displaystyle f(x)={\tfrac {1}{2}}x^{T}Ax-b^{T}x.} Note that sincef{\displaystyle f} is convex, the minimizers off{\displaystyle f} must satisfyf(x)=0{\displaystyle \nabla f(x)=0}, which is equivalent toAx=b.{\displaystyle Ax=b.} The "special stepsize" is the stepsize which leads to a point which in the one-dimensional line spanned by the stochastic gradient minimizes the Euclidean distance from the unknown(!) minimizer off{\displaystyle f}, namely, fromx=A1b.{\displaystyle x^{*}=A^{-1}b.} This insight is gained from a dual view of the iterative process (below described as "Optimization Viewpoint: Constrain and Approximate").

Six Equivalent Formulations

[edit]

The Gower-Richtarik method enjoys six seemingly different but equivalent formulations, shedding additional light on how to interpret it (and, as a consequence, how to interpret its many variants, including randomized Kaczmarz):

  • 1. Sketching viewpoint: Sketch & Project
  • 2. Optimization viewpoint: Constrain and Approximate
  • 3. Geometric viewpoint: Random Intersect
  • 4. Algebraic viewpoint 1: Random Linear Solve
  • 5. Algebraic viewpoint 2: Random Update
  • 6. Analytic viewpoint: Random Fixed Point

We now describe some of these viewpoints. The method depends on 2 parameters:

xB=(x,xB)12,{\displaystyle \|x\|_{B}=\left(\langle x,x\rangle _{B}\right)^{\frac {1}{2}},}

1. Sketch and Project

[edit]

Given previous iteratexk,{\displaystyle x^{k},} the new pointxk+1{\displaystyle x^{k+1}} is computed by drawing a random matrixS{\displaystyle S} (in an iid fashion from some fixed distribution), and setting

xk+1=arg minxxxkB subject to STAx=STb.{\displaystyle x^{k+1}={\underset {x}{\operatorname {arg\ min} }}\|x-x^{k}\|_{B}{\text{ subject to }}S^{T}Ax=S^{T}b.}

That is,xk+1{\displaystyle x^{k+1}} is obtained as the projection ofxk{\displaystyle x^{k}} onto the randomly sketched systemSTAx=STb{\displaystyle S^{T}Ax=S^{T}b}. The idea behind this method is to pickS{\displaystyle S} in such a way that a projection onto the sketched system is substantially simpler than the solution of the original systemAx=b{\displaystyle Ax=b}. Randomized Kaczmarz method is obtained by pickingB{\displaystyle B} to be theidentity matrix, andS{\displaystyle S} to be theith{\displaystyle i^{th}} unitcoordinate vector with probabilitypi=ai22/AF2.{\displaystyle p_{i}=\|a_{i}\|_{2}^{2}/\|A\|_{F}^{2}.} Different choices ofB{\displaystyle B} andS{\displaystyle S} lead to different variants of the method.

2. Constrain and Approximate

[edit]

A seemingly different but entirely equivalent formulation of the method (obtained via Lagrangian duality) is

xk+1=arg minxxxB subject to x=xk+B1ATSy,{\displaystyle x^{k+1}={\underset {x}{\operatorname {arg\ min} }}\left\|x-x^{*}\right\|_{B}{\text{ subject to }}x=x^{k}+B^{-1}A^{T}Sy,}

wherey{\displaystyle y} is also allowed to vary, and wherex{\displaystyle x^{*}} is any solution of the systemAx=b.{\displaystyle Ax=b.} Hence,xk+1{\displaystyle x^{k+1}} is obtained by first constraining the update to the linear subspace spanned by the columns of the random matrixB1ATS{\displaystyle B^{-1}A^{T}S}, i.e., to

{h:h=B1ATSy,y can vary },{\displaystyle \left\{h:h=B^{-1}A^{T}Sy,\quad y{\text{ can vary }}\right\},}

and then choosing the pointx{\displaystyle x} from this subspace which best approximatesx{\displaystyle x^{*}}. This formulation may look surprising as it seems impossible to perform the approximation step due to the fact thatx{\displaystyle x^{*}} is not known (after all, this is what we are trying the compute!). However, it is still possible to do this, simply becausexk+1{\displaystyle x^{k+1}} computed this way is the same asxk+1{\displaystyle x^{k+1}} computed via the sketch and project formulation and sincex{\displaystyle x^{*}} does not appear there.

5. Random Update

[edit]

The update can also be written explicitly as

xk+1=xkB1ATS(STAB1ATS)ST(Axkb),{\displaystyle x^{k+1}=x^{k}-B^{-1}A^{T}S\left(S^{T}AB^{-1}A^{T}S\right)^{\dagger }S^{T}\left(Ax^{k}-b\right),}

where byM{\displaystyle M^{\dagger }} we denote the Moore-Penrose pseudo-inverse of matrixM{\displaystyle M}. Hence, the method can be written in the formxk+1=xk+hk{\displaystyle x^{k+1}=x^{k}+h^{k}}, wherehk{\displaystyle h^{k}} is arandom update vector.

LettingM=STAB1ATS,{\displaystyle M=S^{T}AB^{-1}A^{T}S,} it can be shown that the systemMy=ST(Axkb){\displaystyle My=S^{T}(Ax^{k}-b)} always has a solutionyk{\displaystyle y^{k}}, and that for all such solutions the vectorxk+1B1ATSyk{\displaystyle x^{k+1}-B^{-1}A^{T}Sy^{k}} is the same. Hence, it does not matter which of these solutions is chosen, and the method can be also written asxk+1=xkB1ATSyk{\displaystyle x^{k+1}=x^{k}-B^{-1}A^{T}Sy^{k}}. The pseudo-inverse leads just to one particular solution. The role of the pseudo-inverse is twofold:

  • It allows the method to be written in the explicit "random update" form as above,
  • It makes the analysis simple through the final, sixth, formulation.

6. Random Fixed Point

[edit]

If we subtractx{\displaystyle x^{*}} from both sides of the random update formula, denote

Z:=ATS(STAB1ATS)STA,{\displaystyle Z:=A^{T}S\left(S^{T}AB^{-1}A^{T}S\right)^{\dagger }S^{T}A,}

and use the fact thatAx=b,{\displaystyle Ax^{*}=b,} we arrive at the last formulation:

xk+1x=(IB1Z)(xkx),{\displaystyle x^{k+1}-x^{*}=\left(I-B^{-1}Z\right)\left(x^{k}-x^{*}\right),}

whereI{\displaystyle I} is the identity matrix. The iteration matrix,IB1Z,{\displaystyle I-B^{-1}Z,} is random, whence the name of this formulation.

Convergence

[edit]

By taking conditional expectations in the 6th formulation (conditional onxk{\displaystyle x^{k}}), we obtain

E[xk+1x|xk]=(IB1E[Z])[xkx].{\displaystyle \mathbb {E} \left.\left[x^{k+1}-x^{*}\right|x^{k}\right]=\left(I-B^{-1}\mathbb {E} [Z]\right)\left[x^{k}-x^{*}\right].}

By taking expectation again, and using the tower property of expectations, we obtain

E[xk+1x]=(IB1E[Z])E[xkx].{\displaystyle \mathbb {E} \left[x^{k+1}-x^{*}\right]=(I-B^{-1}\mathbb {E} [Z])\mathbb {E} \left[x^{k}-x^{*}\right].}

Gower and Richtarik[14] show that

ρ:=IB12E[Z]B12B=λmax(IB1E[Z]),{\displaystyle \rho :=\left\|I-B^{-{\frac {1}{2}}}\mathbb {E} [Z]B^{-{\frac {1}{2}}}\right\|_{B}=\lambda _{\max }\left(I-B^{-1}\mathbb {E} [Z]\right),}

where the matrix norm is defined by

MB:=maxx0MxBxB.{\displaystyle \|M\|_{B}:=\max _{x\neq 0}{\frac {\|Mx\|_{B}}{\|x\|_{B}}}.}

Moreover, without any assumptions onS{\displaystyle S} one has0ρ1.{\displaystyle 0\leq \rho \leq 1.} By taking norms and unrolling the recurrence, we obtain

Theorem [Gower & Richtarik 2015]

[edit]
E[xkx]Bρkx0xB.{\displaystyle \left\|\mathbb {E} \left[x^{k}-x^{*}\right]\right\|_{B}\leq \rho ^{k}\|x^{0}-x^{*}\|_{B}.}

Remark. A sufficient condition for the expected residuals to converge to 0 isρ<1.{\displaystyle \rho <1.} This can be achieved ifA{\displaystyle A} has a full column rank and under very mild conditions onS.{\displaystyle S.} Convergence of the method can be established also without the full column rank assumption in a different way.[15]

It is also possible to show a stronger result:

Theorem [Gower & Richtarik 2015]

[edit]

Theexpected squared norms (rather than norms of expectations) converge at the same rate:

E[xkx]B2ρkx0xB2.{\displaystyle \mathbb {E} \left\|\left[x^{k}-x^{*}\right]\right\|_{B}^{2}\leq \rho ^{k}\left\|x^{0}-x^{*}\right\|_{B}^{2}.}

Remark. This second type of convergence isstronger due to the following identity[14] which holds for any random vectorx{\displaystyle x} and any fixed vectorx{\displaystyle x^{*}}:

E[xx]2=E[xx2]E[xE[x]2].{\displaystyle \left\|\mathbb {E} \left[x-x^{*}\right]\right\|^{2}=\mathbb {E} \left[\left\|x-x^{*}\right\|^{2}\right]-\mathbb {E} \left[\|x-\mathbb {E} [x]\|^{2}\right].}

Convergence of Randomized Kaczmarz

[edit]

We have seen that the randomized Kaczmarz method appears as a special case of the Gower-Richtarik method forB=I{\displaystyle B=I} andS{\displaystyle S} being theith{\displaystyle i^{th}} unit coordinate vector with probabilitypi=ai22/AF2,{\displaystyle p_{i}=\|a_{i}\|_{2}^{2}/\|A\|_{F}^{2},} whereai{\displaystyle a_{i}} is theith{\displaystyle i^{th}} row ofA.{\displaystyle A.} It can be checked by direct calculation that

ρ=IB1E[Z]B=1λmin(ATA)AF2.{\displaystyle \rho =\|I-B^{-1}\mathbb {E} [Z]\|_{B}=1-{\frac {\lambda _{\min }(A^{T}A)}{\|A\|_{F}^{2}}}.}

Further Special Cases

[edit]

Algorithm 4: PLSS-Kaczmarz

[edit]

Since the convergence of the (randomized) Kaczmarz method depends on arate of convergence the method may make slow progress on some practical problems.[10] To ensure finite termination of the method, Johannes Brust andMichael Saunders (academic)[16] have developed a process that generalizes the (randomized) Kaczmarz iteration andterminates in at mostm{\displaystyle m} iterations to a solution for the consistent systemAx=b{\displaystyle Ax=b}. The process is based onDimensionality reduction, or projectionsonto lower dimensional spaces, which is how it derives its name PLSS (Projected Linear Systems Solver). An iteration of PLSS-Kaczmarz can be regarded as the generalization

xk+1=xk+A:,1:kT(A1:k,:A:,1:kT)(b1:kA1:k,:xk){\displaystyle x^{k+1}=x^{k}+A_{:,1:k}^{T}(A_{1:k,:}A_{:,1:k}^{T})^{\dagger }(b_{1:k}-A_{1:k,:}x^{k})}

whereA1:k,:{\displaystyle A_{1:k,:}} is the selection of rows 1 tok{\displaystyle k} and all columns ofA{\displaystyle A}. A randomized version of the method usesk{\displaystyle k}non repeated row indices at each iteration:{i1,,ik1,ik}{\displaystyle \{i_{1},\ldots ,i_{k-1},i_{k}\}} where eachij{\displaystyle i_{j}} is in1,2,...,m{\displaystyle 1,2,...,m}.The iteration converges to a solution whenk=m{\displaystyle k=m}. In particular, sinceA1:m,:=A{\displaystyle A_{1:m,:}=A} it holds that

Axm+1=Axm+AAT(AAT)(bAxm)=b{\displaystyle Ax^{m+1}=Ax^{m}+AA^{T}(AA^{T})^{\dagger }(b-Ax^{m})=b}

and thereforexm+1{\displaystyle x^{m+1}} is a solution to the linear system. The computation of iterates in PLSS-Kaczmarz can be simplified and organized effectively.The resulting algorithm only requires matrix-vector products and has a direct form

algorithm PLSS-Kaczmarzisinput: matrixA right hand sideboutput: solutionx such thatAx=bx := 0,P = [0]forkin1,2,...,mdoa :=A(ik,:)'                   // Select an index ik in 1,...,m without resamplingd :=P' * ac1 :=norm(a)c2 :=norm(d)c3 :=(bik-x'*a)/((c1-c2)*(c1+c2))p := c3*(a - P*(P'*a))P := [ P, p/norm(p) ]           // Append a normalized updatex := x + preturnx

Notes

[edit]
  1. ^Kaczmarz (1937)
  2. ^Gordon, Bender & Herman (1970)
  3. ^Gordon (2011)
  4. ^Herman (2009)
  5. ^Censor & Zenios (1997)
  6. ^Aster, Borchers & Thurber (2004)
  7. ^SeeHerman (2009) and references therein.
  8. ^abcStrohmer & Vershynin (2009)
  9. ^Needell, Srebro & Ward (2015)
  10. ^abcCensor, Herman & Jiang (2009)
  11. ^Strohmer & Vershynin (2009b)
  12. ^Bass & Gröchenig (2013)
  13. ^Gordon (2017)
  14. ^abcGower & Richtarik (2015a)
  15. ^Gower & Richtarik (2015b)
  16. ^Brust & Saunders (2023)

References

[edit]


External links

[edit]
  • [1] A randomized Kaczmarz algorithm with exponential convergence
  • [2][permanent dead link] Comments on the randomized Kaczmarz method
  • [3] Kaczmarz algorithm in training Kolmogorov-Arnold network
Key concepts
Problems
Hardware
Software
Retrieved from "https://en.wikipedia.org/w/index.php?title=Kaczmarz_method&oldid=1306152620"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp