@@ -1420,8 +1420,8 @@ def matrix_rank(M, tol=None):
1420
1420
"""
1421
1421
Return matrix rank of array using SVD method
1422
1422
1423
- Rank of the array is the number of SVD singular values of the
1424
- array that are greater than `tol`.
1423
+ Rank of the array is the number of SVD singular values of the array that are
1424
+ greater than `tol`.
1425
1425
1426
1426
Parameters
1427
1427
----------
@@ -1431,27 +1431,53 @@ def matrix_rank(M, tol=None):
1431
1431
threshold below which SVD values are considered zero. If `tol` is
1432
1432
None, and ``S`` is an array with singular values for `M`, and
1433
1433
``eps`` is the epsilon value for datatype of ``S``, then `tol` is
1434
- set to ``S.max() * eps``.
1434
+ set to ``S.max() *max(M.shape) * eps``.
1435
1435
1436
1436
Notes
1437
1437
-----
1438
- Golub and van Loan [1]_ define "numerical rank deficiency" as using
1439
- tol=eps*S[0] (where S[0] is the maximum singular value and thus the
1440
- 2-norm of the matrix). This is one definition of rank deficiency,
1441
- and the one we use here. When floating point roundoff is the main
1442
- concern, then "numerical rank deficiency" is a reasonable choice. In
1443
- some cases you may prefer other definitions. The most useful measure
1444
- of the tolerance depends on the operations you intend to use on your
1445
- matrix. For example, if your data come from uncertain measurements
1446
- with uncertainties greater than floating point epsilon, choosing a
1447
- tolerance near that uncertainty may be preferable. The tolerance
1448
- may be absolute if the uncertainties are absolute rather than
1449
- relative.
1438
+ The default threshold to detect rank deficiency is a test on the magnitude
1439
+ of the singular values of `M`. By default, we identify singular values less
1440
+ than ``S.max() * max(M.shape) * eps`` as indicating rank deficiency (with
1441
+ the symbols defined above). This is the algorithm MATLAB uses [1]. It also
1442
+ appears in *Numerical recipes* in the discussion of SVD solutions for linear
1443
+ least squares [2].
1444
+
1445
+ This default threshold is designed to detect rank deficiency accounting for
1446
+ the numerical errors of the SVD computation. Imagine that there is a column
1447
+ in `M` that is an exact (in floating point) linear combination of other
1448
+ columns in `M`. Computing the SVD on `M` will not produce a singular value
1449
+ exactly equal to 0 in general: any difference of the smallest SVD value from
1450
+ 0 will be caused by numerical imprecision in the calculation of the SVD.
1451
+ Our threshold for small SVD values takes this numerical imprecision into
1452
+ account, and the default threshold will detect such numerical rank
1453
+ deficiency. The threshold may declare a matrix `M` rank deficient even if
1454
+ the linear combination of some columns of `M` is not exactly equal to
1455
+ another column of `M` but only numerically very close to another column of
1456
+ `M`.
1457
+
1458
+ We chose our default threshold because it is in wide use. Other thresholds
1459
+ are possible. For example, elsewhere in the 2007 edition of *Numerical
1460
+ recipes* there is an alternative threshold of ``S.max() *
1461
+ np.finfo(M.dtype).eps / 2. * np.sqrt(m + n + 1.)``. The authors describe
1462
+ this threshold as being based on "expected roundoff error" (p 71).
1463
+
1464
+ The thresholds above deal with floating point roundoff error in the
1465
+ calculation of the SVD. However, you may have more information about the
1466
+ sources of error in `M` that would make you consider other tolerance values
1467
+ to detect *effective* rank deficiency. The most useful measure of the
1468
+ tolerance depends on the operations you intend to use on your matrix. For
1469
+ example, if your data come from uncertain measurements with uncertainties
1470
+ greater than floating point epsilon, choosing a tolerance near that
1471
+ uncertainty may be preferable. The tolerance may be absolute if the
1472
+ uncertainties are absolute rather than relative.
1450
1473
1451
1474
References
1452
1475
----------
1453
- .. [1] G. H. Golub and C. F. Van Loan, *Matrix Computations*.
1454
- Baltimore: Johns Hopkins University Press, 1996.
1476
+ .. [1] MATLAB reference documention, "Rank"
1477
+ http://www.mathworks.com/help/techdoc/ref/rank.html
1478
+ .. [2] W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery,
1479
+ "Numerical Recipes (3rd edition)", Cambridge University Press, 2007,
1480
+ page 795.
1455
1481
1456
1482
Examples
1457
1483
--------
@@ -1465,7 +1491,6 @@ def matrix_rank(M, tol=None):
1465
1491
1
1466
1492
>>> matrix_rank(np.zeros((4,)))
1467
1493
0
1468
-
1469
1494
"""
1470
1495
M = asarray (M )
1471
1496
if M .ndim > 2 :
@@ -1474,7 +1499,7 @@ def matrix_rank(M, tol=None):
1474
1499
return int (not all (M == 0 ))
1475
1500
S = svd (M ,compute_uv = False )
1476
1501
if tol is None :
1477
- tol = S .max ()* finfo (S .dtype ).eps
1502
+ tol = S .max ()* max ( M . shape ) * finfo (S .dtype ).eps
1478
1503
return sum (S > tol )
1479
1504
1480
1505