Pearson’s\(r\) is undoubtedly the gold measure forlinear dependence. Now, it might be the gold measure also for nonlinearmonotone dependence, if adjusted.
recor is an R package that implements theRearrangement Correlation Coefficient (\(r^\#\)), an adjusted version ofPearson’s correlation coefficient designed to accurately measurearbitrary monotone dependence relationships (both linear and nonlinear).Based on cutting-edge statistical research, this package addresses theunderestimation problem of traditional correlation coefficients innonlinear monotone scenarios. The rearrangement correlation is derivedfrom a tighter inequality than the classical Cauchy-Schwarz inequality,providing sharper bounds and expanded capture range.
stats::cor().library(recor)x<-c(1,2,3,4,5)y<-c(2,4,6,8,10)recor(x, y)#> [1] 1# Nonlinear monotone relationshipx<-c(1,2,3,4,5)y<-c(1,8,27,65,125)# y = x^3recor(x, y)# Higher value than Pearson's r#> [1] 1cor(x, y)#> [1] 0.944458# Matrix exampleset.seed(123)mat<-matrix(rnorm(100),ncol =5)colnames(mat)<- LETTERS[1:5]recor(mat)# 5x5 correlation matrix#> A B C D E#> A 1.00000000 -0.09511994 -0.1283021 0.1243721 -0.2328551#> B -0.09511994 1.00000000 0.1022576 0.2381745 0.3780232#> C -0.12830211 0.10225762 1.0000000 -0.1523651 -0.3603780#> D 0.12437205 0.23817455 -0.1523651 1.0000000 -0.1289523#> E -0.23285513 0.37802315 -0.3603780 -0.1289523 1.0000000# Two matricesmat1<-matrix(rnorm(50),ncol =5)mat2<-matrix(rnorm(50),ncol =5)recor(mat1, mat2)# 5x5 cross-correlation matrix#> [,1] [,2] [,3] [,4] [,5]#> [1,] 0.0001379295 0.019273397 -0.14776094 -0.01203410 0.14712263#> [2,] -0.0850363746 0.135125063 -0.10799623 0.35026884 0.20233183#> [3,] -0.2825948208 -0.020383616 -0.31990514 -0.33267352 -0.48254414#> [4,] 0.4067584970 -0.008022853 0.08223935 0.02728547 0.37567963#> [5,] 0.5566966868 -0.059564374 0.03296252 0.22249817 -0.03009148# data.framerecor(iris[,1:4])#> Sepal.Length Sepal.Width Petal.Length Petal.Width#> Sepal.Length 1.0000000 -0.1210250 0.9156110 0.8445397#> Sepal.Width -0.1210250 1.0000000 -0.4628225 -0.3909946#> Petal.Length 0.9156110 -0.4628225 1.0000000 0.9694665#> Petal.Width 0.8445397 -0.3909946 0.9694665 1.0000000The rearrangement correlation coefficient is based on rearrangementinequality theorems that provide tighter bounds than the Cauchy-Schwarzinequality. Mathematically, for samples\(x\) and\(y\), it is defined as:
\({r^\# }\left( {x,y} \right) =\frac{{{s_{x,y}}}}{{\left| {{s_{{x^ \uparrow },{y^ \updownarrow }}}}\right|}}\)
Where:
\({r^\# }\) can be computed in R asfollows:
recor<-function(x,y =NULL) { recor_vector<-function(x, y) { numerator<-cov(x, y)if (numerator>=0) { denominator<-abs(cov(sort(x,decreasing =FALSE),sort(y,decreasing =FALSE) )) }else { denominator<-abs(cov(sort(x,decreasing =FALSE),sort(y,decreasing =TRUE) )) } numerator/ denominator }if (is.matrix(x)||is.data.frame(x)) { x<-as.matrix(x)if (is.null(y)) { p<-ncol(x) result<-matrix(1,nrow = p,ncol = p)rownames(result)<-colnames(result)<-colnames(x)for (iin1:p) {for (jin1:p) {if (i!= j) { result[i, j]<- result[j, i]<-recor_vector(x[, i], x[, j]) } } }return(result) }elseif (is.matrix(y)||is.data.frame(y)) { y<-as.matrix(y)if (nrow(x)!=nrow(y)) {stop("The number of rows of x and y must be the same") } p<-ncol(x) q<-ncol(y) result<-matrix(0,nrow = p,ncol = q)rownames(result)<-colnames(x)colnames(result)<-colnames(y)for (iin1:p) {for (jin1:q) { result[i, j]<-recor_vector(x[, i], y[, j]) } }return(result) } }if (is.null(y)) {stop("y is needed when x is a vector") }if (length(x)!=length(y)) {stop("x and y must have the same length") }if (length(x)<2) {stop("x and y must have at least two elements") }recor_vector(x, y)}It is to be noted that the above R implementation is for illustrativepurposes only. The actualrecor package employs a highlyoptimized C++ backend to ensure efficient computation.
Do we need a new monotone measure given that rank-based measures suchas Spearman’s\(\rho\) can alreadymeasure monotone dependence? The answser is YES in sense that r# has ahigher resolution and is more accurate. To take a simple example, let\(x = (4, 3, 2, 1)\) and
Obviously,\(y_1\) and\(x\) behaves exactly in the same way, withtheir values getting small and small step by step. The behavior of\(y_2, y_3, y_4\) and\(y_5\) are becoming more and more differentfrom that of\(x\). However, the\(\rho\) values are all the same for\(y_2, y_3, y_4\). In contrast, the\(r^\#\) values can reveal all thesedifferences exactly.
x<-c(4,3,2,1)y_list<-list(y1 =c(5,4,3,2.00),y2 =c(5,4,3,3.25),y3 =c(5,4,3,3.50),y4 =c(5,4,3,3.75),y5 =c(5,4,3,4.50))# recorlapply(y_list, recor, x)#> $y1#> [1] 1#>#> $y2#> [1] 0.9259259#>#> $y3#> [1] 0.8461538#>#> $y4#> [1] 0.76#>#> $y5#> [1] 0.3846154#corlapply(y_list, cor, x,method ="spearman")#> $y1#> [1] 1#>#> $y2#> [1] 0.8#>#> $y3#> [1] 0.8#>#> $y4#> [1] 0.8#>#> $y5#> [1] 0.4Ai, X. (2024). Adjust Pearson’s r to Measure Arbitrary MonotoneDependence. InAdvances in Neural Information ProcessingSystems (Vol. 37, pp. 37385-37407).
This project is licensed under GPL-3.
If you use this package in your research, please cite our work as:
@inproceedings{NEURIPS2024_41c38a83,author = {Ai, Xinbo},booktitle = {Advances in Neural Information Processing Systems},editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},pages = {37385--37407},publisher = {Curran Associates, Inc.},title = {Adjust Pearson\textquotesingle s r to Measure Arbitrary Monotone Dependence},url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/41c38a83bd97ba28505b4def82676ba5-Paper-Conference.pdf},volume = {37},year = {2024}}recor: Making Correlation Measurement More Accurate