Movatterモバイル変換

Pearson’s\(r\) is undoubtedly the gold measure forlinear dependence. Now, it might be the gold measure also for nonlinearmonotone dependence, if adjusted.

Quick Start

Basic Usage

library(recor)x<-c(1,2,3,4,5)y<-c(2,4,6,8,10)recor(x, y)#> [1] 1# Nonlinear monotone relationshipx<-c(1,2,3,4,5)y<-c(1,8,27,65,125)# y = x^3recor(x, y)# Higher value than Pearson's r#> [1] 1cor(x, y)#> [1] 0.944458# Matrix exampleset.seed(123)mat<-matrix(rnorm(100),ncol =5)colnames(mat)<- LETTERS[1:5]recor(mat)# 5x5 correlation matrix#>    A           B          C          D          E#> A  1.00000000 -0.09511994 -0.1283021  0.1243721 -0.2328551#> B -0.09511994  1.00000000  0.1022576  0.2381745  0.3780232#> C -0.12830211  0.10225762  1.0000000 -0.1523651 -0.3603780#> D  0.12437205  0.23817455 -0.1523651  1.0000000 -0.1289523#> E -0.23285513  0.37802315 -0.3603780 -0.1289523  1.0000000# Two matricesmat1<-matrix(rnorm(50),ncol =5)mat2<-matrix(rnorm(50),ncol =5)recor(mat1, mat2)# 5x5 cross-correlation matrix#>       [,1]         [,2]        [,3]        [,4]        [,5]#> [1,]  0.0001379295  0.019273397 -0.14776094 -0.01203410  0.14712263#> [2,] -0.0850363746  0.135125063 -0.10799623  0.35026884  0.20233183#> [3,] -0.2825948208 -0.020383616 -0.31990514 -0.33267352 -0.48254414#> [4,]  0.4067584970 -0.008022853  0.08223935  0.02728547  0.37567963#> [5,]  0.5566966868 -0.059564374  0.03296252  0.22249817 -0.03009148# data.framerecor(iris[,1:4])#>                Sepal.Length Sepal.Width Petal.Length Petal.Width#> Sepal.Length    1.0000000  -0.1210250    0.9156110   0.8445397#> Sepal.Width    -0.1210250   1.0000000   -0.4628225  -0.3909946#> Petal.Length    0.9156110  -0.4628225    1.0000000   0.9694665#> Petal.Width     0.8445397  -0.3909946    0.9694665   1.0000000

Theoretical Foundation

Mathematical Definition

The rearrangement correlation coefficient is based on rearrangementinequality theorems that provide tighter bounds than the Cauchy-Schwarzinequality. Mathematically, for samples\(x\) and\(y\), it is defined as:

\({r^\# }\left( {x,y} \right) =\frac{{{s_{x,y}}}}{{\left| {{s_{{x^ \uparrow },{y^ \updownarrow }}}}\right|}}\)

Where:

\({{s_{x,y}}}\) is the samplecovariance between\(x\) and\(y\)
\({{x^ \uparrow }}\) denotes theincreasing rearrangement of\(x\)
\({{y^ \updownarrow }}\) denoteseither\({y^ \uparrow }\) (increasingrearrangement of\(y\)) if\({{s_{x,y}}} \ge 0\), or\({y^ \downarrow }\) (decreasingrearrangement of\(y\)) if\({{s_{x,y}}} < 0\).

R Implementation

\({r^\# }\) can be computed in R asfollows:

recor<-function(x,y =NULL) {    recor_vector<-function(x, y) {        numerator<-cov(x, y)if (numerator>=0) {            denominator<-abs(cov(sort(x,decreasing =FALSE),sort(y,decreasing =FALSE)            ))        }else {            denominator<-abs(cov(sort(x,decreasing =FALSE),sort(y,decreasing =TRUE)            ))        }        numerator/ denominator    }if (is.matrix(x)||is.data.frame(x)) {        x<-as.matrix(x)if (is.null(y)) {            p<-ncol(x)            result<-matrix(1,nrow = p,ncol = p)rownames(result)<-colnames(result)<-colnames(x)for (iin1:p) {for (jin1:p) {if (i!= j) {                        result[i, j]<- result[j, i]<-recor_vector(x[, i], x[, j])                    }                }            }return(result)        }elseif (is.matrix(y)||is.data.frame(y)) {            y<-as.matrix(y)if (nrow(x)!=nrow(y)) {stop("The number of rows of x and y must be the same")            }            p<-ncol(x)            q<-ncol(y)            result<-matrix(0,nrow = p,ncol = q)rownames(result)<-colnames(x)colnames(result)<-colnames(y)for (iin1:p) {for (jin1:q) {                    result[i, j]<-recor_vector(x[, i], y[, j])                }            }return(result)        }    }if (is.null(y)) {stop("y is needed when x is a vector")    }if (length(x)!=length(y)) {stop("x and y must have the same length")    }if (length(x)<2) {stop("x and y must have at least two elements")    }recor_vector(x, y)}

It is to be noted that the above R implementation is for illustrativepurposes only. The actualrecor package employs a highlyoptimized C++ backend to ensure efficient computation.

Intuitive Example

Do we need a new monotone measure given that rank-based measures suchas Spearman’s\(\rho\) can alreadymeasure monotone dependence? The answser is YES in sense that r# has ahigher resolution and is more accurate. To take a simple example, let\(x = (4, 3, 2, 1)\) and

\(y_1 = (5, 4, 3, 2)\)
\(y_2 = (5, 4, 3, 3.25)\)
\(y_3 = (5, 4, 3, 3.50)\)
\(y_4 = (5, 4, 3, 3.75)\)
\(y_5 = (5, 4, 3, 4.50)\)

Obviously,\(y_1\) and\(x\) behaves exactly in the same way, withtheir values getting small and small step by step. The behavior of\(y_2, y_3, y_4\) and\(y_5\) are becoming more and more differentfrom that of\(x\). However, the\(\rho\) values are all the same for\(y_2, y_3, y_4\). In contrast, the\(r^\#\) values can reveal all thesedifferences exactly.

x<-c(4,3,2,1)y_list<-list(y1 =c(5,4,3,2.00),y2 =c(5,4,3,3.25),y3 =c(5,4,3,3.50),y4 =c(5,4,3,3.75),y5 =c(5,4,3,4.50))# recorlapply(y_list, recor, x)#> $y1#> [1] 1#>#> $y2#> [1] 0.9259259#>#> $y3#> [1] 0.8461538#>#> $y4#> [1] 0.76#>#> $y5#> [1] 0.3846154#corlapply(y_list, cor, x,method ="spearman")#> $y1#> [1] 1#>#> $y2#> [1] 0.8#>#> $y3#> [1] 0.8#>#> $y4#> [1] 0.8#>#> $y5#> [1] 0.4

Citation

If you use this package in your research, please cite our work as:

@inproceedings{NEURIPS2024_41c38a83,author = {Ai, Xinbo},booktitle = {Advances in Neural Information Processing Systems},editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},pages = {37385--37407},publisher = {Curran Associates, Inc.},title = {Adjust Pearson\textquotesingle s r to Measure Arbitrary Monotone Dependence},url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/41c38a83bd97ba28505b4def82676ba5-Paper-Conference.pdf},volume = {37},year = {2024}}

recor: Making Correlation Measurement More Accurate

Movatterモバイル変換

recor: Rearrangement CorrelationCoefficient

Xinbo Ai

2025-12-09

Overview

Features