Movatterモバイル変換


[0]ホーム

URL:


What's new

Updates on my research and expository papers, discussion of open problems, and other maths-related topics. By Terence Tao

When is correlation transitive?

5 June, 2014 inexpository,math.CA,math.ST | Tags:,, | by

Given two unit vectors{v,w} in a real inner product space, one can define thecorrelation between these vectors to be their inner product{\langle v, w \rangle}, or in more geometric terms, the cosine of the angle{\angle(v,w)} subtended by{v} and{w}. By the Cauchy-Schwarz inequality, this is a quantity between{-1} and{+1}, with the extreme positive correlation{+1} occurring when{v,w} are identical, the extreme negative correlation{-1} occurring when{v,w} are diametrically opposite, and the zero correlation{0} occurring when{v,w} are orthogonal. This notion is closely related to the notion ofcorrelation between two non-constant square-integrable real-valued random variables{X,Y}, which is the same as the correlation between two unit vectors{v,w} lying in the Hilbert space{L^2(\Omega)} of square-integrable random variables, with{v} being the normalisation of{X} defined by subtracting off the mean{\mathbf{E} X} and then dividing by the standard deviation of{X}, and similarly for{w} and{Y}.

One can also define correlation for complex (Hermitian) inner product spaces by taking the real part{\hbox{Re} \langle , \rangle} of the complex inner product to recover a real inner product.

While reading the (highly recommended) recent popular maths book “How not to be wrong“, by my friend and co-author Jordan Ellenberg, I came across the (important) point that correlation is not necessarily transitive: if{X} correlates with{Y}, and{Y} correlates with{Z}, then this does not imply that{X} correlates with{Z}. A simple geometric example is provided by the three unit vectors

\displaystyle  u := (1,0); v := (\frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}}); w := (0,1)

in the Euclidean plane{{\bf R}^2}:{u} and{v} have a positive correlation of{\frac{1}{\sqrt{2}}}, as does{v} and{w}, but{u} and{w} are not correlated with each other. Or: for a typical undergraduate course, it is generally true that good exam scores are correlated with a deep understanding of the course material, and memorising from flash cards are correlated with good exam scores, but this does not imply that memorising flash cards is correlated with deep understanding of the course material.

However, there are at least two situations in which some partial version of transitivity of correlation can be recovered. The first is in the “99%” regime in which the correlations arevery close to{1}: if{u,v,w} are unit vectors such that{u} isvery highly correlated with{v}, and{v} isvery highly correlated with{w}, then thisdoes imply that{u} is very highly correlated with{w}. Indeed, from the identity

\displaystyle  \| u-v \| = 2^{1/2} (1 - \langle u,v\rangle)^{1/2}

(and similarly for{v-w} and{u-w}) and the triangle inequality

\displaystyle  \|u-w\| \leq \|u-v\| + \|v-w\|,

we see that

\displaystyle  (1 - \langle u,w \rangle)^{1/2} \leq (1 - \langle u,v\rangle)^{1/2} + (1 - \langle v,w\rangle)^{1/2}. \ \ \ \ \ (1)

Thus, for instance, if{\langle u, v \rangle \geq 1-\varepsilon} and{\langle v,w \rangle \geq 1-\varepsilon}, then{\langle u,w \rangle \geq 1-4\varepsilon}. This is of course closely related to (though slightly weaker than) the triangle inequality for angles:

\displaystyle  \angle(u,w) \leq \angle(u,v) + \angle(v,w).

Remark 1 (Thanks to Andrew Granville for conversations leading to this observation.) The inequality(1) also holds for sub-unit vectors, i.e. vectors{u,v,w} with{\|u\|, \|v\|, \|w\| \leq 1}. This comes by extending{u,v,w} in directions orthogonal to all three original vectors and to each other in order to make them unit vectors, enlarging the ambient Hilbert space{H} if necessary. More concretely, one can apply(1) to the unit vectors

\displaystyle  (u, \sqrt{1-\|u\|^2}, 0, 0), (v, 0, \sqrt{1-\|v\|^2}, 0), (w, 0, 0, \sqrt{1-\|w\|^2})

in{H \times {\bf R}^3}.

But even in the “{1\%}” regime in which correlations are very weak, there is still a version of transitivity of correlation, known as thevan der Corput lemma, which basically asserts that if a unit vector{v} is correlated withmany unit vectors{u_1,\dots,u_n}, then many of the pairs{u_i,u_j} will then be correlated with each other. Indeed, from the Cauchy-Schwarz inequality

\displaystyle  |\langle v, \sum_{i=1}^n u_i \rangle|^2 \leq \|v\|^2 \| \sum_{i=1}^n u_i \|^2

we see that

\displaystyle  (\sum_{i=1}^n \langle v, u_i \rangle)^2 \leq \sum_{1 \leq i,j \leq n} \langle u_i, u_j \rangle. \ \ \ \ \ (2)

Thus, for instance, if{\langle v, u_i \rangle \geq \varepsilon} for at least{\varepsilon n} values of{i=1,\dots,n}, then (after removing those indices{i} for which{\langle v, u_i \rangle < \varepsilon}){\sum_{i,j} \langle u_i, u_j \rangle} must be at least{\varepsilon^4 n^2}, which implies that{\langle u_i, u_j \rangle \geq \varepsilon^4/2} for at least{\varepsilon^4 n^2/2} pairs{(i,j)}. Or as another example: if a random variable{X} exhibits at least{1\%} positive correlation with{n} other random variables{Y_1,\dots,Y_n}, then if{n > 10,000}, at least two distinct{Y_i,Y_j} must have positive correlation with each other (although this argument does not tell youwhich pair{Y_i,Y_j} are so correlated). Thus one can view this inequality as a sort of `pigeonhole principle” for correlation.

A similar argument (multiplying each{u_i} by an appropriate sign{\pm 1}) shows the related van der Corput inequality

\displaystyle  (\sum_{i=1}^n |\langle v, u_i \rangle|)^2 \leq \sum_{1 \leq i,j \leq n} |\langle u_i, u_j \rangle|, \ \ \ \ \ (3)

and this inequality is also true for complex inner product spaces. (Also, the{u_i} do not need to be unit vectors for this inequality to hold.)

Geometrically, the picture is this: if{v} positively correlates with all of the{u_1,\dots,u_n}, then the{u_1,\dots,u_n} are all squashed into a somewhat narrow cone centred at{v}. The cone is still wide enough to allow a few pairs{u_i, u_j} to be orthogonal (or even negatively correlated) with each other, but (when{n} is large enough) it is not wide enough to allowall of the{u_i,u_j} to be so widely separated. Remarkably, the bound here does not depend on the dimension of the ambient inner product space; while increasing the number of dimensions should in principle add more “room” to the cone, this effect is counteracted by the fact that in high dimensions, almost all pairs of vectors are close to orthogonal, and the exceptional pairs that are even weakly correlated to each other become exponentially rare. (Seethis previous blog post for some related discussion; in particular, Lemma 2 from that post is closely related to the van der Corput inequality presented here.)

A particularly common special case of the van der Corput inequality arises when{v} is a unit vector fixed by some unitary operator{T}, and the{u_i} are shifts{u_i = T^i u} of a single unit vector{u}. In this case, the inner products{\langle v, u_i \rangle} are all equal, and we arrive at the useful van der Corput inequality

\displaystyle  |\langle v, u \rangle|^2 \leq \frac{1}{n^2} \sum_{1 \leq i,j \leq n} |\langle T^i u, T^j u \rangle|. \ \ \ \ \ (4)

(In fact, one can even remove the absolute values from the right-hand side, by using(2) instead of(4).) Thus, to show that{v} has negligible correlation with{u}, it suffices to show that the shifts of{u} have negligible correlation with each other.

Here is a basic application of the van der Corput inequality:

Proposition 2 (Weyl equidistribution estimate) Let{P: {\bf Z} \rightarrow {\bf R}/{\bf Z}} be a polynomial with at least one non-constant coefficient irrational. Then one has

\displaystyle  \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N e( P(n) ) = 0,

where{e(x) := e^{2\pi i x}}.

Note that this assertion implies the more general assertion

\displaystyle  \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N e( kP(n) ) = 0

for any non-zero integer{k} (simply by replacing{P} by{kP}), which by the Weyl equidistribution criterion is equivalent to the sequence{P(1), P(2),\dots} being asymptotically equidistributed in{{\bf R}/{\bf Z}}.

Proof: We induct on the degree{d} of the polynomial{P}, which must be at least one. If{d} is equal to one, the claim is easily established from the geometric series formula, so suppose that{d>1} and that the claim has already been proven for{d-1}. If the top coefficient{a_d} of{P(n) = a_d n^d + \dots + a_0} is rational, say{a_d = \frac{p}{q}}, then by partitioning the natural numbers into residue classes modulo{q}, we see that the claim follows from the induction hypothesis; so we may assume that the top coefficient{a_d} is irrational.

In order to use the van der Corput inequality as stated above (i.e. in the formalism of inner product spaces) we will need anon-principal ultrafilter{p} (see e.gthis previous blog post for basic theory of ultrafilters); we leave it as an exercise to the reader to figure out how to present the argument below without the use of ultrafilters (or similar devices, such asBanach limits). The ultrafilter{p} defines an inner product{\langle, \rangle_p} on bounded complex sequences{z = (z_1,z_2,z_3,\dots)} by setting

\displaystyle  \langle z, w \rangle_p := \hbox{st} \lim_{N \rightarrow p} \frac{1}{N} \sum_{n=1}^N z_n \overline{w_n}.

Strictly speaking, this inner product is only positive semi-definite rather than positive definite, but one can quotient out by the null vectors to obtain a positive-definite inner product. To establish the claim, it will suffice to show that

\displaystyle  \langle 1, e(P) \rangle_p = 0

for every non-principal ultrafilter{p}.

Note that the space of bounded sequences (modulo null vectors) admits a shift{T}, defined by

\displaystyle  T (z_1,z_2,\dots) := (z_2,z_3,\dots).

This shift becomes unitary once we quotient out by null vectors, and the constant sequence{1} is clearly a unit vector that is invariant with respect to the shift. So by the van der Corput inequality, we have

\displaystyle  |\langle 1, e(P) \rangle_p| \leq \frac{1}{n^2} \sum_{1 \leq i,j \leq n} |\langle T^i e(P), T^j e(P) \rangle_p|

for any{n \geq 1}. But we may rewrite{\langle T^i e(P), T^j e(P) \rangle = \langle 1, e(T^j P - T^i P) \rangle_p}. Then observe that if{i \neq j},{T^j P - T^i P} is a polynomial of degree{d-1} whose{d-1} coefficient is irrational, so by induction hypothesis we have{\langle T^i e(P), T^j e(P) \rangle_p = 0} for{i \neq j}. For{i=j} we of course have{\langle T^i e(P), T^j e(P) \rangle_p = 1}, and so

\displaystyle  |\langle 1, e(P) \rangle_p| \leq \frac{1}{n^2} \times n

for any{n}. Letting{n \rightarrow \infty}, we obtain the claim.\Box

Recent Comments

Sam's avatarSam on245B, notes 0: A quick review…
Terence Tao's avatarTerence Tao onCall for industry sponsors for…
Terence Tao's avatarTerence Tao onSum-difference exponents for b…
Terence Tao's avatarTerence Tao onCall for industry sponsors for…
Unknown's avatarAnonymous onCall for industry sponsors for…
Mark Lewko's avatarMark Lewko onSum-difference exponents for b…
Unknown's avatarAnonymous onCall for industry sponsors for…
Terence Tao's avatarTerence Tao onSum-difference exponents for b…
Terence Tao's avatarTerence Tao onSum-difference exponents for b…
Terence Tao's avatarTerence Tao on245A: Problem solving str…
Terence Tao's avatarTerence Tao onCall for industry sponsors for…
Unknown's avatarAnonymous on245A: Problem solving str…
There is's avatarAnastasiia Kovtun onThe Poisson-Dirichlet process,…
Unknown's avatarAnonymous onClimbing the cosmic distance l…
Saksham's avatarSaksham onCall for industry sponsors for…

Top Posts

Archives

Categories

additive combinatoricsapproximate groupsarithmetic progressionsArtificial IntelligenceBen GreenCauchy-SchwarzCayley graphscentral limit theoremChowla conjecturecompressed sensingcorrespondence principlecosmic distance ladderdistributionsdivisor functioneigenvaluesElias SteinEmmanuel Breuillardentropyequidistributionergodic theoryEuler equationsexponential sumsfinite fieldsFourier transformFreiman's theoremGowers uniformity normGowers uniformity normsgraph theoryGromov's theoremGUEHilbert's fifth problemICMincompressible Euler equationsinverse conjectureJoni TeravainenKaisa MatomakiKakeya conjectureLie algebrasLie groupsLiouville functionLittlewood-Offord problemMaksym RadziwillMobius functionNavier-Stokes equationsnilpotent groupsnilsequencesnonstandard analysisPaul Erdospoliticspolymath1polymath8Polymath15polynomial methodpolynomialsprime gapsprime numbersprime number theoremrandom matricesrandomnessRatner's theoremregularity lemmaRicci flowRiemann zeta functionSchrodinger equationShannon entropysieve theorystructureSzemeredi's theoremTamar ZieglerUCLAultrafiltersuniversalityVan Vuwave mapsYitang Zhang

RSSThe Polymath Blog

29 comments

Comments feed for this article

5 June, 2014 at 12:14 pm

Fred Lunnon

Fred Lunnon's avatar

Single bars in 3 out of 5 displayed formulae preceding (1), perhaps?

Reply

5 June, 2014 at 9:42 pm

Emmanuel Kowalski

Emmanuel Kowalski's avatar

The non-transitivity of correlation also comes up in representation theory, with characters (or approximate versions such as trace functions) as unit vectors; I’ve taken to interpret “X is correlated with Y” as meaning “X has something in common with Y”, which to me carries the right intuition.

It can be interpreted algebraically rigorously with the “common parts” being common irreducible subrepresentations.

Reply

6 June, 2014 at 2:24 am

Basil K

Basil K's avatar

Recently I’ve come across the following property for an element a in some “tolerance space” (A,T) (a set A with a reflexive and symmetric relation T):

for all b, b’ in A, if (b T a) and (a T b’) then (b T b’) .

I was wondering what would make a reasonably “natural” example for such special elements a, which facilitate transitions in an otherwise not necessarily transitive setting, and I’ve been asking everyone I know about it –and even people I don’t know.

I understand that the spirit of this post is analytic rather than algebraic (as my Math Stackexchange question is), but surely the cone intuition seems to be common to both, so I thought I’d share.

Reply

7 June, 2014 at 2:16 am

George Shakan

George Shakan's avatar

That is quite an elegant proof of Proposition 1!

I think I found a couple typos. Of a single unit vectorT should be of a single unit vectoru. Also I in Proposition 1, probably thee(P(n)) should bee(kP(n)) for some positive integer parameterk.

[Corrected, thanks – T.]

Reply

7 June, 2014 at 7:59 am

aquazorcarson

aquazorcarson's avatar

Thanks for making Van der Corput so well-motivated. Very enlightening read.

Reply

7 June, 2014 at 9:16 am

dzako

dzako's avatar

I dont get the definition of the ultrafilter product $_p$. This is defined for complex sequences and $e(P)$ is clearly not a complex seq.

Reply

17 June, 2014 at 10:51 pm

Anonymous

Unknown's avatar

The paragraph after equation 1 seems to have several inaccuracies. First it was not assumed that all the pairs have nonnegative correlations. Then i wasnt able to follow the pigeonhole estimates for both examples. Maybe I am missing something.

Reply
    Terence Tao's avatar

    Both arguments are valid even in the presence of some negative correlations. For instance, once one has\sum_{1 \leq i,j \leq n} \langle u_i,u_j\rangle \geq \varepsilon^4 n^2, one must have\langle u_i,u_j \rangle \geq \varepsilon^4/2 for at least\varepsilon^4 n^2/2 pairs(i,j), for if this were not the case, one would have\langle u_i, u_j \rangle < \varepsilon^4/2 for all but fewer than\varepsilon^4 n^2/2 pairs(i,j), and if one uses the trivial upper bound\langle u_i,u_j\rangle \leq 1 for these exceptional pairs, one obtains a contradiction. Note that at no point in this argument is any non-negativity hypothesis on the\langle u_i,u_j\rangle required. Similarly for the second argument.

    At a more intuitive level, any negative correlation between one pair ofu_i,u_j will only cause the other inner products on the RHS of (1) to become evenmore positive, if (1) is to hold (but the correlation of each pair maxes out at+1, so there has to be a lot of pairs that share in this positive correlation): making a pair of vectors at an obtuse angle will squeeze a lot of other angles to become more acute. So negative correlation is not actually an enemy here, and is in fact helpful in some ways. (For similar reasons, the pigeonhole principle is still mathematically valid – and even “stronger”, in some sense – if some pigeonholes are allowed to have a negative number of pigeons, despite the breakdown of the physical pigeon metaphor in this setting.)

      Kodlu's avatar

      I believe the $\varepsilon^3$s should be $\varepsilon^4$s in the above comment, as in the main text below equation (2).

      [Corrected, thanks – T.]

18 June, 2014 at 8:35 pm

Anonymous

Unknown's avatar

Thanks for your enlightening answer. That part is completely clear now. Now I am confused by the paragraph after equation (2). First if the inner product is standard Euclidean, then the set of vectors positively correlated with v is simply a half space, which is not so “narrow”. Also I don’t understand the explanation for why the equation is independent of dimension. Why does it help to know the weakly correlated pairs are exponentially rare as dimension goes up?

Reply
    Terence Tao's avatar

    Here, one should think of “positively correlated” as meaning “correlation at least epsilon for some fixed epsilon independent of dimension”. Then all the vectorsu_i will be squashed into a cone of aperture about\pi - 2 \varepsilon, and the angular measure of this cone decays exponentially with the dimension. This reduces the number of opportunities for pairs of vectors in this cone to be orthogonal or to make obtuse angles with each other.

22 June, 2014 at 8:52 pm

Anonymous

Anonymous's avatar

The triangle inequality seems to give slightly weaker bounds than a determinant based approach (for eg.
http://math.stackexchange.com/questions/147374/correlations-between-3-random-variables) some people have used. For eg. if and are both 0.87, the triangle inequality gives > 0.48, but the determinant based approach gives > 0.5138. Is this alternate approach correct?

Reply

22 June, 2014 at 9:01 pm

Anonymous

Anonymous's avatar

the second to last line wasnt posted fully by the comment engine. restating that line, given 3 variables 1,2,3 and pairwise correlations x, y, z, triangle inequality gives the minimum bound as 0.48, while determinant > 0 gives a bound around 0.51.

Reply

2 July, 2014 at 1:58 am

taking exponentials of stuff | marekgluza

[…] was reding this:https://terrytao.wordpress.com/2014/06/05/when-is-correlation-transitive/ and I got this going for me which is […]

Reply

27 October, 2014 at 9:22 pm

The Elliott-Halberstam conjecture implies the Vinogradov least quadratic nonresidue conjecture | What's new

Unknown's avatar

[…] of , this shows that correlates with for various small . By the Cauchy-Schwarz inequality (cf. this previous blog post), this implies that correlates with for some distinct . But this can be ruled out by using Type […]

Reply

15 December, 2014 at 1:30 am

hammyhamster

hammyhamster's avatar

Presumably there are implications for investment porfolios here. It’s well-known that adding a high-volatility (aka variance of return) stock to a portfolio can decrease the portfolio’s overall variance if the high-vol stock is negatively correlated to the others. But lack of transitivity presumably increases the challenge of doing this for large portfolios ?

Reply

11 June, 2015 at 4:52 pm

Anonymous

Unknown's avatar

Sorry if I’m being silly, but I don’t understand the explanation below equation (2). For example. letn\epsilon be a positive integer and\sum_{i=1}^n\langle v,u_i\rangle=n\epsilon^2, that is, all the remainingn-n\epsilon terms be such that there is not correlation between them andv. In this case, what is stated in the paragraph below isn’t true. I think I’m missing something here, please help me out!

[Oops, there was a typo – the exponent 3 should have been a 4. Corrected now – T.]

Reply

13 July, 2015 at 9:00 am

willis77 comments on “What does it mean for an algorithm to be fair?” | Offer Your

[…] vs causation. In real data, correlation is mostly transitive (there are toy exceptions –https://terrytao.wordpress.com/2014/06/05/when-is-correlatio&#8230;, but they are just that). This means that if you want to predict something, and that something is […]

Reply

13 July, 2015 at 9:18 am

willis77 comments on “What does it mean for an algorithm to be fair?” | Exploding Ads

[…] vs causation. In real data, correlation is mostly transitive (there are toy exceptions –https://terrytao.wordpress.com/2014/06/05/when-is-correlatio&#8230;, but they are just that). This means that if you want to predict something, and that something is […]

Reply

1 October, 2015 at 3:37 am

Foucart

Foucart's avatar

See Foucart (1991) Transitivité du produit scalaire. Rev. Stat. Appliquée, XXXIX (3), 57-68 and Foucart (1997) numerical analysis of a correlation matrix, Statistics, 29/4, pp. 347-361.

Reply

7 October, 2015 at 6:00 am

War and Picks - datdota.com

[…] note for the mathematically inclined – correlation is not a proper metric and it is not transitive except in the regime of very high correlation. In practice, this means that patch 6.82 can be […]

Reply

7 March, 2016 at 2:30 am

Correlation between Probability theory and Vector Algebra – Arindam Ghosh

[…] product/scalar product/projection product” of Vector Algebra while going through this article on “transitiveness of correlation” by Dr. Terrence […]

Reply

27 February, 2021 at 12:22 pm

Boosting the van der Corput inequality using the tensor power trick | What's new

Unknown's avatar

[…] this previous blog post I noted the following easy application of […]

Reply

1 July, 2021 at 9:46 pm

lotharson

lotharson's avatar

Hi Terence, I wish I could quote your blog post in a paper I am writing! ;-)

[This is fine; see the last paragraph ofhttps://terrytao.wordpress.com/about/ , as well as general guides on how to cite blog posts, e.g., athttps://blog.apastyle.org/apastyle/2016/04/how-to-cite-a-blog-post-in-apa-style.html -T.]

Reply

7 December, 2021 at 2:32 am

matott

matott's avatar

can someone help me with a real world application (not being a mathematician):
I have two logistic regressions (a and b), both being derived of and validated in a database S. Regression a is also tested and validated in a second database T.
Can I now say that my second regression b is also valid in the second database T, since a and b are both valid in S.
Comparing S and T directly, they have the same demographics (plus o moins)

Sorry for intruding with a real life problem. I would really appreciate a helpin hand (brain ;-)

Reply

Leave a commentCancel reply

For commenters

To enter in LaTeX in comments, use $latex<Your LaTeX code>$ (without the < and > signs, of course; in fact, these signs should be avoided as they can cause formatting errors). Also, backslashes \ need to be doubled as \\.  See theabout page for details and for other commenting policy.

Blog at WordPress.com.Ben Eastaugh and Chris Sternal-Johnson.

Subscribe to feed.


[8]ページ先頭

©2009-2025 Movatter.jp