Movatterモバイル変換


[0]ホーム

URL:


independence-test

Introduction

We propose a simple dependence measure \(\nu(Y, \mathbf{X})\) (A New Measure Of Dependence: Integrated R2.) to assess how much a random variable \(X\) explains a univariate response \(Y\).

Then thesimple irdc dependence measure is defined as:

$$\nu_{n}^{\text{1-dim}}(Y, X) := 1 - \frac{1}{2}\sum_{j \atop r_j \neq 1, n}\sum_{i\neq j, j - 1, n} \frac{\mathbb{I}[r_j\in\mathcal{K}_i]}{(r_j - 1)(n - r_j)}.$$where \(\mathbb{I}[r_j\in\mathcal{K}_i]\) is a 0-1 indicator function and \(\mathcal{K}_i := [\min\{r_i, r_{i + 1}\}, \max\{r_{i}, r_{i + 1}\}]\) when we ordered data with respect to \(X\) and rank with respect to \(Y\). InA New Measure Of Dependence: Integrated R2, we conjecture that under the same assumptions $$\sqrt{n}\left(\nu_{n}^{\text{1-dim}}(Y, X)-\frac{2}{n}\right)$$converges in distribution to \(N(0, \pi^2/3 - 3)\) as \(n\rightarrow\infty\).

We compare this metric with the (A new coefficient of correlation, Chatterjee 2021.The \(\xi\) measure is defined as following:$$\xi_n(X, Y) := 1 - \frac{3 \sum_{i=1}^{n-1} |r_{i+1} - r_i|}{n^2 - 1}.$$where we ordered data with respect to \(X\) and rank with respect to \(Y\).

(Chatterjee 2021showed that given \(X\) and \(Y\) are independent and \(Y\) is continuous. Then$$\sqrt{n} , \xi_n(X, Y) \xrightarrow{d} \mathcal{N}(0, 2/5) \quad \text{in distribution as } n \to \infty.$$

Pattern detection for Yeast genes

In this study, we use the revised and curated dataset,Spellman inR packageminerva, with 4381 genes to study the power of \(\nu_{n}^{\text{1-dim}}(Y, X)\) in discovering genes with oscillating transcript levels, and compare its performance with the competingtests by \(\xi_n\). We also explore the possible patterns in this dataset.

Load and Prepare Data

# Load yeast gene expression datayeast_genes_data <- as.data.frame(Spellman)gene_names <- colnames(yeast_genes_data)[-1]time_points <- yeast_genes_data$timen <- length(time_points)

Initialize Results Storage

xi_vals <- numeric(ncol(yeast_genes_data) - 1)xi_pvals <- numeric(ncol(yeast_genes_data) - 1)ird_vals <- numeric(ncol(yeast_genes_data) - 1)ird_pvals <- numeric(ncol(yeast_genes_data) - 1)

Run Dependence Measures for Each Gene

for (i in 1:(ncol(yeast_genes_data) - 1)) {  y <- as.numeric(yeast_genes_data[, i + 1])  # XICOR  xi_pvals[i] <- xicor(x = time_points , y = y, pvalue = T)$pval    # IRDC  ird <- irdc_simple(Y = y, X = time_points)  ird_vals[i] <- ird  ird_pvals[i] <-  1 - pnorm(ird, mean = 2/n , sd = sqrt((pi^2 / 3 - 3)/n))}

Adjust p-values and Identify Significant Genes

xi_fdr <- p.adjust(xi_pvals, method = "BH")ird_fdr <- p.adjust(ird_pvals, method = "BH")sig_xi <- gene_names[xi_fdr < 0.05]sig_ird <- gene_names[ird_fdr < 0.05]common_genes <- intersect(sig_xi, sig_ird)cat("All genes:", length(gene_names) , "\n")#> All genes: 4381cat("XICOR significant genes:", length(sig_xi), "\n")#> XICOR significant genes: 586cat("Simple IRDC significant genes:", length(sig_ird), "\n")#> Simple IRDC significant genes: 677cat("Overlap:", length(common_genes), "\n")#> Overlap: 547cat("ONLY XICOR significant genes:", length(setdiff(sig_xi, sig_ird)), "\n")#> ONLY XICOR significant genes: 39cat("ONLY Simple IRDC significant genes:", length(setdiff(sig_ird, sig_xi)), "\n")#> ONLY Simple IRDC significant genes: 130

Genes Only Detected by IRDC

irdc_detected_only <- setdiff(sig_ird, sig_xi)irdc_only_fdr <- ird_fdr[match(irdc_detected_only, gene_names)]top6_idx <- order(irdc_only_fdr)[1:6]smallest_p_irdc_do <- irdc_detected_only[top6_idx]irdc_do_genes <- yeast_genes_data[, which(gene_names %in% smallest_p_irdc_do) + 1]irdc_do_genes <- cbind(time_points, irdc_do_genes)

Plot Top Genes Only Detected by IRDC With Smallest Adjusted P_value IRDC

for (i in 1:6) {  gene_to_plot <- colnames(irdc_do_genes)[i + 1]  idx <- match(gene_to_plot, gene_names)  p <- ggplot(irdc_do_genes, aes(x = time_points, y = .data[[gene_to_plot]])) +    geom_point(size = 3) +    geom_smooth(method = "loess",se = FALSE, linewidth = 1, color = "blue")+     theme_bw() +    labs(      title = paste0("Only Detected by nu: xi adj.p-val = ", round(xi_fdr[idx], 4),                     ", nu adj.p-val = ", round(ird_fdr[idx], 4)),      x = "Time Points",      y = gene_to_plot    )  print(p)}

plot of chunk unnamed-chunk-6plot of chunk unnamed-chunk-6plot of chunk unnamed-chunk-6plot of chunk unnamed-chunk-6plot of chunk unnamed-chunk-6plot of chunk unnamed-chunk-6

Genes with Largest Adjusted P_value of XICOR

xi_irdc_only_fdr <- xi_fdr[match(irdc_detected_only, gene_names)]top6_diff <- order(-(xi_irdc_only_fdr))[1:6]largest_p_dif_irdc_do <- irdc_detected_only[top6_diff]irdc_do_large_diff_genes <- yeast_genes_data[, which(gene_names %in% largest_p_dif_irdc_do) + 1]irdc_do_large_diff_genes <- cbind(time_points, irdc_do_large_diff_genes)

Plot Top Genes with With Largest Adjusted P_value XICOR

for (i in 1:6) {  gene_to_plot <- colnames(irdc_do_large_diff_genes)[i + 1]  idx <- match(gene_to_plot, gene_names)  p <- ggplot(irdc_do_large_diff_genes, aes(x = time_points, y = .data[[gene_to_plot]])) +    geom_point(size = 3) +    geom_smooth(method = "loess", se = FALSE, linewidth = 1, color = "blue")+     theme_bw() +    labs(      title = paste0("Only Detected by nu: xi adj.p-val = ", round(xi_fdr[idx], 4),                     ", nu adj.p-val = ", round(ird_fdr[idx], 4)),      x = "Time Points",      y = gene_to_plot    )  print(p)}

plot of chunk unnamed-chunk-8plot of chunk unnamed-chunk-8plot of chunk unnamed-chunk-8plot of chunk unnamed-chunk-8plot of chunk unnamed-chunk-8plot of chunk unnamed-chunk-8


[8]ページ先頭

©2009-2025 Movatter.jp