Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Estimate a suite of normalizing transformations, including a new adaptation of a technique based on ranks which can guarantee normally distributed transformed data if there are no ties: ordered quantile normalization (ORQ).

NotificationsYou must be signed in to change notification settings

petersonR/bestNormalize

Repository files navigation

CRAN versionR-CMD-checkdownloadsCodecov test coverage

ThebestNormalize R package was designed to help find a normalizingtransformation for a vector. There are many techniques that have beendeveloped in this aim, however each has been subject to their ownstrengths/weaknesses, and it is unclear on how to decide which will workbest until the data is observed. This package will look at a range ofpossible transformations and return the best one, i.e. the one thatmakes it look themost normal.

Note that some authors use the term “normalize” differently than in thispackage. We define “normalize”: to transform a vector of data in such away that the transformed values follow a Gaussian distribution (orequivalently, a bell curve). This is in contrast to other suchtechniques designed to transform values to the 0-1 range, or to the -1to 1 range.

This package also introduces a new adaptation of a normalizationtechnique, which we call Ordered Quantile normalization (orderNorm(),or ORQ). ORQ transforms the data based off of a rank mapping to thenormal distribution. This allows us toguarantee normally distributedtransformed data (if ties are not present). The adaptation uses ashifted logit approximation on the ranks transformation to perform thetransformation on newly observed data outside of the original domain. Onnew data within the original domain, the transformation uses linearinterpolation of the fitted transformation.

To evaluate the efficacy of the normalization technique, thebestNormalize() function implements repeated cross-validation toestimate the Pearson’s P statistic divided by its degrees of freedom.This is called the “Normality statistic”, and if it is close to 1 (orless), then the transformation can be thought of as working well. Thefunction is designed to select the transformation that produces thelowest P / df value, when estimated on out-of-sample data (estimatingthis on in-sample data will always choose the orderNorm technique, andis generally not the main goal of these procedures).

Installation

You can install the most recent (devel) version of bestNormalize fromGitHub with:

# install.packages("devtools")devtools::install_github("petersonR/bestNormalize")

Or, you can download it from CRAN with:

install.packages("bestNormalize")

Example

In this example, we generate 1000 draws from a gamma distribution, andnormalize them:

library(bestNormalize)
set.seed(100)x<- rgamma(1000,1,1)# Estimate best transformation with repeated cross-validationBN_obj<- bestNormalize(x,allow_lambert_s=TRUE)#> Warning: package 'lamW' was built under R version 4.0.5BN_obj#> Best Normalizing transformation with 1000 Observations#>  Estimated Normality Statistics (Pearson P / df, lower => more normal):#>  - arcsinh(x): 3.6204#>  - Box-Cox: 0.96#>  - Center+scale: 6.7851#>  - Exp(x): 50.8513#>  - Lambert's W (type s): 1.0572#>  - Log_b(x+a): 1.908#>  - orderNorm (ORQ): 1.0516#>  - sqrt(x + a): 1.4556#>  - Yeo-Johnson: 1.7385#> Estimation method: Out-of-sample via CV with 10 folds and 5 repeats#>#> Based off these, bestNormalize chose:#> Standardized Box Cox Transformation with 1000 nonmissing obs.:#>  Estimated statistics:#>  - lambda = 0.2739638#>  - mean (before standardization) = -0.3870903#>  - sd (before standardization) = 1.045498# Perform transformationgx<- predict(BN_obj)# Perform reverse transformationx2<- predict(BN_obj,newdata=gx,inverse=TRUE)# Prove the transformation is 1:1all.equal(x2,x)#> [1] TRUE

As of version 1.3, the package supports leave-one-out cross-validationas well. ORQ normalization works very well when the size of the testdataset is low relative to the training data set, so it will often beselected via leave-one-out cross-validation (which is why we setallow_orderNorm = FALSE here).

(BN_loo<- bestNormalize(x,allow_orderNorm=FALSE,allow_lambert_s=TRUE,loo=TRUE))#> Best Normalizing transformation with 1000 Observations#>  Estimated Normality Statistics (Pearson P / df, lower => more normal):#>  - arcsinh(x): 14.0712#>  - Box-Cox: 0.8077#>  - Center+scale: 26.5181#>  - Exp(x): 451.435#>  - Lambert's W (type s): 1.269#>  - Log_b(x+a): 4.5374#>  - sqrt(x + a): 3.3655#>  - Yeo-Johnson: 5.7997#> Estimation method: Out-of-sample via leave-one-out CV#>#> Based off these, bestNormalize chose:#> Standardized Box Cox Transformation with 1000 nonmissing obs.:#>  Estimated statistics:#>  - lambda = 0.2739638#>  - mean (before standardization) = -0.3870903#>  - sd (before standardization) = 1.045498

It is also possible to visualize these transformations:

plot(BN_obj,leg_loc="bottomright")

For a more in depth tutorial, please consultthe packagevignette,or thepackage website.

About

Estimate a suite of normalizing transformations, including a new adaptation of a technique based on ranks which can guarantee normally distributed transformed data if there are no ties: ordered quantile normalization (ORQ).

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors5

Languages


[8]ページ先頭

©2009-2025 Movatter.jp