Movatterモバイル変換


[0]ホーム

URL:


ztils

License: MIT LicenseR-CMD-checklifecycleyear

Various utilities meant to aid in speeding up common statisticaloperations, such as: - removing outliers and extremes - generatingprobability density and cumulative distribution graphs with ggplot2 -running one-sample Kolmogorov-Smirnov tests against multipledistributions at once - generating prediction plots with ggplot2 -scaling data and performing principal component analysis (PCA) -plotting PCA with ggplot2

Installation

To install from CRAN

install.packages("ztils")

To install the development version:

remotes::install_github("zachpeagler/ztils")


no_outliers()

Description

This function works by keeping only rows in the dataframe containingvariable values within the quartiles +- 1.5 times the interquartilerange.

Usage

This function has no defaults, as it is entirely dependent on theuser input.

no_outliers(data,            var            )

Arguments

Returns

Returns the specified dataframedata minus the rowscontaining outliers in thevar variable.

Examples:

no_outliers(iris, Sepal.Length)

This isn’t a great example because the iris dataset does not containany statistical outliers.

no_extremes()

Description

This function works by keeping only rows in the dataframe containingvariable values within the quartiles +- 3.0 times the interquartilerange.

Usage

This function has no defaults, as it is entirely dependent on theuser input.

no_extremes(data,            var            )

Arguments

Returns

Returns the specified dataframedata minus the rowscontaining extremes in thevar variable.

Examples:

no_extremes(iris, Sepal.Length)

This isn’t a great example because the iris dataset does not containany statistical outliers.


multipdf_cont()

Description

This function gets the probability density function (PDF) forselected distributions againstcontinuous variables.Possible distributions include any combination of “normal”, “lognormal”,“gamma”, “exponential”, and “all” (which just uses all of the priordistributions).

Note that onlynon-negative numbers are supported bythe lognormal and gamma distributions. Feeding this function a negativenumber with those distributions selected will result in an error.

Usage:

multipdf_cont(var,               seq_length = 50,               distributions = "all"              )

Returns

This function returns a dataframe with row number equal toseq_length containing the real density and theprobability density function ofvar for selecteddistributions.

Arguments

Examples

multipdf_cont(iris$Petal.Length)multipdf_cont(iris$Sepal.Length, 100, c("normal", "lognormal"))

multipdf_plot()

Description

This function extendsmultiPDF_cont and gets theprobability density functions (PDFs) for selected distributions againstcontinuous,non-negative numbers.Possible distributions include any combination of “normal”, “lognormal”,“gamma”, “exponential”, and “all” (which just uses all of the priordistributions). It then plots this usingggplot2 and ascico palette, usingvar_name for theplot labeling, if specified. If not specified, it will usevar instead.

Usage

multipdf_plot(var,               seq_length = 50,              distributions = "all",               palette = "oslo",               var_name = NULL              )

Returns

A plot showing the PDF of the selected variable against the selecteddistributions over the selected sequence length.

Arguments

Examples

multipdf_plot(iris$Sepal.Length)multipdf_plot(iris$Sepal.Length,              seq_length = 100,              distributions = c("normal", "lognormal", "gamma"),              palette = "bilbao",              var_name = "Sepal Length (cm)"              )


multicdf_cont()

Description

This function gets the cumulative distribution function (CDF) forselected distributions againstcontinuous variables.Possible distributions include any combination of “normal”, “lognormal”,“gamma”, “exponential”, and “all” (which just uses all of the priordistributions).

Note that onlynon-negative numbers are supported bythe lognormal and gamma distributions. Feeding this function a negativenumber with those distributions selected will result in an error.

Usage:

multicdf_cont(var,               seq_length = 50,               distributions = "all"              )

Returns

This function returns a dataframe with row number equal toseq_length containing the real density and theprobability density function ofvar for selecteddistributions.

Arguments

Examples

multicdf_cont(iris$Petal.Length)multicdf_cont(iris$Sepal.Length,              100,               c("normal", "lognormal")              )

multicdf_plot()

Description

This function extendsmultiCDF_cont and gets thecumulative distribution functions (CDFs) for selected distributionsagainstcontinuous,non-negativenumbers. Possible distributions include any combination of “normal”,“lognormal”, “gamma”, “exponential”, and “all” (which just uses all ofthe prior distributions). It then plots this usingggplot2 and ascico palette, usingvar_name for the plot labeling, if specified. If notspecified, it will usevar instead.

Usage

multicdf_plot(var,               seq_length = 50,              distributions = "all",               palette = "oslo",               var_name = NULL              )

Returns

A plot showing the CDF of the selected variable against the selecteddistributions over the selected sequence length.

Arguments

Examples

multicdf_plot(iris$Sepal.Length)multicdf_plot(iris$Sepal.Length,              seq_length = 100,              distributions = c("normal", "lognormal", "gamma"),              palette = "bilbao",              var_name = "Sepal Length (cm)"              )


multiks_cont()

Description

This function gets the distance and p-value from a one-sampleKolmogorov-Smirnov (KS) test for selected distributions against acontinous input variable. Possible distributions include “normal”,“lognormal”, “gamma”, “exponential”, and “all”.

Usage

multiks_cont(var,             distributions = "all"                )

Note: If using “lognormal” or “gamma” distributions, the targetvariablemust be non-negative.

Arguments

Returns

Returns a dataframe with the distance and p-value for each performedKS test. The distance is a relative metric of similarity. A p-value of> 0.05 indicates that the target variable’s distribution isnot significantly different from the specifieddistribution.

Examples

multiks_cont(iris$Sepal.Length)multiks_cont(iris$Sepal.Length, c("normal", "lognormal"))


gml_pseudor2

Description

This function calculates the pseudo R^2 (proportion of varianceexplained by the model) for a general linear model (glm). glms don’thave real R^2 due to the intrinsic difference between a linear model anda generalized linear model, but we can still calculate an approximiationof the R^2 as (1 - (deviance/null deviance)).

Usage

glm_pseudor2(mod)

Arguments

Returns

Returns the pseudo R^2 value of the model.

Examples

gmod <- glm(Sepal.Length ~ Petal.Length + Species, data = iris)glm_pseudor2(gmod)


pca_plot()

Description

This function performs a principal component analysis (PCA) for theselectedpcavars with the option to automatically scalethe variables. It then graphs PC1 on the x axis and PC2 on the y-axisusingggplot2, coloring the graph with ascico paletteover the specifiedgroups. This is similar to thebiplot command from thestats package, but performsall the steps required in graphing a PCA for you.

Usage

pca_plot(group,         pcavars,         scaled = FALSE,         palette = "oslo         )

Arguments

Returns

A ggplot object showing PC1 on the x axis and PC2 on the y axis,colored by group with vectors and labels showing the individual pcavariables.

Examples

pca_plot(iris$Species, iris[,c(1:4)])pca_plot(iris$Species, iris[,c(1:4)], FALSE, "bilbao")


pca_data()

Description

This function performs a principal component analysis (PCA) on thespecified variables,pcavars and attaches the resultingprincipal components to the specified dataframe,data,with optional variable scaling.

Usage

pca_data(data,         pcavars,         scaled = FALSE         )

Arguments

Returns

Returns a dataframe with principal components as additionalcolumns.

Examples

pca_data(iris, iris[,c(1:4)], FALSE)


predict_plot()

Description

This function performs a prediction based on the suppliedmodel, then graphs it usingggplot2. Optionsare available for predicting based on the confidence or predictioninterval, as well as for applying corrections, such as exponential andlogistic.

I would like to alter this function to reduce the number of requiredinputs, as all the informationshould be available from themodel call, but that’s a work in progress. ### Usage

predict_plot(mod,             data,             rvar,             pvar,             group = NULL,             length = 50,             interval = "confidence",             correction = "normal",             palette = "oslo"             )

Arguments

Returns

Returns a plot with the observed (real) data plotted as points andthe prediction plotted as lines, with a 95% confidence or predictioninterval.

This function has a known issue with the colors on ungroupedpredictions being kind of funky, as the function uses the predictorvariable (x-axis) for the color, which works for the actual data(points), but doesn’t translate well to the predicted lines andribbon.

Examples

mod1 <- lm(Sepal.Length ~ Petal.Length + Species, data = iris)predict_plot(mod1, iris, Sepal.Length, Petal.Length, Species)


Bug reporting

If you find any bugs, please report them athttps://github.com/zachpeagler/ztils/issues.


[8]ページ先頭

©2009-2025 Movatter.jp