Movatterモバイル変換

Type:

Package

Title:

Predictive Power Score

Version:

0.0.5

Description:

The Predictive Power Score (PPS) is an asymmetric, data-type-agnostic score that can detect linear or non-linear relationships between two variables. The score ranges from 0 (no predictive power) to 1 (perfect predictive power). PPS can be useful for data exploration purposes, in the same way correlation analysis is. For more information on PPS, seehttps://github.com/paulvanderlaken/ppsr.

License:

GPL (≥ 3)

Encoding:

UTF-8

Suggests:

testthat (≥ 2.0.0)

Config/testthat/edition:

Config/testthat/parallel:

true

RoxygenNote:

7.2.3

Imports:

ggplot2 (≥ 3.3.3), parsnip (≥ 0.1.5), rpart (≥ 4.1.15),withr (≥ 2.4.1), gridExtra (≥ 2.3)

NeedsCompilation:

Packaged:

2024-02-18 11:57:33 UTC; pvdl

Author:

Paul van der Laken [aut, cre, cph]

Maintainer:

Paul van der Laken <paulvanderlaken@gmail.com>

Repository:

CRAN

Date/Publication:

2024-02-18 12:30:02 UTC

ppsr: An R implementation of the Predictive Power Score (PPS)

Description

The PPS is an asymmetric, data-type-agnostic score that can detect linear ornon-linear relationships between two columns. The score ranges from 0(no predictive power) to 1 (perfect predictive power). It can be used as analternative to the correlation (matrix).

Lists all algorithms currently supported

Description

Lists all algorithms currently supported

Usage

available_algorithms()

Value

a list of all available parsnip engines

Examples

available_algorithms()

Lists all evaluation metrics currently supported

Description

Lists all evaluation metrics currently supported

Usage

available_evaluation_metrics()

Value

a list of all available evaluation metrics and their implementation in functional form

Examples

available_evaluation_metrics()

Normalizes the original score compared to a naive baseline scoreThe calculation that's being performed depends on the type of model

Description

Normalizes the original score compared to a naive baseline scoreThe calculation that's being performed depends on the type of model

Usage

normalize_score(baseline_score, model_score, type)

Arguments

baseline_score

float, the evaluation metric score for a naive baseline (model)

model_score

float, the evaluation metric score for a statistical model

type

character, type of model

Value

numeric vector of length one, normalized score

Calculate predictive power score for x on y

Description

Calculate predictive power score for x on y

Usage

score(  df,  x,  y,  algorithm = "tree",  metrics = list(regression = "MAE", classification = "F1_weighted"),  cv_folds = 5,  seed = 1,  verbose = TRUE)

Arguments

df

data.frame containing columns for x and y

x

string, column name of predictor variable

y

string, column name of target variable

algorithm

string, seeavailable_algorithms()

metrics

named list ofeval_* functions used forregression and classification problems, seeavailable_evaluation_metrics()

cv_folds

float, number of cross-validation folds

seed

float, seed to ensure reproducibility/stability

verbose

boolean, whether to print notifications

Value

a named list, potentially containing

x: the name of the predictor variable
y: the name of the target variable
result_type: text showing how to interpret the resulting score
pps: the predictive power score
metric: the evaluation metric used to compute the PPS
baseline_score: the score of a naive model on the evaluation metric
model_score: the score of the predictive model on the evaluation metric
cv_folds: how many cross-validation folds were used
seed: the seed that was set
algorithm: text shwoing what algorithm was used
model_type: text showing whether classification or regression was used

Examples

score(iris, x = 'Petal.Length', y = 'Species')

Calculate correlation coefficients for whole dataframe

Description

Calculate correlation coefficients for whole dataframe

Usage

score_correlations(df, ...)

Arguments

df

data.frame containing columns for x and y

...

arguments to pass tostats::cor()

Value

a data.frame with x-y correlation coefficients

Examples

score_correlations(iris)

Calculate predictive power scores for whole dataframeIterates through the columns of the dataframe, calculating the predictive powerscore for every possible combination of`x` and`y`.

Description

Calculate predictive power scores for whole dataframeIterates through the columns of the dataframe, calculating the predictive powerscore for every possible combination ofx andy.

Usage

score_df(df, ..., do_parallel = FALSE, n_cores = -1)

Arguments

df

data.frame containing columns for x and y

...

any arguments passed toscore

do_parallel

bool, whether to performscore calls in parallel

n_cores

numeric, number of cores to use, defaults to maximum minus 1

Value

a data.frame containing

x: the name of the predictor variable
y: the name of the target variable
result_type: text showing how to interpret the resulting score
pps: the predictive power score
metric: the evaluation metric used to compute the PPS
baseline_score: the score of a naive model on the evaluation metric
model_score: the score of the predictive model on the evaluation metric
cv_folds: how many cross-validation folds were used
seed: the seed that was set
algorithm: text shwoing what algorithm was used
model_type: text showing whether classification or regression was used

Examples

score_df(iris)score_df(mtcars, do_parallel = TRUE, n_cores = 2)

Calculate predictive power score matrixIterates through the columns of the dataset, calculating the predictive powerscore for every possible combination of`x` and`y`.

Description

Note that the targets are on the rows, and the features on the columns.

Usage

score_matrix(df, ...)

Arguments

df

data.frame containing columns for x and y

...

any arguments passed toscore_df,some of which will be passed on toscore

Value

a matrix of numeric values, representing predictive power scores

Examples

score_matrix(iris)score_matrix(mtcars, do_parallel = TRUE, n_cores=2)

Calculates out-of-sample model performance of a statistical model

Description

Calculates out-of-sample model performance of a statistical model

Usage

score_model(train, test, model, x, y, metric)

Arguments

train

df, training data, containing variable y

test

df, test data, containing variable y

model

parsnip model object, with mode preset

x

character, column name of predictor variable

y

character, column name of target variable

metric

character, name of evaluation metric being used, seeavailable_evaluation_metrics()

Value

numeric vector of length one, evaluation score for predictions using naive model

Calculate out-of-sample model performance of naive baseline modelThe calculation that's being performed depends on the type of modelFor regression models, the mean is used as predictionFor classification, a model predicting random values anda model predicting modal values are used andthe best model is taken as baseline score

Description

Calculate out-of-sample model performance of naive baseline modelThe calculation that's being performed depends on the type of modelFor regression models, the mean is used as predictionFor classification, a model predicting random values anda model predicting modal values are used andthe best model is taken as baseline score

Usage

score_naive(train, test, x, y, type, metric)

Arguments

train

df, training data, containing variable y

test

df, test data, containing variable y

x

character, column name of predictor variable

y

character, column name of target variable

type

character, type of model

metric

character, evaluation metric being used

Value

numeric vector of length one, evaluation score for predictions using naive model

Calculate predictive power scores for yCalculates the predictive power scores for the specified`y` variableusing every column in the dataset as`x`, including itself.

Description

Calculate predictive power scores for yCalculates the predictive power scores for the specifiedy variableusing every column in the dataset asx, including itself.

Usage

score_predictors(df, y, ..., do_parallel = FALSE, n_cores = -1)

Arguments

df

data.frame containing columns for x and y

y

string, column name of target variable

...

any arguments passed toscore

do_parallel

bool, whether to performscore calls in parallel

n_cores

numeric, number of cores to use, defaults to maximum minus 1

Value

a data.frame containing

x: the name of the predictor variable
y: the name of the target variable
result_type: text showing how to interpret the resulting score
pps: the predictive power score
metric: the evaluation metric used to compute the PPS
baseline_score: the score of a naive model on the evaluation metric
model_score: the score of the predictive model on the evaluation metric
cv_folds: how many cross-validation folds were used
seed: the seed that was set
algorithm: text shwoing what algorithm was used
model_type: text showing whether classification or regression was used

Examples

score_predictors(df = iris, y = 'Species')score_predictors(df = mtcars, y = 'mpg', do_parallel = TRUE, n_cores = 2)

Visualize the PPS & correlation matrices

Description

Visualize the PPS & correlation matrices

Usage

visualize_both(  df,  color_value_positive = "#08306B",  color_value_negative = "#8b0000",  color_text = "#FFFFFF",  include_missings = TRUE,  nrow = 1,  ...)

Arguments

df

data.frame containing columns for x and y

color_value_positive

color used for upper limit of gradient (high positive correlation)

color_value_negative

color used for lower limit of gradient (high negative correlation)

color_text

string, hex value or color name used for text, best to pick high contrast withcolor_value_high

include_missings

bool, whether to include the variables without correlation values in the plot

nrow

numeric, number of rows, either 1 or 2

...

any arguments passed toscore

Value

a grob object, a grid with two ggplot2 heatmap visualizations

Examples

visualize_both(iris)visualize_both(mtcars, do_parallel = TRUE, n_cores = 2)

Visualize the correlation matrix

Description

Visualize the correlation matrix

Usage

visualize_correlations(  df,  color_value_positive = "#08306B",  color_value_negative = "#8b0000",  color_text = "#FFFFFF",  include_missings = FALSE,  ...)

Arguments

df

data.frame containing columns for x and y

color_value_positive

color used for upper limit of gradient (high positive correlation)

color_value_negative

color used for lower limit of gradient (high negative correlation)

color_text

color used for text, best to pick high contrast withcolor_value_high

include_missings

bool, whether to include the variables without correlation values in the plot

...

arguments to pass tostats::cor()

Value

a ggplot object, a heatmap visualization

Examples

visualize_correlations(iris)

Visualize the Predictive Power scores of the entire dataframe, or given a target

Description

Ify is specified,visualize_pps returns a barplot of the PPS ofevery predictor on the specified target variable.Ify is not specified,visualize_pps returns a heatmap visualizationof the PPS for all X-Y combinations in a dataframe.

Usage

visualize_pps(  df,  y = NULL,  color_value_high = "#08306B",  color_value_low = "#FFFFFF",  color_text = "#FFFFFF",  include_target = TRUE,  ...)

Arguments

df

data.frame containing columns for x and y

y

string, column name of target variable,can be leftNULL to visualize all X-Y PPS

color_value_high

string, hex value or color name used for upper limit of PPS gradient (high PPS)

color_value_low

string, hex value or color name used for lower limit of PPS gradient (low PPS)

color_text

string, hex value or color name used for text, best to pick high contrast withcolor_value_high

include_target

boolean, whether to include the target variable in the barplot

...

any arguments passed toscore

Value

a ggplot object, a vertical barplot or heatmap visualization

Examples

visualize_pps(iris, y = 'Species')visualize_pps(iris)visualize_pps(mtcars, do_parallel = TRUE, n_cores = 2)

Movatterモバイル変換

ppsr: An R implementation of the Predictive Power Score (PPS)

Description

Lists all algorithms currently supported

Description

Usage

Value

Examples

Lists all evaluation metrics currently supported

Description

Usage

Value

Examples

Normalizes the original score compared to a naive baseline scoreThe calculation that's being performed depends on the type of model

Description

Usage

Arguments

Value

Calculate predictive power score for x on y

Description

Usage

Arguments

Value

Examples

Calculate correlation coefficients for whole dataframe

Description

Usage

Arguments

Value

Examples

Calculate predictive power scores for whole dataframeIterates through the columns of the dataframe, calculating the predictive powerscore for every possible combination ofx andy.

Description

Usage

Arguments

Value

Examples

Calculate predictive power score matrixIterates through the columns of the dataset, calculating the predictive powerscore for every possible combination ofx andy.

Description

Usage

Arguments

Value

Examples

Calculates out-of-sample model performance of a statistical model

Description

Usage

Arguments

Value

Description

Usage

Arguments

Value

Calculate predictive power scores for yCalculates the predictive power scores for the specifiedy variableusing every column in the dataset asx, including itself.

Description

Usage

Arguments

Value

Examples

Visualize the PPS & correlation matrices

Description

Usage

Arguments

Value

Examples

Visualize the correlation matrix

Description

Usage

Arguments

Value

Examples

Visualize the Predictive Power scores of the entire dataframe, or given a target

Description

Usage

Arguments

Value

Examples

Calculate predictive power scores for whole dataframeIterates through the columns of the dataframe, calculating the predictive powerscore for every possible combination of`x` and`y`.

Calculate predictive power score matrixIterates through the columns of the dataset, calculating the predictive powerscore for every possible combination of`x` and`y`.

Calculate predictive power scores for yCalculates the predictive power scores for the specified`y` variableusing every column in the dataset as`x`, including itself.