| Type: | Package |
| Title: | Predictive Power Score |
| Version: | 0.0.5 |
| Description: | The Predictive Power Score (PPS) is an asymmetric, data-type-agnostic score that can detect linear or non-linear relationships between two variables. The score ranges from 0 (no predictive power) to 1 (perfect predictive power). PPS can be useful for data exploration purposes, in the same way correlation analysis is. For more information on PPS, seehttps://github.com/paulvanderlaken/ppsr. |
| License: | GPL (≥ 3) |
| Encoding: | UTF-8 |
| Suggests: | testthat (≥ 2.0.0) |
| Config/testthat/edition: | 3 |
| Config/testthat/parallel: | true |
| RoxygenNote: | 7.2.3 |
| Imports: | ggplot2 (≥ 3.3.3), parsnip (≥ 0.1.5), rpart (≥ 4.1.15),withr (≥ 2.4.1), gridExtra (≥ 2.3) |
| NeedsCompilation: | no |
| Packaged: | 2024-02-18 11:57:33 UTC; pvdl |
| Author: | Paul van der Laken [aut, cre, cph] |
| Maintainer: | Paul van der Laken <paulvanderlaken@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2024-02-18 12:30:02 UTC |
ppsr: An R implementation of the Predictive Power Score (PPS)
Description
The PPS is an asymmetric, data-type-agnostic score that can detect linear ornon-linear relationships between two columns. The score ranges from 0(no predictive power) to 1 (perfect predictive power). It can be used as analternative to the correlation (matrix).
Lists all algorithms currently supported
Description
Lists all algorithms currently supported
Usage
available_algorithms()Value
a list of all available parsnip engines
Examples
available_algorithms()Lists all evaluation metrics currently supported
Description
Lists all evaluation metrics currently supported
Usage
available_evaluation_metrics()Value
a list of all available evaluation metrics and their implementation in functional form
Examples
available_evaluation_metrics()Normalizes the original score compared to a naive baseline scoreThe calculation that's being performed depends on the type of model
Description
Normalizes the original score compared to a naive baseline scoreThe calculation that's being performed depends on the type of model
Usage
normalize_score(baseline_score, model_score, type)Arguments
baseline_score | float, the evaluation metric score for a naive baseline (model) |
model_score | float, the evaluation metric score for a statistical model |
type | character, type of model |
Value
numeric vector of length one, normalized score
Calculate predictive power score for x on y
Description
Calculate predictive power score for x on y
Usage
score( df, x, y, algorithm = "tree", metrics = list(regression = "MAE", classification = "F1_weighted"), cv_folds = 5, seed = 1, verbose = TRUE)Arguments
df | data.frame containing columns for x and y |
x | string, column name of predictor variable |
y | string, column name of target variable |
algorithm | string, see |
metrics | named list of |
cv_folds | float, number of cross-validation folds |
seed | float, seed to ensure reproducibility/stability |
verbose | boolean, whether to print notifications |
Value
a named list, potentially containing
- x
the name of the predictor variable
- y
the name of the target variable
- result_type
text showing how to interpret the resulting score
- pps
the predictive power score
- metric
the evaluation metric used to compute the PPS
- baseline_score
the score of a naive model on the evaluation metric
- model_score
the score of the predictive model on the evaluation metric
- cv_folds
how many cross-validation folds were used
- seed
the seed that was set
- algorithm
text shwoing what algorithm was used
- model_type
text showing whether classification or regression was used
Examples
score(iris, x = 'Petal.Length', y = 'Species')Calculate correlation coefficients for whole dataframe
Description
Calculate correlation coefficients for whole dataframe
Usage
score_correlations(df, ...)Arguments
df | data.frame containing columns for x and y |
... | arguments to pass to |
Value
a data.frame with x-y correlation coefficients
Examples
score_correlations(iris)Calculate predictive power scores for whole dataframeIterates through the columns of the dataframe, calculating the predictive powerscore for every possible combination ofx andy.
Description
Calculate predictive power scores for whole dataframeIterates through the columns of the dataframe, calculating the predictive powerscore for every possible combination ofx andy.
Usage
score_df(df, ..., do_parallel = FALSE, n_cores = -1)Arguments
df | data.frame containing columns for x and y |
... | any arguments passed to |
do_parallel | bool, whether to perform |
n_cores | numeric, number of cores to use, defaults to maximum minus 1 |
Value
a data.frame containing
- x
the name of the predictor variable
- y
the name of the target variable
- result_type
text showing how to interpret the resulting score
- pps
the predictive power score
- metric
the evaluation metric used to compute the PPS
- baseline_score
the score of a naive model on the evaluation metric
- model_score
the score of the predictive model on the evaluation metric
- cv_folds
how many cross-validation folds were used
- seed
the seed that was set
- algorithm
text shwoing what algorithm was used
- model_type
text showing whether classification or regression was used
Examples
score_df(iris)score_df(mtcars, do_parallel = TRUE, n_cores = 2)Calculate predictive power score matrixIterates through the columns of the dataset, calculating the predictive powerscore for every possible combination ofx andy.
Description
Note that the targets are on the rows, and the features on the columns.
Usage
score_matrix(df, ...)Arguments
df | data.frame containing columns for x and y |
... | any arguments passed to |
Value
a matrix of numeric values, representing predictive power scores
Examples
score_matrix(iris)score_matrix(mtcars, do_parallel = TRUE, n_cores=2)Calculates out-of-sample model performance of a statistical model
Description
Calculates out-of-sample model performance of a statistical model
Usage
score_model(train, test, model, x, y, metric)Arguments
train | df, training data, containing variable y |
test | df, test data, containing variable y |
model | parsnip model object, with mode preset |
x | character, column name of predictor variable |
y | character, column name of target variable |
metric | character, name of evaluation metric being used, see |
Value
numeric vector of length one, evaluation score for predictions using naive model
Calculate out-of-sample model performance of naive baseline modelThe calculation that's being performed depends on the type of modelFor regression models, the mean is used as predictionFor classification, a model predicting random values anda model predicting modal values are used andthe best model is taken as baseline score
Description
Calculate out-of-sample model performance of naive baseline modelThe calculation that's being performed depends on the type of modelFor regression models, the mean is used as predictionFor classification, a model predicting random values anda model predicting modal values are used andthe best model is taken as baseline score
Usage
score_naive(train, test, x, y, type, metric)Arguments
train | df, training data, containing variable y |
test | df, test data, containing variable y |
x | character, column name of predictor variable |
y | character, column name of target variable |
type | character, type of model |
metric | character, evaluation metric being used |
Value
numeric vector of length one, evaluation score for predictions using naive model
Calculate predictive power scores for yCalculates the predictive power scores for the specifiedy variableusing every column in the dataset asx, including itself.
Description
Calculate predictive power scores for yCalculates the predictive power scores for the specifiedy variableusing every column in the dataset asx, including itself.
Usage
score_predictors(df, y, ..., do_parallel = FALSE, n_cores = -1)Arguments
df | data.frame containing columns for x and y |
y | string, column name of target variable |
... | any arguments passed to |
do_parallel | bool, whether to perform |
n_cores | numeric, number of cores to use, defaults to maximum minus 1 |
Value
a data.frame containing
- x
the name of the predictor variable
- y
the name of the target variable
- result_type
text showing how to interpret the resulting score
- pps
the predictive power score
- metric
the evaluation metric used to compute the PPS
- baseline_score
the score of a naive model on the evaluation metric
- model_score
the score of the predictive model on the evaluation metric
- cv_folds
how many cross-validation folds were used
- seed
the seed that was set
- algorithm
text shwoing what algorithm was used
- model_type
text showing whether classification or regression was used
Examples
score_predictors(df = iris, y = 'Species')score_predictors(df = mtcars, y = 'mpg', do_parallel = TRUE, n_cores = 2)Visualize the PPS & correlation matrices
Description
Visualize the PPS & correlation matrices
Usage
visualize_both( df, color_value_positive = "#08306B", color_value_negative = "#8b0000", color_text = "#FFFFFF", include_missings = TRUE, nrow = 1, ...)Arguments
df | data.frame containing columns for x and y |
color_value_positive | color used for upper limit of gradient (high positive correlation) |
color_value_negative | color used for lower limit of gradient (high negative correlation) |
color_text | string, hex value or color name used for text, best to pick high contrast with |
include_missings | bool, whether to include the variables without correlation values in the plot |
nrow | numeric, number of rows, either 1 or 2 |
... | any arguments passed to |
Value
a grob object, a grid with two ggplot2 heatmap visualizations
Examples
visualize_both(iris)visualize_both(mtcars, do_parallel = TRUE, n_cores = 2)Visualize the correlation matrix
Description
Visualize the correlation matrix
Usage
visualize_correlations( df, color_value_positive = "#08306B", color_value_negative = "#8b0000", color_text = "#FFFFFF", include_missings = FALSE, ...)Arguments
df | data.frame containing columns for x and y |
color_value_positive | color used for upper limit of gradient (high positive correlation) |
color_value_negative | color used for lower limit of gradient (high negative correlation) |
color_text | color used for text, best to pick high contrast with |
include_missings | bool, whether to include the variables without correlation values in the plot |
... | arguments to pass to |
Value
a ggplot object, a heatmap visualization
Examples
visualize_correlations(iris)Visualize the Predictive Power scores of the entire dataframe, or given a target
Description
Ify is specified,visualize_pps returns a barplot of the PPS ofevery predictor on the specified target variable.Ify is not specified,visualize_pps returns a heatmap visualizationof the PPS for all X-Y combinations in a dataframe.
Usage
visualize_pps( df, y = NULL, color_value_high = "#08306B", color_value_low = "#FFFFFF", color_text = "#FFFFFF", include_target = TRUE, ...)Arguments
df | data.frame containing columns for x and y |
y | string, column name of target variable,can be left |
color_value_high | string, hex value or color name used for upper limit of PPS gradient (high PPS) |
color_value_low | string, hex value or color name used for lower limit of PPS gradient (low PPS) |
color_text | string, hex value or color name used for text, best to pick high contrast with |
include_target | boolean, whether to include the target variable in the barplot |
... | any arguments passed to |
Value
a ggplot object, a vertical barplot or heatmap visualization
Examples
visualize_pps(iris, y = 'Species')visualize_pps(iris)visualize_pps(mtcars, do_parallel = TRUE, n_cores = 2)