Movatterモバイル変換

Type:

Package

Title:

Ordered Correlation Forest

Version:

1.0.3

Description:

Machine learning estimator specifically optimized for predictive modeling of ordered non-numeric outcomes. 'ocf' provides forest-based estimation of the conditional choice probabilities and the covariates’ marginal effects. Under an "honesty" condition, the estimates are consistent and asymptotically normal and standard errors can be obtained by leveraging the weight-based representation of the random forest predictions. Please reference the use as Di Francesco (2025) <doi:10.1080/07474938.2024.2429596>.

License:

GPL-3

Encoding:

UTF-8

Depends:

R (≥ 3.4.0)

Imports:

Rcpp, Matrix, stats, utils, stringr, orf, glmnet, ranger,dplyr, tidyr, ggplot2, magrittr

LinkingTo:

Rcpp, RcppEigen

RoxygenNote:

7.3.2

Suggests:

knitr, rmarkdown, testthat (≥ 3.0.0)

Config/testthat/edition:

VignetteBuilder:

knitr

URL:

https://riccardo-df.github.io/ocf/,https://github.com/riccardo-df/ocf

BugReports:

https://github.com/riccardo-df/ocf/issues

Biarch:

TRUE

NeedsCompilation:

yes

Packaged:

2025-02-03 07:41:16 UTC; riccardo-df

Author:

Riccardo Di Francesco [aut, cre, cph]

Maintainer:

Riccardo Di Francesco <difrancesco.riccardo96@gmail.com>

Repository:

CRAN

Date/Publication:

2025-02-03 08:00:06 UTC

Check Argument alpha

Description

Check Argument alpha

Usage

check_alpha(alpha)

Arguments

alpha

Fraction of observations that must lie on each side of each split.

Check Arguments honesty, honesty.fraction and inference

Description

Check Arguments honesty, honesty.fraction and inference

Usage

check_honesty_inference(honesty, honesty.fraction, inference)

Arguments

honesty

Whether to grow honest forests.

honesty.fraction

Fraction of honest sample.

inference

Whether to conduct weight-based inference.

Check Argument max.depth

Description

Check Argument max.depth

Usage

check_maxdepth(max.depth)

Arguments

max.depth

Maximal tree depth. A value of 0 corresponds to unlimited depth, 1 to "stumps" (one split per tree).

Check Argument min.node.size

Description

Check Argument min.node.size

Usage

check_minnodesize(min.node.size)

Arguments

min.node.size

Minimal node size.

Check Argument mtry

Description

Check Argument mtry

Usage

check_mtry(mtry, nv)

Arguments

mtry

Number of covariates to possibly split at in each node. Default is the (rounded down) square root of the number of covariates. Alternatively, one can pass a single-argument function returning an integer, where the argument is the number of covariates.

nv

Number of covariates.

Value

Appropriate value ofmtry.

Check Argument n.trees

Description

Check Argument n.trees

Usage

check_ntrees(n.trees)

Arguments

n.trees

Number of trees.

Check Argument sample.fraction

Description

Check Argument sample.fraction

Usage

check_samplefraction(sample.fraction)

Arguments

sample.fraction

Fraction of observations to sample.

Check Arguments x and y

Description

Check Arguments x and y

Usage

check_x_y(x, y)

Arguments

x

Covariate matrix (no intercept).

y

Outcome vector.

Honest Sample Split

Description

Randomly spits the sample into a training sample and an honest sample.

Usage

class_honest_split(data, honesty.fraction = 0.5)

Arguments

data

data.frame ormatrix to be split. The outcome must be located in the first column.

honesty.fraction

Fraction of honest sample.

Details

class_honest_split looks for balanced splits, i.e., splits such as all the outcome's classes are representedin both the training and the honest sample. After 100 trials, the program throws an error.

Value

List with elements:

train_sample

Training sample.

honest_sample

Honest sample.

Forest In-Sample Honest Weights

Description

Computes forest in-sample honest weights for anocf.forest object.

Usage

forest_weights_fitted(forest, honest_sample, train_sample)

Arguments

forest

Anocf.forest object.

honest_sample

Honest sample.

train_sample

Training sample.

Details

forest must have been grown using only the training sample.

Value

Matrix of in-sample honest weights.

Forest In-Sample Honest Weights

Description

Computes forest in-sample honest weights for aocf.forest object relative to the m-th class.

Usage

forest_weights_fitted_cpp(  leaf_IDs_train_list,  leaf_IDs_honest_list,  leaf_size_honest_list)

Arguments

leaf_IDs_train_list

List of sizen.trees, storing the leaf of each tree where training units fall into.

leaf_IDs_honest_list

List of sizen.trees, storing the leaf of each tree where honest units fall into.

leaf_size_honest_list

List of sizen.trees, storing the size of the leaves of each tree computed with honest units.

Forest Out-of-Sample Honest Weights

Description

Computes forest out-of-sample honest weights for aocf.forest object relative to the m-th class.

Usage

forest_weights_predicted_cpp(  leaf_IDs_test_list,  leaf_IDs_honest_list,  leaf_size_honest_list,  w)

Arguments

leaf_IDs_test_list

List of sizen.trees, storing the leaf of each tree where training units fall into.

leaf_IDs_honest_list

List of sizen.trees, storing the leaf of each tree where honest units fall into.

leaf_size_honest_list

List of sizen.trees, storing the size of the leaves of each tree computed with honest units.

w

1 if marginal effects are being computed, 0 otherwise for normal prediction.

Generate Ordered Data

Description

Generate a synthetic data set with an ordered non-numeric outcome, together with conditional probabilities and covariates' marginal effects.

Usage

generate_ordered_data(n)

Arguments

n

Sample size.

Details

First, a latent outcome is generated as follows:

Y_i^* = g ( X_i ) + \epsilon_i

with:

g ( X_i ) = X_i^T \beta

X_i := (X_{i, 1}, X_{i, 2}, X_{i, 3}, X_{i, 4}, X_{i, 5}, X_{i, 6})

X_{i, 1}, X_{i, 3}, X_{i, 5} \sim \mathcal{N} \left( 0, 1 \right)

X_{i, 2}, X_{i, 4}, X_{i, 6} \sim \textit{Bernoulli} \left( 0, 1 \right)

\beta = \left( 1, 1, 1/2, 1/2, 0, 0 \right)

\epsilon_i \sim logistic (0, 1)

Second, the observed outcomes are obtained by discretizing the latent outcome into three classes using uniformly spaced threshold parameters.

Third, the conditional probabilities and the covariates' marginal effects at the mean are generated using standard textbook formulas. Marginaleffects are approximated using a sample of 1,000,000 observations.

Value

A list storing a data frame with the observed data, a matrix of true conditional probabilities, and a matrix of true marginal effects at the mean of the covariates.

Author(s)

Riccardo Di Francesco

References

Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17.doi:10.1080/07474938.2024.2429596.

Examples

## Generate synthetic data.set.seed(1986)data <- generate_ordered_data(1000)head(data$true_probs)data$me_at_meansample <- data$sampleY <- sample$YX <- sample[, -1]## Fit ocf.forests <- ocf(Y, X)

Honest In-Sample Predictions

Description

Computes honest in-sample predictions for anocf.forest object.

Usage

honest_fitted(forest, train_sample, honest_sample, y_m_honest, y_m_1_honest)

Arguments

forest

Anocf.forest object.

train_sample

Training sample.

honest_sample

Honest sample.

y_m_honest

Indicator variable, whether the outcome is smaller than or equal to the m-th class.

y_m_1_honest

Indicator variable, whether the outcome is smaller than or equal to the (m-1)-th class.

Details

forest must have been grown using only the training sample.honest_fitted replaces the leaf estimates using the outcome from the honest sample (using the prediction method ofocf).

Value

In-sample honest predictions.

Honest In-Sample Predictions

Description

Computes honest in-sample predictions for a ocf.forest object relative to the desired class.

Usage

honest_fitted_cpp(  unique_leaves_honest,  y_m,  y_m_1,  honest_leaves,  train_leaves)

Arguments

unique_leaves_honest

List of sizen.trees, storing the unique leaf ids of each tree relative to the honest sample.

y_m

Indicator variable, equal to 1 if they is lower or equal than the m-th class and zero otherwise.

y_m_1

Indicator variable, equal to 1 if they is lower or equal than the (m-1)-th class and zero otherwise.

honest_leaves

Matrix of size (n.samples xn.trees). The i-th row stores the id of the leaf where the i-th honest observation falls in each tree.

train_leaves

Matrix of size (n.samples xn.trees). The i-th row stores the id of the leaf where the i-th training observation falls in each tree.

Honest Out-of-Sample Predictions

Description

Computes honest out-of-sample predictions for anocf.forest object.

Usage

honest_predictions(  forest,  honest_sample,  test_sample,  y_m_honest,  y_m_1_honest)

Arguments

forest

ocf.forest object.

honest_sample

Honest sample.

test_sample

Test sample.

y_m_honest

Indicator variable, whether the outcome is smaller than or equal to the m-th class.

y_m_1_honest

Indicator variable, whether the outcome is smaller than or equal to the (m-1)-th class.

Details

honest_predictions replaces the leaf estimates offorest using the outcome from the associated honest sample (using the prediction method ofocf). The honest sample must not have been usedto build the trees.

Value

Out-of-sample honest predictions.

Honest Out-of-Sample Predictions

Description

Computes honest out-of-sample predictions for a ocf.forest object relative to the desired class.

Usage

honest_predictions_cpp(  unique_leaves_honest,  y_m,  y_m_1,  honest_leaves,  test_leaves)

Arguments

unique_leaves_honest

List of sizen.trees, storing the unique leaf ids of each tree relative to the honest sample.

y_m

Indicator variable, equal to 1 if they is lower or equal than the m-th class and zero otherwise.

y_m_1

Indicator variable, equal to 1 if they is lower or equal than the (m-1)-th class and zero otherwise.

honest_leaves

Matrix of size (n.samples xn.trees). The i-th row stores the id of the leaf where the i-th honest observation falls in each tree.

test_leaves

Matrix of size (n.samples xn.trees). The i-th row stores the id of the leaf where the i-th test observation falls in each tree.

Marginal Effects for Ordered Correlation Forest

Description

Nonparametric estimation of marginal effects using anocf object.

Usage

marginal_effects(  object,  data = NULL,  these_covariates = NULL,  eval = "atmean",  bandwitdh = 0.1,  inference = FALSE)

Arguments

object

Anocf object.

data

Data set of classdata.frame to estimate marginal effects. It must contain at least the same covariates used to train the forests. IfNULL, marginal effects are estimated onobject$full_data.

these_covariates

Named list with covariates' names as keys and strings denoting covariates' types as entries. Strings must be either"continuous" or"discrete". The names of the list indicate the covariates for which marginal effect estimation is desired. IfNULL (the default), marginal effects are estimated for all covariates and covariates' types are inferred by the routine.

eval

Evaluation point for marginal effects. Either"mean","atmean" or"atmedian".

bandwitdh

How many standard deviationsx_up andx_down differ fromx.

inference

Whether to extract weights and compute standard errors. The weights extraction considerably slows down the program.

Details

marginal_effects can estimate mean marginal effects, marginal effects at the mean, or marginal effects at themedian, according to theeval argument.

Ifthese_covariates isNULL (the default), the routine assumes that covariates with with at most ten unique values are categorical and treats the remaining covariates as continuous.

Value

Object of classocf.marginal.

Author(s)

Riccardo Di Francesco

References

Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17.doi:10.1080/07474938.2024.2429596.

Examples

## Generate synthetic data.set.seed(1986)data <- generate_ordered_data(100)sample <- data$sampleY <- sample$YX <- sample[, -1]## Fit ocf.forests <- ocf(Y, X)## Marginal effects at the mean.me <- marginal_effects(forests, eval = "atmean")print(me)print(me, latex = TRUE)plot(me)## Compute standard errors. This requires honest forests.honest_forests <- ocf(Y, X, honesty = TRUE)honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE)print(honest_me, latex = TRUE)plot(honest_me)## Subset covariates and select covariates' types.my_covariates <- list("x1" = "continuous", "x2" = "discrete", "x4" = "discrete")honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE,                              these_covariates = my_covariates)print(honest_me)plot(honest_me)

Accuracy Measures for Ordered Probability Predictions

Description

Accuracy measures for evaluating ordered probability predictions.

Usage

mean_squared_error(y, predictions, use.true = FALSE)mean_absolute_error(y, predictions, use.true = FALSE)mean_ranked_score(y, predictions, use.true = FALSE)classification_error(y, predictions)

Arguments

y

Either the observed outcome vector or a matrix of true probabilities.

predictions

Predictions.

use.true

IfTRUE, then the program treatsy as a matrix of true probabilities.

Details

MSE, MAE, and RPS

When calling one ofmean_squared_error,mean_absolute_error, ormean_ranked_score,predictions must be a matrix of predicted class probabilities, with as many rows as observations iny and asmany columns as classes ofy.

Ifuse.true == FALSE, the mean squared error (MSE), the mean absolute error (MAE), and the mean ranked probability score(RPS) are computed as follows:

MSE = \frac{1}{n} \sum_{i = 1}^n \sum_{m = 1}^M (1 (Y_i = m) - \hat{p}_m (x))^2

MAE = \frac{1}{n} \sum_{i = 1}^n \sum_{m = 1}^M |1 (Y_i = m) - \hat{p}_m (x)|

RPS = \frac{1}{n} \sum_{i = 1}^n \frac{1}{M - 1} \sum_{m = 1}^M (1 (Y_i \leq m) - \hat{p}_m^* (x))^2

Ifuse.true == TRUE, the MSE, the MAE, and the RPS are computed as follows (useful for simulation studies):

MSE = \frac{1}{n} \sum_{i = 1}^n \sum_{m = 1}^M (p_m (x) - \hat{p}_m (x))^2

MSE = \frac{1}{n} \sum_{i = 1}^n \sum_{m = 1}^M |p_m (x) - \hat{p}_m (x)|

RPS = \frac{1}{n} \sum_{i = 1}^n \frac{1}{M - 1} \sum_{m = 1}^M (p_m^* (x) - \hat{p}_m^* (x))^2

where:

p_m (x) = P(Y_i = m | X_i = x)

p_m^* (x) = P(Y_i \leq m | X_i = x)

Classification error

When callingclassification_error,predictions must be a vector of predicted class labels.

Classification error (CE) is computed as follows:

CE = \frac{1}{n} \sum_{i = 1}^n 1 (Y_i \neq \hat{Y}_i)

where Y_i are the observed class labels.

Value

The MSE, the MAE, the RPS, or the CE of the method.

Author(s)

Riccardo Di Francesco

Examples

## Generate synthetic data.set.seed(1986)data <- generate_ordered_data(100)sample <- data$sampleY <- sample$YX <- sample[, -1]## Training-test split.train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5))Y_tr <- Y[train_idx]X_tr <- X[train_idx, ]Y_test <- Y[-train_idx]X_test <- X[-train_idx, ]## Fit ocf on training sample.forests <- ocf(Y_tr, X_tr)## Accuracy measures on test sample.predictions <- predict(forests, X_test)mean_squared_error(Y_test, predictions$probabilities)mean_ranked_score(Y_test, predictions$probabilities)classification_error(Y_test, predictions$classification)

Multinomial Machine Learning

Description

Estimation strategy to estimate conditional choice probabilities for ordered non-numeric outcomes.

Usage

multinomial_ml(Y = NULL, X = NULL, learner = "forest", scale = TRUE)

Arguments

Y

Outcome vector.

X

Covariate matrix (no intercept).

learner

String, either"forest" or"l1". Selects the base learner to estimate each expectation.

scale

Logical, whether to scale the covariates. Ignored iflearner is not"l1".

Details

Multinomial machine learning expresses conditional choice probabilities as expectations of binary variables:

p_m \left( X_i \right) = \mathbb{E} \left[ 1 \left( Y_i = m \right) | X_i \right]

This allows us to estimate each expectation separately using any regression algorithm to get an estimate of conditional probabilities.

multinomial_ml combines this strategy with either regression forests or penalized logistic regressions with an L1 penalty,according to the user-specified parameterlearner.

Iflearner == "l1", the penalty parameters are chosen via 10-fold cross-validation andmodel.matrix is used to handle non-numeric covariates. Additionally, ifscale == TRUE, the covariates are scaled to have zero mean and unit variance.

Value

Object of classmml.

Author(s)

Riccardo Di Francesco

References

Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17.doi:10.1080/07474938.2024.2429596.

Examples

## Generate synthetic data.set.seed(1986)data <- generate_ordered_data(100)sample <- data$sampleY <- sample$YX <- sample[, -1]## Training-test split.train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5))Y_tr <- Y[train_idx]X_tr <- X[train_idx, ]Y_test <- Y[-train_idx]X_test <- X[-train_idx, ]## Fit multinomial machine learning on training sample using two different learners.multinomial_forest <- multinomial_ml(Y_tr, X_tr, learner = "forest")multinomial_l1 <- multinomial_ml(Y_tr, X_tr, learner = "l1")## Predict out of sample.predictions_forest <- predict(multinomial_forest, X_test)predictions_l1 <- predict(multinomial_l1, X_test)## Compare predictions.cbind(head(predictions_forest), head(predictions_l1))

Ordered Correlation Forest

Description

Nonparametric estimator for ordered non-numeric outcomes. The estimator modifies a standard random forestsplitting criterion to build a collection of forests, each estimating the conditional probability of a single class.

Usage

ocf(  Y = NULL,  X = NULL,  honesty = FALSE,  honesty.fraction = 0.5,  inference = FALSE,  alpha = 0.2,  n.trees = 2000,  mtry = ceiling(sqrt(ncol(X))),  min.node.size = 5,  max.depth = 0,  replace = FALSE,  sample.fraction = ifelse(replace, 1, 0.5),  n.threads = 1)

Arguments

Y

Outcome vector.

X

Covariate matrix (no intercept).

honesty

Whether to grow honest forests.

honesty.fraction

Fraction of honest sample. Ignored ifhonesty = FALSE.

inference

Whether to extract weights and compute standard errors. The weights extraction considerably slows down the routine.honesty = TRUE is required for valid inference.

alpha

Controls the balance of each split. Each split leaves at least a fractionalpha of observations in the parent node on each side of the split.

n.trees

Number of trees.

mtry

Number of covariates to possibly split at in each node. Default is the square root of the number of covariates.

min.node.size

Minimal node size.

max.depth

Maximal tree depth. A value of 0 corresponds to unlimited depth, 1 to "stumps" (one split per tree).

replace

IfTRUE, grow trees on bootstrap subsamples. Otherwise, trees are grown on random subsamples drawn without replacement.

sample.fraction

Fraction of observations to sample.

n.threads

Number of threads. Zero corresponds to the number of CPUs available.

Value

Object of classocf.

Author(s)

Riccardo Di Francesco

References

Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17.doi:10.1080/07474938.2024.2429596.

Examples

## Generate synthetic data.set.seed(1986)data <- generate_ordered_data(100)sample <- data$sampleY <- sample$YX <- sample[, -1]## Training-test split.train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5))Y_tr <- Y[train_idx]X_tr <- X[train_idx, ]Y_test <- Y[-train_idx]X_test <- X[-train_idx, ]## Fit ocf on training sample.forests <- ocf(Y_tr, X_tr)## We have compatibility with generic S3-methods.print(forests)summary(forests)predictions <- predict(forests, X_test)head(predictions$probabilities)table(Y_test, predictions$classification)## Compute standard errors. This requires honest forests.honest_forests <- ocf(Y_tr, X_tr, honesty = TRUE, inference = TRUE)head(honest_forests$predictions$standard.errors)## Marginal effects.me <- marginal_effects(forests, eval = "atmean")print(me)print(me, latex = TRUE)plot(me)## Compute standard errors. This requires honest forests.honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE)print(honest_me, latex = TRUE)plot(honest_me)

Ordered Machine Learning

Description

Estimation strategy to estimate conditional choice probabilities for ordered non-numeric outcomes.

Usage

ordered_ml(Y = NULL, X = NULL, learner = "forest", scale = TRUE)

Arguments

Y

Outcome vector.

X

Covariate matrix (no intercept).

learner

String, either"forest" or"l1". Selects the base learner to estimate each expectation.

scale

Logical, whether to scale the covariates. Ignored iflearner is not"l1".

Details

Ordered machine learning expresses conditional choice probabilities as the difference between the cumulative probabilities of two adjacent classes, which in turn can be expressed as conditional expectations of binary variables:

p_m \left( X_i \right) = \mathbb{E} \left[ 1 \left( Y_i \leq m \right) | X_i \right] - \mathbb{E} \left[ 1 \left( Y_i \leq m - 1 \right) | X_i \right]

Then we can separately estimate each expectation using any regression algorithm and pick the difference between the m-th and the(m-1)-th estimated surfaces to estimate conditional probabilities.

ordered_ml combines this strategy with either regression forests or penalized logistic regressions with an L1 penalty,according to the user-specified parameterlearner.

Iflearner == "forest", then theorffunction is called from an external package, as this estimator has already been proposed by Lechner and Okasa (2019).

Value

Object of classoml.

Author(s)

Riccardo Di Francesco

References

Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17.doi:10.1080/07474938.2024.2429596.

Examples

## Generate synthetic data.set.seed(1986)data <- generate_ordered_data(100)sample <- data$sampleY <- sample$YX <- sample[, -1]## Training-test split.train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5))Y_tr <- Y[train_idx]X_tr <- X[train_idx, ]Y_test <- Y[-train_idx]X_test <- X[-train_idx, ]## Fit ordered machine learning on training sample using two different learners.ordered_forest <- ordered_ml(Y_tr, X_tr, learner = "forest")ordered_l1 <- ordered_ml(Y_tr, X_tr, learner = "l1")## Predict out of sample.predictions_forest <- predict(ordered_forest, X_test)predictions_l1 <- predict(ordered_l1, X_test)## Compare predictions.cbind(head(predictions_forest), head(predictions_l1))

Plot Method for ocf.marginal Objects

Description

Plots anocf.marginal object.

Usage

## S3 method for class 'ocf.marginal'plot(x, ...)

Arguments

x

Anocf.marginal object.

...

Further arguments passed to or from other methods.

Details

If standard errors have been estimated, 95% confidence intervals are shown.

Value

Plots anocf.marginal object.

Author(s)

Riccardo Di Francesco

References

Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17.doi:10.1080/07474938.2024.2429596.

Examples

## Generate synthetic data.set.seed(1986)data <- generate_ordered_data(100)sample <- data$sampleY <- sample$YX <- sample[, -1]## Fit ocf.forests <- ocf(Y, X)## Marginal effects at the mean.me <- marginal_effects(forests, eval = "atmean")plot(me)## Add standard errors.honest_forests <- ocf(Y, X, honesty = TRUE)honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE)plot(honest_me)

Prediction Method for mml Objects

Description

Prediction method for classmml.

Usage

## S3 method for class 'mml'predict(object, data = NULL, ...)

Arguments

object

Anmml object.

data

Data set of classdata.frame. It must contain the same covariates used to train the base learners. Ifdata isNULL, thenobject$X is used.

...

Further arguments passed to or from other methods.

Details

Ifobject$learner == "l1", thenmodel.matrix is used to handle non-numeric covariates. If we alsohaveobject$scaling == TRUE, thendata is scaled to have zero mean and unit variance.

Value

Matrix of predictions.

Author(s)

Riccardo Di Francesco

References

Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17.doi:10.1080/07474938.2024.2429596.

Examples

## Generate synthetic data.set.seed(1986)data <- generate_ordered_data(100)sample <- data$sampleY <- sample$YX <- sample[, -1]## Training-test split.train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5))Y_tr <- Y[train_idx]X_tr <- X[train_idx, ]Y_test <- Y[-train_idx]X_test <- X[-train_idx, ]## Fit multinomial machine learning on training sample using two different learners.multinomial_forest <- multinomial_ml(Y_tr, X_tr, learner = "forest")multinomial_l1 <- multinomial_ml(Y_tr, X_tr, learner = "l1")## Predict out of sample.predictions_forest <- predict(multinomial_forest, X_test)predictions_l1 <- predict(multinomial_l1, X_test)## Compare predictions.cbind(head(predictions_forest), head(predictions_l1))

Prediction Method for ocf Objects

Description

Prediction method for classocf.

Usage

## S3 method for class 'ocf'predict(object, data = NULL, type = "response", ...)

Arguments

object

Anocf object.

data

Data set of classdata.frame. It must contain at least the same covariates used to train the forests. Ifdata isNULL, thenobject$full_data is used.

type

Type of prediction. Either"response" or"terminalNodes".

...

Further arguments passed to or from other methods.

Details

Iftype == "response", the routine returns the predicted conditional class probabilities and the predicted class labels. If forests are honest, the predicted probabilities are honest.

Iftype == "terminalNodes", the IDs of the terminal node in each tree for each observation indata are returned.

Value

Desired predictions.

Author(s)

Riccardo Di Francesco

References

Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17.doi:10.1080/07474938.2024.2429596.

Examples

## Generate synthetic data.set.seed(1986)data <- generate_ordered_data(100)sample <- data$sampleY <- sample$YX <- sample[, -1]## Training-test split.train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5))Y_tr <- Y[train_idx]X_tr <- X[train_idx, ]Y_test <- Y[-train_idx]X_test <- X[-train_idx, ]## Fit ocf on training sample.forests <- ocf(Y_tr, X_tr)## Predict on test sample.predictions <- predict(forests, X_test)head(predictions$probabilities)predictions$classification## Get terminal nodes.predictions <- predict(forests, X_test, type = "terminalNodes")predictions$forest.1[1:10, 1:20] # Rows are observations, columns are forests.

Prediction Method for ocf.forest Objects

Description

Prediction method for classocf.forest.

Usage

## S3 method for class 'ocf.forest'predict(object, data, type = "response", ...)

Arguments

object

Anocf.forest object.

data

Data set of classdata.frame. It must contain at least the same covariates used to train the forests.

type

Type of prediction. Either"response" or"terminalNodes".

...

Further arguments passed to or from other methods.

Details

Iftype === "response" (the default), the predicted conditional class probabilities are returned. If forests are honest, these predictions are honest.

Iftype == "terminalNodes", the IDs of the terminal node in each tree for each observation indata are returned.

Value

Prediction results.

Author(s)

Riccardo Di Francesco

References

Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17.doi:10.1080/07474938.2024.2429596.

Prediction Method for oml Objects

Description

Prediction method for classoml.

Usage

## S3 method for class 'oml'predict(object, data = NULL, ...)

Arguments

object

Anoml object.

data

Data set of classdata.frame. It must contain the same covariates used to train the base learners. Ifdata isNULL, thenobject$X is used.

...

Further arguments passed to or from other methods.

Details

Ifobject$learner == "l1", thenmodel.matrix is used to handle non-numeric covariates. If we alsohaveobject$scaling == TRUE, thendata is scaled to have zero mean and unit variance.

Value

Matrix of predictions.

Author(s)

Riccardo Di Francesco

References

Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17.doi:10.1080/07474938.2024.2429596.

Examples

## Generate synthetic data.set.seed(1986)data <- generate_ordered_data(100)sample <- data$sampleY <- sample$YX <- sample[, -1]## Training-test split.train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5))Y_tr <- Y[train_idx]X_tr <- X[train_idx, ]Y_test <- Y[-train_idx]X_test <- X[-train_idx, ]## Fit ordered machine learning on training sample using two different learners.ordered_forest <- ordered_ml(Y_tr, X_tr, learner = "forest")ordered_l1 <- ordered_ml(Y_tr, X_tr, learner = "l1")## Predict out of sample.predictions_forest <- predict(ordered_forest, X_test)predictions_l1 <- predict(ordered_l1, X_test)## Compare predictions.cbind(head(predictions_forest), head(predictions_l1))

Forest Out-of-Sample Weights

Description

Computes forest out-of-sample honest weights for anocf.forest object.

Usage

predict_forest_weights(forest, honest_sample, test_sample)

Arguments

forest

Anocf.forest object.

honest_sample

Honest sample.

test_sample

Test sample.

Details

forest must have been grown using only the training sample.

Value

Matrix of out-of-sample honest weights.

Print Method for ocf Objects

Description

Prints anocf object.

Usage

## S3 method for class 'ocf'print(x, ...)

Arguments

x

Anocf object.

...

Further arguments passed to or from other methods.

Value

Prints anocf object.

Author(s)

Riccardo Di Francesco

References

Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17.doi:10.1080/07474938.2024.2429596.

Examples

## Generate synthetic data.set.seed(1986)data <- generate_ordered_data(100)sample <- data$sampleY <- sample$YX <- sample[, -1]## Fit ocf.forests <- ocf(Y, X)## Print.print(forests)

Print Method for ocf.marginal Objects

Description

Prints anocf.marginal object.

Usage

## S3 method for class 'ocf.marginal'print(x, latex = FALSE, ...)

Arguments

x

Anocf.marginal object.

latex

IfTRUE, prints LATEX code.

...

Further arguments passed to or from other methods.

Details

Compilation of the LATEX code requires the following packages:booktabs,float,adjustbox. Ifstandard errors have been estimated, they are printed in parenthesis below each point estimate.

Value

Prints anocf.marginal object.

Author(s)

Riccardo Di Francesco

References

Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17.doi:10.1080/07474938.2024.2429596.

Examples

## Generate synthetic data.set.seed(1986)data <- generate_ordered_data(100)sample <- data$sampleY <- sample$YX <- sample[, -1]## Fit ocf.forests <- ocf(Y, X)## Marginal effects at the mean.me <- marginal_effects(forests, eval = "atmean")print(me)print(me, latex = TRUE)## Add standard errors.honest_forests <- ocf(Y, X, honesty = TRUE)honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE)print(honest_me, latex = TRUE)

Renaming Variables for LATEX Usage

Description

Renames variables where the character "_" is used, which causes clashes in LATEX. Useful for thephased print method.

Usage

rename_latex(names)

Arguments

names

string vector.

Value

The renamed string vector. Strings where "_" is not found are not modified byrename_latex.

Summary Method for ocf Objects

Description

Summarizes anocf object.

Usage

## S3 method for class 'ocf'summary(object, ...)

Arguments

object

Anocf object.

...

Further arguments passed to or from other methods.

Value

Summarizes anocf object.

Author(s)

Riccardo Di Francesco

References

Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17.doi:10.1080/07474938.2024.2429596.

Examples

## Generate synthetic data.set.seed(1986)data <- generate_ordered_data(100)sample <- data$sampleY <- sample$YX <- sample[, -1]## Fit ocf.forests <- ocf(Y, X)## Summary.summary(forests)

Summary Method for ocf.marginal Objects

Description

Summarizes anocf.marginal object.

Usage

## S3 method for class 'ocf.marginal'summary(object, latex = FALSE, ...)

Arguments

object

Anocf.marginal object.

latex

IfTRUE, prints LATEX code.

...

Further arguments passed to or from other methods.

Details

Compilation of the LATEX code requires the following packages:booktabs,float,adjustbox. Ifstandard errors have been estimated, they are printed in parenthesis below each point estimate.

Value

Summarizes anocf.marginal object.

Author(s)

Riccardo Di Francesco

References

Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17.doi:10.1080/07474938.2024.2429596.

Examples

## Generate synthetic data.set.seed(1986)data <- generate_ordered_data(100)sample <- data$sampleY <- sample$YX <- sample[, -1]## Fit ocf.forests <- ocf(Y, X)## Marginal effects at the mean.me <- marginal_effects(forests, eval = "atmean")summary(me)summary(me, latex = TRUE)## Add standard errors.honest_forests <- ocf(Y, X, honesty = TRUE)honest_me <- marginal_effects(honest_forests, eval = "atmean", inference = TRUE)summary(honest_me, latex = TRUE)

Tree Information in Readable Format

Description

Extracts tree information from aocf.forest object.

Usage

tree_info(object, tree = 1)

Arguments

object

ocf.forest object.

tree

Number of the tree of interest.

Details

Nodes and variables IDs are 0-indexed, i.e., node 0 is the root node.

All values smaller than or equal tosplitval go to the left and all values larger go to the right.

Value

Adata.frame with the following columns:

nodeID

Node IDs.

leftChild

IDs of the left child node.

rightChild

IDs of the right child node.

splitvarID

IDs of the splitting variable.

splitvarName

Name of the splitting variable.

splitval

Splitting value.

terminal

Logical, TRUE for terminal nodes.

prediction

One column with the predicted conditional class probabilities.

Author(s)

Riccardo Di Francesco

References

Di Francesco, R. (2025). Ordered Correlation Forest. Econometric Reviews, 1–17.doi:10.1080/07474938.2024.2429596.

Examples

## Generate synthetic data.set.seed(1986)data <- generate_ordered_data(1000)sample <- data$sampleY <- sample$YX <- sample[, -1]## Fit ocf.forests <- ocf(Y, X)## Extract information from tenth tree of first forest.info <- tree_info(forests$forests.info$forest.1, tree = 10)head(info)

Movatterモバイル変換

Check Argument alpha

Description

Usage

Arguments

Check Arguments honesty, honesty.fraction and inference

Description

Usage

Arguments

Check Argument max.depth

Description

Usage

Arguments

Check Argument min.node.size

Description

Usage

Arguments

Check Argument mtry

Description

Usage

Arguments

Value

Check Argument n.trees

Description

Usage

Arguments

Check Argument sample.fraction

Description

Usage

Arguments

Check Arguments x and y

Description

Usage

Arguments

Honest Sample Split

Description

Usage

Arguments

Details

Value

Forest In-Sample Honest Weights

Description

Usage

Arguments

Details

Value

Forest In-Sample Honest Weights

Description

Usage

Arguments

Forest Out-of-Sample Honest Weights

Description

Usage

Arguments

Generate Ordered Data

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Honest In-Sample Predictions

Description

Usage

Arguments

Details

Value

Honest In-Sample Predictions

Description

Usage

Arguments

Honest Out-of-Sample Predictions

Description

Usage

Arguments

Details

Value