Type: Package
Title: Regression Toward the Mean
Version: 1.2
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.1
Depends: R (>= 3.4.0)
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2022-10-26 13:52:37 UTC
In repeated measures studies with extreme large or small values, itis common for the subjects’ measurements on average to be closer to themean of the basic population. Interpreting possible changes in the meanin such situations can lead to biased results since the values were notrandomly selected, they come from truncated sampling. This method allowsestimating the range of means where treatment effects are likely tooccur when regression toward the mean is present.
Ostermann, T., Willich, Stefan N. & Luedtke, Rainer. (2008).Regression toward the mean - a detection method for unknown populationmean based on Mee and Chua’s algorithm. BMC Medical ResearchMethodology.
Daniela Recchiadaniela.rodriguesrecchia@uni-wh.de
We would like to acknowledge Lena Roth and Nico Steckhan for thepackage’s initial updates (Q3 2024) and continued supervision andguidance. Both have contributed to discussing and integrating thesemethods into the package, ensuring they are up-to-date and contextuallyrelevant.
To install this package, use:
install.packages("regtomean")A dataset with scores from 8 students who failed a high school testand could not get their diploma. They repeated the exam and got newscores.
data("language_test")A data frame with 8 observations on the following 9 variables:
Student: a numeric vectorBefore: a numeric vectorAfter: a numeric vectorTotal N: a numeric vectorCross: a numeric vectorPre-treatment Mean: a numeric vectorPre-treatment Std: a numeric vectorPost-treatment Mean: a numeric vectorPost-treatment Std: a numeric vectorMcClave, J.T; Dietrich, F.H.: “Statistics”; New York, DellenPublishing; 1988.
This function calculates the correlation for the data and Cohen’s deffect sizes, both based on pooled and on treatment standarddeviations.
cordata(Before, After, data)Before: a numeric vector giving the data values for thefirst (before) measure.After: a numeric vector giving the data values for thesecond (after) measure.data: an optional data frame containing the variablesin the formula. By default, the variables are taken from the environment(formula).This function computes the correlation between both measures as alsoboth effect sizes based on Cohen’s d statistic.
The inputs must be numeric.
Returns a table containing the correlation, effect size pooled, andeffect size based on treatment.
Daniela Recchia, Thomas Ostermann.
Cohen, J. (1988). Statistical power analysis for the behavioralsciences (2nd ed.). New York: Academic Press.
cohen.d,cor
cordata("Before","After",data=language_test)This function replicates 100 times the before and after values givinga start and end reference.
replicate_data(start, end, Before, After, data)start: a start value for µ.end: an end value for µ.Before: a numeric vector giving the data values for thefirst (before) measure.After: a numeric vector giving the data values for thesecond (after) measure.data: an optional data frame containing the before andafter variables in the formula. By default, the variables are taken fromthe environment (formula).In order to overcome the limitation of Mee and Chua’s test regardingthe population mean µ, a replication of the data is performed.
After replicating the data, the unknown population mean µ issystematically estimated over a range of values. Further estimationswill be based on this new dataset.
Returns a data frame we could callmee_chua containingthe values for µ, before, and after.
Daniela Recchia, Thomas Ostermann.
Ostermann, T., Willich, Stefan N. & Luedtke, Rainer. (2008).Regression toward the mean - a detection method for unknown populationmean based on Mee and Chua’s algorithm. BMC Medical ResearchMethodology.
Galton, F. (1886). Regression towards mediocrity in hereditarystature. Journal of the Anthropological Institute (15: 246-263).
rep
replicate_data(0,100,"Before","After",data=language_test)This function fits linear models for a subset of data frames.
meechua_reg(x)x: Data to be used in the regression.The data used for the regression must be sorted by mu.
A set of linear models will be estimated and model coefficients aresaved and stored inmod_coef.
The estimated standard error for the after measure is also stored inse_after to be used further in other functions.
A table containing the estimations for each mu. Global variablesmodels,mod_coef,se_after arestored for further analysis. The models are saved in an object calledmee_chua, which is not automatically printed but is savedin the environment.
Daniela Recchia, Thomas Ostermann.
Ostermann, T., Willich, Stefan N. & Luedtke, Rainer. (2008).Regression toward the mean - a detection method for unknown populationmean based on Mee and Chua’s algorithm. BMC Medical ResearchMethodology.
lm,dlply
## get the values ##mee_chua<-replicate_data(0,100,"Before","After",data=language_test)meechua_reg(mee_chua)This function calculates and plots treatment and regression effectsof both before and after measures as also its p-values.
meechua_eff.CI(x, n, se_after)x: a data frame containing the results frommeechua_reg. It is stored asmod_coef.n: the original sample size (number of observations)from data.se_after: the estimated standard error frommeechua_reg. It is stored asse_after.After performing themeechua_reg, the model coefficientsmod_coef and the global variablese_after areused as input in this function to estimate treatment and regressioneffects.
Two plots are performed: the first “Treatment Effect and p-value” andthe second “Confidence Intervals” for µ.
Daniela Recchia, Thomas Ostermann
Ostermann, T., Willich, Stefan N. & Luedtke, Rainer. (2008).Regression toward the mean - a detection method for unknown populationmean based on Mee and Chua’s algorithm. BMC Medical ResearchMethodology.
# First perform replicate_data and meechua_regreplicate_data(0,100,"Before","After",data=language_test)meechua_reg(mee_chua)# Model coefficients (mod_coef) and se_after are stored in the environment# as a result from the function meechua_regmeechua_eff.CI(mod_coef,8, se_after)Based on the data before and after the intervention and theregression models of the function meechua_reg, this function plots for agiven range of µ the t-statistics and p-values of one sided tests,wether the intervention is having an significant impact on themeasurements accounting for regression to the mean.
For each µ the t-statistic and p-value correspond to the one sidedtest, if the intercept of the regression model frommeechua_reg is significantly different from µ in thespecified direction. Respecting the assumptions of the method, this isequivalent to the intervention having an significant impact accountingfor regression to the mean. If for a concrete µ the p-value is below thespecified threshold -visible as a blue dashed line- the impact of theintervention is significant under the assumption that µ is the realpopulation mean.
plot_mu(x, n, se_after,lower = F,alpha =0.05)| Argument | Description |
|---|---|
x | A data frame containing the results frommeechua_reg.It is stored asmod_coef. |
n | The original sample size (number of observations) of the data. |
se_after | The estimated standard error frommeechua_reg. It isstored asse_after. |
lower | Boolean value specifying the direction of the one sided tests. Forlower = F (the default) it is testing, wether theintervention is increasing the measurements, forlower = T,wether the second measurements are lower than expected. |
alpha | Specifies the significance threshold for the p-values ofcorresponding one sided tests. The default isalpha = 0.05. |
Plot for a range of µ the p-values and t-values of the correspondingtests against µ and prints some relevant values:
The value of µ, for which the treatment effect is the moststatistically significant, and the corresponding t-statistic andp-value. The highest and lowest µ, for which the treatment impact issignificant.
Those variables will be returned as a list as well.
Julian Stein
Ostermann, T., Willich, Stefan N. & Luedtke, Rainer. (2008).Regression toward the mean - a detection method for unknown populationmean based on Mee and Chua’s algorithm. BMC Medical ResearchMethodology.
# First perform replicate_data and meechua_regreplicate_data(0,100,"Before","After",data=language_test)meechua_reg(mee_chua)#mod_coef and se_after are stored in the environment. The parameters lower = F and alpha = 0.05 can be omittedplot_mu(mod_coef,8, se_after)#Alternative usage: Testing for decreased values due to the intervention with significance threshold alpha = 0.1plot_mu(mod_coef,8, se_after,lower=T,alpha =0.1)Similar toplot_mu, this function plots for a givenrange of µ the t-statistics and p-values of one sided tests, wether theintervention is having an significant impact on the measurementsaccounting for regression to the mean. The difference is, that thisfunction is only based on some statistics of the samples before andafter the treatment, like the mean, standard deviation andcovariance/correlation.
For each µ the t-statistic and p-value correspond to the one sidedtest, if the intervention has an significant impact on the secondmeasurements accounting for regression to the mean. If for a concrete µthe p-value is below the specified threshold -visible as a blue dashedline- the impact of the intervention is significant under the assumptionthat µ is the real population mean.
plot_t(mu_start, mu_end, n, y1_mean, y2_mean, y1_std, y2_std, cov,lower = F,alpha =0.05,r_insteadof_cov = F)| Argument | Description |
|---|---|
mu_start | Lower end for the range of µ to be considered. |
mu_end | Upper end for the range of µ to be considered. |
n | The number of observations. |
y1_mean | Mean of the first measurement. |
y2_mean | Mean of the second measurement. |
y1_std | Standard deviation of the first measurement. |
y2_std | Standard deviation of the second measurement. |
cov | Covariance between the first and second measurements. Ifr_insteadof_cov = T this argument represents thecorrelation instead. |
lower | Boolean value specifying the direction of the one sided tests. Forlower = F (the default) it is testing, wether theintervention is increasing the measurements, forlower = T,wether the second measurements are lower than expected. |
alpha | Specifies the significance threshold for the p-values ofcorresponding one sided tests. The default isalpha = 0.05. |
r_insteadof_cov | Boolean value for the alternative usage of correlation instead ofcovariance. Ifr_insteadof_cov = T, the inputcov is interpreted as the correlation. |
Plot for a range of µ the p-values and t-values of the correspondingtests against µ and prints some relevant values:
The value of µ, for which the treatment effect is the moststatistically significant, and the corresponding t-statistic andp-value. The highest and lowest µ, for which the treatment impact issignificant.
Those variables will be returned as a list as well.
Julian Stein
Ostermann, T., Willich, Stefan N. & Luedtke, Rainer. (2008).Regression toward the mean - a detection method for unknown populationmean based on Mee and Chua’s algorithm. BMC Medical ResearchMethodology.
#Using the parameters corresponding to the example of the function plot_muplot_t(mu_start =0,mu_end =100,n =8 ,y1_mean =57.375,y2_mean =60.375,y1_std =7.0,y2_std =8.8,cov =54.268)This function plots all 4 diagnostics plots for each linearregression model: “Residuals vs Fitted”, “Normal Q-Q”, “Scale-Location”and “Residuals vs Leverage”.
meechua_plot(x)x: List containing the estimated linear models frommeechua_reg. It is stored asmodels.For each model frommodels, 4 diagnostic plots areperformed. For the first model, the numbers 1 to 4 should be given, forthe second model numbers from 5 to 8, and so on.
Diagnostics plots for the set of models frommeechua_reg.
Daniela Recchia, Thomas Ostermann.
Ostermann, T., Willich, Stefan N. & Luedtke, Rainer. (2008).Regression toward the mean - a detection method for unknown populationmean based on Mee and Chua’s algorithm. BMC Medical ResearchMethodology.
plot.lm,meechua_reg
# models are an output from meechua_regreplicate_data(0,100,"Before","After",data=language_test)meechua_reg(mee_chua)# models are the output from meechua_reg saved in the environment after running the functionmeechua_plot(models)