| Title: | Create Datasets with Identical Summary Statistics |
| Version: | 1.1.0 |
| Date: | 2022-10-03 |
| Description: | Anscombe's quartet are a set of four two-variable datasets that have several common summary statistics but which have very different joint distributions. This becomes apparent when the data are plotted, which illustrates the importance of using graphical displays in Statistics. This package enables the creation of datasets that have identical marginal sample means and sample variances, sample correlation, least squares regression coefficients and coefficient of determination. The user supplies an initial dataset, which is shifted, scaled and rotated in order to achieve target summary statistics. The general shape of the initial dataset is retained. The target statistics can be supplied directly or calculated based on a user-supplied dataset. The 'datasauRus' packagehttps://cran.r-project.org/package=datasauRus provides further examples of datasets that have markedly different scatter plots but share many sample summary statistics. |
| Imports: | graphics, stats |
| License: | GPL-2 |GPL-3 [expanded from: GPL (≥ 2)] |
| LazyData: | TRUE |
| Encoding: | UTF-8 |
| Depends: | R (≥ 3.3.0) |
| RoxygenNote: | 7.2.1 |
| Suggests: | datasauRus, datasets, gganimate, ggplot2, maps, testthat,knitr, rmarkdown |
| VignetteBuilder: | knitr |
| URL: | https://paulnorthrop.github.io/anscombiser/,https://github.com/paulnorthrop/anscombiser |
| BugReports: | https://github.com/paulnorthrop/anscombiser/issues |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2022-10-02 23:14:56 UTC; paul |
| Author: | Paul J. Northrop [aut, cre, cph] |
| Maintainer: | Paul J. Northrop <p.northrop@ucl.ac.uk> |
| Repository: | CRAN |
| Date/Publication: | 2022-10-02 23:30:02 UTC |
anscombiser: Create Datasets with Identical Summary Statistics
Description
Anscombe's quartet (Anscombe, 1973) are a set of four two-variable datasetsthat have several common summary statistics but which have very differentjoint distributions. This becomes apparent when the data are plotted, whichillustrates the importance of using graphical displays in Statistics. Thispackage enables the creation of datasets that have identical marginal samplemeans and sample variances, sample correlation, least squares regressioncoefficients and coefficient of determination. The user supplies an initialdataset, which is shifted, scaled and rotated in order to achieve targetsummary statistics. The general shape of the initial dataset is retained.The target statistics can be supplied directly or calculated based on auser-supplied dataset.
Details
The main functions inanscombiser are
anscombise, which modifies a user-supplied dataset so that it sharessample summary statistics with Anscombe's quartet.mimic, which modified a user-supplied dataset so that is sharessample summary statistics with another user-supplied dataset.
Seevignette("intro-to-anscombiser", package = "anscombiser") foran overview of the package.
Author(s)
Maintainer: Paul J. Northropp.northrop@ucl.ac.uk [copyright holder]
References
Anscombe, F. J. (1973). Graphs in Statistical Analysis.The American Statistician 27 (1): 17–21.doi:10.1080/00031305.1973.10478966
See Also
anscombise andmimic
Anscombe's Quartet Separated
Description
Provides Anscombe's Quartet as separate data frames.
Usage
anscombe1anscombe2anscombe3anscombe4Format
All datasets are objects of classdata.frame with 11 rows and 2columns.
Source
Anscombe's Quartet of 'Identical' Simple Linear Regressions:datasets::anscombe in thedatasets package. Theith dataset isdatasets::anscombe[, c(i, i + 4)].
References
Anscombe, Francis J. (1973). Graphs in statistical analysis.The American Statistician,27, 17-21.doi:10.2307/2682899
Create new versions of Anscombe's quartet
Description
Modifies a datasetx so that it shares sample summary statistics withAnscombe's quartet.
Usage
anscombise(x, which = 1, idempotent = TRUE)Arguments
x | A numeric matrix or data frame. Each column contains observationson a different variable. Missing observations are not allowed. |
which | An integer in {1, 2, 3, 4}. Which of Anscombe's datasets touse as the target dataset. Obviously, this makes very little difference. |
idempotent | A logical scalar. If |
Details
The input datasetx is modified by shifting, scaling and rotatingit so that its sample mean and covariance matrix match those of theAnscombe quartet.
The rotation is based on the square root of the sample correlation matrix.Ifidempotent = FALSE then this square root is based on the Choleskydecomposition this matrix, usingchol. Ifidempotent = TRUE thesquare root is based on the spectral decomposition of this matrix, usingthe output fromeigen. This is a minimal rotation square root,which means that if the input datax already have theexactly/approximately the required summary statistics then the returneddataset is exactly/approximately the same as the target dataset.
Value
An object of classc("anscombe", "matrix", "array") withplot andprint methods. This returneddataset has the following summary statistics in common with Anscombe'squartet.
The sample means of each variable.
The sample variances of each variable.
The sample correlation matrix.
The estimated regression coefficients from least squares linearregressions of each variable on each other variable.
The target and new summary statistics are returned as attributesold_stats andnew_stats and the chosen Anscombe's quartet dataset asan attributeold_data.
See Also
mimic to modify a dataset to share sample summary statisticswith another dataset.
datasets::anscombe for Anscombe's Quartet andanscombe forAnscombe's Quartet as 4 separate datasets.
input_datasets:input1 toinput8 for some input datasestof the same size as those in Anscombe's quartet.
Examples
# Produce Anscombe-like datasets using input1 to input8a1 <- anscombise(input1, idempotent = FALSE)plot(a1)a2 <- anscombise(input2)plot(a2)a3 <- anscombise(input3, idempotent = FALSE)plot(a3)a4 <- anscombise(input4, idempotent = FALSE)plot(a4)a5 <- anscombise(input5, idempotent = FALSE)plot(a5)a6 <- anscombise(input6)plot(a6)a7 <- anscombise(input7, idempotent = FALSE)plot(a7)a8 <- anscombise(input8, idempotent = FALSE)plot(a8)# Old faithful to new faithfulnew_faithful <- anscombise(datasets::faithful, which = 4)plot(new_faithful)# Then check that the sample summary statistics are the sameplot(new_faithful, input = TRUE)# Map of Italygot_maps <- requireNamespace("maps", quietly = TRUE)if (got_maps) { italy <- mapdata("Italy") new_italy <- anscombise(italy, which = 4) plot(new_italy)}Animation of several Anscombised datasets
Description
Create an animation to show datasets that share sample summary statisticswithAnscombe's quartet.
Usage
anscombise_gif( x, which = 1, idempotent = TRUE, theme_name = "classic", ease = "cubic-in-out", transition_length = 3, state_length = 1, wrap = TRUE)Arguments
x | A list of input datasets. Each one must be a suitable argument |
which,idempotent | Vectors that provide the arguments of the same namesto |
theme_name | A character scalar used to set the |
ease | A character scalar passed to |
transition_length,state_length,wrap | Arguments passed to |
Details
For this function to work the packagesggplot2 andgganimate must be installed.
Value
An object of classc("gganim", "gg", "ggplot") with an additionalattributenew_data that is a data frame with 3 variables,x,y anddataset containing the datasets output fromanscombise.
The returned object may be displayed using by typing its name,e.g.,anim or saved as a GIF file usinganim_save, e.g.,gganimate::anim_save("anscombe.gif", anim).
See Also
anscombise modifies a dataset so that it shares sample summarystatistics withAnscombe's quartet.
input_datasets:input1 toinput8 for some input datasetsof the same size as those in Anscombe's quartet.
Examples
# Animate some Anscombe-like datasets produced using input1 to input8x <- list(input1, input2, input3, input4, input5, input6, input7, input8)idem <- c(FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE)anim <- anscombise_gif(x, idempotent = idem)Internal anscombiser functions
Description
Internal anscombiser functions
Usage
is_wholenumber(x, tol = .Machine$double.eps^0.5)is_pos_def(x, tol = 1e-06)minimal_rotation(x)make_stats(x, stats, idempotent = FALSE)Details
These functions are not intended to be called by the user.
Calculate Anscombe's summary statistics
Description
Calculates a particular set of summary statistics for a dataset.
Usage
get_stats(x)Arguments
x | a numeric matrix or data frame with at least 2 columns/variables.Each column contains observations on a different variable. Missingobservations are not allowed. |
Value
A named list of summary statistics containing
nThe sample size.meansThe sample means of each variable.variancesThe sample means of each variable.correlationThe sample correlation matrix.intercepts,slopes,rsquaredMatrices whose (i,j)th entries are theestimated regression coefficients in a regression ofx[, i]onx[, j]and the resulting coefficient of determinationR^2.
Examples
get_stats(anscombe[, c(1, 5)])Input datasets for use by anscombise()
Description
Provides input datasets from whichanscombe will produce transformed datasets that behave likeAnscombe's quartet of datasets, that is, with the same traditional statistical properties but different general behaviours. Useplot(input1), for example, to see the behaviours of the datasets.
Usage
input1input2input3input4input5input6input7input8Format
All datasets are objects of classmatrix (inherits fromarray) with 11 rows and 2 columns.
Source
None. Created for use in 'anscombiser'.
References
Anscombe, Francis J. (1973). Graphs in statistical analysis.The American Statistician,27, 17-21.doi:10.2307/2682899
Extract longitude and latitude values
Description
Extracts longitude and latitude values for a particular region from theworld map supplied by the maps package.
Usage
mapdata(region = ".", map = "world", exact = FALSE, ...)Arguments
region | Passed to |
map | Passed to |
exact | The argument |
... | Additional arguments to be passed to |
Value
A dataframe with two columns:long andlat for longitude andlatitude.
Examples
See the examples inmimic.
Modify a dataset to mimic another dataset
Description
Modifies a datasetx so that it shares sample summary statistics witha target datasetx2.
Usage
mimic(x, x2, idempotent = TRUE, ...)Arguments
x,x2 | Numeric matrices or data frames. Each column contains observationson a different variable. Missing observations are not allowed. |
idempotent | A logical scalar. If |
... | Additional arguments to be passed to |
Details
The input datasetx is modified by shifting, scaling and rotatingit so that its sample mean and covariance matrix match those of the targetdatasetx2.
The rotation is based on the square root of the sample correlation matrix.Ifidempotent = FALSE then this square root is based on the Choleskydecomposition this matrix, usingchol. Ifidempotent = TRUE thesquare root is based on the spectral decomposition of this matrix, usingthe output fromeigen. This is a minimal rotation square root,which means that if the input datax already have theexactly/approximately the required summary statistics then the returneddataset is exactly/approximately the same as the target datasetx2.
Value
An object of classc("anscombe", "matrix", "array") withplot andprint methods. This returneddataset has the following summary statistics in common withx2.
The sample means of each variable.
The sample variances of each variable.
The sample correlation matrix.
The estimated regression coefficients from least squares linearregressions of each variable on each other variable.
The target and new summary statistics are returned as attributesold_stats andnew_stats.Ifx2 is supplied then it is returned as a attributeold_data.
See Also
anscombise modifies a dataset so that it shares sample summarystatistics withAnscombe's quartet.
Examples
### 2D examples# The UK and a dinosaurgot_maps <- requireNamespace("maps", quietly = TRUE)got_datasauRus <- requireNamespace("datasauRus", quietly = TRUE)if (got_maps && got_datasauRus) { library(maps) library(datasauRus) dino <- datasaurus_dozen_wide[, c("dino_x", "dino_y")] UK <- mapdata("UK") new_UK <- mimic(UK, dino) plot(new_UK)}# Trump and a dinosaurif (got_datasauRus) { library(datasauRus) dino <- datasaurus_dozen_wide[, c("dino_x", "dino_y")] new_dino <- mimic(dino, trump) plot(new_dino)}## Examples of passing summary statistics# The default is zero mean, unit variance and no correlationnew_faithful <- mimic(faithful)plot(new_faithful)# Change the correlationmat <- matrix(c(1, -0.9, -0.9, 1), 2, 2)new_faithful <- mimic(faithful, correlation = mat)plot(new_faithful)### A 3D examplenew_randu <- mimic(datasets::randu, datasets::trees)# The samples summary statistics are equalget_stats(new_randu)get_stats(datasets::trees)Animation of several mimicking datasets
Description
Create an animation to show datasets that mimic a target datasetx2.
Usage
mimic_gif( x, x2, idempotent = TRUE, theme_name = "classic", ease = "cubic-in-out", transition_length = 3, state_length = 1, wrap = TRUE)Arguments
x | A list of input datasets. Each one must be suitable argument |
x2 | A suitable argument |
idempotent | A logical vector that provides the argument of the samenames to |
theme_name | A character scalar used to set the |
ease | A character scalar passed to |
transition_length,state_length,wrap | Arguments passed to |
Details
For this function to work the packagesggplot2 andgganimate must be installed.
Value
An object of classc("gganim", "gg", "ggplot") with an additionalattributenew_data that is a data frame with 3 variables,x,y anddataset containing the datasets output frommimc.
The returned object may be displayed using by typing its name,e.g.,anim or saved as a GIF file usinganim_save, e.g.,gganimate::anim_save("anscombe.gif", anim).
See Also
mimic to modify a dataset to share sample summary statisticswith another dataset.
input_datasets:input1 toinput8 for some input datasetsof the same size as those in Anscombe's quartet.
Examples
# Create 8 datasets that mimic Anscombe's first datasetx <- list(input1, input2, input3, input4, input5, input6, input7, input8)anim <- mimic_gif(x, anscombe1)Plot method for objects of class "anscombe"
Description
plot method for objects inheriting from class"anscombe".
Usage
## S3 method for class 'anscombe'plot(x, input = FALSE, stats = TRUE, digits = 3, legend_args = list(), ...)Arguments
x | an object of class |
input | A logical scalar. Should the old, input data, that is, theAnscombe's dataset chosen for |
stats | A logical scalar. Should the sample summary statistics |
digits | An integer. The argument |
legend_args | A list of arguments to be passed to |
... | Further arguments to be passed to |
Details
This function is only applicable in 2 dimensions, that is,whenlength(attr(x, "new_stats")$means) = 2.
Value
Nothing is returned.
Examples
See the examples inanscombise andmimic.
See Also
anscombise andmimic.
Print method for objects of class "anscombe"
Description
print method for class "anscombe".
Usage
## S3 method for class 'anscombe'print(x, ...)Arguments
x | an object of class "anscombe", a result of a call to |
... | Additional optional arguments to be passed to |
Details
Just extracts the new dataset fromx and prints it usingprint.
Value
The argumentx, invisibly.
See Also
anscombise andmimic
Create a list of summary statistics
Description
Creates a list of summary statistics to pass tomimic.
Usage
set_stats(d = 2, means = 0, variances = 1, correlation = diag(2))Arguments
d | An integer that is no smaller than 2. |
means | A numeric vector of sample means. |
variances | A numeric vector of positive sample variances. |
correlation | A numeric correlation matrix. None of the off-diagonalentries in |
Details
The vectorsmeans andvariances are recycled usingrep_len to have lengthd.
Value
A list containing the following components.
meansad-vector of sample means.variancesad-vector sample variances.correlationadbydcorrelation matrix.
Examples
# Uncorrelated with zero means and unit variancesset_stats()# Sample correlation = 0.9set_stats(correlation = matrix(c(1, 0.9, 0.9, 1), 2, 2))Donald Trump
Description
A dataset that provides an image of Donald Trump's face.
Usage
trumpFormat
A matrix with 4885 rows and 2 columns:x andy.
Source
This image was created by Accentaur from the Noun Project.https://thenounproject.com/term/donald-trump/727774/
Examples
plot(trump)