Movatterモバイル変換


[0]ホーム

URL:


Title:Create Datasets with Identical Summary Statistics
Version:1.1.0
Date:2022-10-03
Description:Anscombe's quartet are a set of four two-variable datasets that have several common summary statistics but which have very different joint distributions. This becomes apparent when the data are plotted, which illustrates the importance of using graphical displays in Statistics. This package enables the creation of datasets that have identical marginal sample means and sample variances, sample correlation, least squares regression coefficients and coefficient of determination. The user supplies an initial dataset, which is shifted, scaled and rotated in order to achieve target summary statistics. The general shape of the initial dataset is retained. The target statistics can be supplied directly or calculated based on a user-supplied dataset. The 'datasauRus' packagehttps://cran.r-project.org/package=datasauRus provides further examples of datasets that have markedly different scatter plots but share many sample summary statistics.
Imports:graphics, stats
License:GPL-2 |GPL-3 [expanded from: GPL (≥ 2)]
LazyData:TRUE
Encoding:UTF-8
Depends:R (≥ 3.3.0)
RoxygenNote:7.2.1
Suggests:datasauRus, datasets, gganimate, ggplot2, maps, testthat,knitr, rmarkdown
VignetteBuilder:knitr
URL:https://paulnorthrop.github.io/anscombiser/,https://github.com/paulnorthrop/anscombiser
BugReports:https://github.com/paulnorthrop/anscombiser/issues
Config/testthat/edition:3
NeedsCompilation:no
Packaged:2022-10-02 23:14:56 UTC; paul
Author:Paul J. Northrop [aut, cre, cph]
Maintainer:Paul J. Northrop <p.northrop@ucl.ac.uk>
Repository:CRAN
Date/Publication:2022-10-02 23:30:02 UTC

anscombiser: Create Datasets with Identical Summary Statistics

Description

Anscombe's quartet (Anscombe, 1973) are a set of four two-variable datasetsthat have several common summary statistics but which have very differentjoint distributions. This becomes apparent when the data are plotted, whichillustrates the importance of using graphical displays in Statistics. Thispackage enables the creation of datasets that have identical marginal samplemeans and sample variances, sample correlation, least squares regressioncoefficients and coefficient of determination. The user supplies an initialdataset, which is shifted, scaled and rotated in order to achieve targetsummary statistics. The general shape of the initial dataset is retained.The target statistics can be supplied directly or calculated based on auser-supplied dataset.

Details

The main functions inanscombiser are

Seevignette("intro-to-anscombiser", package = "anscombiser") foran overview of the package.

Author(s)

Maintainer: Paul J. Northropp.northrop@ucl.ac.uk [copyright holder]

References

Anscombe, F. J. (1973). Graphs in Statistical Analysis.The American Statistician 27 (1): 17–21.doi:10.1080/00031305.1973.10478966

See Also

anscombise andmimic


Anscombe's Quartet Separated

Description

Provides Anscombe's Quartet as separate data frames.

Usage

anscombe1anscombe2anscombe3anscombe4

Format

All datasets are objects of classdata.frame with 11 rows and 2columns.

Source

Anscombe's Quartet of 'Identical' Simple Linear Regressions:datasets::anscombe in thedatasets package. Theith dataset isdatasets::anscombe[, c(i, i + 4)].

References

Anscombe, Francis J. (1973). Graphs in statistical analysis.The American Statistician,27, 17-21.doi:10.2307/2682899


Create new versions of Anscombe's quartet

Description

Modifies a datasetx so that it shares sample summary statistics withAnscombe's quartet.

Usage

anscombise(x, which = 1, idempotent = TRUE)

Arguments

x

A numeric matrix or data frame. Each column contains observationson a different variable. Missing observations are not allowed.

which

An integer in {1, 2, 3, 4}. Which of Anscombe's datasets touse as the target dataset. Obviously, this makes very little difference.

idempotent

A logical scalar. Ifidempotent = TRUE then applyinganscombise to one of the datasets in Anscombe's Quartet will returnthe dataset unchanged, apart from a change ofclass. Ifidempotent = FALSE then the returned dataset will be a rotated versionof the original dataset, with the same summary statistics. SeeDetails.

Details

The input datasetx is modified by shifting, scaling and rotatingit so that its sample mean and covariance matrix match those of theAnscombe quartet.

The rotation is based on the square root of the sample correlation matrix.Ifidempotent = FALSE then this square root is based on the Choleskydecomposition this matrix, usingchol. Ifidempotent = TRUE thesquare root is based on the spectral decomposition of this matrix, usingthe output fromeigen. This is a minimal rotation square root,which means that if the input datax already have theexactly/approximately the required summary statistics then the returneddataset is exactly/approximately the same as the target dataset.

Value

An object of classc("anscombe", "matrix", "array") withplot andprint methods. This returneddataset has the following summary statistics in common with Anscombe'squartet.

The target and new summary statistics are returned as attributesold_stats andnew_stats and the chosen Anscombe's quartet dataset asan attributeold_data.

See Also

mimic to modify a dataset to share sample summary statisticswith another dataset.

datasets::anscombe for Anscombe's Quartet andanscombe forAnscombe's Quartet as 4 separate datasets.

input_datasets:input1 toinput8 for some input datasestof the same size as those in Anscombe's quartet.

Examples

# Produce Anscombe-like datasets using input1 to input8a1 <- anscombise(input1, idempotent = FALSE)plot(a1)a2 <- anscombise(input2)plot(a2)a3 <- anscombise(input3, idempotent = FALSE)plot(a3)a4 <- anscombise(input4, idempotent = FALSE)plot(a4)a5 <- anscombise(input5, idempotent = FALSE)plot(a5)a6 <- anscombise(input6)plot(a6)a7 <- anscombise(input7, idempotent = FALSE)plot(a7)a8 <- anscombise(input8, idempotent = FALSE)plot(a8)# Old faithful to new faithfulnew_faithful <- anscombise(datasets::faithful, which = 4)plot(new_faithful)# Then check that the sample summary statistics are the sameplot(new_faithful, input = TRUE)# Map of Italygot_maps <- requireNamespace("maps", quietly = TRUE)if (got_maps) {  italy <- mapdata("Italy")  new_italy <- anscombise(italy, which = 4)  plot(new_italy)}

Animation of several Anscombised datasets

Description

Create an animation to show datasets that share sample summary statisticswithAnscombe's quartet.

Usage

anscombise_gif(  x,  which = 1,  idempotent = TRUE,  theme_name = "classic",  ease = "cubic-in-out",  transition_length = 3,  state_length = 1,  wrap = TRUE)

Arguments

x

A list of input datasets. Each one must be a suitable argumentx foranscombise.

which,idempotent

Vectors that provide the arguments of the same namestoanscombise for each dataset. If necessary,rep_len is used toreplicate these arguments so that they each have lengthlength(x).

theme_name

A character scalar used to set theggtheme.One of"grey","gray","bw","linedraw","light","dark","minimal","classic","void" or"test".

ease

A character scalar passed toease_aesto control how the points move in transitioning from one dataset tothe next.

transition_length,state_length,wrap

Arguments passed totransition_states.

Details

For this function to work the packagesggplot2 andgganimate must be installed.

Value

An object of classc("gganim", "gg", "ggplot") with an additionalattributenew_data that is a data frame with 3 variables,x,y anddataset containing the datasets output fromanscombise.

The returned object may be displayed using by typing its name,e.g.,anim or saved as a GIF file usinganim_save, e.g.,gganimate::anim_save("anscombe.gif", anim).

See Also

anscombise modifies a dataset so that it shares sample summarystatistics withAnscombe's quartet.

input_datasets:input1 toinput8 for some input datasetsof the same size as those in Anscombe's quartet.

Examples

# Animate some Anscombe-like datasets produced using input1 to input8x <- list(input1, input2, input3, input4, input5, input6, input7, input8)idem <- c(FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE)anim <- anscombise_gif(x, idempotent = idem)

Internal anscombiser functions

Description

Internal anscombiser functions

Usage

is_wholenumber(x, tol = .Machine$double.eps^0.5)is_pos_def(x, tol = 1e-06)minimal_rotation(x)make_stats(x, stats, idempotent = FALSE)

Details

These functions are not intended to be called by the user.


Calculate Anscombe's summary statistics

Description

Calculates a particular set of summary statistics for a dataset.

Usage

get_stats(x)

Arguments

x

a numeric matrix or data frame with at least 2 columns/variables.Each column contains observations on a different variable. Missingobservations are not allowed.

Value

A named list of summary statistics containing

Examples

get_stats(anscombe[, c(1, 5)])

Input datasets for use by anscombise()

Description

Provides input datasets from whichanscombe will produce transformed datasets that behave likeAnscombe's quartet of datasets, that is, with the same traditional statistical properties but different general behaviours. Useplot(input1), for example, to see the behaviours of the datasets.

Usage

input1input2input3input4input5input6input7input8

Format

All datasets are objects of classmatrix (inherits fromarray) with 11 rows and 2 columns.

Source

None. Created for use in 'anscombiser'.

References

Anscombe, Francis J. (1973). Graphs in statistical analysis.The American Statistician,27, 17-21.doi:10.2307/2682899


Extract longitude and latitude values

Description

Extracts longitude and latitude values for a particular region from theworld map supplied by the maps package.

Usage

mapdata(region = ".", map = "world", exact = FALSE, ...)

Arguments

region

Passed tomap as the argumentregions.

map

Passed tomap as the argumentdatabase

exact

The argumentexact passed to themap function.

...

Additional arguments to be passed tomap.

Value

A dataframe with two columns:long andlat for longitude andlatitude.

Examples

See the examples inmimic.


Modify a dataset to mimic another dataset

Description

Modifies a datasetx so that it shares sample summary statistics witha target datasetx2.

Usage

mimic(x, x2, idempotent = TRUE, ...)

Arguments

x,x2

Numeric matrices or data frames. Each column contains observationson a different variable. Missing observations are not allowed.get_stats(x2) sets the target summary statistics. Ifx2 is missingthenset_stats is called withd = ncol(x) and any additional argumentssupplied via.... This can be used to set target summary statistics(means, variances and/or correlations).

idempotent

A logical scalar. Ifidempotent = TRUE thenmimic(x, x) returnsx, apart from a change ofclass. Ifidempotent = FALSE then the returned dataset may be a rotated version ofthe original dataset, with the same summary statistics. SeeDetails.

...

Additional arguments to be passed toset_stats.

Details

The input datasetx is modified by shifting, scaling and rotatingit so that its sample mean and covariance matrix match those of the targetdatasetx2.

The rotation is based on the square root of the sample correlation matrix.Ifidempotent = FALSE then this square root is based on the Choleskydecomposition this matrix, usingchol. Ifidempotent = TRUE thesquare root is based on the spectral decomposition of this matrix, usingthe output fromeigen. This is a minimal rotation square root,which means that if the input datax already have theexactly/approximately the required summary statistics then the returneddataset is exactly/approximately the same as the target datasetx2.

Value

An object of classc("anscombe", "matrix", "array") withplot andprint methods. This returneddataset has the following summary statistics in common withx2.

The target and new summary statistics are returned as attributesold_stats andnew_stats.Ifx2 is supplied then it is returned as a attributeold_data.

See Also

anscombise modifies a dataset so that it shares sample summarystatistics withAnscombe's quartet.

Examples

### 2D examples# The UK and a dinosaurgot_maps <- requireNamespace("maps", quietly = TRUE)got_datasauRus <- requireNamespace("datasauRus", quietly = TRUE)if (got_maps && got_datasauRus) {  library(maps)  library(datasauRus)  dino <- datasaurus_dozen_wide[, c("dino_x", "dino_y")]  UK <- mapdata("UK")  new_UK <- mimic(UK, dino)  plot(new_UK)}# Trump and a dinosaurif (got_datasauRus) {  library(datasauRus)  dino <- datasaurus_dozen_wide[, c("dino_x", "dino_y")]  new_dino <- mimic(dino, trump)  plot(new_dino)}## Examples of passing summary statistics# The default is zero mean, unit variance and no correlationnew_faithful <- mimic(faithful)plot(new_faithful)# Change the correlationmat <- matrix(c(1, -0.9, -0.9, 1), 2, 2)new_faithful <- mimic(faithful, correlation = mat)plot(new_faithful)### A 3D examplenew_randu <- mimic(datasets::randu, datasets::trees)# The samples summary statistics are equalget_stats(new_randu)get_stats(datasets::trees)

Animation of several mimicking datasets

Description

Create an animation to show datasets that mimic a target datasetx2.

Usage

mimic_gif(  x,  x2,  idempotent = TRUE,  theme_name = "classic",  ease = "cubic-in-out",  transition_length = 3,  state_length = 1,  wrap = TRUE)

Arguments

x

A list of input datasets. Each one must be suitable argumentx for formimic.

x2

A suitable argumentx2 formimic.

idempotent

A logical vector that provides the argument of the samenames tomimic for each dataset. If necessary,rep_len is used toreplicate this argument so that it has lengthlength(x).

theme_name

A character scalar used to set theggtheme.One of"grey","gray","bw","linedraw","light","dark","minimal","classic","void" or"test".

ease

A character scalar passed toease_aesto control how the points move in transitioning from one dataset tothe next.

transition_length,state_length,wrap

Arguments passed totransition_states.

Details

For this function to work the packagesggplot2 andgganimate must be installed.

Value

An object of classc("gganim", "gg", "ggplot") with an additionalattributenew_data that is a data frame with 3 variables,x,y anddataset containing the datasets output frommimc.

The returned object may be displayed using by typing its name,e.g.,anim or saved as a GIF file usinganim_save, e.g.,gganimate::anim_save("anscombe.gif", anim).

See Also

mimic to modify a dataset to share sample summary statisticswith another dataset.

input_datasets:input1 toinput8 for some input datasetsof the same size as those in Anscombe's quartet.

Examples

# Create 8 datasets that mimic Anscombe's first datasetx <- list(input1, input2, input3, input4, input5, input6, input7, input8)anim <- mimic_gif(x, anscombe1)

Plot method for objects of class "anscombe"

Description

plot method for objects inheriting from class"anscombe".

Usage

## S3 method for class 'anscombe'plot(x, input = FALSE, stats = TRUE, digits = 3, legend_args = list(), ...)

Arguments

x

an object of class'anscombe', a result of a call toanscombise ormimic.

input

A logical scalar. Should the old, input data, that is, theAnscombe's dataset chosen foranscombise or the argumentx2 tomimic, be plotted? Ifold = FALSE then the new, output data areplotted. Ifold = TRUE then the old data are plotted.

stats

A logical scalar. Should the sample summary statisticsn,means,variances andcorrelation be added to the plot?

digits

An integer. The argumentdigits passed tosignifto round the values of the statistics before adding them to the plot.

legend_args

A list of arguments to be passed tolegend whenstats = TRUE, especiallylegend_args$x to control the position of the legend.

...

Further arguments to be passed toplot

Details

This function is only applicable in 2 dimensions, that is,whenlength(attr(x, "new_stats")$means) = 2.

Value

Nothing is returned.

Examples

See the examples inanscombise andmimic.

See Also

anscombise andmimic.


Print method for objects of class "anscombe"

Description

print method for class "anscombe".

Usage

## S3 method for class 'anscombe'print(x, ...)

Arguments

x

an object of class "anscombe", a result of a call toanscombiseormimic.

...

Additional optional arguments to be passed toprint.

Details

Just extracts the new dataset fromx and prints it usingprint.

Value

The argumentx, invisibly.

See Also

anscombise andmimic


Create a list of summary statistics

Description

Creates a list of summary statistics to pass tomimic.

Usage

set_stats(d = 2, means = 0, variances = 1, correlation = diag(2))

Arguments

d

An integer that is no smaller than 2.

means

A numeric vector of sample means.

variances

A numeric vector of positive sample variances.

correlation

A numeric correlation matrix. None of the off-diagonalentries incorrelation are allowed to be equal to 1 in absolute value.

Details

The vectorsmeans andvariances are recycled usingrep_len to have lengthd.

Value

A list containing the following components.

Examples

# Uncorrelated with zero means and unit variancesset_stats()# Sample correlation = 0.9set_stats(correlation = matrix(c(1, 0.9, 0.9, 1), 2, 2))

Donald Trump

Description

A dataset that provides an image of Donald Trump's face.

Usage

trump

Format

A matrix with 4885 rows and 2 columns:x andy.

Source

This image was created by Accentaur from the Noun Project.https://thenounproject.com/term/donald-trump/727774/

Examples

plot(trump)

[8]ページ先頭

©2009-2025 Movatter.jp