Movatterモバイル変換


[0]ホーム

URL:


qwraps2: Data Sets

Peter E. DeWitt

library(qwraps2)packageVersion("qwraps2")
## [1] '0.6.1'

1 mtcars2

The base R package datasets provides the mtcars data set. Theinformation in mtcars is the fuel consumption and automobilecharacteristics of 32 automobiles as reported in the March, April, Juneand July 1974 issues ofMotor Trend magazine(Hocking 1976).

That dataset is modified and extended to provide support for exampleswithin the qwraps2 package documentation. This vignette documents theconstruction of mtcars2.

2 Construction ofmtcars2

Starting with the original mtcars:

mtcars2<- mtcarsstr(mtcars2)
## 'data.frame':    32 obs. of  11 variables:##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...##  $ disp: num  160 160 108 258 360 ...##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...##  $ qsec: num  16.5 17 18.6 19.4 17 ...##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

The cyl column provides the number of cylinders for in the engine ofthe automobiles. We will use two additional versions of thisinformation, one as character column and one as a factor. Please notethat the order of the factor levels is intentionally set to benon-sequential. This will help to illustrate the ordering or resultswhen using a factor or a character vector as a grouping variable.

mtcars2$cyl_character<-paste(mtcars2$cyl,"cylinders")mtcars2$cyl_factor<-factor(mtcars2$cyl,levels =c(6,4,8),labels =paste(c(6,4,8),"cylinders"))

Create other factor variables.

mtcars2$gear_factor<-factor(mtcars2$gear,levels =c(3,4,5),labels =paste(c(3,4,5),"forward gears"))

Engine configuration: thevs column is an integer vectorfor indicating V-shaped or straight. The constructed column engine is afactor the same information as a labeled factor.

mtcars2$engine<-factor(mtcars2$vs,levels =c(0,1),labels =c("V-shaped","straight"))

Transmission: theam column is an integer vectorindicating if the transmission is automatic or manual. We construct atransmission column to provide the same information as afactor.

mtcars2$transmission<-factor(mtcars2$am,levels =c(0,1),labels =c("Automatic","Manual"))

The rownames of the mtcars2 data set provide the make and model ofthe automobiles. Here we will create columns for make and model and thenomit the rownames.

mtcars2$make<-sub("^(\\w+)\\s(.+)","\\1",rownames(mtcars2))mtcars2$model<-sub("^(\\w+)\\s(.+)","\\2",rownames(mtcars2))rownames(mtcars2)<-NULL

To have some dates to use in examples we are going to add an mostlyarbitrary date column to mtcars2. Given that the data came from theMarch through July issues ofMotor Trend in 1974, we willcreate atest_date column starting in January 1974 forwardwith one to three tests per week through May 1974. This assumes the datais in chronological order of the data.

set.seed(42)mtcars2$test_date<-as.POSIXct("1974-01-03",tz ="GMT")+cumsum(sample(c(2,3,4,7)*3600*24,size =nrow(mtcars2),replace =TRUE))

Lastly we will order the columns of mtcars2 so similar columns arenext to each other.

mtcars2<-  mtcars2[,c("make","model","mpg","disp","hp","drat","wt","qsec","cyl","cyl_character","cyl_factor","vs","engine","am","transmission","gear","gear_factor","carb","test_date")]

3 Summary of mtcars2

mtcars2 is a data frame with 32 observations with 19 variables. Someof the variables tell us the same information, but in differentformats.

str(mtcars2)
## 'data.frame':    32 obs. of  19 variables:##  $ make         : chr  "Mazda" "Mazda" "Datsun" "Hornet" ...##  $ model        : chr  "RX4" "RX4 Wag" "710" "4 Drive" ...##  $ mpg          : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...##  $ disp         : num  160 160 108 258 360 ...##  $ hp           : num  110 110 93 110 175 105 245 62 95 123 ...##  $ drat         : num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...##  $ wt           : num  2.62 2.88 2.32 3.21 3.44 ...##  $ qsec         : num  16.5 17 18.6 19.4 17 ...##  $ cyl          : num  6 6 4 6 8 6 8 4 4 6 ...##  $ cyl_character: chr  "6 cylinders" "6 cylinders" "4 cylinders" "6 cylinders" ...##  $ cyl_factor   : Factor w/ 3 levels "6 cylinders",..: 1 1 2 1 3 1 3 2 2 1 ...##  $ vs           : num  0 0 1 1 0 1 0 1 1 1 ...##  $ engine       : Factor w/ 2 levels "V-shaped","straight": 1 1 2 2 1 2 1 2 2 2 ...##  $ am           : num  1 1 1 0 0 0 0 0 0 0 ...##  $ transmission : Factor w/ 2 levels "Automatic","Manual": 2 2 2 1 1 1 1 1 1 1 ...##  $ gear         : num  4 4 4 3 3 3 3 4 4 4 ...##  $ gear_factor  : Factor w/ 3 levels "3 forward gears",..: 2 2 2 1 1 1 1 2 2 2 ...##  $ carb         : num  4 4 1 1 2 1 4 2 2 4 ...##  $ test_date    : POSIXct, format: "1974-01-05" "1974-01-07" ...
ElementNameDescription
[, 1]makeVehicle Manufacturer
[, 2]modelVehicle model
[, 3]mpgMiles/(US) gallon
[, 4]dispDisplacement (cu.in.)
[, 5]hpGross horsepower
[, 6]dratRear axle ratio
[, 7]wtWeight (1000 lbs)
[, 8]qsec1/4 mile time
[, 9]cylNumber of cylinders
[, 10]cyl_characterNumber of cylinders as a character string
[, 11]cyl_factorNumber of cylinders as a factor
[, 12]vsEngine (0 = V-shaped, 1 = straight)
[, 13]enginesame info as vs, but as a factor
[, 14]amTransmission (0 = automatic, 1 = manual)
[, 15]transmissionsame info as am as a factor
[, 16]gearNumber of forward gears
[, 17]gear_factorNumber of forward gears as a factor
[, 18]carbNumber of carburetors
[, 19]test_datearbitrary date - created to approximate when thevehicle would have been assessed.

4 pefr

Peak expiratory flow rate (pefr) data is used for examples within theqwraps2 package. The data has been transcribed from(Bland and Altman 1986).

The sample comprised colleagues and family of J.M.B. chosen to give awide range of PEFR but in no way representative of any definedpopulation. Two measurements were made with a Wright peak flow meter andtwo with a mini Wright meter, in random order. All measurements weretaken by J.M.B., using the same two instruments. (These data werecollected to demonstrate the statistical method and provide no evidenceon the comparability of these two instruments.) We did not repeatsuspect readings and took a single reading as our measurement of PEFR.Only the first measurement by each method is used to illustrate thecomparison of methods, the second measurements being used in the studyof repeatability.

The units of measure for the pefr are liters per minute (L/min).

# copied text from the manuscriptpefr_table<-read.delim(header =FALSE,text ="1   494 490 512 5252   395 397 430 4153   516 512 520 5084   434 401 428 4445   476 470 500 5006   557 611 600 6257   413 415 364 4608   442 431 380 3909   650 638 658 64210  433 429 445 43211  417 420 432 42012  656 633 626 60513  267 275 260 22714  478 492 477 46715  178 165 259 26816  423 372 350 37017  427 421 451 443")

Build the data set

pefr<-expand.grid(subject =1:17,measurement =1:2,meter   =c("Wright peak flow meter","Mini Wright peak flow meter"),KEEP.OUT.ATTRS =FALSE,stringsAsFactors =FALSE)pefr$pefr<-do.call(c, pefr_table[,2:5])head(pefr)
##   subject measurement                  meter pefr## 1       1           1 Wright peak flow meter  494## 2       2           1 Wright peak flow meter  395## 3       3           1 Wright peak flow meter  516## 4       4           1 Wright peak flow meter  434## 5       5           1 Wright peak flow meter  476## 6       6           1 Wright peak flow meter  557

Seevignette("qwraps2-graphics", package = "qwraps2")for examples using this data set, specifically in the construction anduse of Bland-Altman plots viaqblandaltman.

5 Spambase

Spambase(Hopkins and Suermondt 1999)is a useful data set for example needed a binary outcome and severalpossible predictors. The data set and documentation can be found in thispackage in the directory on your machine at:

system.file("spambase",package ="qwraps2")
## [1] "/private/var/folders/fc/3hxyq4z94jx_7jr506b8ttlm0000gq/T/RtmpusG7hx/Rinstd17dcc8f7b1/qwraps2/spambase"

The data setspambase was generated thusly:

nms<-scan(system.file("spambase","spambase.names",package ="qwraps2")       ,what =character()       ,skip =33       ,sep ="\n"       ,quiet =TRUE  )nms<-sapply(strsplit(nms,split =":"), getElement,1)nms<-c(nms,"spam")# clean up char_freq namesnms<-  nms|>sub(";","semicolon",x = _,fixed =TRUE)|>sub("(","parenthesis",x = _,fixed =TRUE)|>sub("[","square_bracket",x = _,fixed =TRUE)|>sub("!","exclamation_point",x = _,fixed =TRUE)|>sub("$","dollar_sign",x = _,fixed =TRUE)|>sub("#","pound",x = _,fixed =TRUE)spambase<-read.csv(file =system.file("spambase","spambase.data",package ="qwraps2")    ,header =FALSE    ,col.names = nms)

There are 4,601 rows of data with 57 predictors for the binaryoutcomespam

n_perc(spambase$spam)# count and percent of spam messages
## [1] "1,813 (39.40\\%)"

6 References

Bland, J Martin, and DouglasG Altman. 1986.“Statistical Methodsfor Assessing Agreement Between Two Methods of ClinicalMeasurement.”The Lancet 327 (8476): 307–10.
Hocking, Ronald R. 1976.“A Biometrics Invited Paper. The Analysisand Selection of Variables in Linear Regression.”Biometrics 32 (1): 1–49.
Hopkins, Reeber, Mark, and Jaap Suermondt. 1999.Spambase.” UCI Machine Learning Repository.

7 Session Info

sessionInfo()
## R version 4.4.1 (2024-06-14)## Platform: x86_64-apple-darwin20## Running under: macOS Sonoma 14.6## ## Matrix products: default## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib ## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0## ## locale:## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8## ## time zone: America/Denver## tzcode source: internal## ## attached base packages:## [1] stats     graphics  grDevices utils     datasets  methods   base     ## ## other attached packages:## [1] qwraps2_0.6.1## ## loaded via a namespace (and not attached):##  [1] digest_0.6.37     R6_2.5.1          fastmap_1.2.0     xfun_0.48        ##  [5] cachem_1.1.0      knitr_1.48        htmltools_0.5.8.1 rmarkdown_2.28   ##  [9] lifecycle_1.0.4   cli_3.6.3         sass_0.4.9        jquerylib_0.1.4  ## [13] compiler_4.4.1    tools_4.4.1       evaluate_1.0.1    bslib_0.8.0      ## [17] Rcpp_1.0.13       yaml_2.3.10       rlang_1.1.4       jsonlite_1.8.9

[8]ページ先頭

©2009-2025 Movatter.jp