## [1] '0.6.1'The base R package datasets provides the mtcars data set. Theinformation in mtcars is the fuel consumption and automobilecharacteristics of 32 automobiles as reported in the March, April, Juneand July 1974 issues ofMotor Trend magazine(Hocking 1976).
That dataset is modified and extended to provide support for exampleswithin the qwraps2 package documentation. This vignette documents theconstruction of mtcars2.
Starting with the original mtcars:
## 'data.frame': 32 obs. of 11 variables:## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...## $ disp: num 160 160 108 258 360 ...## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...## $ qsec: num 16.5 17 18.6 19.4 17 ...## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...## $ am : num 1 1 1 0 0 0 0 0 0 0 ...## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...The cyl column provides the number of cylinders for in the engine ofthe automobiles. We will use two additional versions of thisinformation, one as character column and one as a factor. Please notethat the order of the factor levels is intentionally set to benon-sequential. This will help to illustrate the ordering or resultswhen using a factor or a character vector as a grouping variable.
mtcars2$cyl_character<-paste(mtcars2$cyl,"cylinders")mtcars2$cyl_factor<-factor(mtcars2$cyl,levels =c(6,4,8),labels =paste(c(6,4,8),"cylinders"))Create other factor variables.
Engine configuration: thevs column is an integer vectorfor indicating V-shaped or straight. The constructed column engine is afactor the same information as a labeled factor.
Transmission: theam column is an integer vectorindicating if the transmission is automatic or manual. We construct atransmission column to provide the same information as afactor.
The rownames of the mtcars2 data set provide the make and model ofthe automobiles. Here we will create columns for make and model and thenomit the rownames.
mtcars2$make<-sub("^(\\w+)\\s(.+)","\\1",rownames(mtcars2))mtcars2$model<-sub("^(\\w+)\\s(.+)","\\2",rownames(mtcars2))rownames(mtcars2)<-NULLTo have some dates to use in examples we are going to add an mostlyarbitrary date column to mtcars2. Given that the data came from theMarch through July issues ofMotor Trend in 1974, we willcreate atest_date column starting in January 1974 forwardwith one to three tests per week through May 1974. This assumes the datais in chronological order of the data.
set.seed(42)mtcars2$test_date<-as.POSIXct("1974-01-03",tz ="GMT")+cumsum(sample(c(2,3,4,7)*3600*24,size =nrow(mtcars2),replace =TRUE))Lastly we will order the columns of mtcars2 so similar columns arenext to each other.
mtcars2 is a data frame with 32 observations with 19 variables. Someof the variables tell us the same information, but in differentformats.
## 'data.frame': 32 obs. of 19 variables:## $ make : chr "Mazda" "Mazda" "Datsun" "Hornet" ...## $ model : chr "RX4" "RX4 Wag" "710" "4 Drive" ...## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...## $ disp : num 160 160 108 258 360 ...## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...## $ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...## $ qsec : num 16.5 17 18.6 19.4 17 ...## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...## $ cyl_character: chr "6 cylinders" "6 cylinders" "4 cylinders" "6 cylinders" ...## $ cyl_factor : Factor w/ 3 levels "6 cylinders",..: 1 1 2 1 3 1 3 2 2 1 ...## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...## $ engine : Factor w/ 2 levels "V-shaped","straight": 1 1 2 2 1 2 1 2 2 2 ...## $ am : num 1 1 1 0 0 0 0 0 0 0 ...## $ transmission : Factor w/ 2 levels "Automatic","Manual": 2 2 2 1 1 1 1 1 1 1 ...## $ gear : num 4 4 4 3 3 3 3 4 4 4 ...## $ gear_factor : Factor w/ 3 levels "3 forward gears",..: 2 2 2 1 1 1 1 2 2 2 ...## $ carb : num 4 4 1 1 2 1 4 2 2 4 ...## $ test_date : POSIXct, format: "1974-01-05" "1974-01-07" ...| Element | Name | Description |
|---|---|---|
| [, 1] | make | Vehicle Manufacturer |
| [, 2] | model | Vehicle model |
| [, 3] | mpg | Miles/(US) gallon |
| [, 4] | disp | Displacement (cu.in.) |
| [, 5] | hp | Gross horsepower |
| [, 6] | drat | Rear axle ratio |
| [, 7] | wt | Weight (1000 lbs) |
| [, 8] | qsec | 1/4 mile time |
| [, 9] | cyl | Number of cylinders |
| [, 10] | cyl_character | Number of cylinders as a character string |
| [, 11] | cyl_factor | Number of cylinders as a factor |
| [, 12] | vs | Engine (0 = V-shaped, 1 = straight) |
| [, 13] | engine | same info as vs, but as a factor |
| [, 14] | am | Transmission (0 = automatic, 1 = manual) |
| [, 15] | transmission | same info as am as a factor |
| [, 16] | gear | Number of forward gears |
| [, 17] | gear_factor | Number of forward gears as a factor |
| [, 18] | carb | Number of carburetors |
| [, 19] | test_date | arbitrary date - created to approximate when thevehicle would have been assessed. |
Peak expiratory flow rate (pefr) data is used for examples within theqwraps2 package. The data has been transcribed from(Bland and Altman 1986).
The sample comprised colleagues and family of J.M.B. chosen to give awide range of PEFR but in no way representative of any definedpopulation. Two measurements were made with a Wright peak flow meter andtwo with a mini Wright meter, in random order. All measurements weretaken by J.M.B., using the same two instruments. (These data werecollected to demonstrate the statistical method and provide no evidenceon the comparability of these two instruments.) We did not repeatsuspect readings and took a single reading as our measurement of PEFR.Only the first measurement by each method is used to illustrate thecomparison of methods, the second measurements being used in the studyof repeatability.
The units of measure for the pefr are liters per minute (L/min).
# copied text from the manuscriptpefr_table<-read.delim(header =FALSE,text ="1 494 490 512 5252 395 397 430 4153 516 512 520 5084 434 401 428 4445 476 470 500 5006 557 611 600 6257 413 415 364 4608 442 431 380 3909 650 638 658 64210 433 429 445 43211 417 420 432 42012 656 633 626 60513 267 275 260 22714 478 492 477 46715 178 165 259 26816 423 372 350 37017 427 421 451 443")Build the data set
pefr<-expand.grid(subject =1:17,measurement =1:2,meter =c("Wright peak flow meter","Mini Wright peak flow meter"),KEEP.OUT.ATTRS =FALSE,stringsAsFactors =FALSE)pefr$pefr<-do.call(c, pefr_table[,2:5])head(pefr)## subject measurement meter pefr## 1 1 1 Wright peak flow meter 494## 2 2 1 Wright peak flow meter 395## 3 3 1 Wright peak flow meter 516## 4 4 1 Wright peak flow meter 434## 5 5 1 Wright peak flow meter 476## 6 6 1 Wright peak flow meter 557Seevignette("qwraps2-graphics", package = "qwraps2")for examples using this data set, specifically in the construction anduse of Bland-Altman plots viaqblandaltman.
Spambase(Hopkins and Suermondt 1999)is a useful data set for example needed a binary outcome and severalpossible predictors. The data set and documentation can be found in thispackage in the directory on your machine at:
## [1] "/private/var/folders/fc/3hxyq4z94jx_7jr506b8ttlm0000gq/T/RtmpusG7hx/Rinstd17dcc8f7b1/qwraps2/spambase"The data setspambase was generated thusly:
nms<-scan(system.file("spambase","spambase.names",package ="qwraps2") ,what =character() ,skip =33 ,sep ="\n" ,quiet =TRUE )nms<-sapply(strsplit(nms,split =":"), getElement,1)nms<-c(nms,"spam")# clean up char_freq namesnms<- nms|>sub(";","semicolon",x = _,fixed =TRUE)|>sub("(","parenthesis",x = _,fixed =TRUE)|>sub("[","square_bracket",x = _,fixed =TRUE)|>sub("!","exclamation_point",x = _,fixed =TRUE)|>sub("$","dollar_sign",x = _,fixed =TRUE)|>sub("#","pound",x = _,fixed =TRUE)spambase<-read.csv(file =system.file("spambase","spambase.data",package ="qwraps2") ,header =FALSE ,col.names = nms)There are 4,601 rows of data with 57 predictors for the binaryoutcomespam
## [1] "1,813 (39.40\\%)"## R version 4.4.1 (2024-06-14)## Platform: x86_64-apple-darwin20## Running under: macOS Sonoma 14.6## ## Matrix products: default## BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib ## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0## ## locale:## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8## ## time zone: America/Denver## tzcode source: internal## ## attached base packages:## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages:## [1] qwraps2_0.6.1## ## loaded via a namespace (and not attached):## [1] digest_0.6.37 R6_2.5.1 fastmap_1.2.0 xfun_0.48 ## [5] cachem_1.1.0 knitr_1.48 htmltools_0.5.8.1 rmarkdown_2.28 ## [9] lifecycle_1.0.4 cli_3.6.3 sass_0.4.9 jquerylib_0.1.4 ## [13] compiler_4.4.1 tools_4.4.1 evaluate_1.0.1 bslib_0.8.0 ## [17] Rcpp_1.0.13 yaml_2.3.10 rlang_1.1.4 jsonlite_1.8.9