| Priority: | recommended |
| Version: | 7.3-65 |
| Date: | 2025-02-19 |
| Revision: | $Rev: 3681 $ |
| Depends: | R (≥ 4.4.0), grDevices, graphics, stats, utils |
| Imports: | methods |
| Suggests: | lattice, nlme, nnet, survival |
| Description: | Functions and datasets to support Venables and Ripley, "Modern Applied Statistics with S" (4th edition, 2002). |
| Title: | Support Functions and Datasets for Venables and Ripley's MASS |
| LazyData: | yes |
| ByteCompile: | yes |
| License: | GPL-2 |GPL-3 |
| URL: | http://www.stats.ox.ac.uk/pub/MASS4/ |
| Contact: | <MASS@stats.ox.ac.uk> |
| NeedsCompilation: | yes |
| Packaged: | 2025-02-19 08:49:43 UTC; ripley |
| Author: | Brian Ripley [aut, cre, cph], Bill Venables [aut, cph], Douglas M. Bates [ctb], Kurt Hornik [trl] (partial port ca 1998), Albrecht Gebhardt [trl] (partial port ca 1998), David Firth [ctb] (support functions for polr) |
| Maintainer: | Brian Ripley <Brian.Ripley@R-project.org> |
| Repository: | CRAN |
| Date/Publication: | 2025-02-28 17:44:52 UTC |
Australian AIDS Survival Data
Description
Data on patients diagnosed with AIDS in Australia before 1 July 1991.
Usage
Aids2Format
This data frame contains 2843 rows and the following columns:
stateGrouped state of origin:
"NSW "includes ACT and"other"is WA, SA, NT and TAS.sexSex of patient.
diag(Julian) date of diagnosis.
death(Julian) date of death or end of observation.
status"A"(alive) or"D"(dead) at end of observation.T.categReported transmission category.
ageAge (years) at diagnosis.
Note
This data set has been slightly jittered as acondition of its release, to ensure patient confidentiality.
Source
Dr P. J. Solomon and the Australian National Centre in HIV Epidemiologyand Clinical Research.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Brain and Body Weights for 28 Species
Description
Average brain and body weights for 28 species of land animals.
Usage
AnimalsFormat
bodybody weight in kg.
brainbrain weight in g.
Note
The nameAnimals avoided conflicts with a system datasetanimals in S-PLUS 4.5 and later.
Source
P. J. Rousseeuw and A. M. Leroy (1987)Robust Regression and Outlier Detection.Wiley, p. 57.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Belgium Phone Calls 1950-1973
Description
A list object with the annual numbers of telephone calls, inBelgium. The components are:
yearlast two digits of the year.
callsnumber of telephone calls made (in millions of calls).
Usage
phonesSource
P. J. Rousseeuw and A. M. Leroy (1987)Robust Regression & Outlier Detection. Wiley.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Housing Values in Suburbs of Boston
Description
TheBoston data frame has 506 rows and 14 columns.
Usage
BostonFormat
This data frame contains the following columns:
crimper capita crime rate by town.
znproportion of residential land zoned for lots over 25,000 sq.ft.
indusproportion of non-retail business acres per town.
chasCharles River dummy variable (= 1 if tract bounds river; 0 otherwise).
noxnitrogen oxides concentration (parts per 10 million).
rmaverage number of rooms per dwelling.
ageproportion of owner-occupied units built prior to 1940.
disweighted mean of distances to five Boston employment centres.
radindex of accessibility to radial highways.
taxfull-value property-tax rate per $10,000.
ptratiopupil-teacher ratio by town.
black1000(Bk - 0.63)^2whereBkis the proportion of blacksby town.lstatlower status of the population (percent).
medvmedian value of owner-occupied homes in $1000s.
Source
Harrison, D. and Rubinfeld, D.L. (1978)Hedonic prices and the demand for clean air.J. Environ. Economics and Management5, 81–102.
Belsley D.A., Kuh, E. and Welsch, R.E. (1980)Regression Diagnostics. Identifying Influential Data and Sourcesof Collinearity.New York: Wiley.
Data from 93 Cars on Sale in the USA in 1993
Description
TheCars93 data frame has 93 rows and 27 columns.
Usage
Cars93Format
This data frame contains the following columns:
ManufacturerManufacturer.
ModelModel.
TypeType: a factor with levels
"Small","Sporty","Compact","Midsize","Large"and"Van".Min.PriceMinimum Price (in $1,000): price for a basic version.
PriceMidrange Price (in $1,000): average of
Min.PriceandMax.Price.Max.PriceMaximum Price (in $1,000): price for “a premium version”.
MPG.cityCity MPG (miles per US gallon by EPA rating).
MPG.highwayHighway MPG.
AirBagsAir Bags standard. Factor: none, driver only, or driver & passenger.
DriveTrainDrive train type: rear wheel, front wheel or 4WD; (factor).
CylindersNumber of cylinders (missing for Mazda RX-7, which has a rotary engine).
EngineSizeEngine size (litres).
HorsepowerHorsepower (maximum).
RPMRPM (revs per minute at maximum horsepower).
Rev.per.mileEngine revolutions per mile (in highest gear).
Man.trans.availIs a manual transmission version available? (yes or no, Factor).
Fuel.tank.capacityFuel tank capacity (US gallons).
PassengersPassenger capacity (persons)
LengthLength (inches).
WheelbaseWheelbase (inches).
WidthWidth (inches).
Turn.circleU-turn space (feet).
Rear.seat.roomRear seat room (inches) (missing for 2-seater vehicles).
Luggage.roomLuggage capacity (cubic feet) (missing for vans).
WeightWeight (pounds).
OriginOf non-USA or USA company origins? (factor).
MakeCombination of Manufacturer and Model (character).
Details
Cars were selected at random from among 1993 passenger car models thatwere listed in both theConsumer Reports issue and thePACE Buying Guide. Pickup trucks and Sport/Utility vehicles wereeliminated due to incomplete information in theConsumer Reportssource. Duplicate models (e.g., Dodge Shadow and Plymouth Sundance)were listed at most once.
Further description can be found in Lock (1993).
Source
Lock, R. H. (1993)1993 New Car Data.Journal of Statistics Education1(1).doi:10.1080/10691898.1993.11910459
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Diagnostic Tests on Patients with Cushing's Syndrome
Description
Cushing's syndrome is a hypertensive disorder associated withover-secretion of cortisol by the adrenal gland. The observationsare urinary excretion rates of two steroid metabolites.
Usage
CushingsFormat
TheCushings data frame has 27 rows and 3 columns:
Tetrahydrocortisoneurinary excretion rate (mg/24hr) of Tetrahydrocortisone.
Pregnanetriolurinary excretion rate (mg/24hr) of Pregnanetriol.
Typeunderlying type of syndrome, coded
a(adenoma) ,b(bilateral hyperplasia),c(carcinoma) orufor unknown.
Source
J. Aitchison and I. R. Dunsmore (1975)Statistical Prediction Analysis.Cambridge University Press, Tables 11.1–3.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
DDT in Kale
Description
A numeric vector of 15 measurements by different laboratories of thepesticide DDT in kale, in ppm (parts per million) using the multiplepesticide residue measurement.
Usage
DDTSource
C. E. Finsterwalder (1976)Collaborative study of an extension of the Millset almethod for the determination of pesticide residues in food.J. Off. Anal. Chem.59, 169–171
R. G. Staudte and S. J. Sheather (1990)Robust Estimation and Testing.Wiley
Level of GAG in Urine of Children
Description
Data were collected on the concentration of a chemical GAG in theurine of 314 children aged from zero to seventeen years. The aim ofthe study was to produce a chart to help a paediatrican to assess if achild's GAG concentration is ‘normal’.
Usage
GAGurineFormat
This data frame contains the following columns:
Ageage of child in years.
GAGconcentration of GAG (the units have been lost).
Source
Mrs Susan Prosser, Paediatrics Department, University of Oxford,via Department of Statistics Consulting Service.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Numbers of Car Insurance claims
Description
The data given in data frameInsurance consist of thenumbers of policyholders of an insurance company who wereexposed to risk, and the numbers of car insurance claims made bythose policyholders in the third quarter of 1973.
Usage
InsuranceFormat
This data frame contains the following columns:
Districtfactor: district of residence of policyholder (1 to 4): 4 is major cities.
Groupan ordered factor: group of car with levels <1 litre, 1–1.5 litre,1.5–2 litre, >2 litre.
Agean ordered factor: the age of the insured in 4 groups labelled<25, 25–29, 30–35, >35.
Holdersnumbers of policyholders.
Claimsnumbers of claims
Source
L. A. Baxter, S. M. Coutts and G. A. F. Ross (1980) Applications oflinear models in motor insurance.Proceedings of the 21st International Congress of Actuaries, Zurichpp. 11–29.
M. Aitkin, D. Anderson, B. Francis and J. Hinde (1989)Statistical Modelling in GLIM.Oxford University Press.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Examples
## main-effects fit as Poisson GLM with offsetglm(Claims ~ District + Group + Age + offset(log(Holders)), data = Insurance, family = poisson)# same via loglmloglm(Claims ~ District + Group + Age + offset(log(Holders)), data = Insurance)Internal MASS functions
Description
Internal MASS functions.
Usage
enlist(vec)fbeta(x, alpha, beta)frequency.polygon(x, nclass = nclass.freq(x), xlab="", ylab="", ...)nclass.freq(x)neg.bin(theta = stop("'theta' must be given"))negexp.SSival(mCall, data, LHS)Details
These are not intended to be called by the user.Some are for compatibilitywith earlier versions of MASS (the book).
Survival from Malignant Melanoma
Description
TheMelanoma data frame has data on 205 patients in Denmarkwith malignant melanoma.
Usage
MelanomaFormat
This data frame contains the following columns:
timesurvival time in days, possibly censored.
status1died from melanoma,2alive,3dead fromother causes.sex1= male,0= female.ageage in years.
yearof operation.
thicknesstumour thickness in mm.
ulcer1= presence,0= absence.
Source
P. K. Andersen, O. Borgan, R. D. Gill and N. Keiding (1993)Statistical Models based on Counting Processes.Springer.
Null Spaces of Matrices
Description
Given a matrix,M, find a matrixN giving a basis for the(left) null space. That iscrossprod(N, M) = t(N) %*% Mis an all-zero matrix andN has the maximum number of linearlyindependent columns.
Usage
Null(M)Arguments
M | Input matrix. A vector is coerced to a 1-column matrix. |
Details
For a basis for the (right) null space\{x : Mx = 0\},useNull(t(M)).
Value
The matrixN with the basis for the (left) null space, or amatrix with zero columns if the matrixM is square and ofmaximal rank.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
# The function is currently defined asfunction(M){ tmp <- qr(M) set <- if(tmp$rank == 0L) seq_len(ncol(M)) else -seq_len(tmp$rank) qr.Q(tmp, complete = TRUE)[, set, drop = FALSE]}Tests of Auditory Perception in Children with OME
Description
Experiments were performed on children on their ability todifferentiate a signal in broad-band noise. The noise was played froma pair of speakers and a signal was added to just one channel; thesubject had to turn his/her head to the channel with the added signal.The signal was either coherent (the amplitude of the noise wasincreased for a period) or incoherent (independent noise was added forthe same period to form the same increase in power).
The threshold used in the original analysis was the stimulus loudnessneeds to get 75% correct responses. Some of the children hadsuffered from otitis media with effusion (OME).
Usage
OMEFormat
TheOME data frame has 1129 rows and 7 columns:
IDSubject ID (1 to 99, with some IDs missing). A few subjects weremeasured at different ages.
OME"low"or"high"or"N/A"(at ages other than30 and 60 months).AgeAge of the subject (months).
LoudLoudness of stimulus, in decibels.
NoiseWhether the signal in the stimulus was
"coherent"or"incoherent".CorrectNumber of correct responses from
Trialstrials.TrialsNumber of trials performed.
Background
The experiment was to study otitis media with effusion (OME), a verycommon childhood condition where the middle ear space, which isnormally air-filled, becomes congested by a fluid. There is aconcomitant fluctuating, conductive hearing loss which can result invarious language, cognitive and social deficits. The term ‘binauralhearing’ is used to describe the listening conditions in which thebrain is processing information from both ears at the same time. Thebrain computes differences in the intensity and/or timing of signalsarriving at each ear which contributes to sound localisation and alsoto our ability to hear in background noise.
Some years ago, it was found that children of 7–8 years with a historyof significant OME had significantly worse binaural hearing thanchildren without such a history, despite having equivalentsensitivity. The question remained as to whether it was the timing,the duration, or the degree of severity of the otitis media episodesduring critical periods, which affected later binaural hearing. In anattempt to begin to answer this question, 95 children were monitored forthe presence of effusion every month since birth. On the basis of OMEexperience in their first two years, the test population was splitinto one group of high OME prevalence and one of low prevalence.
Source
Sarah Hogan, Dept of Physiology, University of Oxford, viaDept of Statistics Consulting Service
Examples
# Fit logistic curve from p = 0.5 to p = 1.0fp1 <- deriv(~ 0.5 + 0.5/(1 + exp(-(x-L75)/scal)), c("L75", "scal"), function(x,L75,scal)NULL)nls(Correct/Trials ~ fp1(Loud, L75, scal), data = OME, start = c(L75=45, scal=3))nls(Correct/Trials ~ fp1(Loud, L75, scal), data = OME[OME$Noise == "coherent",], start=c(L75=45, scal=3))nls(Correct/Trials ~ fp1(Loud, L75, scal), data = OME[OME$Noise == "incoherent",], start = c(L75=45, scal=3))# individual fits for each experimentaa <- factor(OME$Age)ab <- 10*OME$ID + unclass(aa)ac <- unclass(factor(ab))OME$UID <- as.vector(ac)OME$UIDn <- OME$UID + 0.1*(OME$Noise == "incoherent")rm(aa, ab, ac)OMEi <- OMElibrary(nlme)fp2 <- deriv(~ 0.5 + 0.5/(1 + exp(-(x-L75)/2)), "L75", function(x,L75) NULL)dec <- getOption("OutDec")options(show.error.messages = FALSE, OutDec=".")OMEi.nls <- nlsList(Correct/Trials ~ fp2(Loud, L75) | UIDn, data = OMEi, start = list(L75=45), control = list(maxiter=100))options(show.error.messages = TRUE, OutDec=dec)tmp <- sapply(OMEi.nls, function(X) {if(is.null(X)) NA else as.vector(coef(X))})OMEif <- data.frame(UID = round(as.numeric((names(tmp)))), Noise = rep(c("coherent", "incoherent"), 110), L75 = as.vector(tmp), stringsAsFactors = TRUE)OMEif$Age <- OME$Age[match(OMEif$UID, OME$UID)]OMEif$OME <- OME$OME[match(OMEif$UID, OME$UID)]OMEif <- OMEif[OMEif$L75 > 30,]summary(lm(L75 ~ Noise/Age, data = OMEif, na.action = na.omit))summary(lm(L75 ~ Noise/(Age + OME), data = OMEif, subset = (Age >= 30 & Age <= 60), na.action = na.omit), correlation = FALSE)# Or fit by weighted least squaresfpl75 <- deriv(~ sqrt(n)*(r/n - 0.5 - 0.5/(1 + exp(-(x-L75)/scal))), c("L75", "scal"), function(r,n,x,L75,scal) NULL)nls(0 ~ fpl75(Correct, Trials, Loud, L75, scal), data = OME[OME$Noise == "coherent",], start = c(L75=45, scal=3))nls(0 ~ fpl75(Correct, Trials, Loud, L75, scal), data = OME[OME$Noise == "incoherent",], start = c(L75=45, scal=3))# Test to see if the curves shift with agefpl75age <- deriv(~sqrt(n)*(r/n - 0.5 - 0.5/(1 + exp(-(x-L75-slope*age)/scal))), c("L75", "slope", "scal"), function(r,n,x,age,L75,slope,scal) NULL)OME.nls1 <-nls(0 ~ fpl75age(Correct, Trials, Loud, Age, L75, slope, scal), data = OME[OME$Noise == "coherent",], start = c(L75=45, slope=0, scal=2))sqrt(diag(vcov(OME.nls1)))OME.nls2 <-nls(0 ~ fpl75age(Correct, Trials, Loud, Age, L75, slope, scal), data = OME[OME$Noise == "incoherent",], start = c(L75=45, slope=0, scal=2))sqrt(diag(vcov(OME.nls2)))# Now allow random effects by using NLMEOMEf <- OME[rep(1:nrow(OME), OME$Trials),]OMEf$Resp <- with(OME, rep(rep(c(1,0), length(Trials)), t(cbind(Correct, Trials-Correct))))OMEf <- OMEf[, -match(c("Correct", "Trials"), names(OMEf))]## Not run: ## these fail in R on most platformsfp2 <- deriv(~ 0.5 + 0.5/(1 + exp(-(x-L75)/exp(lsc))), c("L75", "lsc"), function(x, L75, lsc) NULL)try(summary(nlme(Resp ~ fp2(Loud, L75, lsc), fixed = list(L75 ~ Age, lsc ~ 1), random = L75 + lsc ~ 1 | UID, data = OMEf[OMEf$Noise == "coherent",], method = "ML", start = list(fixed=c(L75=c(48.7, -0.03), lsc=0.24)), verbose = TRUE)))try(summary(nlme(Resp ~ fp2(Loud, L75, lsc), fixed = list(L75 ~ Age, lsc ~ 1), random = L75 + lsc ~ 1 | UID, data = OMEf[OMEf$Noise == "incoherent",], method = "ML", start = list(fixed=c(L75=c(41.5, -0.1), lsc=0)), verbose = TRUE)))## End(Not run)Diabetes in Pima Indian Women
Description
A population of women who were at least 21 years old, of Pima Indian heritageand living near Phoenix, Arizona, was tested for diabetesaccording to World Health Organization criteria. The datawere collected by the US National Institute of Diabetes and Digestive andKidney Diseases. We used the 532 complete records after dropping the(mainly missing) data on serum insulin.
Usage
Pima.trPima.tr2Pima.teFormat
These data frames contains the following columns:
npregnumber of pregnancies.
gluplasma glucose concentration in an oral glucose tolerance test.
bpdiastolic blood pressure (mm Hg).
skintriceps skin fold thickness (mm).
bmibody mass index (weight in kg/(height in m)
^2).peddiabetes pedigree function.
ageage in years.
typeYesorNo, for diabetic according to WHO criteria.
Details
The training setPima.tr contains a randomly selected set of 200subjects, andPima.te contains the remaining 332 subjects.Pima.tr2 containsPima.tr plus 100 subjects withmissing values in the explanatory variables.
Source
Smith, J. W., Everhart, J. E., Dickson, W. C., Knowler, W. C.and Johannes, R. S. (1988)Using the ADAP learning algorithm to forecast the onset ofdiabetes mellitus.InProceedings of the Symposium on Computer Applications inMedical Care (Washington, 1988), ed. R. A. Greenes,pp. 261–265. Los Alamitos, CA: IEEE Computer Society Press.
Ripley, B.D. (1996)Pattern Recognition and Neural Networks.Cambridge: Cambridge University Press.
Blood Pressure in Rabbits
Description
Five rabbits were studied on two occasions, after treatment withsaline (control) and after treatment with the5-HT_3 antagonist MDL72222. After each treatment ascending doses of phenylbiguanide wereinjected intravenously at 10 minute intervals and the responses ofmean blood pressure measured. The goal was to test whether thecardiogenic chemoreflex elicited by phenylbiguanide depends on theactivation of5-HT_3 receptors.
Usage
RabbitFormat
This data frame contains 60 rows and the following variables:
BPchangechange in blood pressure relative to the start of the experiment.
Dosedose of Phenylbiguanide in micrograms.
Runlabel of run (
"C1"to"C5", then"M1"to"M5").Treatmentplacebo or the
5-HT_3antagonist MDL 72222.Animallabel of animal used (
"R1"to"R5").
Source
J. Ludbrook (1994)Repeated measurements and multiple comparisons in cardiovascular research.Cardiovascular Research28, 303–311.
[The numerical data are not in the paper but were supplied byProfessor Ludbrook]
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Accelerated Testing of Tyre Rubber
Description
Data frame from accelerated testing of tyre rubber.
Usage
RubberFormat
lossthe abrasion loss in gm/hr.
hardthe hardness in Shore units.
tenstensile strength in kg/sq m.
Source
O.L. Davies (1947)Statistical Methods in Research and Production.Oliver and Boyd, Table 6.1 p. 119.
O.L. Davies and P.L. Goldsmith (1972)Statistical Methods in Research and Production.4th edition, Longmans, Table 8.1 p. 239.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Returns of the Standard and Poors 500
Description
Returns of the Standard and Poors 500 Index in the 1990's
Usage
SP500Format
A vector of returns of the Standard and Poors 500 index for allthe trading days in 1990, 1991, ..., 1999.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Growth Curves for Sitka Spruce Trees in 1988
Description
TheSitka data frame has 395 rows and 4 columns. It gives repeatedmeasurements on the log-size of 79 Sitka spruce trees, 54 of whichwere grown in ozone-enriched chambers and 25 were controls. The sizewas measured five times in 1988, at roughly monthly intervals.
Usage
SitkaFormat
This data frame contains the following columns:
sizemeasured size (height times diameter squared) oftree, on log scale.
Timetime of measurement in days since 1 January 1988.
treenumber of tree.
treateither
"ozone"for an ozone-enrichedchamber or"control".
Source
P. J. Diggle, K.-Y. Liang and S. L. Zeger (1994)Analysis of Longitudinal Data.Clarendon Press, Oxford
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Growth Curves for Sitka Spruce Trees in 1989
Description
TheSitka89 data frame has 632 rows and 4 columns. It gives repeatedmeasurements on the log-size of 79 Sitka spruce trees, 54 of whichwere grown in ozone-enriched chambers and 25 were controls. The sizewas measured eight times in 1989, at roughly monthly intervals.
Usage
Sitka89Format
This data frame contains the following columns:
sizemeasured size (height times diameter squared) oftree, on log scale.
Timetime of measurement in days since 1 January 1988.
treenumber of tree.
treateither
"ozone"for an ozone-enrichedchamber or"control".
Source
P. J. Diggle, K.-Y. Liang and S. L. Zeger (1994)Analysis of Longitudinal Data.Clarendon Press, Oxford
See Also
AFM Compositions of Aphyric Skye Lavas
Description
TheSkye data frame has 23 rows and 3 columns.
Usage
SkyeFormat
This data frame contains the following columns:
APercentage of sodium and potassium oxides.
FPercentage of iron oxide.
MPercentage of magnesium oxide.
Source
R. N. Thompson, J. Esson and A. C. Duncan (1972)Major element chemical variation in the Eocene lavas of the Isle ofSkye.J. Petrology,13, 219–253.
References
J. Aitchison (1986)The Statistical Analysis of Compositional Data.Chapman and Hall, p.360.
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
# ternary() is from the on-line answers.ternary <- function(X, pch = par("pch"), lcex = 1, add = FALSE, ord = 1:3, ...){ X <- as.matrix(X) if(any(X < 0)) stop("X must be non-negative") s <- drop(X %*% rep(1, ncol(X))) if(any(s<=0)) stop("each row of X must have a positive sum") if(max(abs(s-1)) > 1e-6) { warning("row(s) of X will be rescaled") X <- X / s } X <- X[, ord] s3 <- sqrt(1/3) if(!add) { oldpty <- par("pty") on.exit(par(pty=oldpty)) par(pty="s") plot(c(-s3, s3), c(0.5-s3, 0.5+s3), type="n", axes=FALSE, xlab="", ylab="") polygon(c(0, -s3, s3), c(1, 0, 0), density=0) lab <- NULL if(!is.null(dn <- dimnames(X))) lab <- dn[[2]] if(length(lab) < 3) lab <- as.character(1:3) eps <- 0.05 * lcex text(c(0, s3+eps*0.7, -s3-eps*0.7), c(1+eps, -0.1*eps, -0.1*eps), lab, cex=lcex) } points((X[,2] - X[,3])*s3, X[,1], ...)}ternary(Skye/100, ord=c(1,3,2))Effect of Swedish Speed Limits on Accidents
Description
An experiment was performed in Sweden in 1961–2 to assess theeffect of a speed limit on the motorway accident rate. Theexperiment was conducted on 92 days in each year, matched so thatdayj in 1962 was comparable to dayj in 1961. On some daysthe speed limit was in effect and enforced, while on other daysthere was no speed limit and cars tended to be driven faster.The speed limit days tended to be in contiguous blocks.
Usage
TrafficFormat
This data frame contains the following columns:
year1961 or 1962.
dayof year.
limitwas there a speed limit?
ytraffic accident count for that day.
Source
Svensson, A. (1981)On the goodness-of-fit test for the multiplicative Poisson model.Annals of Statistics,9, 697–704.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Nutritional and Marketing Information on US Cereals
Description
TheUScereal data frame has 65 rows and 11 columns.The data come from the 1993 ASA Statistical Graphics Exposition,and are taken from the mandatory F&DA food label. The data have beennormalized here to a portion of one American cup.
Usage
UScerealFormat
This data frame contains the following columns:
mfrManufacturer, represented by its first initial: G=General Mills,K=Kelloggs, N=Nabisco, P=Post, Q=Quaker Oats, R=Ralston Purina.
caloriesnumber of calories in one portion.
proteingrams of protein in one portion.
fatgrams of fat in one portion.
sodiummilligrams of sodium in one portion.
fibregrams of dietary fibre in one portion.
carbograms of complex carbohydrates in one portion.
sugarsgrams of sugars in one portion.
shelfdisplay shelf (1, 2, or 3, counting from the floor).
potassiumgrams of potassium.
vitaminsvitamins and minerals (none, enriched, or 100%).
Source
The original data are available athttps://lib.stat.cmu.edu/datasets/1993.expo/.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
The Effect of Punishment Regimes on Crime Rates
Description
Criminologists are interested in the effect of punishment regimes oncrime rates. This has been studied using aggregate data on 47 statesof the USA for 1960 given in this data frame. The variables seem tohave been re-scaled to convenient numbers.
Usage
UScrimeFormat
This data frame contains the following columns:
Mpercentage of males aged 14–24.
Soindicator variable for a Southern state.
Edmean years of schooling.
Po1police expenditure in 1960.
Po2police expenditure in 1959.
LFlabour force participation rate.
M.Fnumber of males per 1000 females.
Popstate population.
NWnumber of non-whites per 1000 people.
U1unemployment rate of urban males 14–24.
U2unemployment rate of urban males 35–39.
GDPgross domestic product per head.
Ineqincome inequality.
Probprobability of imprisonment.
Timeaverage time served in state prisons.
yrate of crimes in a particular category per head of population.
Source
Ehrlich, I. (1973) Participation in illegitimate activities: atheoretical and empirical investigation.Journal of Political Economy,81, 521–565.
Vandaele, W. (1978) Participation in illegitimate activities: Ehrlichrevisited. InDeterrence and Incapacitation,eds A. Blumstein, J. Cohen and D. Nagin, pp. 270–335.US National Academy of Sciences.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Veteran's Administration Lung Cancer Trial
Description
Veteran's Administration lung cancer trial from Kalbfleisch & Prentice.
Usage
VAFormat
A data frame with columns:
stimesurvival or follow-up time in days.
statusdead or censored.
treattreatment: standard or test.
agepatient's age in years.
KarnKarnofsky score of patient's performance on a scale of 0 to 100.
diag.timetimes since diagnosis in months at entry to trial.
cellone of four cell types.
priorprior therapy?
Source
Kalbfleisch, J.D. and Prentice R.L. (1980)The Statistical Analysis of Failure Time Data.Wiley.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Determinations of Nickel Content
Description
A numeric vector of 31 determinations of nickel content (ppm) ina Canadian syenite rock.
Usage
abbeySource
S. Abbey (1988)Geostandards Newsletter12, 241.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Accidental Deaths in the US 1973-1978
Description
A regular time series giving the monthly totals of accidentaldeaths in the USA.
Usage
accdeathsDetails
The values for first six months of 1979 (p. 326) were7798 7406 8363 8460 9217 9316.
Source
P. J. Brockwell and R. A. Davis (1991)Time Series: Theory and Methods.Springer, New York.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Try All One-Term Additions to a Model
Description
Try fitting all models that differ from the current model by adding asingle term from those supplied, maintaining marginality.
This function is generic; there exist methods for classeslm andglm and the default method will work for many other classes.
Usage
addterm(object, ...)## Default S3 method:addterm(object, scope, scale = 0, test = c("none", "Chisq"), k = 2, sorted = FALSE, trace = FALSE, ...)## S3 method for class 'lm'addterm(object, scope, scale = 0, test = c("none", "Chisq", "F"), k = 2, sorted = FALSE, ...)## S3 method for class 'glm'addterm(object, scope, scale = 0, test = c("none", "Chisq", "F"), k = 2, sorted = FALSE, trace = FALSE, ...)Arguments
object | An object fitted by some model-fitting function. |
scope | a formula specifying a maximal model which should include the currentone. All additional terms in the maximal model with all marginal termsin the original model are tried. |
scale | used in the definition of the AIC statistic for selecting the models,currently only for |
test | should the results include a test statistic relative to the originalmodel? The F test is only appropriate for |
k | the multiple of the number of degrees of freedom used for the penalty.Only |
sorted | should the results be sorted on the value of AIC? |
trace | if |
... | arguments passed to or from other methods. |
Details
The definition of AIC is only up to an additive constant: whenappropriate (lm models with specified scale) the constant is takento be that used in Mallows' Cp statistic and the results are labelledaccordingly.
Value
A table of class"anova" containing at least columns for the changein degrees of freedom and AIC (or Cp) for the models. Some methodswill give further information, for example sums of squares, deviances,log-likelihoods and test statistics.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
quine.hi <- aov(log(Days + 2.5) ~ .^4, quine)quine.lo <- aov(log(Days+2.5) ~ 1, quine)addterm(quine.lo, quine.hi, test="F")house.glm0 <- glm(Freq ~ Infl*Type*Cont + Sat, family=poisson, data=housing)addterm(house.glm0, ~. + Sat:(Infl+Type+Cont), test="Chisq")house.glm1 <- update(house.glm0, . ~ . + Sat*(Infl+Type+Cont))addterm(house.glm1, ~. + Sat:(Infl+Type+Cont)^2, test = "Chisq")Anorexia Data on Weight Change
Description
Theanorexia data frame has 72 rows and 3 columns.Weight change data for young female anorexia patients.
Usage
anorexiaFormat
This data frame contains the following columns:
TreatFactor of three levels:
"Cont"(control),"CBT"(Cognitive Behavioural treatment) and"FT"(familytreatment).PrewtWeight of patient before study period, in lbs.
PostwtWeight of patient after study period, in lbs.
Source
Hand, D. J., Daly, F., McConway, K., Lunn, D. and Ostrowski, E. eds (1993)A Handbook of Small Data Sets.Chapman & Hall, Data set 285 (p. 229)
(Note that the original source mistakenly says that weights are in kg.)
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Likelihood Ratio Tests for Negative Binomial GLMs
Description
Method function to perform sequential likelihood ratio tests for NegativeBinomial generalized linear models.
Usage
## S3 method for class 'negbin'anova(object, ..., test = "Chisq")Arguments
object | Fitted model object of class |
... | Zero or more additional fitted model objects of class |
test | Argument to match the |
Details
This function is a method for the generic functionanova() for class"negbin".It can be invoked by callinganova(x) for anobjectx of the appropriate class, or directly bycallinganova.negbin(x) regardless of theclass of the object.
Note
If only one fitted model object is specified, a sequential analysis ofdeviance table is given for the fitted model. Thetheta parameter is keptfixed. If more than one fitted model object is specified they must all beof class"negbin" and likelihood ratio tests are done of each model withinthe next. In this casetheta is assumed to have been re-estimated for eachmodel.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
glm.nb,negative.binomial,summary.negbin
Examples
m1 <- glm.nb(Days ~ Eth*Age*Lrn*Sex, quine, link = log)m2 <- update(m1, . ~ . - Eth:Age:Lrn:Sex)anova(m2, m1)anova(m2)Adaptive Numerical Integration
Description
Integrate a function of one variable over a finite range using arecursive adaptive method. This function is mainly fordemonstration purposes.
Usage
area(f, a, b, ..., fa = f(a, ...), fb = f(b, ...), limit = 10, eps = 1e-05)Arguments
f | The integrand as an |
a | Lower limit of integration. |
b | Upper limit of integration. |
... | Additional arguments needed by the integrand. |
fa | Function value at the lower limit. |
fb | Function value at the upper limit. |
limit | Limit on the depth to which recursion is allowed to go. |
eps | Error tolerance to control the process. |
Details
The method divides the interval in two and compares the values given bySimpson's rule and the trapezium rule. If these are within eps of eachother the Simpson's rule result is given, otherwise the process is appliedseparately to each half of the interval and the results added together.
Value
The integral froma tob off(x).
References
Venables, W. N. and Ripley, B. D. (1994)Modern Applied Statistics with S-Plus. Springer.pp. 105–110.
Examples
area(sin, 0, pi) # integrate the sin function from 0 to pi.Presence of Bacteria after Drug Treatments
Description
Tests of the presence of the bacteriaH. influenzaein children with otitis media in the Northern Territory of Australia.
Usage
bacteriaFormat
This data frame has 220 rows and the following columns:
- y
presence or absence: a factor with levels
nandy.- ap
active/placebo: a factor with levels
aandp.- hilo
hi/low compliance: a factor with levels
hiamdlo.- week
numeric: week of test.
- ID
subject ID: a factor.
- trt
a factor with levels
placebo,druganddrug+, a re-coding ofapandhilo.
Details
Dr A. Leach tested the effects of a drug on 50 children with a history ofotitis media in the Northern Territory of Australia. The childrenwere randomized to the drug or the a placebo, and also to receiveactive encouragement to comply with taking the drug.
The presence ofH. influenzae was checked at weeks 0, 2, 4, 6and 11: 30 of the checks were missing and are not included in thisdata frame.
Source
Dr Amanda Leachvia Mr James McBroom.
References
Menzies School of Health Research 1999–2000 Annual Report. p.20.https://www.menzies.edu.au/icms_docs/172302_2000_Annual_report.pdf.
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
contrasts(bacteria$trt) <- structure(contr.sdif(3), dimnames = list(NULL, c("drug", "encourage")))## fixed effects analysessummary(glm(y ~ trt * week, binomial, data = bacteria))summary(glm(y ~ trt + week, binomial, data = bacteria))summary(glm(y ~ trt + I(week > 2), binomial, data = bacteria))# conditional random-effects analysislibrary(survival)bacteria$Time <- rep(1, nrow(bacteria))coxph(Surv(Time, unclass(y)) ~ week + strata(ID), data = bacteria, method = "exact")coxph(Surv(Time, unclass(y)) ~ factor(week) + strata(ID), data = bacteria, method = "exact")coxph(Surv(Time, unclass(y)) ~ I(week > 2) + strata(ID), data = bacteria, method = "exact")# PQL glmm analysislibrary(nlme)## IGNORE_RDIFF_BEGINsummary(glmmPQL(y ~ trt + I(week > 2), random = ~ 1 | ID, family = binomial, data = bacteria))## IGNORE_RDIFF_ENDBandwidth for density() via Normal Reference Distribution
Description
A well-supported rule-of-thumb for choosing the bandwidth of a Gaussiankernel density estimator.
Usage
bandwidth.nrd(x)Arguments
x | A data vector. |
Value
A bandwidth on a scale suitable for thewidth argument ofdensity.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S.Springer, equation (5.5) on page 130.
Examples
# The function is currently defined asfunction(x){ r <- quantile(x, c(0.25, 0.75)) h <- (r[2] - r[1])/1.34 4 * 1.06 * min(sqrt(var(x)), h) * length(x)^(-1/5)}Biased Cross-Validation for Bandwidth Selection
Description
Uses biased cross-validation to select the bandwidth of a Gaussiankernel density estimator.
Usage
bcv(x, nb = 1000, lower, upper)Arguments
x | a numeric vector |
nb | number of bins to use. |
lower,upper | Range over which to minimize. The default is almost always satisfactory. |
Value
a bandwidth
References
Scott, D. W. (1992)Multivariate Density Estimation: Theory, Practice, and Visualization.Wiley.
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
bcv(geyser$duration)Body Temperature Series of Beaver 1
Description
Reynolds (1994) describes a small part of a study of the long-termtemperature dynamics of beaverCastor canadensis innorth-central Wisconsin. Body temperature was measured by telemetryevery 10 minutes for four females, but data from a one period of lessthan a day for each of two animals is used there.
Usage
beav1Format
Thebeav1 data frame has 114 rows and 4 columns.This data frame contains the following columns:
dayDay of observation (in days since the beginning of 1990),December 12–13.
timeTime of observation, in the form
0330for 3.30am.tempMeasured body temperature in degrees Celsius.
activIndicator of activity outside the retreat.
Note
The observation at 22:20 is missing.
Source
P. S. Reynolds (1994) Time-series analyses of beaver body temperatures.Chapter 11 ofLange, N., Ryan, L., Billard, L., Brillinger, D., Conquest, L.and Greenhouse, J. eds (1994)Case Studies in Biometry. NewYork: John Wiley and Sons.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
beav1 <- within(beav1, hours <- 24*(day-346) + trunc(time/100) + (time%%100)/60)plot(beav1$hours, beav1$temp, type="l", xlab="time", ylab="temperature", main="Beaver 1")usr <- par("usr"); usr[3:4] <- c(-0.2, 8); par(usr=usr)lines(beav1$hours, beav1$activ, type="s", lty=2)temp <- ts(c(beav1$temp[1:82], NA, beav1$temp[83:114]), start = 9.5, frequency = 6)activ <- ts(c(beav1$activ[1:82], NA, beav1$activ[83:114]), start = 9.5, frequency = 6)acf(temp[1:53])acf(temp[1:53], type = "partial")ar(temp[1:53])act <- c(rep(0, 10), activ)X <- cbind(1, act = act[11:125], act1 = act[10:124], act2 = act[9:123], act3 = act[8:122])alpha <- 0.80stemp <- as.vector(temp - alpha*lag(temp, -1))sX <- X[-1, ] - alpha * X[-115,]beav1.ls <- lm(stemp ~ -1 + sX, na.action = na.omit)summary(beav1.ls, correlation = FALSE)rm(temp, activ)Body Temperature Series of Beaver 2
Description
Reynolds (1994) describes a small part of a study of the long-termtemperature dynamics of beaverCastor canadensis innorth-central Wisconsin. Body temperature was measured by telemetryevery 10 minutes for four females, but data from a one period of lessthan a day for each of two animals is used there.
Usage
beav2Format
Thebeav2 data frame has 100 rows and 4 columns.This data frame contains the following columns:
dayDay of observation (in days since the beginning of 1990),November 3–4.
timeTime of observation, in the form
0330for 3.30am.tempMeasured body temperature in degrees Celsius.
activIndicator of activity outside the retreat.
Source
P. S. Reynolds (1994) Time-series analyses of beaver body temperatures.Chapter 11 ofLange, N., Ryan, L., Billard, L., Brillinger, D., Conquest, L.and Greenhouse, J. eds (1994)Case Studies in Biometry. New York: John Wiley and Sons.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
attach(beav2)beav2$hours <- 24*(day-307) + trunc(time/100) + (time%%100)/60plot(beav2$hours, beav2$temp, type = "l", xlab = "time", ylab = "temperature", main = "Beaver 2")usr <- par("usr"); usr[3:4] <- c(-0.2, 8); par(usr = usr)lines(beav2$hours, beav2$activ, type = "s", lty = 2)temp <- ts(temp, start = 8+2/3, frequency = 6)activ <- ts(activ, start = 8+2/3, frequency = 6)acf(temp[activ == 0]); acf(temp[activ == 1]) # also look at PACFsar(temp[activ == 0]); ar(temp[activ == 1])arima(temp, order = c(1,0,0), xreg = activ)dreg <- cbind(sin = sin(2*pi*beav2$hours/24), cos = cos(2*pi*beav2$hours/24))arima(temp, order = c(1,0,0), xreg = cbind(active=activ, dreg))## IGNORE_RDIFF_BEGINlibrary(nlme) # for gls and corAR1beav2.gls <- gls(temp ~ activ, data = beav2, correlation = corAR1(0.8), method = "ML")summary(beav2.gls)summary(update(beav2.gls, subset = 6:100))detach("beav2"); rm(temp, activ)## IGNORE_RDIFF_ENDBiopsy Data on Breast Cancer Patients
Description
This breast cancer database was obtained from the University of WisconsinHospitals, Madison from Dr. William H. Wolberg. He assessed biopsiesof breast tumours for 699 patients up to 15 July 1992; each of nineattributes has been scored on a scale of 1 to 10, and the outcome isalso known. There are 699 rows and 11 columns.
Usage
biopsyFormat
This data frame contains the following columns:
IDsample code number (not unique).
V1clump thickness.
V2uniformity of cell size.
V3uniformity of cell shape.
V4marginal adhesion.
V5single epithelial cell size.
V6bare nuclei (16 values are missing).
V7bland chromatin.
V8normal nucleoli.
V9mitoses.
class"benign"or"malignant".
Source
P. M. Murphy and D. W. Aha (1992). UCI Repository of machinelearning databases. [Machine-readable data repository]. Irvine, CA:University of California, Department of Information and Computer Science.
O. L. Mangasarian and W. H. Wolberg (1990)Cancer diagnosis via linear programming.SIAM News23, pp 1 & 18.
William H. Wolberg and O.L. Mangasarian (1990)Multisurface method of pattern separation for medical diagnosisapplied to breast cytology.Proceedings of the National Academy of Sciences, U.S.A.87, pp. 9193–9196.
O. L. Mangasarian, R. Setiono and W.H. Wolberg (1990)Pattern recognition via linear programming: Theory and applicationto medical diagnosis. InLarge-scale Numerical Optimizationeds Thomas F. Coleman and Yuying Li, SIAM Publications, Philadelphia,pp 22–30.
K. P. Bennett and O. L. Mangasarian (1992)Robust linear programming discrimination of two linearly inseparable sets.Optimization Methods and Software1, pp. 23–34 (Gordon & Breach Science Publishers).
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Risk Factors Associated with Low Infant Birth Weight
Description
Thebirthwt data frame has 189 rows and 10 columns.The data were collected at Baystate Medical Center, Springfield, Massduring 1986.
Usage
birthwtFormat
This data frame contains the following columns:
lowindicator of birth weight less than 2.5 kg.
agemother's age in years.
lwtmother's weight in pounds at last menstrual period.
racemother's race (
1= white,2= black,3= other).smokesmoking status during pregnancy.
ptlnumber of previous premature labours.
hthistory of hypertension.
uipresence of uterine irritability.
ftvnumber of physician visits during the first trimester.
bwtbirth weight in grams.
Source
Hosmer, D.W. and Lemeshow, S. (1989)Applied Logistic Regression. New York: Wiley
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
bwt <- with(birthwt, {race <- factor(race, labels = c("white", "black", "other"))ptd <- factor(ptl > 0)ftv <- factor(ftv)levels(ftv)[-(1:2)] <- "2+"data.frame(low = factor(low), age, lwt, race, smoke = (smoke > 0), ptd, ht = (ht > 0), ui = (ui > 0), ftv)})options(contrasts = c("contr.treatment", "contr.poly"))glm(low ~ ., binomial, bwt)Box-Cox Transformations for Linear Models
Description
Computes and optionally plots profile log-likelihoods for theparameter of the Box-Cox power transformation.
Usage
boxcox(object, ...)## Default S3 method:boxcox(object, lambda = seq(-2, 2, 1/10), plotit = TRUE, interp, eps = 1/50, xlab = expression(lambda), ylab = "log-Likelihood", ...)## S3 method for class 'formula'boxcox(object, lambda = seq(-2, 2, 1/10), plotit = TRUE, interp, eps = 1/50, xlab = expression(lambda), ylab = "log-Likelihood", ...)## S3 method for class 'lm'boxcox(object, lambda = seq(-2, 2, 1/10), plotit = TRUE, interp, eps = 1/50, xlab = expression(lambda), ylab = "log-Likelihood", ...)Arguments
object | a formula or fitted model object. Currently only |
lambda | vector of values of |
plotit | logical which controls whether the result should be plotted. |
interp | logical which controls whether spline interpolation isused. Default to |
eps | Tolerance for |
xlab | defaults to |
ylab | defaults to |
... | additional parameters to be used in the model fitting. |
Value
A list of thelambda vector and the computed profilelog-likelihood vector, invisibly if the result is plotted.
Side Effects
Ifplotit = TRUE plots log-likelihoodvslambda andindicates a 95% confidence interval about the maximum observed valueoflambda. Ifinterp = TRUE, spline interpolation isused to give a smoother plot.
References
Box, G. E. P. and Cox, D. R. (1964)An analysis of transformations (with discussion).Journal of the Royal Statistical Society B,26, 211–252.
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
boxcox(Volume ~ log(Height) + log(Girth), data = trees, lambda = seq(-0.25, 0.25, length.out = 10))boxcox(Days+1 ~ Eth*Sex*Age*Lrn, data = quine, lambda = seq(-0.05, 0.45, length.out = 20))Data from a cabbage field trial
Description
Thecabbages data set has 60 observations and 4 variables
Usage
cabbagesFormat
This data frame contains the following columns:
CultFactor giving the cultivar of the cabbage, two levels:
c39andc52.DateFactor specifying one of three planting dates:
d16,d20ord21.HeadWtWeight of the cabbage head, presumably in kg.
VitCAscorbic acid content, in undefined units.
Source
Rawlings, J. O. (1988)Applied Regression Analysis: A Research Tool.Wadsworth and Brooks/Cole. Example 8.4, page 219.(Rawlings cites the original source as the files of the lateDr Gertrude M Cox.)
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Colours of Eyes and Hair of People in Caithness
Description
Data on the cross-classification of people in Caithness, Scotland, byeye and hair colour. The region of the UK is particularly interestingas there is a mixture of people of Nordic, Celtic and Anglo-Saxon origin.
Usage
caithFormat
A 4 by 5 table with rows the eye colours (blue, light, medium, dark) andcolumns the hair colours (fair, red, medium, dark, black).
Source
Fisher, R.A. (1940) The precision of discriminant functions.Annals of Eugenics (London)10, 422–429.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
## IGNORE_RDIFF_BEGIN## The signs can vary by platformcorresp(caith)## IGNORE_RDIFF_ENDdimnames(caith)[[2]] <- c("F", "R", "M", "D", "B")par(mfcol=c(1,3))plot(corresp(caith, nf=2)); title("symmetric")plot(corresp(caith, nf=2), type="rows"); title("rows")plot(corresp(caith, nf=2), type="col"); title("columns")par(mfrow=c(1,1))Anatomical Data from Domestic Cats
Description
The heart and body weights of samples of male and female cats used fordigitalis experiments. The cats were all adult, over 2 kg body weight.
Usage
catsFormat
This data frame contains the following columns:
Sexsex: Factor with levels
"F"and"M".Bwtbody weight in kg.
Hwtheart weight in g.
Source
R. A. Fisher (1947) The analysis of covariance method for the relationbetween a part and the whole,Biometrics3, 65–68.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Heat Evolved by Setting Cements
Description
Experiment on the heat evolved in the setting of each of 13 cements.
Usage
cementFormat
x1, x2, x3, x4Proportions (%) of active ingredients.
yheat evolved in cals/gm.
Details
Thirteen samples of Portland cement were set. For each sample, thepercentages of the four main chemical ingredients was accuratelymeasured. While the cement was setting the amount of heat evolved wasalso measured.
Source
Woods, H., Steinour, H.H. and Starke, H.R. (1932) Effect of composition of Portland cement on heat evolved during hardening.Industrial Engineering and Chemistry,24, 1207–1214.
References
Hald, A. (1957)Statistical Theory with EngineeringApplications. Wiley, New York.
Examples
lm(y ~ x1 + x2 + x3 + x4, cement)Copper in Wholemeal Flour
Description
A numeric vector of 24 determinations of copper in wholemealflour, in parts per million.
Usage
chemSource
Analytical Methods Committee (1989) Robust statistics – how not toreject outliers.The Analyst114, 1693–1702.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Convert Lists to Data Frames for use by lattice
Description
Convert lists to data frames for use by lattice.
Usage
con2tr(obj)Arguments
obj | A list of components |
Details
con2tr repeats thex andy components suitably tomatch the vectorz.
Value
A data frame suitable for passing to lattice (formerly trellis) functions.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Confidence Intervals for Model Parameters
Description
Computes confidence intervals for one or more parameters in a fittedmodel. PackageMASS added methods forglm andnlsfits. As fronR 4.4.0 these have been migrated to packagestats.
It also adds a method forpolr fits.
Successive Differences Contrast Coding
Description
A coding for factors based on successive differences.
Usage
contr.sdif(n, contrasts = TRUE, sparse = FALSE)Arguments
n | The number of levels required. |
contrasts | logical: Should there be |
sparse | logical. If true and the result would be sparse (onlytrue for |
Details
The contrast coefficients are chosen so that the coded coefficientsin a one-way layout are the differences between the means of thesecond and first levels, the third and second levels, and so on. Thismakes most sense for ordered factors, but does not assume that thelevels are equally spaced.
Value
Ifcontrasts isTRUE, a matrix withn rows andn - 1 columns, and then byn identity matrix ifcontrasts isFALSE.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S.Fourth Edition, Springer.
See Also
contr.treatment,contr.sum,contr.helmert.
Examples
(A <- contr.sdif(6))zapsmall(ginv(A))Co-operative Trial in Analytical Chemistry
Description
Seven specimens were sent to 6 laboratories in 3 separate batches andeach analysed for Analyte. Each analysis was duplicated.
Usage
coopFormat
This data frame contains the following columns:
LabLaboratory,
L1,L2, ...,L6.SpcSpecimen,
S1,S2, ...,S7.BatBatch,
B1,B2,B3(nested withinSpc/Lab),ConcConcentration of Analyte in
g/kg.
Source
Analytical Methods Committee (1987)Recommendations for the conduct andinterpretation of co-operative trials,The Analyst112, 679–686.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Simple Correspondence Analysis
Description
Find the principal canonical correlation and corresponding row- andcolumn-scores from a correspondence analysis of a two-way contingencytable.
Usage
corresp(x, ...)## S3 method for class 'matrix'corresp(x, nf = 1, ...)## S3 method for class 'factor'corresp(x, y, ...)## S3 method for class 'data.frame'corresp(x, ...)## S3 method for class 'xtabs'corresp(x, ...)## S3 method for class 'formula'corresp(formula, data, ...)Arguments
x,formula | The function is generic, accepting various forms of the principalargument for specifying a two-way frequency table. Currently acceptedforms are matrices, data frames (coerced to frequency tables), objectsof class |
nf | The number of factors to be computed. Note that although 1 is the mostusual, one school of thought takes the first two singular vectors fora sort of biplot. |
y | a second factor for a cross-classification. |
data | an optional data frame, list or environment against whichto preferentially resolve variables in the formula. |
... | If the principal argument is a formula, a data frame may be specifiedas well from which variables in the formula are preferentiallysatisfied. |
Details
See Venables & Ripley (2002). Theplot method produces a graphicalrepresentation of the table ifnf=1, with theareas of circlesrepresenting the numbers of points. Ifnf is two or more thebiplot method is called, which plots the second and third columns ofthe matricesA = Dr^(-1/2) U L andB = Dc^(-1/2) V L where thesingular value decomposition isU L V. Thus the x-axis is thecanonical correlation times the row and column scores. Although thisis called a biplot, it doesnot have any useful inner productrelationship between the row and column scores. Think of this as anequally-scaled plot with two unrelated sets of labels. The origin ismarked on the plot with a cross. (For other versions of this plot seethe book.)
Value
An list object of class"correspondence" for whichprint,plot andbiplot methods are supplied.The main components are the canonical correlation(s) and the rowand column scores.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Gower, J. C. and Hand, D. J. (1996)Biplots. Chapman & Hall.
See Also
Examples
## IGNORE_RDIFF_BEGIN## The signs can vary by platform(ct <- corresp(~ Age + Eth, data = quine))plot(ct)corresp(caith)biplot(corresp(caith, nf = 2))## IGNORE_RDIFF_ENDResistant Estimation of Multivariate Location and Scatter
Description
Compute a multivariate location and scale estimate with a highbreakdown point – this can be thought of as estimating the mean andcovariance of thegood part of the data.cov.mve andcov.mcd are compatibility wrappers.
Usage
cov.rob(x, cor = FALSE, quantile.used = floor((n + p + 1)/2), method = c("mve", "mcd", "classical"), nsamp = "best", seed)cov.mve(...)cov.mcd(...)Arguments
x | a matrix or data frame. |
cor | should the returned result include a correlation matrix? |
quantile.used | the minimum number of the data points regarded as |
method | the method to be used – minimum volume ellipsoid, minimumcovariance determinant or classical product-moment. Using |
nsamp | the number of samples or |
seed | the seed to be used for random sampling: see |
... | arguments to |
Details
For method"mve", an approximate search is made of a subset ofsizequantile.used with an enclosing ellipsoid of smallest volume; inmethod"mcd" it is the volume of the Gaussian confidenceellipsoid, equivalently the determinant of the classical covariancematrix, that is minimized. The mean of the subset provides a firstestimate of the location, and the rescaled covariance matrix a firstestimate of scatter. The Mahalanobis distances of all the points fromthe location estimate for this covariance matrix are calculated, andthose points within the 97.5% point under Gaussian assumptions aredeclared to begood. The final estimates are the mean and rescaledcovariance of thegood points.
The rescaling is by the appropriate percentile under Gaussian data; inaddition the first covariance matrix has anad hoc finite-samplecorrection given by Marazzi.
For method"mve" the search is made over ellipsoids determinedby the covariance matrix ofp of the data points. For method"mcd" an additional improvement step suggested by Rousseeuw andvan Driessen (1999) is used, in which once a subset of sizequantile.used is selected, an ellipsoid based on its covarianceis tested (as this will have no larger a determinant, and may be smaller).
There is a hard limit on the allowed number of samples,2^{31} - 1. However, practical limits are likely to be much lowerand one might check the number of samples used for exhaustiveenumeration,combn(NROW(x), NCOL(x) + 1), before attempting it.
Value
A list with components
center | the final estimate of location. |
cov | the final estimate of scatter. |
cor | (only is |
sing | message giving number of singular samples out of total |
crit | the value of the criterion on log scale. For MCD this isthe determinant, and for MVE it is proportional to the volume. |
best | the subset used. For MVE the best sample, for MCD the bestset of size |
n.obs | total number of observations. |
References
P. J. Rousseeuw and A. M. Leroy (1987)Robust Regression and Outlier Detection.Wiley.
A. Marazzi (1993)Algorithms, Routines and S Functions for Robust Statistics.Wadsworth and Brooks/Cole.
P. J. Rousseeuw and B. C. van Zomeren (1990) Unmaskingmultivariate outliers and leverage points,Journal of the American Statistical Association,85, 633–639.
P. J. Rousseeuw and K. van Driessen (1999) A fast algorithm for theminimum covariance determinant estimator.Technometrics41, 212–223.
P. Rousseeuw and M. Hubert (1997) Recent developments in PROGRESS. InL1-Statistical Procedures and Related Topicsed Y. Dodge, IMS Lecture Notes volume31, pp. 201–214.
See Also
Examples
set.seed(123)cov.rob(stackloss)cov.rob(stack.x, method = "mcd", nsamp = "exact")Covariance Estimation for Multivariate t Distribution
Description
Estimates a covariance or correlation matrix assuming the data camefrom a multivariate t distribution: this provides some degree ofrobustness to outlier without giving a high breakdown point.
Usage
cov.trob(x, wt = rep(1, n), cor = FALSE, center = TRUE, nu = 5, maxit = 25, tol = 0.01)Arguments
x | data matrix. Missing values (NAs) are not allowed. |
wt | A vector of weights for each case: these are treated as if the case |
cor | Flag to choose between returning the correlation ( |
center | a logical value or a numeric vector providing the location about whichthe covariance is to be taken. If |
nu | ‘degrees of freedom’ for the multivariate t distribution. Must exceed2 (so that the covariance matrix is finite). |
maxit | Maximum number of iterations in fitting. |
tol | Convergence tolerance for fitting. |
Value
A list with the following components
cov | the fitted covariance matrix. |
center | the estimated or specified location vector. |
wt | the specified weights: only returned if the |
n.obs | the number of cases used in the fitting. |
cor | the fitted correlation matrix: only returned if |
call | The matched call. |
iter | The number of iterations used. |
References
J. T. Kent, D. E. Tyler and Y. Vardi (1994)A curious likelihood identity for the multivariate t-distribution.Communications in Statistics—Simulation and Computation23, 441–453.
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
See Also
Examples
cov.trob(stackloss)Performance of Computer CPUs
Description
A relative performance measure and characteristics of 209 CPUs.
Usage
cpusFormat
The components are:
namemanufacturer and model.
syctcycle time in nanoseconds.
mminminimum main memory in kilobytes.
mmaxmaximum main memory in kilobytes.
cachcache size in kilobytes.
chminminimum number of channels.
chmaxmaximum number of channels.
perfpublished performance on a benchmark mix relative to an IBM 370/158-3.
estperfestimated performance (by Ein-Dor & Feldmesser).
Source
P. Ein-Dor and J. Feldmesser (1987)Attributes of the performance of central processing units: a relativeperformance prediction model.Comm. ACM.30, 308–317.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Morphological Measurements on Leptograpsus Crabs
Description
Thecrabs data frame has 200 rows and 8 columns, describing 5morphological measurements on 50 crabs each of two colour forms andboth sexes, of the speciesLeptograpsus variegatus collected atFremantle, W. Australia.
Usage
crabsFormat
This data frame contains the following columns:
spspecies-"B"or"O"for blue or orange.sexas it says.
indexindex
1:50within each of the four groups.FLfrontal lobe size (mm).
RWrear width (mm).
CLcarapace length (mm).
CWcarapace width (mm).
BDbody depth (mm).
Source
Campbell, N.A. and Mahon, R.J. (1974) A multivariatestudy of variation in two species of rock crab of genusLeptograpsus.Australian Journal of Zoology22, 417–425.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Monthly Deaths from Lung Diseases in the UK
Description
A time series giving the monthly deaths from bronchitis,emphysema and asthma in the UK, 1974-1979, both sexes (deaths),
Usage
deathsSource
P. J. Diggle (1990)Time Series: A Biostatistical Introduction.Oxford, table A.3
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
This the same as datasetldeaths inR'sdatasets package.
Transform an Allowable Formula for 'loglm' into one for 'terms'
Description
loglm allows dimension numbers to be used in place of names inthe formula.denumerate modifies such a formula into one thatterms can process.
Usage
denumerate(x)Arguments
x | A formula conforming to the conventions of |
Details
The model fitting functionloglm fits log-linear models tofrequency data using iterative proportional scaling. To specifythe model the user must nominate the margins in the data thatremain fixed under the log-linear model. It is convenient toallow the user to use dimension numbers, 1, 2, 3, ... for thefirst, second, third, ..., margins in a similar way to variablenames. As the model formula has to be parsed byterms, whichtreats1 in a special way and requires parseable variable names,these formulae have to be modified by giving genuine names forthese margin, or dimension numbers.denumerate replaces thesenumbers with names of a special form, namelyn is replaced by.vn. This allowsterms to parse the formula in the usual way.
Value
A linear model formula like that presented, except that wheredimension numbers, sayn, have been used to specify fixedmargins these are replaced by names of the form.vn which maybe processed byterms.
See Also
Examples
denumerate(~(1+2+3)^3 + a/b)## which gives ~ (.v1 + .v2 + .v3)^3 + a/bPredict Doses for Binomial Assay model
Description
Calibrate binomial assays, generalizing the calculation of LD50.
Usage
dose.p(obj, cf = 1:2, p = 0.5)Arguments
obj | A fitted model object of class inheriting from |
cf | The terms in the coefficient vector giving the intercept andcoefficient of (log-)dose |
p | Probabilities at which to predict the dose needed. |
Value
An object of class"glm.dose" giving the prediction (attribute"p" and standard error (attribute"SE") at each responseprobability.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S.Springer.
Examples
ldose <- rep(0:5, 2)numdead <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16)sex <- factor(rep(c("M", "F"), c(6, 6)))SF <- cbind(numdead, numalive = 20 - numdead)budworm.lg0 <- glm(SF ~ sex + ldose - 1, family = binomial)dose.p(budworm.lg0, cf = c(1,3), p = 1:3/4)dose.p(update(budworm.lg0, family = binomial(link=probit)), cf = c(1,3), p = 1:3/4)Deaths of Car Drivers in Great Britain 1969-84
Description
A regular time series giving the monthly totals of car drivers inGreat Britain killed or seriously injured Jan 1969 to Dec1984. Compulsory wearing of seat belts was introduced on 31 Jan 1983
Usage
driversSource
Harvey, A.C. (1989)Forecasting, Structural Time Series Models and the Kalman Filter.Cambridge University Press, pp. 519–523.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Try All One-Term Deletions from a Model
Description
Try fitting all models that differ from the current model by dropping asingle term, maintaining marginality.
This function is generic; there exist methods for classeslm andglm and the default method will work for many other classes.
Usage
dropterm (object, ...)## Default S3 method:dropterm(object, scope, scale = 0, test = c("none", "Chisq"), k = 2, sorted = FALSE, trace = FALSE, ...)## S3 method for class 'lm'dropterm(object, scope, scale = 0, test = c("none", "Chisq", "F"), k = 2, sorted = FALSE, ...)## S3 method for class 'glm'dropterm(object, scope, scale = 0, test = c("none", "Chisq", "F"), k = 2, sorted = FALSE, trace = FALSE, ...)Arguments
object | A object fitted by some model-fitting function. |
scope | a formula giving terms which might be dropped. By default, themodel formula. Only terms that can be dropped and maintain marginalityare actually tried. |
scale | used in the definition of the AIC statistic for selecting the models,currently only for |
test | should the results include a test statistic relative to the originalmodel? The F test is only appropriate for |
k | the multiple of the number of degrees of freedom used for the penalty.Only |
sorted | should the results be sorted on the value of AIC? |
trace | if |
... | arguments passed to or from other methods. |
Details
The definition of AIC is only up to an additive constant: whenappropriate (lm models with specified scale) the constant is takento be that used in Mallows' Cp statistic and the results are labelledaccordingly.
Value
A table of class"anova" containing at least columns for the changein degrees of freedom and AIC (or Cp) for the models. Some methodswill give further information, for example sums of squares, deviances,log-likelihoods and test statistics.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
quine.hi <- aov(log(Days + 2.5) ~ .^4, quine)quine.nxt <- update(quine.hi, . ~ . - Eth:Sex:Age:Lrn)dropterm(quine.nxt, test= "F")quine.stp <- stepAIC(quine.nxt, scope = list(upper = ~Eth*Sex*Age*Lrn, lower = ~1), trace = FALSE)dropterm(quine.stp, test = "F")quine.3 <- update(quine.stp, . ~ . - Eth:Age:Lrn)dropterm(quine.3, test = "F")quine.4 <- update(quine.3, . ~ . - Eth:Age)dropterm(quine.4, test = "F")quine.5 <- update(quine.4, . ~ . - Age:Lrn)dropterm(quine.5, test = "F")house.glm0 <- glm(Freq ~ Infl*Type*Cont + Sat, family=poisson, data = housing)house.glm1 <- update(house.glm0, . ~ . + Sat*(Infl+Type+Cont))dropterm(house.glm1, test = "Chisq")Foraging Ecology of Bald Eagles
Description
Knight and Skagen collected during a field study on the foragingbehaviour of wintering Bald Eagles in Washington State, USA dataconcerning 160 attempts by one (pirating) Bald Eagle to steal a chumsalmon from another (feeding) Bald Eagle.
Usage
eaglesFormat
Theeagles data frame has 8 rows and 5 columns.
yNumber of successful attempts.
nTotal number of attempts.
PSize of pirating eagle (
L= large,S= small).AAge of pirating eagle (
I= immature,A= adult).VSize of victim eagle (
L= large,S= small).
Source
Knight, R. L. and Skagen, S. K. (1988)Agonistic asymmetries and the foraging ecology of Bald Eagles.Ecology69, 1188–1194.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Examples
eagles.glm <- glm(cbind(y, n - y) ~ P*A + V, data = eagles, family = binomial)dropterm(eagles.glm)prof <- profile(eagles.glm)plot(prof)pairs(prof)Seizure Counts for Epileptics
Description
Thall and Vail (1990) give a data set on two-week seizure counts for59 epileptics. The number of seizures was recorded for a baselineperiod of 8 weeks, and then patients were randomly assigned to atreatment group or a control group. Counts were then recorded forfour successive two-week periods. The subject's age is the onlycovariate.
Usage
epilFormat
This data frame has 236 rows and the following 9 columns:
ythe count for the 2-week period.
trttreatment,
"placebo"or"progabide".basethe counts in the baseline 8-week period.
agesubject's age, in years.
V40/1indicator variable of period 4.subjectsubject number, 1 to 59.
periodperiod, 1 to 4.
lbaselog-counts for the baseline period, centred to have zero mean.
lagelog-ages, centred to have zero mean.
Note
The value ofy in row 31 was corrected from21 to23 in version 7.3-65.
Source
Thall, P. F. and Vail, S. C. (1990)Some covariance models for longitudinal count data with over-dispersion.Biometrics46, 657–671.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth Edition. Springer.
Examples
summary(glm(y ~ lbase*trt + lage + V4, family = poisson, data = epil), correlation = FALSE)epil2 <- epil[epil$period == 1, ]epil2["period"] <- rep(0, 59); epil2["y"] <- epil2["base"]epil["time"] <- 1; epil2["time"] <- 4epil2 <- rbind(epil, epil2)epil2$pred <- unclass(epil2$trt) * (epil2$period > 0)epil2$subject <- factor(epil2$subject)epil3 <- aggregate(epil2, list(epil2$subject, epil2$period > 0), function(x) if(is.numeric(x)) sum(x) else x[1])epil3$pred <- factor(epil3$pred, labels = c("base", "placebo", "drug"))contrasts(epil3$pred) <- structure(contr.sdif(3), dimnames = list(NULL, c("placebo-base", "drug-placebo")))## IGNORE_RDIFF_BEGINsummary(glm(y ~ pred + factor(subject) + offset(log(time)), family = poisson, data = epil3), correlation = FALSE)## IGNORE_RDIFF_ENDsummary(glmmPQL(y ~ lbase*trt + lage + V4, random = ~ 1 | subject, family = poisson, data = epil))summary(glmmPQL(y ~ pred, random = ~1 | subject, family = poisson, data = epil3))Plots with Geometrically Equal Scales
Description
Version of a scatterplot with scales chosen to be equal on both axes, thatis 1cm represents the same units on each
Usage
eqscplot(x, y, ratio = 1, tol = 0.04, uin, ...)Arguments
x | vector of x values, or a 2-column matrix, or a list with components |
y | vector of y values |
ratio | desired ratio of units on the axes. Units on the y axis are drawn at |
tol | proportion of white space at the margins of plot. |
uin | desired values for the units-per-inch parameter. If of length 1, thedesired units per inch on the x axis. |
... | further arguments for |
Details
Limits for the x and y axes are chosen so that they include thedata. One of the sets of limits is then stretched from the midpoint tomake the units in the ratio given byratio. Finally both arestretched by1 + tol to move points away from the axes, and thepoints plotted.
Value
invisibly, the values ofuin used for the plot.
Side Effects
performs the plot.
Note
This was originally written for S:R'splot.window has anargumentasp with a similar effect (including to this function'sratio) and can be passed from the defaultplot function.
Argumentsratio anduin were suggested by Bill Dunlap.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Ecological Factors in Farm Management
Description
Thefarms data frame has 20 rows and 4 columns. The rows are farmson the Dutch island of Terschelling and the columns are factorsdescribing the management of grassland.
Usage
farmsFormat
This data frame contains the following columns:
MoisFive levels of soil moisture – level 3 does not occur at these 20 farms.
ManagGrassland management type (
SF= standard,BF= biological,HF= hobby farming,NM= nature conservation).UseGrassland use (
U1= hay production,U2=intermediate,U3= grazing).ManureManure usage – classes
C0toC4.
Source
J.C. Gower and D.J. Hand (1996)Biplots. Chapman & Hall, Table 4.6.
Quoted as from:
R.H.G. Jongman, C.J.F. ter Braak and O.F.R. van Tongeren (1987)Data Analysis in Community and Landscape Ecology.PUDOC, Wageningen.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
farms.mca <- mca(farms, abbrev = TRUE) # Use levels as nameseqscplot(farms.mca$cs, type = "n")text(farms.mca$rs, cex = 0.7)text(farms.mca$cs, labels = dimnames(farms.mca$cs)[[1]], cex = 0.7)Measurements of Forensic Glass Fragments
Description
Thefgl data frame has 214 rows and 10 columns.It was collected by B. German on fragments of glasscollected in forensic work.
Usage
fglFormat
This data frame contains the following columns:
RIrefractive index; more precisely the refractive index is 1.518xxxx.
The next 8 measurements are percentages by weight of oxides.
Nasodium.
Mgmanganese.
Alaluminium.
Sisilicon.
Kpotassium.
Cacalcium.
Babarium.
Feiron.
typeThe fragments were originally classed into seven types, one of whichwas absent in this dataset. The categories which occur arewindow float glass (
WinF: 70),window non-float glass (WinNF: 76),vehicle window glass (Veh: 17),containers (Con: 13),tableware (Tabl: 9) andvehicle headlamps (Head: 29).
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Maximum-likelihood Fitting of Univariate Distributions
Description
Maximum-likelihood fitting of univariate distributions, allowingparameters to be held fixed if desired.
Usage
fitdistr(x, densfun, start, ...)Arguments
x | A numeric vector of length at least one containing onlyfinite values. |
densfun | Either a character string or a function returning a density evaluatedat its first argument. Distributions |
start | A named list giving the parameters to be optimized with initialvalues. This can be omitted for some of the named distributions andmust be for others (see Details). |
... | Additional parameters, either for |
Details
For the Normal, log-Normal, geometric, exponential and Poissondistributions the closed-form MLEs (and exact standard errors) areused, andstart should not be supplied.
For all other distributions, direct optimization of the log-likelihoodis performed usingoptim. The estimated standarderrors are taken from the observed information matrix, calculated by anumerical approximation. For one-dimensional problems the Nelder-Meadmethod is used and for multi-dimensional problems the BFGS method,unless arguments namedlower orupper are supplied (whenL-BFGS-B is used) ormethod is supplied explicitly.
For the"t" named distribution the density is taken to be thelocation-scale family with locationm and scales.
For the following named distributions, reasonable starting values willbe computed ifstart is omitted or only partially specified:"cauchy","gamma","logistic","negative binomial" (parametrized bymu andsize),"t" and"weibull". Note that thesestarting values may not be good enough if the fit is poor: inparticular they are not resistant to outliers unless the fitteddistribution is long-tailed.
There areprint,coef,vcovandlogLik methods for class"fitdistr".
Value
An object of class"fitdistr", a list with four components,
estimate | the parameter estimates, |
sd | the estimated standard errors, |
vcov | the estimated variance-covariance matrix, and |
loglik | the log-likelihood. |
Note
Numerical optimization cannot work miracles: please note the commentsinoptim on scaling data. If the fitted parameters arefar away from one, consider re-fitting specifying the controlparameterparscale.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
## avoid spurious accuracyop <- options(digits = 3)set.seed(123)x <- rgamma(100, shape = 5, rate = 0.1)fitdistr(x, "gamma")## now do this directly with more control.fitdistr(x, dgamma, list(shape = 1, rate = 0.1), lower = 0.001)set.seed(123)x2 <- rt(250, df = 9)fitdistr(x2, "t", df = 9)## allow df to vary: not a very good idea!fitdistr(x2, "t")## now do fixed-df fit directly with more control.mydt <- function(x, m, s, df) dt((x-m)/s, df)/sfitdistr(x2, mydt, list(m = 0, s = 1), df = 9, lower = c(-Inf, 0))set.seed(123)x3 <- rweibull(100, shape = 4, scale = 100)fitdistr(x3, "weibull")set.seed(123)x4 <- rnegbin(500, mu = 5, theta = 4)fitdistr(x4, "Negative Binomial")options(op)Forbes' Data on Boiling Points in the Alps
Description
A data frame with 17 observations on boiling pointof water and barometric pressure in inches of mercury.
Usage
forbesFormat
bpboiling point (degrees Farenheit).
presbarometric pressure in inches of mercury.
Source
A. C. Atkinson (1985)Plots, Transformations and Regression. Oxford.
S. Weisberg (1980)Applied Linear Regression. Wiley.
Rational Approximation
Description
Find rational approximations to the components of a real numericobject using a standard continued fraction method.
Usage
fractions(x, cycles = 10, max.denominator = 2000, ...)as.fractions(x)is.fractions(f)Arguments
x | Any object of mode numeric. Missing values are now allowed. |
cycles | The maximum number of steps to be used in the continued fractionapproximation process. |
max.denominator | An early termination criterion. If any partial denominatorexceeds |
... | arguments passed to or from other methods. |
f | anR object. |
Details
Each component is first expanded in a continued fraction of theform
x = floor(x) + 1/(p1 + 1/(p2 + ...)))
wherep1,p2, ... are positive integers, terminating eitheratcycles terms or when apj > max.denominator. Thecontinued fraction is then re-arranged to retrieve the numeratorand denominator as integers.
The numerators and denominators are then combined into acharacter vector that becomes the"fracs" attribute and used inprinted representations.
Arithmetic operations on"fractions" objects have full floatingpoint accuracy, but the character representation printed out maynot.
Value
An object of class"fractions". A structure with.Data componentthe same as the input numericx, but with the rationalapproximations held as a character vector attribute,"fracs".Arithmetic operations on"fractions" objects are possible.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth Edition. Springer.
See Also
Examples
X <- matrix(runif(25), 5, 5)zapsmall(solve(X, X/5)) # print near-zeroes as zerofractions(solve(X, X/5))fractions(solve(X, X/5)) + 1Velocities for 82 Galaxies
Description
A numeric vector of velocities in km/sec of 82 galaxies from 6well-separated conic sections of anunfilled survey of the CoronaBorealis region. Multimodality in such surveys is evidence for voidsand superclusters in the far universe.
Usage
galaxiesNote
There is an 83rd measurement of 5607 km/sec in the Postmanet al. paper which is omitted in Roeder (1990) and from thedataset here.
There is also a typo: this dataset has 78th observation 26690 whichshould be 26960.
Source
Roeder, K. (1990) Density estimation with confidence sets exemplifiedby superclusters and voids in galaxies.Journal of the American Statistical Association85, 617–624.
Postman, M., Huchra, J. P. and Geller, M. J. (1986)Probes of large-scale structures in the Corona Borealis region.Astronomical Journal92, 1238–1247.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
gal <- galaxies/1000c(width.SJ(gal, method = "dpi"), width.SJ(gal))plot(x = c(0, 40), y = c(0, 0.3), type = "n", bty = "l", xlab = "velocity of galaxy (1000km/s)", ylab = "density")rug(gal)lines(density(gal, width = 3.25, n = 200), lty = 1)lines(density(gal, width = 2.56, n = 200), lty = 3)Calculate the MLE of the Gamma Dispersion Parameter in a GLM Fit
Description
A front end togamma.shape for convenience. Finds thereciprocal of the estimate of the shape parameter only.
Usage
gamma.dispersion(object, ...)Arguments
object | Fitted model object giving the gamma fit. |
... | Additional arguments passed on to |
Value
The MLE of the dispersion parameter of the gamma distribution.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
gamma.shape.glm, including the example on its help page.
Estimate the Shape Parameter of the Gamma Distribution in a GLM Fit
Description
Find the maximum likelihood estimate of the shape parameter ofthe gamma distribution after fitting aGamma generalizedlinear model.
Usage
gamma.shape(object, ...)## S3 method for class 'glm'gamma.shape(object, it.lim = 10, eps.max = .Machine$double.eps^0.25, verbose = FALSE, ...)Arguments
object | Fitted model object from a |
it.lim | Upper limit on the number of iterations. |
eps.max | Maximum discrepancy between approximations for the iterationprocess to continue. |
verbose | If |
... | further arguments passed to or from other methods. |
Details
A glm fit for a Gamma family correctly calculates the maximumlikelihood estimate of the mean parameters but provides only acrude estimate of the dispersion parameter. This function takesthe results of the glm fit and solves the maximum likelihoodequation for the reciprocal of the dispersion parameter, which isusually called the shape (or exponent) parameter.
Value
List of two components
alpha | the maximum likelihood estimate |
SE | the approximate standard error, the square-root of the reciprocal ofthe observed information. |
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
clotting <- data.frame( u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12))clot1 <- glm(lot1 ~ log(u), data = clotting, family = Gamma)gamma.shape(clot1)gm <- glm(Days + 0.1 ~ Age*Eth*Sex*Lrn, quasi(link=log, variance="mu^2"), quine, start = c(3, rep(0,31)))gamma.shape(gm, verbose = TRUE)summary(gm, dispersion = gamma.dispersion(gm)) # better summaryRemission Times of Leukaemia Patients
Description
A data frame from a trial of 42 leukaemia patients. Some weretreated with the drug6-mercaptopurineand the rest are controls. The trial was designed as matched pairs,both withdrawn from the trial when either came out of remission.
Usage
gehanFormat
This data frame contains the following columns:
pairlabel for pair.
timeremission time in weeks.
censcensoring, 0/1.
treattreatment, control or 6-MP.
Source
Cox, D. R. and Oakes, D. (1984)Analysis of Survival Data.Chapman & Hall, p. 7. Taken from
Gehan, E.A. (1965) A generalized Wilcoxon test for comparingarbitrarily single-censored samples.Biometrika52, 203–233.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
library(survival)gehan.surv <- survfit(Surv(time, cens) ~ treat, data = gehan, conf.type = "log-log")summary(gehan.surv)survreg(Surv(time, cens) ~ factor(pair) + treat, gehan, dist = "exponential")summary(survreg(Surv(time, cens) ~ treat, gehan, dist = "exponential"))summary(survreg(Surv(time, cens) ~ treat, gehan))gehan.cox <- coxph(Surv(time, cens) ~ treat, gehan)summary(gehan.cox)Rat Genotype Data
Description
Data from a foster feeding experiment with rat mothers and litters offour different genotypes:A,B,I andJ.Rat litters were separated from their natural mothers at birth andgiven to foster mothers to rear.
Usage
genotypeFormat
The data frame has the following components:
Littergenotype of the litter.
Mothergenotype of the foster mother.
WtLitter average weight gain of the litter, in grams at age 28 days.(The source states that the within-litter variability is negligible.)
Source
Scheffe, H. (1959)The Analysis of Variance Wiley p. 140.
Bailey, D. W. (1953)The Inheritance of Maternal Influences on the Growth of the Rat.Unpublished Ph.D. thesis, University of California. Table B of the Appendix.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Old Faithful Geyser Data
Description
A version of the eruptions data from the ‘Old Faithful’ geyserin Yellowstone National Park, Wyoming. This version comes fromAzzalini and Bowman (1990) and is of continuous measurement from August1 to August 15, 1985.
Some nocturnal duration measurements were coded as 2, 3 or 4 minutes,having originally been described as ‘short’, ‘medium’or ‘long’.
Usage
geyserFormat
A data frame with 299 observations on 2 variables.
duration | numeric | Eruption time in mins |
waiting | numeric | Waiting time for this eruption |
Note
Thewaiting time was incorrectly described as the time to thenext eruption in the original files, and corrected forMASSversion 7.3-30.
References
Azzalini, A. and Bowman, A. W. (1990) A look at somedata on the Old Faithful geyser.Applied Statistics39, 357–365.
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
CRAN packagesm.
Line Transect of Soil in Gilgai Territory
Description
This dataset was collected on a line transect survey in gilgaiterritory in New South Wales, Australia. Gilgais are natural gentledepressions in otherwise flat land, and sometimes seem to be regularlydistributed. The data collection was stimulated by the question: arethese patterns reflected in soil properties? At each of 365 samplinglocations on a linear grid of 4 meters spacing, samples were taken atdepths 0-10 cm, 30-40 cm and 80-90 cm below the surface. pH, electricalconductivity and chloride content were measured on a 1:5 soil:waterextract from each sample.
Usage
gilgaisFormat
This data frame contains the following columns:
pH00pH at depth 0–10 cm.
pH30pH at depth 30–40 cm.
pH80pH at depth 80–90 cm.
e00electrical conductivity in mS/cm (0–10 cm).
e30electrical conductivity in mS/cm (30–40 cm).
e80electrical conductivity in mS/cm (80–90 cm).
c00chloride content in ppm (0–10 cm).
c30chloride content in ppm (30–40 cm).
c80chloride content in ppm (80–90 cm).
Source
Webster, R. (1977) Spectral analysis of gilgai soil.Australian Journal of Soil Research15, 191–204.
Laslett, G. M. (1989)Kriging and splines: An empirical comparison of theirpredictive performance in some applications (with discussion).Journal of the American Statistical Association89, 319–409
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Generalized Inverse of a Matrix
Description
Calculates the Moore-Penrose generalized inverse of a matrixX.
Usage
ginv(X, tol = sqrt(.Machine$double.eps))Arguments
X | Matrix for which the Moore-Penrose inverse is required. |
tol | A relative tolerance to detect zero singular values. |
Value
A MP generalized inverse matrix forX.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
See Also
Change a Negative Binomial fit to a GLM fit
Description
This function modifies an output object fromglm.nb() to onethat looks like the output fromglm() with a negative binomialfamily. This allows it to be updated keeping the theta parameterfixed.
Usage
glm.convert(object)Arguments
object | An object of class |
Details
Convenience function needed to effect some low level changes to thestructure of the fitted model object.
Value
An object of class"glm" with negative binomial family. The thetaparameter is then fixed at its present estimate.
See Also
Examples
quine.nb1 <- glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = quine)quine.nbA <- glm.convert(quine.nb1)quine.nbB <- update(quine.nb1, . ~ . + Sex:Age:Lrn)anova(quine.nbA, quine.nbB)Fit a Negative Binomial Generalized Linear Model
Description
A modification of the system functionglm() to includeestimation of the additional parameter,theta, for aNegative Binomial generalized linear model.
Usage
glm.nb(formula, data, weights, subset, na.action, start = NULL, etastart, mustart, control = glm.control(...), method = "glm.fit", model = TRUE, x = FALSE, y = TRUE, contrasts = NULL, ..., init.theta, link = log)Arguments
formula,data,weights,subset,na.action,start,etastart,mustart,control,method,model,x,y,contrasts,... | arguments for the |
init.theta | Optional initial value for the theta parameter. If omitted a momentestimator after an initial fit using a Poisson GLM is used. |
link | The link function. Currently must be one of |
Details
An alternating iteration process is used. For giventheta the GLMis fitted using the same process as used byglm(). For fixed meansthetheta parameter is estimated using score and informationiterations. The two are alternated until convergence of both. (Thenumber of alternations and the number of iterations when estimatingtheta are controlled by themaxit parameter ofglm.control.)
Settingtrace > 0 traces the alternating iterationprocess. Settingtrace > 1 traces theglm fit, andsettingtrace > 2 traces the estimation oftheta.
Value
A fitted model object of classnegbin inheriting fromglmandlm. The object is like the output ofglm but containsthree additional components, namelytheta for the ML estimate oftheta,SE.theta for its approximate standard error (usingobserved rather than expected information), andtwologlik fortwice the log-likelihood function.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
glm,negative.binomial,anova.negbin,summary.negbin,theta.md
There is asimulate method.
Examples
quine.nb1 <- glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = quine)quine.nb2 <- update(quine.nb1, . ~ . + Sex:Age:Lrn)quine.nb3 <- update(quine.nb2, Days ~ .^4)anova(quine.nb1, quine.nb2, quine.nb3)Fit Generalized Linear Mixed Models via PQL
Description
Fit a GLMM model with multivariate normal random effects, usingPenalized Quasi-Likelihood.
Usage
glmmPQL(fixed, random, family, data, correlation, weights, control, niter = 10, verbose = TRUE, ...)Arguments
fixed | a two-sided linear formula giving fixed-effects part of the model. |
random | a formula or list of formulae describing the random effects. |
family | a GLM family. |
data | an optional data frame, list or environment used as the first place to findvariables in the formulae, |
correlation | an optional correlation structure. |
weights | optional case weights as in |
control | an optional argument to be passed to |
niter | maximum number of iterations. |
verbose | logical: print out record of iterations? |
... | Further arguments for |
Details
glmmPQL works by repeated calls tolme, sonamespacenlme will be loaded at first use. (Before 2015 itused to attachnlme but nowadays only loads the namespace.)
Unlikelme,offset terms are allowed infixed – this is done by pre- and post-processing the calls tolme.
Note that the returned object inherits from class"lme" andthat most generics will use the method for that class. As fromversion 3.1-158, the fitted values have any offset included, as dothe results of callingpredict.
Value
A object of classc("glmmPQL", "lme"): seelmeObject.
References
Schall, R. (1991) Estimation in generalized linear models withrandom effects.Biometrika78, 719–727.
Breslow, N. E. and Clayton, D. G. (1993) Approximate inference ingeneralized linear mixed models.Journal of the American Statistical Association88, 9–25.
Wolfinger, R. and O'Connell, M. (1993) Generalized linear mixed models: apseudo-likelihood approach.Journal of Statistical Computation and Simulation48, 233–243.
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
summary(glmmPQL(y ~ trt + I(week > 2), random = ~ 1 | ID, family = binomial, data = bacteria))## an example of an offset: the coefficient of 'week' changes by one.summary(glmmPQL(y ~ trt + week, random = ~ 1 | ID, family = binomial, data = bacteria))summary(glmmPQL(y ~ trt + week + offset(week), random = ~ 1 | ID, family = binomial, data = bacteria))Record Times in Scottish Hill Races
Description
The record times in 1984 for 35 Scottish hill races.
Usage
hillsFormat
The components are:
distdistance in miles (on the map).
climbtotal height gained during the route, in feet.
timerecord time in minutes.
Source
A.C. Atkinson (1986) Comment: Aspects of diagnostic regression analysis.Statistical Science1, 397–402.
[A.C. Atkinson (1988) Transformations unmasked.Technometrics30, 311–318 “corrects” the time for Knock Hill from 78.65to 18.65. It is unclear if this based on the original records.]
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Plot a Histogram with Automatic Bin Width Selection
Description
Plot a histogram with automatic bin width selection, using the Scottor Freedman–Diaconis formulae.
Usage
hist.scott(x, prob = TRUE, xlab = deparse(substitute(x)), ...)hist.FD(x, prob = TRUE, xlab = deparse(substitute(x)), ...)Arguments
x | A data vector |
prob | Should the plot have unit area, so be a density estimate? |
xlab,... | Further arguments to |
Value
For thenclass.* functions, the suggested number of classes.
Side Effects
Plot a histogram.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S.Springer.
See Also
Frequency Table from a Copenhagen Housing Conditions Survey
Description
Thehousing data frame has 72 rows and 5 variables.
Usage
housingFormat
SatSatisfaction of householders with their present housingcircumstances, (High, Medium or Low, ordered factor).
InflPerceived degree of influence householders have on themanagement of the property (High, Medium, Low).
TypeType of rental accommodation, (Tower, Atrium, Apartment, Terrace).
ContContact residents are afforded with other residents, (Low, High).
FreqFrequencies: the numbers of residents in each class.
Source
Madsen, M. (1976)Statistical analysis of multiple contingency tables. Two examples.Scand. J. Statist.3, 97–106.
Cox, D. R. and Snell, E. J. (1984)Applied Statistics, Principles and Examples.Chapman & Hall.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
options(contrasts = c("contr.treatment", "contr.poly"))# Surrogate Poisson modelshouse.glm0 <- glm(Freq ~ Infl*Type*Cont + Sat, family = poisson, data = housing)## IGNORE_RDIFF_BEGINsummary(house.glm0, correlation = FALSE)## IGNORE_RDIFF_ENDaddterm(house.glm0, ~. + Sat:(Infl+Type+Cont), test = "Chisq")house.glm1 <- update(house.glm0, . ~ . + Sat*(Infl+Type+Cont))summary(house.glm1, correlation = FALSE)1 - pchisq(deviance(house.glm1), house.glm1$df.residual)dropterm(house.glm1, test = "Chisq")addterm(house.glm1, ~. + Sat:(Infl+Type+Cont)^2, test = "Chisq")hnames <- lapply(housing[, -5], levels) # omit FreqnewData <- expand.grid(hnames)newData$Sat <- ordered(newData$Sat)house.pm <- predict(house.glm1, newData, type = "response") # poisson meanshouse.pm <- matrix(house.pm, ncol = 3, byrow = TRUE, dimnames = list(NULL, hnames[[1]]))house.pr <- house.pm/drop(house.pm %*% rep(1, 3))cbind(expand.grid(hnames[-1]), round(house.pr, 2))# Iterative proportional scalingloglm(Freq ~ Infl*Type*Cont + Sat*(Infl+Type+Cont), data = housing)# multinomial modellibrary(nnet)(house.mult<- multinom(Sat ~ Infl + Type + Cont, weights = Freq, data = housing))house.mult2 <- multinom(Sat ~ Infl*Type*Cont, weights = Freq, data = housing)anova(house.mult, house.mult2)house.pm <- predict(house.mult, expand.grid(hnames[-1]), type = "probs")cbind(expand.grid(hnames[-1]), round(house.pm, 2))# proportional odds modelhouse.cpr <- apply(house.pr, 1, cumsum)logit <- function(x) log(x/(1-x))house.ld <- logit(house.cpr[2, ]) - logit(house.cpr[1, ])(ratio <- sort(drop(house.ld)))mean(ratio)(house.plr <- polr(Sat ~ Infl + Type + Cont, data = housing, weights = Freq))house.pr1 <- predict(house.plr, expand.grid(hnames[-1]), type = "probs")cbind(expand.grid(hnames[-1]), round(house.pr1, 2))Fr <- matrix(housing$Freq, ncol = 3, byrow = TRUE)2*sum(Fr*log(house.pr/house.pr1))house.plr2 <- stepAIC(house.plr, ~.^2)house.plr2$anovaHuber M-estimator of Location with MAD Scale
Description
Finds the Huber M-estimator of location with MAD scale.
Usage
huber(y, k = 1.5, tol = 1e-06)Arguments
y | vector of data values |
k | Winsorizes at |
tol | convergence tolerance |
Value
list of location and scale parameters
mu | location estimate |
s | MAD scale estimate |
References
Huber, P. J. (1981)Robust Statistics.Wiley.
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
huber(chem)Huber Proposal 2 Robust Estimator of Location and/or Scale
Description
Finds the Huber M-estimator for location with scale specified, scalewith location specified, or both if neither is specified.
Usage
hubers(y, k = 1.5, mu, s, initmu = median(y), tol = 1e-06)Arguments
y | vector y of data values |
k | Winsorizes at |
mu | specified location |
s | specified scale |
initmu | initial value of |
tol | convergence tolerance |
Value
list of location and scale estimates
mu | location estimate |
s | scale estimate |
References
Huber, P. J. (1981)Robust Statistics.Wiley.
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
hubers(chem)hubers(chem, mu=3.68)Yields from a Barley Field Trial
Description
Theimmer data frame has 30 rows and 4 columns. Five varieties ofbarley were grown in six locations in each of 1931 and 1932.
Usage
immerFormat
This data frame contains the following columns:
LocThe location.
VarThe variety of barley (
"manchuria","svansota","velvet","trebi"and"peatland").Y1Yield in 1931.
Y2Yield in 1932.
Source
Immer, F.R., Hayes, H.D. and LeRoy Powers (1934)Statistical determination of barley varietal adaptation.Journal of the American Society for Agronomy26, 403–419.
Fisher, R.A. (1947)The Design of Experiments. 4th edition. Edinburgh: Oliver and Boyd.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Examples
immer.aov <- aov(cbind(Y1,Y2) ~ Loc + Var, data = immer)summary(immer.aov)immer.aov <- aov((Y1+Y2)/2 ~ Var + Loc, data = immer)summary(immer.aov)model.tables(immer.aov, type = "means", se = TRUE, cterms = "Var")Kruskal's Non-metric Multidimensional Scaling
Description
One form of non-metric multidimensional scaling
Usage
isoMDS(d, y = cmdscale(d, k), k = 2, maxit = 50, trace = TRUE, tol = 1e-3, p = 2)Shepard(d, x, p = 2)Arguments
d | distance structure of the form returned by |
y | An initial configuration. If none is supplied, |
k | The desired dimension for the solution, passed to |
maxit | The maximum number of iterations. |
trace | Logical for tracing optimization. Default |
tol | convergence tolerance. |
p | Power for Minkowski distance in the configuration space. |
x | A final configuration. |
Details
This chooses a k-dimensional (default k = 2) configuration to minimizethe stress, the square root of the ratio of the sum of squareddifferences between the input distances and those of the configurationto the sum of configuration distances squared. However, the inputdistances are allowed a monotonic transformation.
An iterative algorithm is used, which will usually converge in around10 iterations. As this is necessarily anO(n^2) calculation,it is slow for large datasets. Further, since for the defaultp = 2the configuration is only determined up to rotations and reflections(by convention the centroid is at the origin), the result can varyconsiderably from machine to machine.
Value
Two components:
points | A k-column vector of the fitted configuration. |
stress | The final stress achieved (in percent). |
Side Effects
Iftrace is true, the initial stress and the current stressare printed out every 5 iterations.
References
T. F. Cox and M. A. A. Cox (1994, 2001)Multidimensional Scaling. Chapman & Hall.
Ripley, B. D. (1996)Pattern Recognition and Neural Networks. Cambridge University Press.
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
swiss.x <- as.matrix(swiss[, -1])swiss.dist <- dist(swiss.x)swiss.mds <- isoMDS(swiss.dist)plot(swiss.mds$points, type = "n")text(swiss.mds$points, labels = as.character(1:nrow(swiss.x)))swiss.sh <- Shepard(swiss.dist, swiss.mds$points)plot(swiss.sh, pch = ".")lines(swiss.sh$x, swiss.sh$yf, type = "S")Two-Dimensional Kernel Density Estimation
Description
Two-dimensional kernel density estimation with an axis-alignedbivariate normal kernel, evaluated on a square grid.
Usage
kde2d(x, y, h, n = 25, lims = c(range(x), range(y)))Arguments
x | x coordinate of data |
y | y coordinate of data |
h | vector of bandwidths for x and y directions. Defaults tonormal reference bandwidth (see |
n | Number of grid points in each direction. Can be scalar or a length-2integer vector. |
lims | The limits of the rectangle covered by the grid as |
Value
A list of three components.
x,y | The x and y coordinates of the grid points, vectors of length |
z | An |
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
attach(geyser)plot(duration, waiting, xlim = c(0.5,6), ylim = c(40,100))f1 <- kde2d(duration, waiting, n = 50, lims = c(0.5, 6, 40, 100))image(f1, zlim = c(0, 0.05))f2 <- kde2d(duration, waiting, n = 50, lims = c(0.5, 6, 40, 100), h = c(width.SJ(duration), width.SJ(waiting)) )image(f2, zlim = c(0, 0.05))persp(f2, phi = 30, theta = 20, d = 5)plot(duration[-272], duration[-1], xlim = c(0.5, 6), ylim = c(1, 6),xlab = "previous duration", ylab = "duration")f1 <- kde2d(duration[-272], duration[-1], h = rep(1.5, 2), n = 50, lims = c(0.5, 6, 0.5, 6))contour(f1, xlab = "previous duration", ylab = "duration", levels = c(0.05, 0.1, 0.2, 0.4) )f1 <- kde2d(duration[-272], duration[-1], h = rep(0.6, 2), n = 50, lims = c(0.5, 6, 0.5, 6))contour(f1, xlab = "previous duration", ylab = "duration", levels = c(0.05, 0.1, 0.2, 0.4) )f1 <- kde2d(duration[-272], duration[-1], h = rep(0.4, 2), n = 50, lims = c(0.5, 6, 0.5, 6))contour(f1, xlab = "previous duration", ylab = "duration", levels = c(0.05, 0.1, 0.2, 0.4) )detach("geyser")Linear Discriminant Analysis
Description
Linear discriminant analysis.
Usage
lda(x, ...)## S3 method for class 'formula'lda(formula, data, ..., subset, na.action)## Default S3 method:lda(x, grouping, prior = proportions, tol = 1.0e-4, method, CV = FALSE, nu, ...)## S3 method for class 'data.frame'lda(x, ...)## S3 method for class 'matrix'lda(x, grouping, ..., subset, na.action)Arguments
formula | A formula of the form |
data | An optional data frame, list or environment from which variablesspecified in |
x | (required if no formula is given as the principal argument.)a matrix or data frame or Matrix containing the explanatory variables. |
grouping | (required if no formula principal argument is given.)a factor specifying the class for each observation. |
prior | the prior probabilities of class membership. If unspecified, theclass proportions for the training set are used. If present, theprobabilities should be specified in the order of the factorlevels. |
tol | A tolerance to decide if a matrix is singular; it will reject variablesand linear combinations of unit-variance variables whose variance isless than |
subset | An index vector specifying the cases to be used in the trainingsample. (NOTE: If given, this argument must be named.) |
na.action | A function to specify the action to be taken if |
method |
|
CV | If true, returns results (classes and posterior probabilities) forleave-one-out cross-validation. Note that if the prior is estimated,the proportions in the whole dataset are used. |
nu | degrees of freedom for |
... | arguments passed to or from other methods. |
Details
The functiontries hard to detect if the within-class covariance matrix issingular. If any variable has within-group variance less thantol^2 it will stop and report the variable as constant. Thiscould result from poor scaling of the problem, but is morelikely to result from constant variables.
Specifying theprior will affect the classification unlessover-ridden inpredict.lda. Unlike in most statistical packages, itwill also affect the rotation of the linear discriminants within theirspace, as a weighted between-groups covariance matrix is used. Thusthe first few linear discriminants emphasize the differences betweengroups with the weights given by the prior, which may differ fromtheir prevalence in the dataset.
If one or more groups is missing in the supplied data, they are droppedwith a warning, but the classifications produced are with respect to theoriginal set of levels.
Value
IfCV = TRUE the return value is a list with componentsclass, the MAP classification (a factor), andposterior,posterior probabilities for the classes.
Otherwise it is an object of class"lda" containing thefollowing components:
prior | the prior probabilities used. |
means | the group means. |
scaling | a matrix which transforms observations to discriminant functions,normalized so that within groups covariance matrix is spherical. |
svd | the singular values, which give the ratio of the between- andwithin-group standard deviations on the linear discriminantvariables. Their squares are the canonical F-statistics. |
N | The number of observations used. |
call | The (matched) function call. |
Note
This function may be called giving either a formula andoptional data frame, or a matrix and grouping factor as the firsttwo arguments. All other arguments are optional, butsubset= andna.action=, if required, must be fully named.
If a formula is given as the principal argument the object may bemodified usingupdate() in the usual way.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Ripley, B. D. (1996)Pattern Recognition and Neural Networks. Cambridge University Press.
See Also
Examples
Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]), Sp = rep(c("s","c","v"), rep(50,3)))train <- sample(1:150, 75)table(Iris$Sp[train])## your answer may differ## c s v## 22 23 30z <- lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train)predict(z, Iris[-train, ])$class## [1] s s s s s s s s s s s s s s s s s s s s s s s s s s s c c c## [31] c c c c c c c v c c c c v c c c c c c c c c c c c v v v v v## [61] v v v v v v v v v v v v v v v(z1 <- update(z, . ~ . - Petal.W.))Histograms or Density Plots of Multiple Groups
Description
Plot histograms or density plots of data on a single Fisher lineardiscriminant.
Usage
ldahist(data, g, nbins = 25, h, x0 = - h/1000, breaks, xlim = range(breaks), ymax = 0, width, type = c("histogram", "density", "both"), sep = (type != "density"), col = 5, xlab = deparse(substitute(data)), bty = "n", ...)Arguments
data | vector of data. Missing values ( |
g | factor or vector giving groups, of the same length as |
nbins | Suggested number of bins to cover the whole range of the data. |
h | The bin width (takes precedence over |
x0 | Shift for the bins - the breaks are at |
breaks | The set of breakpoints to be used. (Usually omitted, takes precedenceover |
xlim | The limits for the x-axis. |
ymax | The upper limit for the y-axis. |
width | Bandwidth for density estimates. If missing, the Sheather-Jonesselector is used for each group separately. |
type | Type of plot. |
sep | Whether there is a separate plot for each group, or one combined plot. |
col | The colour number for the bar fill. |
xlab | label for the plot x-axis. By default, this will be the name of |
bty | The box type for the plot - defaults to none. |
... | additional arguments to |
Side Effects
Histogram and/or density plots are plotted on the current device.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Survival Times and White Blood Counts for Leukaemia Patients
Description
A data frame of data from 33 leukaemia patients.
Usage
leukFormat
A data frame with columns:
wbcwhite blood count.
aga test result,
"present"or"absent".timesurvival time in weeks.
Details
Survival times are given for 33 patients who died from acutemyelogenous leukaemia. Also measured was the patient's white blood cellcount at the time of diagnosis. The patients were also factored into 2groups according to the presence or absence of a morphologiccharacteristic of white blood cells. Patients termed AG positive wereidentified by the presence of Auer rods and/or significant granulationof the leukaemic cells in the bone marrow at the time of diagnosis.
Source
Cox, D. R. and Oakes, D. (1984)Analysis of Survival Data.Chapman & Hall, p. 9.
Taken from
Feigl, P. & Zelen, M. (1965) Estimation of exponential survivalprobabilities with concomitant information.Biometrics21, 826–838.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
library(survival)plot(survfit(Surv(time) ~ ag, data = leuk), lty = 2:3, col = 2:3)# now Cox modelsleuk.cox <- coxph(Surv(time) ~ ag + log(wbc), leuk)summary(leuk.cox)Fit Linear Models by Generalized Least Squares
Description
Fit linear models by Generalized Least Squares
Usage
lm.gls(formula, data, W, subset, na.action, inverse = FALSE, method = "qr", model = FALSE, x = FALSE, y = FALSE, contrasts = NULL, ...)Arguments
formula | a formula expression as for regression models, of the form |
data | an optional data frame, list or environment in which to interpret thevariables occurring in |
W | a weight matrix. |
subset | expression saying which subset of the rows of the data should be usedin the fit. All observations are included by default. |
na.action | a function to filter missing data. |
inverse | logical: if true |
method | method to be used by |
model | should the model frame be returned? |
x | should the design matrix be returned? |
y | should the response be returned? |
contrasts | a list of contrasts to be used for some or all of |
... | additional arguments to |
Details
The problem is transformed to uncorrelated form and passed tolm.fit.
Value
An object of class"lm.gls", which is similar to an"lm"object. There is no"weights" component, and only a few"lm"methods will work correctly. As from version 7.1-22 the residuals andfitted values refer to the untransformed problem.
See Also
Ridge Regression
Description
Fit a linear model by ridge regression.
Usage
lm.ridge(formula, data, subset, na.action, lambda = 0, model = FALSE, x = FALSE, y = FALSE, contrasts = NULL, ...)select(obj)Arguments
formula | a formula expression as for regression models, of the form |
data | an optional data frame, list or environment in which to interpret thevariables occurring in |
subset | expression saying which subset of the rows of the data should be usedin the fit. All observations are included by default. |
na.action | a function to filter missing data. |
lambda | A scalar or vector of ridge constants. |
model | should the model frame be returned? Not implemented. |
x | should the design matrix be returned? Not implemented. |
y | should the response be returned? Not implemented. |
contrasts | a list of contrasts to be used for some or all of factor terms in theformula. See the |
... | additional arguments to |
obj | anR object, such as an |
Details
If an intercept is present in the model, its coefficient is notpenalized. (If you want to penalize an intercept, put in your ownconstant term and remove the intercept.)
Value
A list with components
coef | matrix of coefficients, one row for each value of |
scales | scalings used on the X matrix. |
Inter | was intercept included? |
lambda | vector of lambda values |
ym | mean of |
xm | column means of |
GCV | vector of GCV values |
kHKB | HKB estimate of the ridge constant. |
kLW | L-W estimate of the ridge constant. |
References
Brown, P. J. (1994)Measurement, Regression and CalibrationOxford.
See Also
Examples
longley # not the same as the S-PLUS datasetnames(longley)[1] <- "y"lm.ridge(y ~ ., longley)plot(lm.ridge(y ~ ., longley, lambda = seq(0,0.1,0.001)))select(lm.ridge(y ~ ., longley, lambda = seq(0,0.1,0.0001)))Fit Log-Linear Models by Iterative Proportional Scaling
Description
This function provides a front-end to the standard function,loglin, to allow log-linear models to be specified and fittedin a manner similar to that of other fitting functions, such asglm.
Usage
loglm(formula, data, subset, na.action, ...)Arguments
formula | A linear model formula specifying the log-linear model. If the left-hand side is empty, the |
data | Numeric array or data frame (or list or environment).In the first case it specifies thearray of frequencies; in the second it provides the data framefrom which the variables occurring in the formula arepreferentially obtained in the usual way. This argument may be the result of a call to |
subset | Specifies a subset of the rows in the data frame to be used. Thedefault is to take all rows. |
na.action | Specifies a method for handling missing observations. Thedefault is to fail if missing values are present. |
... | May supply other arguments to the function |
Details
If the left-hand side of the formula is empty thedata argumentsupplies the frequency array and the right-hand side of theformula is used to construct the list of fixed faces as requiredbyloglin. Structural zeros may be specified by giving astart argument with those entries set to zero, as described inthe help information forloglin.
If the left-hand side is not empty, all variables on theright-hand side are regarded as classifying factors and an arrayof frequencies is constructed. If some cells in the completearray are not specified they are treated as structural zeros.The right-hand side of the formula is again used to construct thelist of faces on which the observed and fitted totals must agree,as required byloglin. Hence terms such asa:b,a*b anda/b are all equivalent.
Value
An object of class"loglm" conveying the results of the fittedlog-linear model. Methods exist for the generic functionsprint,summary,deviance,fitted,coef,resid,anova andupdate, which perform the expectedtasks. Only log-likelihood ratio tests are allowed usinganova.
The deviance is simply an alternative name for the log-likelihoodratio statistic for testing the current model within a saturatedmodel, in accordance with standard usage in generalized linearmodels.
Warning
If structural zeros are present, the calculation of degrees offreedom may not be correct.loglin itself takes no action toallow for structural zeros.loglm deducts one degree offreedom for each structural zero, but cannot make allowance forgains in error degrees of freedom due to loss of dimension in themodel space. (This would require checking the rank of themodel matrix, but since iterative proportional scaling methodsare developed largely to avoid constructing the model matrixexplicitly, the computation is at least difficult.)
When structural zeros (or zero fitted values) are present theestimated coefficients will not be available due to infiniteestimates. The deviances will normally continue to be correct, though.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
# The data frames Cars93, minn38 and quine are available# in the MASS package.# Case 1: frequencies specified as an array.sapply(minn38, function(x) length(levels(x)))## hs phs fol sex f## 3 4 7 2 0##minn38a <- array(0, c(3,4,7,2), lapply(minn38[, -5], levels))##minn38a[data.matrix(minn38[,-5])] <- minn38$f## or more simplyminn38a <- xtabs(f ~ ., minn38)fm <- loglm(~ 1 + 2 + 3 + 4, minn38a) # numerals as names.deviance(fm)## [1] 3711.9fm1 <- update(fm, .~.^2)fm2 <- update(fm, .~.^3, print = TRUE)## 5 iterations: deviation 0.075anova(fm, fm1, fm2)# Case 1. An array generated with xtabs.loglm(~ Type + Origin, xtabs(~ Type + Origin, Cars93))# Case 2. Frequencies given as a vector in a data framenames(quine)## [1] "Eth" "Sex" "Age" "Lrn" "Days"fm <- loglm(Days ~ .^2, quine)gm <- glm(Days ~ .^2, poisson, quine) # check glm.c(deviance(fm), deviance(gm)) # deviances agree## [1] 1368.7 1368.7c(fm$df, gm$df) # resid df do not!c(fm$df, gm$df.residual) # resid df do not!## [1] 127 128# The loglm residual degrees of freedom is wrong because of# a non-detectable redundancy in the model matrix.Fit Log-Linear Models by Iterative Proportional Scaling – Internal function
Description
loglm1 is an internal function used byloglm.It is a generic function dispatching on thedata argument.
Usage
loglm1(formula, data, ...)## S3 method for class 'xtabs'loglm1(formula, data, ...)## S3 method for class 'data.frame'loglm1(formula, data, ...)## Default S3 method:loglm1(formula, data, start = rep(1, length(data)), fitted = FALSE, keep.frequencies = fitted, param = TRUE, eps = 1/10, iter = 40, print = FALSE, ...)Arguments
formula | A linear model formula specifying the log-linear model.See |
data | Numeric array or data frame (or list or environment).In the first case it specifies thearray of frequencies; in then second it provides the data framefrom which the variables occurring in the formula arepreferentially obtained in the usual way. This argument may also be the result of a call to |
start,param,eps,iter,print | Arguments passed to |
fitted | logical: should the fitted values be returned? |
keep.frequencies | If |
... | arguments passed to the default method. |
Value
An object of class"loglm".
See Also
Estimate log Transformation Parameter
Description
Find and optionally plot the marginal (profile) likelihood for alphafor a transformation model of the formlog(y + alpha) ~ x1 + x2 + ....
Usage
logtrans(object, ...)## Default S3 method:logtrans(object, ..., alpha = seq(0.5, 6, by = 0.25) - min(y), plotit = TRUE, interp =, xlab = "alpha", ylab = "log Likelihood")## S3 method for class 'formula'logtrans(object, data, ...)## S3 method for class 'lm'logtrans(object, ...)Arguments
object | Fitted linear model object, or formula defining the untransformedmodel that is |
... | If |
alpha | Set of values for the transformation parameter, alpha. |
plotit | Should plotting be done? |
interp | Should the marginal log-likelihood be interpolated with a splineapproximation? (Default is |
xlab | as for |
ylab | as for |
data | optional |
Value
List with componentsx (for alpha) andy (for the marginallog-likelihood values).
Side Effects
A plot of the marginal log-likelihood is produced, if requested,together with an approximate mle and 95% confidence interval.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
logtrans(Days ~ Age*Sex*Eth*Lrn, data = quine, alpha = seq(0.75, 6.5, length.out = 20))Resistant Regression
Description
Fit a regression to thegood points in the dataset, therebyachieving a regression estimator with a high breakdown point.lmsreg andltsreg are compatibility wrappers.
Usage
lqs(x, ...)## S3 method for class 'formula'lqs(formula, data, ..., method = c("lts", "lqs", "lms", "S", "model.frame"), subset, na.action, model = TRUE, x.ret = FALSE, y.ret = FALSE, contrasts = NULL)## Default S3 method:lqs(x, y, intercept = TRUE, method = c("lts", "lqs", "lms", "S"), quantile, control = lqs.control(...), k0 = 1.548, seed, ...)lmsreg(...)ltsreg(...)Arguments
formula | a formula of the form |
data | an optional data frame, list or environemnt from whichvariables specified in |
subset | an index vector specifying the cases to be used infitting. (NOTE: If given, this argument must be named exactly.) |
na.action | function to specify the action to be taken if |
model,x.ret,y.ret | logical. If |
contrasts | an optional list. See the |
x | a matrix or data frame containing the explanatory variables. |
y | the response: a vector of length the number of rows of |
intercept | should the model include an intercept? |
method | the method to be used. |
quantile | the quantile to be used: see |
control | additional control items: see |
k0 | the cutoff / tuning constant used for |
seed | the seed to be used for random sampling: see |
... | arguments to be passed to |
Details
Suppose there aren data points andp regressors,including any intercept.
The first three methods minimize some function of the sorted squaredresiduals. For methods"lqs" and"lms" is thequantile squared residual, and for"lts" it is the sumof thequantile smallest squared residuals."lqs" and"lms" differ in the defaults forquantile, which arefloor((n+p+1)/2) andfloor((n+1)/2) respectively.For"lts" the default isfloor(n/2) + floor((p+1)/2).
The"S" estimation method solves for the scalessuch that the average of a function chi of the residuals dividedbys is equal to a given constant.
Thecontrol argument is a list with components
psamp:the size of each sample. Defaults to
p.nsamp:the number of samples or
"best"(thedefault) or"exact"or"sample".If"sample"the number chosen ismin(5*p, 3000),taken from Rousseeuw and Hubert (1997).If"best"exhaustive enumeration is done up to 5000 samples;if"exact"exhaustive enumeration will be attempted howevermany samples are needed.adjust:should the intercept be optimized for eachsample? Defaults to
TRUE.
Value
An object of class"lqs". This is a list with components
crit | the value of the criterion for the best solution found, inthe case of |
sing | character. A message about the number of samples whichresulted in singular fits. |
coefficients | of the fitted linear model |
bestone | the indices of those points fitted by the best samplefound (prior to adjustment of the intercept, if requested). |
fitted.values | the fitted values. |
residuals | the residuals. |
scale | estimate(s) of the scale of the error. The first is basedon the fit criterion. The second (not present for |
Note
There seems no reason other than historical to use thelms andlqs options. LMS estimation is of low efficiency (convergingat raten^{-1/3}) whereas LTS has the same asymptotic efficiencyas an M estimator with trimming at the quartiles (Marazzi, 1993, p.201).LQS and LTS have the same maximal breakdown value of(floor((n-p)/2) + 1)/n attained iffloor((n+p)/2) <= quantile <= floor((n+p+1)/2).The only drawback mentioned of LTS is greater computation, as a sortwas thought to be required (Marazzi, 1993, p.201) but this is nottrue as a partial sort can be used (and is used in this implementation).
Adjusting the intercept for each trial fit does need the residuals tobe sorted, and may be significant extra computation ifn is largeandp small.
Opinions differ over the choice ofpsamp. Rousseeuw and Hubert(1997) only consider p; Marazzi (1993) recommends p+1 and suggeststhat more samples are better than adjustment for a given computationallimit.
The computations are exact for a model with just an intercept andadjustment, and for LQS for a model with an intercept plus oneregressor and exhaustive search with adjustment. For all other casesthe minimization is only known to be approximate.
References
P. J. Rousseeuw and A. M. Leroy (1987)Robust Regression and Outlier Detection. Wiley.
A. Marazzi (1993)Algorithms, Routines and S Functions for Robust Statistics.Wadsworth and Brooks/Cole.
P. Rousseeuw and M. Hubert (1997) Recent developments in PROGRESS. InL1-Statistical Procedures and Related Topics,ed Y. Dodge, IMS Lecture Notes volume31, pp. 201–214.
See Also
Examples
## IGNORE_RDIFF_BEGINset.seed(123) # make reproduciblelqs(stack.loss ~ ., data = stackloss)lqs(stack.loss ~ ., data = stackloss, method = "S", nsamp = "exact")## IGNORE_RDIFF_ENDBrain and Body Weights for 62 Species of Land Mammals
Description
A data frame with average brain and body weights for 62 speciesof land mammals.
Usage
mammalsFormat
bodybody weight in kg.
brainbrain weight in g.
nameCommon name of species.(Rock hyrax-a =Heterohyrax brucci,Rock hyrax-b =Procavia habessinic..)
Source
Weisberg, S. (1985)Applied Linear Regression. 2nd edition. Wiley, pp. 144–5.
Selected from:Allison, T. and Cicchetti, D. V. (1976)Sleep in mammals: ecological and constitutional correlates.Science194, 732–734.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Multiple Correspondence Analysis
Description
Computes a multiple correspondence analysis of a set of factors.
Usage
mca(df, nf = 2, abbrev = FALSE)Arguments
df | A data frame containing only factors |
nf | The number of dimensions for the MCA. Rarely 3 might be useful. |
abbrev | Should the vertex names be abbreviated? By default these are of theform ‘factor.level’ but if |
Value
An object of class"mca", with components
rs | The coordinates of the rows, in |
cs | The coordinates of the column vertices, one for each level of each factor. |
fs | Weights for each row, used to interpolate additional factors in |
p | The number of factors |
d | The singular values for the |
call | The matched call. |
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
farms.mca <- mca(farms, abbrev=TRUE)farms.mcaplot(farms.mca)Data from a Simulated Motorcycle Accident
Description
A data frame giving a series of measurements of head accelerationin a simulated motorcycle accident, used to test crash helmets.
Usage
mcycleFormat
timesin milliseconds after impact.
accelin g.
Source
Silverman, B. W. (1985) Some aspects of the spline smoothing approach tonon-parametric curve fitting.Journal of the Royal Statistical Society series B47, 1–52.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Age of Menarche in Warsaw
Description
Proportions of female children at various ages during adolescencewho have reached menarche.
Usage
menarcheFormat
This data frame contains the following columns:
AgeAverage age of the group. (The groups are reasonably age homogeneous.)
TotalTotal number of children in the group.
MenarcheNumber who have reached menarche.
Source
Milicer, H. and Szczotka, F. (1966) Age at Menarche in Warsaw girls in1965.Human Biology38, 199–203.
The data are also given in
Aranda-Ordaz, F.J. (1981)On two families of transformations to additivity for binary response data.Biometrika68, 357–363.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
mprob <- glm(cbind(Menarche, Total - Menarche) ~ Age, binomial(link = probit), data = menarche)Michelson's Speed of Light Data
Description
Measurements of the speed of light in air, made between 5th Juneand 2nd July, 1879. The data consists of five experiments, eachconsisting of 20 consecutive runs. The response is the speedof light in km/s, less 299000. The currently accepted value, onthis scale of measurement, is 734.5.
Usage
michelsonFormat
The data frame contains the following components:
ExptThe experiment number, from 1 to 5.
RunThe run number within each experiment.
SpeedSpeed-of-light measurement.
Source
A.J. Weekes (1986)A Genstat Primer. Edward Arnold.
S. M. Stigler (1977) Do robust estimators work with real data?Annals of Statistics5, 1055–1098.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Minnesota High School Graduates of 1938
Description
The Minnesota high school graduates of 1938 were classified according tofour factors, described below. Theminn38 data frame has 168rows and 5 columns.
Usage
minn38Format
This data frame contains the following columns:
hshigh school rank:
"L","M"and"U"for lower,middle and upper third.phspost high school status: Enrolled in college, (
"C"), enrolled innon-collegiate school, ("N"), employed full-time, ("E")and other, ("O").folfather's occupational level, (seven levels,
"F1","F2",...,"F7").sexsex: factor with levels
"F"or"M".ffrequency.
Source
From R. L. Plackett, (1974)The Analysis of CategoricalData. London: Griffin
who quotes the data from
Hoyt, C. J., Krishnaiah, P. R. and Torrance, E. P. (1959) Analysis ofcomplex contingency tables,J. Exp. Ed.27, 187–194.
Accelerated Life Testing of Motorettes
Description
Themotors data frame has 40 rows and 3 columns. It describes anaccelerated life test at each of four temperatures of 10 motorettes,and has rather discrete times.
Usage
motorsFormat
This data frame contains the following columns:
tempthe temperature (degrees C) of the test.
timethe time in hours to failure or censoring at 8064 hours (= 336 days).
censan indicator variable for death.
Source
Kalbfleisch, J. D. and Prentice, R. L. (1980)The Statistical Analysis of Failure Time Data.New York: Wiley.
taken from
Nelson, W. D. and Hahn, G. J. (1972)Linear regression of a regression relationship from censored data.Part 1 – simple methods and their application.Technometrics,14, 247–276.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
library(survival)plot(survfit(Surv(time, cens) ~ factor(temp), motors), conf.int = FALSE)# fit Weibull modelmotor.wei <- survreg(Surv(time, cens) ~ temp, motors)## IGNORE_RDIFF_BEGINsummary(motor.wei)## IGNORE_RDIFF_END# and predict at 130Cunlist(predict(motor.wei, data.frame(temp=130), se.fit = TRUE))motor.cox <- coxph(Surv(time, cens) ~ temp, motors)summary(motor.cox)# predict at temperature 200plot(survfit(motor.cox, newdata = data.frame(temp=200), conf.type = "log-log"))summary( survfit(motor.cox, newdata = data.frame(temp=130)) )Effect of Calcium Chloride on Muscle Contraction in Rat Hearts
Description
The purpose of this experiment was to assess the influence ofcalcium in solution on the contraction of heart muscle in rats.The left auricle of 21 rat hearts was isolated and on severaloccasions a constant-length strip of tissue was electricallystimulated and dipped into various concentrations of calciumchloride solution, after which the shortening of the strip wasaccurately measured as the response.
Usage
muscleFormat
This data frame contains the following columns:
Stripwhich heart muscle strip was used?
Concconcentration of calcium chloride solution, in multiples of 2.2 mM.
Lengththe change in length (shortening) of the strip, (allegedly) in mm.
Source
Linder, A., Chakravarti, I. M. and Vuagnat, P. (1964)Fitting asymptotic regression curves with different asymptotes. InContributions to Statistics. Presented to Professor P. C. Mahalanobison the occasion of his 70th birthday,ed. C. R. Rao, pp. 221–228. Oxford: Pergamon Press.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth Edition. Springer.
Examples
## IGNORE_RDIFF_BEGINA <- model.matrix(~ Strip - 1, data=muscle)rats.nls1 <- nls(log(Length) ~ cbind(A, rho^Conc), data = muscle, start = c(rho=0.1), algorithm="plinear")(B <- coef(rats.nls1))st <- list(alpha = B[2:22], beta = B[23], rho = B[1])(rats.nls2 <- nls(log(Length) ~ alpha[Strip] + beta*rho^Conc, data = muscle, start = st))## IGNORE_RDIFF_ENDMuscle <- with(muscle, {Muscle <- expand.grid(Conc = sort(unique(Conc)), Strip = levels(Strip))Muscle$Yhat <- predict(rats.nls2, Muscle)Muscle <- cbind(Muscle, logLength = rep(as.numeric(NA), 126))ind <- match(paste(Strip, Conc), paste(Muscle$Strip, Muscle$Conc))Muscle$logLength[ind] <- log(Length)Muscle})lattice::xyplot(Yhat ~ Conc | Strip, Muscle, as.table = TRUE, ylim = range(c(Muscle$Yhat, Muscle$logLength), na.rm = TRUE), subscripts = TRUE, xlab = "Calcium Chloride concentration (mM)", ylab = "log(Length in mm)", panel = function(x, y, subscripts, ...) { panel.xyplot(x, Muscle$logLength[subscripts], ...) llines(spline(x, y)) })Simulate from a Multivariate Normal Distribution
Description
Produces one or more samples from the specifiedmultivariate normal distribution.
Usage
mvrnorm(n = 1, mu, Sigma, tol = 1e-6, empirical = FALSE, EISPACK = FALSE)Arguments
n | the number of samples required. |
mu | a vector giving the means of the variables. |
Sigma | a positive-definite symmetric matrix specifying thecovariance matrix of the variables. |
tol | tolerance (relative to largest variance) for numerical lackof positive-definiteness in |
empirical | logical. If true, mu and Sigma specify the empiricalnot population mean and covariance matrix. |
EISPACK | logical: values other than |
Details
The matrix decomposition is done viaeigen; although a Choleskidecomposition might be faster, the eigendecomposition isstabler.
Value
Ifn = 1 a vector of the same length asmu, otherwise ann bylength(mu) matrix with one sample in each row.
Side Effects
Causes creation of the dataset.Random.seed if it doesnot already exist, otherwise its value is updated.
References
B. D. Ripley (1987)Stochastic Simulation. Wiley. Page 98.
See Also
Examples
Sigma <- matrix(c(10,3,3,2),2,2)Sigmavar(mvrnorm(n = 1000, rep(0, 2), Sigma))var(mvrnorm(n = 1000, rep(0, 2), Sigma, empirical = TRUE))Family function for Negative Binomial GLMs
Description
Specifies the information required to fit a Negative Binomial generalizedlinear model, with knowntheta parameter, usingglm().
Usage
negative.binomial(theta = stop("'theta' must be specified"), link = "log")Arguments
theta | The known value of the additional parameter, |
link | The link function, as a character string, name or one-elementcharacter vector specifying one of |
Value
An object of class"family", a list of functions andexpressions needed byglm() to fit a Negative Binomialgeneralized linear model.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
See Also
glm.nb,anova.negbin,summary.negbin
Examples
# Fitting a Negative Binomial model to the quine data# with theta = 2 assumed known.#glm(Days ~ .^4, family = negative.binomial(2), data = quine)Newcomb's Measurements of the Passage Time of Light
Description
A numeric vector giving the ‘Third Series’ of measurements of thepassage time of light recorded by Newcomb in 1882. The given valuesdivided by 1000 plus 24.8 give the time in millionths of a second forlight to traverse a known distance. The ‘true’ value is nowconsidered to be 33.02.
The dataset is given in the order in Staudte and Sheather. Stigler(1977, Table 5) gives the dataset as
28 26 33 24 34 -44 27 16 40 -2 29 22 24 21 25 30 23 29 31 19 24 20 36 32 36 28 25 21 28 29 37 25 28 26 30 32 36 26 30 22 36 23 27 27 28 27 31 27 26 33 26 32 32 24 39 28 24 25 32 25 29 27 28 29 16 23
However, order is not relevant to its use as an example of robustestimation. (Thanks to Anthony Unwin for bringing this difference to ourattention.)
Usage
newcombSource
S. M. Stigler (1973)Simon Newcomb, Percy Daniell, and the history of robust estimation1885–1920.Journal of the American Statistical Association68, 872–879.
S. M. Stigler (1977)Do robust estimators work withreal data?Annals of Statistics,5, 1055–1098.
R. G. Staudte and S. J. Sheather (1990)Robust Estimation and Testing. Wiley.
Eighth-Grade Pupils in the Netherlands
Description
Snijders and Bosker (1999) use as a running example a study of 2287eighth-grade pupils (aged about 11) in 132 classes in 131 schools inthe Netherlands. Only the variables used in our examples are supplied.
Usage
nlschoolsFormat
This data frame contains 2287 rows and the following columns:
langlanguage test score.
IQverbal IQ.
classclass ID.
GSclass size: number of eighth-grade pupils recorded in the class (theremay be others: see
COMB, and some may have been omittedwith missing values).SESsocial-economic status of pupil's family.
COMBwere the pupils taught in a multi-grade class (
0/1)? Classes whichcontained pupils from grades 7 and 8 are coded1, but onlyeighth-graders were tested.
Source
Snijders, T. A. B. and Bosker, R. J. (1999)Multilevel Analysis. An Introduction to Basic and AdvancedMultilevel Modelling. London: Sage.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
nl1 <- within(nlschools, {IQave <- tapply(IQ, class, mean)[as.character(class)]IQ <- IQ - IQave})cen <- c("IQ", "IQave", "SES")nl1[cen] <- scale(nl1[cen], center = TRUE, scale = FALSE)nl.lme <- nlme::lme(lang ~ IQ*COMB + IQave + SES, random = ~ IQ | class, data = nl1)## IGNORE_RDIFF_BEGINsummary(nl.lme)## IGNORE_RDIFF_ENDClassical N, P, K Factorial Experiment
Description
A classical N, P, K (nitrogen, phosphate, potassium) factorialexperiment on the growth of peas conducted on 6 blocks. Each half of afractional factorial design confounding the NPK interaction was usedon 3 of the plots.
Usage
npkFormat
Thenpk data frame has 24 rows and 5 columns:
blockwhich block (label 1 to 6).
Nindicator (0/1) for the application of nitrogen.
Pindicator (0/1) for the application of phosphate.
Kindicator (0/1) for the application of potassium.
yieldYield of peas, in pounds/plot (the plots were (1/70) acre).
Note
This dataset is also contained inR 3.0.2 and later.
Source
Imperial College, London, M.Sc. exercise sheet.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
options(contrasts = c("contr.sum", "contr.poly"))npk.aov <- aov(yield ~ block + N*P*K, npk)## IGNORE_RDIFF_BEGINnpk.aovsummary(npk.aov)alias(npk.aov)coef(npk.aov)options(contrasts = c("contr.treatment", "contr.poly"))npk.aov1 <- aov(yield ~ block + N + K, data = npk)summary.lm(npk.aov1)se.contrast(npk.aov1, list(N=="0", N=="1"), data = npk)model.tables(npk.aov1, type = "means", se = TRUE)## IGNORE_RDIFF_ENDUS Naval Petroleum Reserve No. 1 data
Description
Data on the locations, porosity and permeability (a measure of oil flow)on 104 oil wells in the US Naval Petroleum Reserve No. 1 in California.
Usage
npr1Format
This data frame contains the following columns:
xx coordinates, in miles (origin unspecified)..
yy coordinates, in miles.
permpermeability in milli-Darcies.
porporosity (%).
Source
Maher, J.C., Carter, R.D. and Lantz, R.J. (1975)Petroleum geology of Naval Petroleum Reserve No. 1, Elk Hills,Kern County, California.USGS Professional Paper912.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Data from an Oats Field Trial
Description
The yield of oats from a split-plot field trial using three varieties andfour levels of manurial treatment. The experiment was laid out in 6 blocksof 3 main plots, each split into 4 sub-plots. The varieties were appliedto the main plots and the manurial treatments to the sub-plots.
Usage
oatsFormat
This data frame contains the following columns:
BBlocks, levels I, II, III, IV, V and VI.
VVarieties, 3 levels.
NNitrogen (manurial) treatment, levels 0.0cwt, 0.2cwt, 0.4cwt and 0.6cwt,showing the application in cwt/acre.
YYields in 1/4lbs per sub-plot, each of area 1/80 acre.
Source
Yates, F. (1935) Complex experiments,Journal of the Royal Statistical Society Suppl.2, 181–247.
Also given inYates, F. (1970)Experimental design: Selected papers of Frank Yates, C.B.E, F.R.S.London: Griffin.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
oats$Nf <- ordered(oats$N, levels = sort(levels(oats$N)))oats.aov <- aov(Y ~ Nf*V + Error(B/V), data = oats, qr = TRUE)## IGNORE_RDIFF_BEGINsummary(oats.aov)summary(oats.aov, split = list(Nf=list(L=1, Dev=2:3)))## IGNORE_RDIFF_ENDpar(mfrow = c(1,2), pty = "s")plot(fitted(oats.aov[[4]]), studres(oats.aov[[4]]))abline(h = 0, lty = 2)oats.pr <- proj(oats.aov)qqnorm(oats.pr[[4]][,"Residuals"], ylab = "Stratum 4 residuals")qqline(oats.pr[[4]][,"Residuals"])par(mfrow = c(1,1), pty = "m")oats.aov2 <- aov(Y ~ N + V + Error(B/V), data = oats, qr = TRUE)model.tables(oats.aov2, type = "means", se = TRUE)The Painter's Data of de Piles
Description
The subjective assessment, on a 0 to 20 integer scale, of 54classical painters. The painters were assessed on four characteristics:composition, drawing, colour and expression. The data is due to theEighteenth century art critic, de Piles.
Usage
paintersFormat
The row names of the data frame are the painters. The components are:
CompositionComposition score.
DrawingDrawing score.
ColourColour score.
ExpressionExpression score.
SchoolThe school to which a painter belongs, as indicated by a factor levelcode as follows:
"A": Renaissance;"B": Mannerist;"C": Seicento;"D": Venetian;"E": Lombard;"F": Sixteenth Century;"G": Seventeenth Century;"H": French.
Source
A. J. Weekes (1986)A Genstat Primer. Edward Arnold.
M. Davenport and G. Studdert-Kennedy (1972) The statisticalanalysis of aesthetic judgement: an exploration.Applied Statistics21, 324–333.
I. T. Jolliffe (1986)Principal Component Analysis. Springer.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Produce Pairwise Scatterplots from an 'lda' Fit
Description
Pairwise scatterplot of the data on the linear discriminants.
Usage
## S3 method for class 'lda'pairs(x, labels = colnames(x), panel = panel.lda, dimen, abbrev = FALSE, ..., cex=0.7, type = c("std", "trellis"))Arguments
x | Object of class |
labels | vector of character strings for labelling the variables. |
panel | panel function to plot the data in each panel. |
dimen | The number of linear discriminants to be used for the plot; if thisexceeds the number determined by |
abbrev | whether the group labels are abbreviated on the plots. If |
... | additional arguments for |
cex | graphics parameter |
type | type of plot. The default is in the style of |
Details
This function is a method for the generic functionpairs() for class"lda".It can be invoked by callingpairs(x) for anobjectx of the appropriate class, or directly bycallingpairs.lda(x) regardless of theclass of the object.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Parallel Coordinates Plot
Description
Parallel coordinates plot
Usage
parcoord(x, col = 1, lty = 1, var.label = FALSE, ...)Arguments
x | a matrix or data frame who columns represent variables. Missing valuesare allowed. |
col | A vector of colours, recycled as necessary for each observation. |
lty | A vector of line types, recycled as necessary for each observation. |
var.label | If |
... | Further graphics parameters which are passed to |
Side Effects
a parallel coordinates plots is drawn.
Author(s)
B. D. Ripley. Enhancements based on ideas and code by Fabian Scheipl.
References
Wegman, E. J. (1990) Hyperdimensional data analysis using parallelcoordinates.Journal of the American Statistical Association85, 664–675.
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
parcoord(state.x77[, c(7, 4, 6, 2, 5, 3)])ir <- rbind(iris3[,,1], iris3[,,2], iris3[,,3])parcoord(log(ir)[, c(3, 4, 2, 1)], col = 1 + (0:149)%/%50)N. L. Prater's Petrol Refinery Data
Description
The yield of a petroleum refining process with four covariates.The crude oil appears to come from only 10 distinct samples.
These data were originally used by Prater (1956) tobuild an estimation equation for the yield of the refiningprocess of crude oil to gasoline.
Usage
petrolFormat
The variables are as follows
Nocrude oil sample identification label. (Factor.)
SGspecific gravity, degrees API. (Constant within sample.)
VPvapour pressure in pounds per square inch. (Constant within sample.)
V10volatility of crude; ASTM 10% point. (Constant within sample.)
EPdesired volatility of gasoline. (The end point. Varies within sample.)
Yyield as a percentage of crude.
Source
N. H. Prater (1956) Estimate gasoline yields fromcrudes.Petroleum Refiner35, 236–238.
This dataset is also given inD. J. Hand, F. Daly, K. McConway, D. Lunn and E. Ostrowski (eds) (1994)A Handbook of Small Data Sets. Chapman & Hall.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
library(nlme)Petrol <- petrolPetrol[, 2:5] <- scale(as.matrix(Petrol[, 2:5]), scale = FALSE)pet3.lme <- lme(Y ~ SG + VP + V10 + EP, random = ~ 1 | No, data = Petrol)pet3.lme <- update(pet3.lme, method = "ML")pet4.lme <- update(pet3.lme, fixed. = Y ~ V10 + EP)anova(pet4.lme, pet3.lme)Plot Method for Class 'lda'
Description
Plots a set of data on one, two or more linear discriminants.
Usage
## S3 method for class 'lda'plot(x, panel = panel.lda, ..., cex = 0.7, dimen, abbrev = FALSE, xlab = "LD1", ylab = "LD2")Arguments
x | An object of class |
panel | the panel function used to plot the data. |
... | additional arguments to |
cex | graphics parameter |
dimen | The number of linear discriminants to be used for the plot; if thisexceeds the number determined by |
abbrev | whether the group labels are abbreviated on the plots. If |
xlab | label for the x axis |
ylab | label for the y axis |
Details
This function is a method for the generic functionplot() for class"lda".It can be invoked by callingplot(x) for anobjectx of the appropriate class, or directly bycallingplot.lda(x) regardless of theclass of the object.
The behaviour is determined by the value ofdimen. Fordimen > 2, apairs plot is used. Fordimen = 2, anequiscaled scatter plot is drawn. Fordimen = 1, a set ofhistograms or density plots are drawn. Use argumenttype tomatch"histogram" or"density" or"both".
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
pairs.lda,ldahist,lda,predict.lda
Plot Method for Objects of Class 'mca'
Description
Plot a multiple correspondence analysis.
Usage
## S3 method for class 'mca'plot(x, rows = TRUE, col, cex = par("cex"), ...)Arguments
x | An object of class |
rows | Should the coordinates for the rows be plotted, or just the verticesfor the levels? |
col,cex | The colours and |
... | Additional parameters to |
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
plot(mca(farms, abbrev = TRUE))Ordered Logistic or Probit Regression
Description
Fits a logistic or probit regression model to an ordered factorresponse. The default logistic case isproportional oddslogistic regression, after which the function is named.
Usage
polr(formula, data, weights, start, ..., subset, na.action, contrasts = NULL, Hess = FALSE, model = TRUE, method = c("logistic", "probit", "loglog", "cloglog", "cauchit"))Arguments
formula | a formula expression as for regression models, of the form |
data | an optional data frame, list or environment in which to interpretthe variables occurring in |
weights | optional case weights in fitting. Default to 1. |
start | initial values for the parameters. This is in the format |
... | additional arguments to be passed to |
subset | expression saying which subset of the rows of the data should be usedin the fit. All observations are included by default. |
na.action | a function to filter missing data. |
contrasts | a list of contrasts to be used for some or all ofthe factors appearing as variables in the model formula. |
Hess | logical for whether the Hessian (the observed information matrix)should be returned. Use this if you intend to call |
model | logical for whether the model matrix should be returned. |
method | logistic or probit or (complementary) log-log or cauchit(corresponding to a Cauchy latent variable). |
Details
This model is what Agresti (2002) calls acumulative linkmodel. The basic interpretation is as acoarsened version of alatent variableY_i which has a logistic or normal orextreme-value or Cauchy distribution with scale parameter one and alinear model for the mean. The ordered factor which is observed iswhich binY_i falls into with breakpoints
\zeta_0 = -\infty < \zeta_1 < \cdots < \zeta_K = \infty
This leads to the model
\mbox{logit} P(Y \le k | x) = \zeta_k - \eta
withlogit replaced byprobit for a normal latentvariable, and\eta being the linear predictor, a linearfunction of the explanatory variables (with no intercept). Notethat it is quite common for other software to use the opposite signfor\eta (and hence the coefficientsbeta).
In the logistic case, the left-hand side of the last display is thelog odds of categoryk or less, and since these are log oddswhich differ only by a constant for differentk, the odds areproportional. Hence the termproportional odds logisticregression.
The log-log and complementary log-log links are the increasing functionsF^{-1}(p) = -log(-log(p)) andF^{-1}(p) = log(-log(1-p));some call the first the ‘negative log-log’ link. Thesecorrespond to a latent variable with the extreme-value distribution forthe maximum and minimum respectively.
Aproportional hazards model for grouped survival times can beobtained by using the complementary log-log link with grouping orderedby increasing times.
predict,summary,vcov,anova,model.frame and anextractAIC method for use withstepAIC (andstep). There are alsoprofile andconfint methods.
Value
A object of class"polr". This has components
coefficients | the coefficients of the linear predictor, which has nointercept. |
zeta | the intercepts for the class boundaries. |
deviance | the residual deviance. |
fitted.values | a matrix, with a column for each level of the response. |
lev | the names of the response levels. |
terms | the |
df.residual | the number of residual degrees of freedoms,calculated using the weights. |
edf | the (effective) number of degrees of freedom used by the model |
n,nobs | the (effective) number of observations, calculated using theweights. ( |
call | the matched call. |
method | the matched method used. |
convergence | the convergence code returned by |
niter | the number of function and gradient evaluations used by |
lp | the linear predictor (including any offset). |
Hessian | (if |
model | (if |
Note
Thevcov method uses the approximate Hessian: forreliable results the model matrix should be sensibly scaled with allcolumns having range the order of one.
Prior to version 7.3-32,method = "cloglog" confusingly gavethe log-log link, implicitly assuming the first response level was the‘best’.
References
Agresti, A. (2002)Categorical Data. Second edition. Wiley.
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
options(contrasts = c("contr.treatment", "contr.poly"))house.plr <- polr(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)house.plrsummary(house.plr, digits = 3)## slightly worse fit fromsummary(update(house.plr, method = "probit", Hess = TRUE), digits = 3)## although it is not really appropriate, can fitsummary(update(house.plr, method = "loglog", Hess = TRUE), digits = 3)summary(update(house.plr, method = "cloglog", Hess = TRUE), digits = 3)predict(house.plr, housing, type = "p")addterm(house.plr, ~.^2, test = "Chisq")house.plr2 <- stepAIC(house.plr, ~.^2)house.plr2$anovaanova(house.plr, house.plr2)house.plr <- update(house.plr, Hess=TRUE)pr <- profile(house.plr)confint(pr)plot(pr)pairs(pr)Predict Method for glmmPQL Fits
Description
Obtains predictions from a fitted generalized linear modelwith random effects.
Usage
## S3 method for class 'glmmPQL'predict(object, newdata = NULL, type = c("link", "response"), level, na.action = na.pass, ...)Arguments
object | a fitted object of class inheriting from |
newdata | optionally, a data frame in which to look for variables withwhich to predict. |
type | the type of prediction required. The default is on thescale of the linear predictors; the alternative |
level | an optional integer vector giving the level(s) of groupingto be used in obtaining the predictions. Level values increase fromoutermost to innermost grouping, with level zero corresponding to thepopulation predictions. Defaults to the highest or innermost level ofgrouping. |
na.action | function determining what should be done with missingvalues in |
... | further arguments passed to or from other methods. |
Value
Iflevel is a single integer, a vector otherwise a data frame.
See Also
Examples
fit <- glmmPQL(y ~ trt + I(week > 2), random = ~1 | ID, family = binomial, data = bacteria)predict(fit, bacteria, level = 0, type="response")predict(fit, bacteria, level = 1, type="response")Classify Multivariate Observations by Linear Discrimination
Description
Classify multivariate observations in conjunction withlda, and alsoproject data onto the linear discriminants.
Usage
## S3 method for class 'lda'predict(object, newdata, prior = object$prior, dimen, method = c("plug-in", "predictive", "debiased"), ...)Arguments
object | object of class |
newdata | data frame of cases to be classified or, if |
prior | The prior probabilities of the classes, by default the proportions in thetraining set or what was set in the call to |
dimen | the dimension of the space to be used. If this is less than |
method | This determines how the parameter estimation is handled. With |
... | arguments based from or to other methods |
Details
This function is a method for the generic functionpredict() forclass"lda". It can be invoked by callingpredict(x) foran objectx of the appropriate class, or directly by callingpredict.lda(x) regardless of the class of the object.
Missing values innewdata are handled by returningNA if thelinear discriminants cannot be evaluated. Ifnewdata is omitted andthena.action of the fit omitted cases, these will be omitted on theprediction.
This version centres the linear discriminants so that theweighted mean (weighted byprior) of the group centroids is atthe origin.
Value
a list with components
class | The MAP classification (a factor) |
posterior | posterior probabilities for the classes |
x | the scores of test cases on up to |
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Ripley, B. D. (1996)Pattern Recognition and Neural Networks. Cambridge University Press.
See Also
Examples
tr <- sample(1:50, 25)train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3])test <- rbind(iris3[-tr,,1], iris3[-tr,,2], iris3[-tr,,3])cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))z <- lda(train, cl)predict(z, test)$classPredict from an lqs Fit
Description
Predict from an resistant regression fitted bylqs.
Usage
## S3 method for class 'lqs'predict(object, newdata, na.action = na.pass, ...)Arguments
object | object inheriting from class |
newdata | matrix or data frame of cases to be predicted or, if |
na.action | function determining what should be done with missingvalues in |
... | arguments to be passed from or to other methods. |
Details
This function is a method for the generic functionpredict() for classlqs.It can be invoked by callingpredict(x) for anobjectx of the appropriate class, or directly bycallingpredict.lqs(x) regardless of theclass of the object.
Missing values innewdata are handled by returningNA if thelinear fit cannot be evaluated. Ifnewdata is omitted andthena.action of the fit omitted cases, these will be omitted on theprediction.
Value
A vector of predictions.
Author(s)
B.D. Ripley
See Also
Examples
set.seed(123)fm <- lqs(stack.loss ~ ., data = stackloss, method = "S", nsamp = "exact")predict(fm, stackloss)Predict Method for Class 'mca'
Description
Used to compute coordinates for additional rows or additional factorsin a multiple correspondence analysis.
Usage
## S3 method for class 'mca'predict(object, newdata, type = c("row", "factor"), ...)Arguments
object | An object of class |
newdata | A data frame containingeither additional rows of the factors used tofit |
type | Are predictions required for further rows or for new factors? |
... | Additional arguments from |
Value
Iftype = "row", the coordinates for the additional rows.
Iftype = "factor", the coordinates of the column vertices for thelevels of the new factors.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Classify from Quadratic Discriminant Analysis
Description
Classify multivariate observations in conjunction withqda
Usage
## S3 method for class 'qda'predict(object, newdata, prior = object$prior, method = c("plug-in", "predictive", "debiased", "looCV"), ...)Arguments
object | object of class |
newdata | data frame of cases to be classified or, if |
prior | The prior probabilities of the classes, by default the proportions in thetraining set or what was set in the call to |
method | This determines how the parameter estimation is handled. With |
... | arguments based from or to other methods |
Details
This function is a method for the generic functionpredict() for class"qda".It can be invoked by callingpredict(x) for anobjectx of the appropriate class, or directly bycallingpredict.qda(x) regardless of theclass of the object.
Missing values innewdata are handled by returningNA if thequadratic discriminants cannot be evaluated. Ifnewdata is omitted andthena.action of the fit omitted cases, these will be omitted on theprediction.
Value
a list with components
class | The MAP classification (a factor) |
posterior | posterior probabilities for the classes |
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Ripley, B. D. (1996)Pattern Recognition and Neural Networks. Cambridge University Press.
See Also
Examples
tr <- sample(1:50, 25)train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3])test <- rbind(iris3[-tr,,1], iris3[-tr,,2], iris3[-tr,,3])cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))zq <- qda(train, cl)predict(zq, test)$classMethod for Profiling glm Objects
Description
Investigates the profile log-likelihood function for a fitted model ofclass"glm".
As fromR 4.4.0 was migrated to packagestats with additionalfunctionality.
Quadratic Discriminant Analysis
Description
Quadratic discriminant analysis.
Usage
qda(x, ...)## S3 method for class 'formula'qda(formula, data, ..., subset, na.action)## Default S3 method:qda(x, grouping, prior = proportions, method, CV = FALSE, nu, ...)## S3 method for class 'data.frame'qda(x, ...)## S3 method for class 'matrix'qda(x, grouping, ..., subset, na.action)Arguments
formula | A formula of the form |
data | An optional data frame, list or environment from which variablesspecified in |
x | (required if no formula is given as the principal argument.)a matrix or data frame or Matrix containing the explanatory variables. |
grouping | (required if no formula principal argument is given.)a factor specifying the class for each observation. |
prior | the prior probabilities of class membership. If unspecified, the classproportions for the training set are used. If specified, theprobabilities should be specified in the order of the factor levels. |
subset | An index vector specifying the cases to be used in the trainingsample. (NOTE: If given, this argument must be named.) |
na.action | A function to specify the action to be taken if |
method |
|
CV | If true, returns results (classes and posterior probabilities) forleave-out-out cross-validation. Note that if the prior is estimated,the proportions in the whole dataset are used. |
nu | degrees of freedom for |
... | arguments passed to or from other methods. |
Details
Uses a QR decomposition which will give an error message if thewithin-group variance is singular for any group.
Value
an object of class"qda" containing the following components:
prior | the prior probabilities used. |
means | the group means. |
scaling | for each group |
ldet | a vector of half log determinants of the dispersion matrix. |
lev | the levels of the grouping factor. |
terms | (if formula is a formula)an object of mode expression and class term summarizingthe formula. |
call | the (matched) function call. |
unlessCV=TRUE, when the return value is a list with components:
class | The MAP classification (a factor) |
posterior | posterior probabilities for the classes |
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Ripley, B. D. (1996)Pattern Recognition and Neural Networks. Cambridge University Press.
See Also
Examples
tr <- sample(1:50, 25)train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3])test <- rbind(iris3[-tr,,1], iris3[-tr,,2], iris3[-tr,,3])cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))z <- qda(train, cl)predict(z,test)$classAbsenteeism from School in Rural New South Wales
Description
Thequine data frame has 146 rows and 5 columns.Children from Walgett, New South Wales, Australia, were classified byCulture, Age, Sex and Learner status and the number of days absent fromschool in a particular school year was recorded.
Usage
quineFormat
This data frame contains the following columns:
Ethethnic background: Aboriginal or Not, (
"A"or"N").Sexsex: factor with levels (
"F"or"M").Ageage group: Primary (
"F0"), or forms"F1,""F2"or"F3".Lrnlearner status: factor with levels Average or Slow learner, (
"AL"or"SL").Daysdays absent from school in the year.
Source
S. Quine, quoted in Aitkin, M. (1978) The analysis of unbalanced crossclassifications (with discussion).Journal of the Royal Statistical Society series A141, 195–223.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Rational Approximation
Description
Find rational approximations to the components of a real numericobject using a standard continued fraction method.
Usage
rational(x, cycles = 10, max.denominator = 2000, ...)Arguments
x | Any object of mode numeric. Missing values are now allowed. |
cycles | The maximum number of steps to be used in the continued fractionapproximation process. |
max.denominator | An early termination criterion. If any partial denominatorexceeds |
... | arguments passed to or from other methods. |
Details
Each component is first expanded in a continued fraction of theform
x = floor(x) + 1/(p1 + 1/(p2 + ...)))
wherep1,p2, ... are positive integers, terminating eitheratcycles terms or when apj > max.denominator. Thecontinued fraction is then re-arranged to retrieve the numeratorand denominator as integers and the ratio returned as the value.
Value
A numeric object with the same attributes asx but with entriesrational approximations to the values. This effectively roundsrelative to the size of the object and replaces very smallentries by zero.
See Also
Examples
X <- matrix(runif(25), 5, 5)zapsmall(solve(X, X/5)) # print near-zeroes as zerorational(solve(X, X/5))Convert a Formula Transformed by 'denumerate'
Description
denumerate converts a formula written using the conventions ofloglm into one thatterms is able to process.renumerateconverts it back again to a form like the original.
Usage
renumerate(x)Arguments
x | A formula, normally as modified by |
Details
This is an inverse function todenumerate. It is only neededsinceterms returns an expanded form of the original formulawhere the non-marginal terms are exposed. This expanded form ismapped back into a form corresponding to the one that the useroriginally supplied.
Value
A formula where all variables with names of the form.vn, wheren is an integer, converted to numbers,n, as allowed by theformula conventions ofloglm.
See Also
Examples
denumerate(~(1+2+3)^3 + a/b)## ~ (.v1 + .v2 + .v3)^3 + a/brenumerate(.Last.value)## ~ (1 + 2 + 3)^3 + a/bRobust Fitting of Linear Models
Description
Fit a linear model by robust regression using an M estimator.
Usage
rlm(x, ...)## S3 method for class 'formula'rlm(formula, data, weights, ..., subset, na.action, method = c("M", "MM", "model.frame"), wt.method = c("inv.var", "case"), model = TRUE, x.ret = TRUE, y.ret = FALSE, contrasts = NULL)## Default S3 method:rlm(x, y, weights, ..., w = rep(1, nrow(x)), init = "ls", psi = psi.huber, scale.est = c("MAD", "Huber", "proposal 2"), k2 = 1.345, method = c("M", "MM"), wt.method = c("inv.var", "case"), maxit = 20, acc = 1e-4, test.vec = "resid", lqs.control = NULL)psi.huber(u, k = 1.345, deriv = 0)psi.hampel(u, a = 2, b = 4, c = 8, deriv = 0)psi.bisquare(u, c = 4.685, deriv = 0)Arguments
formula | a formula of the form |
data | an optional data frame, list or environment from which variablesspecified in |
weights | a vector of prior weights for each case. |
subset | An index vector specifying the cases to be used in fitting. |
na.action | A function to specify the action to be taken if |
x | a matrix or data frame containing the explanatory variables. |
y | the response: a vector of length the number of rows of |
method | currently either M-estimation or MM-estimation or (for the |
wt.method | are the weights case weights (giving the relative importance of case,so a weight of 2 means there are two of these) or the inverse of thevariances, so a weight of two means this error is half as variable? |
model | should the model frame be returned in the object? |
x.ret | should the model matrix be returned in the object? |
y.ret | should the response be returned in the object? |
contrasts | optional contrast specifications: see |
w | (optional) initial down-weighting for each case. |
init | (optional) initial values for the coefficients OR a method to findinitial values OR the result of a fit with a |
psi | the psi function is specified by this argument. It must give(possibly by name) a function |
scale.est | method of scale estimation: re-scaled MAD of the residuals (default)or Huber's proposal 2 (which can be selected by either |
k2 | tuning constant used for Huber proposal 2 scale estimation. |
maxit | the limit on the number of IWLS iterations. |
acc | the accuracy for the stopping criterion. |
test.vec | the stopping criterion is based on changes in this vector. |
... | additional arguments to be passed to |
lqs.control | An optional list of control values for |
u | numeric vector of evaluation points. |
k,a,b,c | tuning constants. |
deriv |
|
Details
Fitting is done by iterated re-weighted least squares (IWLS).
Psi functions are supplied for the Huber, Hampel and Tukey bisquareproposals aspsi.huber,psi.hampel andpsi.bisquare. Huber's corresponds to a convex optimizationproblem and gives a unique solution (up to collinearity). The othertwo will have multiple local minima, and a good starting point isdesirable.
Selectingmethod = "MM" selects a specific set of options whichensures that the estimator has a high breakdown point. The initial setof coefficients and the final scale are selected by an S-estimatorwithk0 = 1.548; this gives (forn \gg p)breakdown point 0.5.The final estimator is an M-estimator with Tukey's biweight and fixedscale that will inherit this breakdown point providedc > k0;this is true for the default value ofc that corresponds to95% relative efficiency at the normal. Case weights are notsupported formethod = "MM".
Value
An object of class"rlm" inheriting from"lm".Note that thedf.residual component is deliberately set toNA to avoid inappropriate estimation of the residual scale fromthe residual mean square by"lm" methods.
The additional components not in anlm object are
s | the robust scale estimate used |
w | the weights used in the IWLS process |
psi | the psi function with parameters substituted |
conv | the convergence criteria at each iteration |
converged | did the IWLS converge? |
wresid | a working residual, weighted for |
Note
Prior to version7.3-52, offset terms informulawere omitted from fitted and predicted values.
References
P. J. Huber (1981)Robust Statistics.Wiley.
F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw and W. A. Stahel (1986)Robust Statistics: The Approach based on Influence Functions.Wiley.
A. Marazzi (1993)Algorithms, Routines and S Functions for Robust Statistics.Wadsworth & Brooks/Cole.
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
summary(rlm(stack.loss ~ ., stackloss))rlm(stack.loss ~ ., stackloss, psi = psi.hampel, init = "lts")rlm(stack.loss ~ ., stackloss, psi = psi.bisquare)Relative Curvature Measures for Non-Linear Regression
Description
Calculates the root mean square parameter effects and intrinsic relativecurvatures,c^\theta andc^\iota, for afitted nonlinear regression, as defined in Bates & Watts, section 7.3,p. 253ff
Usage
rms.curv(obj)Arguments
obj | Fitted model object of class |
Details
The method of section 7.3.1 of Bates & Watts is implemented. Thefunctionderiv3 should be used generate a model function with firstderivative (gradient) matrix and second derivative (Hessian) arrayattributes. This function should then be used to fit the nonlinearregression model.
A print method,print.rms.curv, prints thepc andic components only, suitably annotated.
If eitherpc oric exceeds some threshold (0.3 has beensuggested) the curvature is unacceptably high for the planar assumption.
Value
A list of classrms.curv with componentspc andicfor parameter effects and intrinsic relative curvatures multiplied bysqrt(F),ct andci forc^\theta andc^\iota (unmultiplied),andC the C-array as used in section 7.3.1 of Bates & Watts.
References
Bates, D. M, and Watts, D. G. (1988)Nonlinear Regression Analysis and its Applications.Wiley, New York.
See Also
Examples
# The treated sample from the Puromycin datammcurve <- deriv3(~ Vm * conc/(K + conc), c("Vm", "K"), function(Vm, K, conc) NULL)Treated <- Puromycin[Puromycin$state == "treated", ](Purfit1 <- nls(rate ~ mmcurve(Vm, K, conc), data = Treated, start = list(Vm=200, K=0.1)))rms.curv(Purfit1)##Parameter effects: c^theta x sqrt(F) = 0.2121## Intrinsic: c^iota x sqrt(F) = 0.092Simulate Negative Binomial Variates
Description
Function to generate random outcomes from a Negative Binomial distribution,with meanmu and variancemu + mu^2/theta.
Usage
rnegbin(n, mu = n, theta = stop("'theta' must be specified"))Arguments
n | If a scalar, the number of sample values required. If a vector, |
mu | The vector of means. Short vectors are recycled. |
theta | Vector of values of the |
Details
The function uses the representation of the Negative Binomial distributionas a continuous mixture of Poisson distributions with Gamma distributed means.Unlikernbinom the index can be arbitrary.
Value
Vector of random Negative Binomial variate values.
Side Effects
Changes.Random.seed in the usual way.
Examples
# Negative Binomials with means fitted(fm) and theta = 4.5fm <- glm.nb(Days ~ ., data = quine)dummy <- rnegbin(fitted(fm), theta = 4.5)Road Accident Deaths in US States
Description
A data frame with the annual deaths in road accidents for halfthe US states.
Usage
roadFormat
Columns are:
statename.
deathsnumber of deaths.
driversnumber of drivers (in 10,000s).
popdenpopulation density in people per square mile.
rurallength of rural roads, in 1000s of miles.
tempaverage daily maximum temperature in January.
fuelfuel consumption in 10,000,000 US gallons per year.
Source
Imperial College, London M.Sc. exercise
Numbers of Rotifers by Fluid Density
Description
The data give the numbers of rotifers falling out of suspension fordifferent fluid densities. There are two species,pmPolyartha major andkc,Keratella cochlearis andfor each species the number falling out and the total number aregiven.
Usage
rotiferFormat
densityspecific density of fluid.
pm.ynumber falling out forP. major.
pm.totaltotal number ofP. major.
kc.ynumber falling out forK. cochlearis.
kc.tottotal number ofK. cochlearis.
Source
D. Collett (1991)Modelling Binary Data. Chapman & Hall. p. 217
Sammon's Non-Linear Mapping
Description
One form of non-metric multidimensional scaling.
Usage
sammon(d, y = cmdscale(d, k), k = 2, niter = 100, trace = TRUE, magic = 0.2, tol = 1e-4)Arguments
d | distance structure of the form returned by |
y | An initial configuration. If none is supplied, |
k | The dimension of the configuration. |
niter | The maximum number of iterations. |
trace | Logical for tracing optimization. Default |
magic | initial value of the step size constant in diagonal Newton method. |
tol | Tolerance for stopping, in units of stress. |
Details
This chooses a two-dimensional configuration to minimize the stress,the sum of squared differences between the input distances and thoseof the configuration, weighted by the distances, the whole sum beingdivided by the sum of input distances to make the stress scale-free.
An iterative algorithm is used, which will usually converge in around50 iterations. As this is necessarily anO(n^2) calculation, it is slowfor large datasets. Further, since the configuration is only determinedup to rotations and reflections (by convention the centroid is at theorigin), the result can vary considerably from machine to machine.In this release the algorithm has been modified by adding a step-lengthsearch (magic) to ensure that it always goes downhill.
Value
Two components:
points | A two-column vector of the fitted configuration. |
stress | The final stress achieved. |
Side Effects
If trace is true, the initial stress and the current stress are printedout every 10 iterations.
References
Sammon, J. W. (1969)A non-linear mapping for data structure analysis.IEEE Trans. Comput.,C-18 401–409.
Ripley, B. D. (1996)Pattern Recognition and Neural Networks. Cambridge University Press.
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
swiss.x <- as.matrix(swiss[, -1])swiss.sam <- sammon(dist(swiss.x))plot(swiss.sam$points, type = "n")text(swiss.sam$points, labels = as.character(1:nrow(swiss.x)))Ships Damage Data
Description
Data frame giving the number of damage incidents and aggregatemonths of service by ship type, year of construction, and period of operation.
Usage
shipsFormat
typetype:
"A"to"E".yearyear of construction: 1960–64, 65–69, 70–74, 75–79(coded as
"60","65","70","75").periodperiod of operation : 1960–74, 75–79.
serviceaggregate months of service.
incidentsnumber of damage incidents.
Source
P. McCullagh and J. A. Nelder, (1983),Generalized Linear Models. Chapman & Hall, section 6.3.2, page 137
Shoe wear data of Box, Hunter and Hunter
Description
A list of two vectors, giving the wear of shoes of materials A and Bfor one foot each of ten boys.
Usage
shoesSource
G. E. P. Box, W. G. Hunter and J. S. Hunter (1978)Statistics for Experimenters. Wiley, p. 100
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Percentage of Shrimp in Shrimp Cocktail
Description
A numeric vector with 18 determinations by different laboratoriesof the amount (percentage of the declared total weight) of shrimpin shrimp cocktail.
Usage
shrimpSource
F. J. King and J. J. Ryan (1976)Collaborative study of the determination of the amount of shrimp inshrimp cocktail.J. Off. Anal. Chem.59, 644–649.
R. G. Staudte and S. J. Sheather (1990)Robust Estimation and Testing. Wiley.
Space Shuttle Autolander Problem
Description
Theshuttle data frame has 256 rows and 7 columns.The first six columns are categorical variables giving exampleconditions; the seventh is the decision. The first 253 rows are thetraining set, the last 3 the test conditions.
Usage
shuttleFormat
This data frame contains the following factor columns:
stabilitystable positioning or not (
stab/xstab).errorsize of error (
MM/SS/LX/XL).signsign of error, positive or negative (
pp/nn).windwind sign (
head/tail).magnwind strength (
Light/Medium/Strong/Out of Range).visvisibility (
yes/no).useuse the autolander or not. (
auto/noauto.)
Source
D. Michie (1989)Problems of computer-aided concept formation. InApplications of Expert Systems 2,ed. J. R. Quinlan, Turing Institute Press / Addison-Wesley, pp. 310–333.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Snail Mortality Data
Description
Groups of 20 snails were held for periods of 1, 2, 3 or 4 weeksin carefully controlled conditions of temperature and relativehumidity. There were two species of snail, A and B, and theexperiment was designed as a 4 by 3 by 4 by 2 completely randomizeddesign. At the end of the exposure time the snails were tested to see ifthey had survived; the process itself is fatal for the animals. Theobject of the exercise was to model the probability of survival in terms ofthe stimulus variables, and in particular to test for differences betweenspecies.
The data are unusual in that in most cases fatalities during the experimentwere fairly small.
Usage
snailsFormat
The data frame contains the following components:
Speciessnail species A (
1) or B (2).Exposureexposure in weeks.
Rel.Humrelative humidity (4 levels).
Temptemperature, in degrees Celsius (3 levels).
Deathsnumber of deaths.
Nnumber of snails exposed.
Source
Zoology Department, The University of Adelaide.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Extract Standardized Residuals from a Linear Model
Description
The standardized residuals. These are normalized to unitvariance, fitted including the current data point.
Usage
stdres(object)Arguments
object | any object representing a linear model. |
Value
The vector of appropriately transformed residuals.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
The Saturated Steam Pressure Data
Description
Temperature and pressure in a saturated steam driven experimental device.
Usage
steamFormat
The data frame contains the following components:
Temptemperature, in degrees Celsius.
Presspressure, in Pascals.
Source
N.R. Draper and H. Smith (1981)Applied Regression Analysis. Wiley, pp. 518–9.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Choose a model by AIC in a Stepwise Algorithm
Description
Performs stepwise model selection by AIC.
Usage
stepAIC(object, scope, scale = 0, direction = c("both", "backward", "forward"), trace = 1, keep = NULL, steps = 1000, use.start = FALSE, k = 2, ...)Arguments
object | an object representing a model of an appropriate class.This is used as the initial model in the stepwise search. |
scope | defines the range of models examined in the stepwise search.This should be either a single formula, or a list containingcomponents |
scale | used in the definition of the AIC statistic for selecting the models,currently only for |
direction | the mode of stepwise search, can be one of |
trace | if positive, information is printed during the running of |
keep | a filter function whose input is a fitted model object and theassociated |
steps | the maximum number of steps to be considered. The default is 1000(essentially as many as required). It is typically used to stop theprocess early. |
use.start | if true the updated fits are done starting at the linear predictor forthe currently selected model. This may speed up the iterativecalculations for |
k | the multiple of the number of degrees of freedom used for the penalty.Only |
... | any additional arguments to |
Details
The set of models searched is determined by thescope argument.The right-hand-side of itslower component is always includedin the model, and right-hand-side of the model is included in theupper component. Ifscope is a single formula, itspecifies theupper component, and thelower model isempty. Ifscope is missing, the initial model is used as theupper model.
Models specified byscope can be templates to updateobject as used byupdate.formula.
There is a potential problem in usingglm fits with avariablescale, as in that case the deviance is not simplyrelated to the maximized log-likelihood. Theglm method forextractAIC makes theappropriate adjustment for agaussian family, but may need to beamended for other cases. (Thebinomial andpoissonfamilies have fixedscale by default and do not correspondto a particular maximum-likelihood problem for variablescale.)
Where a conventional deviance exists (e.g. forlm,aovandglm fits) this is quoted in the analysis of variance table:it is theunscaled deviance.
Value
the stepwise-selected model is returned, with up to two additionalcomponents. There is an"anova" component corresponding to thesteps taken in the search, as well as a"keep" component if thekeep= argument was supplied in the call. The"Resid. Dev" column of the analysis of deviance table refersto a constant minus twice the maximized log likelihood: it will be adeviance only in cases where a saturated model is well-defined(thus excludinglm,aov andsurvreg fits,for example).
Note
The model fitting must apply the models to the same dataset. This maybe a problem if there are missing values and anna.action other thanna.fail is used (as is the default inR).We suggest you remove the missing values first.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
quine.hi <- aov(log(Days + 2.5) ~ .^4, quine)quine.nxt <- update(quine.hi, . ~ . - Eth:Sex:Age:Lrn)quine.stp <- stepAIC(quine.nxt, scope = list(upper = ~Eth*Sex*Age*Lrn, lower = ~1), trace = FALSE)quine.stp$anovacpus1 <- cpusfor(v in names(cpus)[2:7]) cpus1[[v]] <- cut(cpus[[v]], unique(quantile(cpus[[v]])), include.lowest = TRUE)cpus0 <- cpus1[, 2:8] # excludes names, authors' predictionscpus.samp <- sample(1:209, 100)cpus.lm <- lm(log10(perf) ~ ., data = cpus1[cpus.samp,2:8])cpus.lm2 <- stepAIC(cpus.lm, trace = FALSE)cpus.lm2$anovaexample(birthwt)birthwt.glm <- glm(low ~ ., family = binomial, data = bwt)birthwt.step <- stepAIC(birthwt.glm, trace = FALSE)birthwt.step$anovabirthwt.step2 <- stepAIC(birthwt.glm, ~ .^2 + I(scale(age)^2) + I(scale(lwt)^2), trace = FALSE)birthwt.step2$anovaquine.nb <- glm.nb(Days ~ .^4, data = quine)quine.nb2 <- stepAIC(quine.nb)quine.nb2$anovaThe Stormer Viscometer Data
Description
The stormer viscometer measures the viscosity of a fluid by measuring thetime taken for an inner cylinder in the mechanism to perform a fixed numberof revolutions in response to an actuating weight. The viscometer iscalibrated by measuring the time taken with varying weights while themechanism is suspended in fluids of accurately known viscosity. The datacomes from such a calibration, and theoretical considerations suggest anonlinear relationship between time, weight and viscosity, of the formTime = (B1*Viscosity)/(Weight - B2) + EwhereB1 andB2are unknown parameters to be estimated, andE is error.
Usage
stormerFormat
The data frame contains the following components:
Viscosityviscosity of fluid.
Wtactuating weight.
Timetime taken.
Source
E. J. Williams (1959)Regression Analysis. Wiley.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Extract Studentized Residuals from a Linear Model
Description
The Studentized residuals. Like standardized residuals, these arenormalized to unit variance, but the Studentized version is fittedignoring the current data point. (They are sometimes called jackknifedresiduals).
Usage
studres(object)Arguments
object | any object representing a linear model. |
Value
The vector of appropriately transformed residuals.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Summary Method Function for Objects of Class 'loglm'
Description
Returns a summary list for log-linear models fitted byiterative proportional scaling usingloglm.
Usage
## S3 method for class 'loglm'summary(object, fitted = FALSE, ...)Arguments
object | a fitted loglm model object. |
fitted | if |
... | arguments to be passed to or from other methods. |
Details
This function is a method for the generic functionsummary() for class"loglm".It can be invoked by callingsummary(x) for anobjectx of the appropriate class, or directly bycallingsummary.loglm(x) regardless of theclass of the object.
Value
a list is returned for use byprint.summary.loglm.This has components
formula | the formula used to produce |
tests | the table of test statistics (likelihood ratio, Pearson) for the fit. |
oe | if |
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Summary Method Function for Objects of Class 'negbin'
Description
Identical tosummary.glm, but with three lines of additional output: theML estimate of theta, its standard error, and twice the log-likelihoodfunction.
Usage
## S3 method for class 'negbin'summary(object, dispersion = 1, correlation = FALSE, ...)Arguments
object | fitted model object of class |
dispersion | as for |
correlation | as for |
... | arguments passed to or from other methods. |
Details
summary.glm is used to produce the majority of the output and supply theresult.This function is a method for the generic functionsummary() for class"negbin".It can be invoked by callingsummary(x) for anobjectx of the appropriate class, or directly bycallingsummary.negbin(x) regardless of theclass of the object.
Value
As forsummary.glm; the additional lines of output are not included inthe resultant object.
Side Effects
A summary table is produced as forsummary.glm, with the additionalinformation described above.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
summary,glm.nb,negative.binomial,anova.negbin
Examples
summary(glm.nb(Days ~ Eth*Age*Lrn*Sex, quine, link = log))Summary Method for Robust Linear Models
Description
summary method for objects of class"rlm"
Usage
## S3 method for class 'rlm'summary(object, method = c("XtX", "XtWX"), correlation = FALSE, ...)Arguments
object | the fitted model.This is assumed to be the result of some fit that producesan object inheriting from the class |
method | Should the weighted (by the IWLS weights) or unweighted cross-productsmatrix be used? |
correlation | logical. Should correlations be computed (and printed)? |
... | arguments passed to or from other methods. |
Details
This function is a method for the generic functionsummary() for class"rlm".It can be invoked by callingsummary(x) for anobjectx of the appropriate class, or directly bycallingsummary.rlm(x) regardless of theclass of the object.
Value
If printing takes place, only a null value is returned.Otherwise, a list is returned with the following components.Printing always takes place if this function is invoked automaticallyas a method for thesummary function.
correlation | The computed correlation coefficient matrix for the coefficients in the model. |
cov.unscaled | The unscaled covariance matrix; i.e, a matrix such that multiplying it byan estimate of the error variance produces an estimated covariance matrixfor the coefficients. |
sigma | The scale estimate. |
stddev | A scale estimate used for the standard errors. |
df | The number of degrees of freedom for the model and for residuals. |
coefficients | A matrix with three columns, containing the coefficients, their standard errorsand the corresponding t statistic. |
terms | The terms object used in fitting this model. |
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
summary(rlm(calls ~ year, data = phones, maxit = 50))Student Survey Data
Description
This data frame contains the responses of 237 Statistics I students atthe University of Adelaide to a number of questions.
Usage
surveyFormat
The components of the data frame are:
SexThe sex of the student. (Factor with levels
"Male"and"Female".)Wr.Hndspan (distance from tip of thumb to tip of little finger of spreadhand) of writing hand, in centimetres.
NW.Hndspan of non-writing hand.
W.Hndwriting hand of student. (Factor, with levels
"Left"and"Right".)Fold“Fold your arms! Which is on top” (Factor, with levels
"R on L","L on R","Neither".)Pulsepulse rate of student (beats per minute).
Clap‘Clap your hands! Which hand is on top?’ (Factor, with levels
"Right","Left","Neither".)Exerhow often the student exercises. (Factor, with levels
"Freq"(frequently),"Some","None".)Smokehow much the student smokes. (Factor, levels
"Heavy","Regul"(regularly),"Occas"(occasionally),"Never".)Heightheight of the student in centimetres.
M.Iwhether the student expressed height in imperial(feet/inches) or metric (centimetres/metres) units. (Factor, levels
"Metric","Imperial".)Ageage of the student in years.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S-PLUS. Fourth Edition. Springer.
Synthetic Classification Problem
Description
Thesynth.tr data frame has 250 rows and 3 columns.Thesynth.te data frame has 100 rows and 3 columns.It is intended thatsynth.tr be used from training andsynth.te for testing.
Usage
synth.trsynth.teFormat
These data frames contains the following columns:
xsx-coordinate
ysy-coordinate
ycclass, coded as 0 or 1.
Source
Ripley, B.D. (1994)Neural networks and related methods forclassification (with discussion).Journal of the Royal Statistical Society series B56, 409–456.
Ripley, B.D. (1996)Pattern Recognition and Neural Networks.Cambridge: Cambridge University Press.
Estimate theta of the Negative Binomial
Description
Given the estimated mean vector, estimatetheta of theNegative Binomial Distribution.
Usage
theta.md(y, mu, dfr, weights, limit = 20, eps = .Machine$double.eps^0.25)theta.ml(y, mu, n, weights, limit = 10, eps = .Machine$double.eps^0.25, trace = FALSE)theta.mm(y, mu, dfr, weights, limit = 10, eps = .Machine$double.eps^0.25)Arguments
y | Vector of observed values from the Negative Binomial. |
mu | Estimated mean vector. |
n | Number of data points (defaults to the sum of |
dfr | Residual degrees of freedom (assuming |
weights | Case weights. If missing, taken as 1. |
limit | Limit on the number of iterations. |
eps | Tolerance to determine convergence. |
trace | logical: should iteration progress be printed? |
Details
theta.md estimates by equating the deviance to the residualdegrees of freedom, an analogue of a moment estimator.
theta.ml uses maximum likelihood.
theta.mm calculates the moment estimator oftheta byequating the Pearson chi-square\sum (y-\mu)^2/(\mu+\mu^2/\theta)to the residual degrees of freedom.
Value
The required estimate oftheta, as a scalar.Fortheta.ml, the standard error is given as attribute"SE".
See Also
Examples
quine.nb <- glm.nb(Days ~ .^2, data = quine)theta.md(quine$Days, fitted(quine.nb), dfr = df.residual(quine.nb))theta.ml(quine$Days, fitted(quine.nb))theta.mm(quine$Days, fitted(quine.nb), dfr = df.residual(quine.nb))## weighted exampleyeast <- data.frame(cbind(numbers = 0:5, fr = c(213, 128, 37, 18, 3, 1)))fit <- glm.nb(numbers ~ 1, weights = fr, data = yeast)summary(fit)mu <- fitted(fit)theta.md(yeast$numbers, mu, dfr = 399, weights = yeast$fr)theta.ml(yeast$numbers, mu, limit = 15, weights = yeast$fr)theta.mm(yeast$numbers, mu, dfr = 399, weights = yeast$fr)Spatial Topographic Data
Description
Thetopo data frame has 52 rows and 3 columns, oftopographic heights within a 310 feet square.
Usage
topoFormat
This data frame contains the following columns:
xx coordinates (units of 50 feet)
yy coordinates (units of 50 feet)
zheights (feet)
Source
Davis, J.C. (1973)Statistics and Data Analysis in Geology.Wiley.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Plot a Histogram
Description
Creates a histogram on the current graphics device.
Usage
truehist(data, nbins = "Scott", h, x0 = -h/1000, breaks, prob = TRUE, xlim = range(breaks), ymax = max(est), col = "cyan", xlab = deparse(substitute(data)), bty = "n", ...)Arguments
data | numeric vector of data for histogram. Missing values ( |
nbins | The suggested number of bins. Either a positive integer, or a character stringnaming a rule: |
h | The bin width, a strictly positive number (takes precedence over |
x0 | Shift for the bins - the breaks are at |
breaks | The set of breakpoints to be used. (Usually omitted, takes precedenceover |
prob | If true (the default) plot a true histogram.The vertical axis has arelative frequency densityscale, so the product of the dimensions of any panel gives therelative frequency. Hence the total area under the histogramis 1 and it is directly comparable with most other estimatesof the probability density function.If false plot the counts in the bins. |
xlim | The limits for the x-axis. |
ymax | The upper limit for the y-axis. |
col | The colour for the bar fill: the default is colour 5 in the defaultR palette. |
xlab | label for the plot x-axis. By default, this will be the name of |
bty | The box type for the plot - defaults to none. |
... |
Details
This plots a true histogram, a density estimate of total area 1. Ifbreaks is specified, those breakpoints are used. Otherwise ifh is specified, a regular grid of bins is used with widthh. If neitherbreaks norh is specified,nbins is used to select a suitableh.
Side Effects
A histogram is plotted on the current device.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Unbiased Cross-Validation for Bandwidth Selection
Description
Uses unbiased cross-validation to select the bandwidth of a Gaussiankernel density estimator.
Usage
ucv(x, nb = 1000, lower, upper)Arguments
x | a numeric vector |
nb | number of bins to use. |
lower,upper | Range over which to minimize. The default is almost always satisfactory. |
Value
a bandwidth.
References
Scott, D. W. (1992)Multivariate Density Estimation: Theory, Practice, and Visualization.Wiley.
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
ucv(geyser$duration)Counts of Waders at 15 Sites in South Africa
Description
Thewaders data frame has 15 rows and 19 columns.The entries are counts of waders in summer.
Usage
wadersFormat
This data frame contains the following columns (species)
S1Oystercatcher
S2White-fronted Plover
S3Kitt Lutz's Plover
S4Three-banded Plover
S5Grey Plover
S6Ringed Plover
S7Bar-tailed Godwit
S8Whimbrel
S9Marsh Sandpiper
S10Greenshank
S11Common Sandpiper
S12Turnstone
S13Knot
S14Sanderling
S15Little Stint
S16Curlew Sandpiper
S17Ruff
S18Avocet
S19Black-winged Stilt
The rows are the sites:
A = Namibia North coast
B = Namibia North wetland
C = Namibia South coast
D = Namibia South wetland
E = Cape North coast
F = Cape North wetland
G = Cape West coast
H = Cape West wetland
I = Cape South coast
J= Cape South wetland
K = Cape East coast
L = Cape East wetland
M = Transkei coast
N = Natal coast
O = Natal wetland
Source
J.C. Gower and D.J. Hand (1996)BiplotsChapman & Hall Table 9.1. Quoted as from:
R.W. Summers, L.G. Underhill, D.J. Pearson and D.A. Scott (1987)Wader migration systems in south and eastern Africa and western Asia.Wader Study Group Bulletin49 Supplement, 15–34.
Examples
plot(corresp(waders, nf=2))House Insulation: Whiteside's Data
Description
Mr Derek Whiteside of the UK Building Research Station recorded theweekly gas consumption and average external temperature at his ownhouse in south-east England for two heating seasons, one of 26 weeksbefore, and one of 30 weeks after cavity-wall insulation wasinstalled. The object of the exercise was to assess the effect of theinsulation on gas consumption.
Usage
whitesideFormat
Thewhiteside data frame has 56 rows and 3 columns.:
InsulA factor, before or after insulation.
TempPurportedly the average outside temperature in degrees Celsius. (Thesevalues is far too low for any 56-week period in the 1960s inSouth-East England. It might be the weekly average of daily minima.)
GasThe weekly gas consumption in 1000s of cubic feet.
Source
A data set collected in the 1960s by Mr Derek Whiteside of theUK Building Research Station. Reported by
Hand, D. J., Daly, F., McConway, K., Lunn, D. and Ostrowski, E. eds (1993)A Handbook of Small Data Sets.Chapman & Hall, p. 69.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
require(lattice)xyplot(Gas ~ Temp | Insul, whiteside, panel = function(x, y, ...) { panel.xyplot(x, y, ...) panel.lmline(x, y, ...) }, xlab = "Average external temperature (deg. C)", ylab = "Gas consumption (1000 cubic feet)", aspect = "xy", strip = function(...) strip.default(..., style = 1))gasB <- lm(Gas ~ Temp, whiteside, subset = Insul=="Before")gasA <- update(gasB, subset = Insul=="After")summary(gasB)summary(gasA)gasBA <- lm(Gas ~ Insul/Temp - 1, whiteside)summary(gasBA)gasQ <- lm(Gas ~ Insul/(Temp + I(Temp^2)) - 1, whiteside)coef(summary(gasQ))gasPR <- lm(Gas ~ Insul + Temp, whiteside)anova(gasPR, gasBA)options(contrasts = c("contr.treatment", "contr.poly"))gasBA1 <- lm(Gas ~ Insul*Temp, whiteside)coef(summary(gasBA1))Bandwidth Selection by Pilot Estimation of Derivatives
Description
Uses the method of Sheather & Jones (1991) to select the bandwidth ofa Gaussian kernel density estimator.
Usage
width.SJ(x, nb = 1000, lower, upper, method = c("ste", "dpi"))Arguments
x | a numeric vector |
nb | number of bins to use. |
upper,lower | range over which to search for solution if |
method | Either |
Value
a bandwidth.
Note
A faster version for largen (thousands) is available inR\ge 3.4.0 as part ofbw.SJ: quadruple itsvalue for comparability with this version.
References
Sheather, S. J. and Jones, M. C. (1991) A reliable data-based bandwidthselection method for kernel density estimation.Journal of the Royal Statistical Society series B53, 683–690.
Scott, D. W. (1992)Multivariate Density Estimation: Theory, Practice, and Visualization.Wiley.
Wand, M. P. and Jones, M. C. (1995)Kernel Smoothing.Chapman & Hall.
See Also
Examples
width.SJ(geyser$duration, method = "dpi")width.SJ(geyser$duration)width.SJ(galaxies, method = "dpi")width.SJ(galaxies)Write a Matrix or Data Frame
Description
Writes a matrix or data frame to a file or the console, using columnlabels and a layout respecting columns.
Usage
write.matrix(x, file = "", sep = " ", blocksize)Arguments
x | matrix or data frame. |
file | name of output file. The default ( |
sep | The separator between columns. |
blocksize | If supplied and positive, the output is written in blocks of |
Details
Ifx is a matrix, supplyingblocksize is morememory-efficient and enables larger matrices to be written, but eachblock of rows might be formatted slightly differently.
Ifx is a data frame, the conversion to a matrix may negate thememory saving.
Side Effects
A formatted file is produced, with column headings (ifx has them)and columns of data.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Weight Loss Data from an Obese Patient
Description
The data frame gives the weight, in kilograms, of an obese patient at 52time points over an 8 month period of a weight rehabilitation programme.
Usage
wtlossFormat
This data frame contains the following columns:
Daystime in days since the start of the programme.
Weightweight in kilograms of the patient.
Source
Dr T. Davies, Adelaide.
References
Venables, W. N. and Ripley, B. D. (2002)Modern Applied Statistics with S. Fourth edition. Springer.
Examples
## IGNORE_RDIFF_BEGINwtloss.fm <- nls(Weight ~ b0 + b1*2^(-Days/th), data = wtloss, start = list(b0=90, b1=95, th=120))wtloss.fm## IGNORE_RDIFF_ENDplot(wtloss)with(wtloss, lines(Days, fitted(wtloss.fm)))