R is a command-driven statistical package. At first sight, this can make it rather daunting to use. However, there are a number of reasons to learn statistics using this computer program. The two most important are:
An additional bonus is that R has excellent graphics and programming capabilities, so can be used as an aid to teaching and learning. For example, all the illustrations in this book have been produced using R; by clicking on any illustration, you can obtain the R commands used to produce it.
A final benefit, which is of more use once you have some basic knowledge of either statistics or R, is that there are many online resources to help users of R. A list is available in theappendix to this book.
The main text in this book describes the why and how of statistics, which is relevant whatever statistical package you use. However, alongside the main text, there are a large number of "R topics": exercises and examples that use R to illustrate particular points. You may find that it takes some time to get used to R, especially if you are unfamiliar with the idea of computer languages.
Don't worry! The topics in this chapter and inChapter 2 should get you going, to the point where you can understand and use R's basic functionality. This chapter is intended to get you started: once you have installed R, there are topics on how to carry outsimple calculations anduse functions, how tostore results, how toget help, and how toquit. The few exercises in Chapter 1 mainly show the possibilities open to you when using R, then Chapter 2 introduces the nuts and bolts of R usage: in particularvectors andfactors,reading data intodata frames, andplotting ofvarioussorts. From then on, the exercises become more statistical in nature.
If you wish to work straight through these initial exercises before statistical discussion, they are collectedhere. Note that when working through R topics online, you may find it more visually appealing if youset up wikibooks to display R commands nicely. If the R topics get in the way of reading the main text, they can be hidden by clicking on the arrow at the top right of each box.
If you don't already have R installed on your computer, download the latest version for free fromhttp://www.r-project.org, and install the base system. You don't need to install any extra packages yet. Once you have installed it, start it up, and you should be presented with something like this:
R version 2.11.1 (2010-05-31)Copyright (C) 2010 The R Foundation for Statistical ComputingISBN 3-900051-07-0R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English localeR is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.>
You are now in an R session. R is a command-driven program, and the ominous-looking ">" character means that R is now waiting for you to type something. Don't be daunted. You will soon get the hang of the simplest commands, and that is all you should need for the moment. And you will eventually find that the command-line driven interface gives you a degree of freedom and power[1] that is impossible to achieve using more "user-friendly" packages.
100+2/3
[1] 100.6667
#this is a comment: R will ignore it(100+2)/3 #You can use round brackets to group operations so that they are carried out first5*10^2 #The symbol * means multiply, and ^ means "to the power", so this gives 5 times (10 squared), i.e. 5001/0 #R knows about infinity (and minus infinity)0/0 #undefined results take the value NaN ("not a number")(0i-9)^(1/2) #for the mathematically inclined, you can force R to use complex numbers> (100+2)/3 #You can use round brackets to group operations so that they are carried out first[1] 34> 5*10^2 #The symbol * means multiply, and ^ means "to the power", so this is 5 times (10 squared)[1] 500> 1/0 #R knows about infinity (and minus infinity)[1] Inf> 0/0 #undefined results take the value NaN ("not a number")[1] NaN> (0i-9)^(1/2) #for the mathematically inclined, you can force R to use complex numbers[1] 0+3i
<- and-> as demonstrated in the exercise below. Which sign you use depends on whether you prefer putting the name first or last (it may be helpful to think of-> as "put into" and<- as "set to").Unlike many statistical packages, R does not usually display the results of analyses you perform. Instead, analyses usually end up by producing an object which can be stored. Results can then be obtained from the object at leisure. For this reason, when doing statistics in R, you will often find yourself naming and storing objects. The name you choose should consist of letters, numbers, and the "." character[3], and should not start with a number.0.001 -> small.num #Store the number 0.0001 under the name "small.num" (i.e. put 0.0001 into small.num)big.num <- 10 * 100 #You can put the name first if you reverse the arrow (set big.num to 10000).big.num+small.num+1 #Now you can treat big.num and small.num as numbers, and use them in calculationsmy.result <- big.num+small.num+2 #And you can store the result of any calculationmy.result #To look at the stored object, just type its namepi #There are some named objects that R provides for you
> big.num <- 10 * 100 #You can put the name first if you reverse the arrow (set big.num to 10000).> big.num+small.num+1 #Now you can treat big.num and small.num as numbers, and use them in calculations[1] 1001.001> my.result <- big.num+small.num+2 #And you can store the result of any calculation> my.result #To look at the stored object, just type its name[1] 1002.001> pi #There are some named objects that R provides for you[1] 3.141593
citation()
citation() function. It can take an optional argument giving the name of anR add-on package. If you do not provide an optional argument, there is usually an assumed default value (in the case ofcitation(), this default value is"base", i.e. provide the citation reference for the base package: the package which provides most of the foundations of the R language).Most arguments to a function arenamed. For example, the first argument of the citation function is namedpackage. To provide extra clarity, when using a function you can provide arguments in the longer formname=value. Thus
citation("base")does the same as
citation(package="base")If a function can take more than one argument, using the long form also allows you to change the order of arguments, as shown in the example code below.
citation("base") #Does the same as citation(), because the default for the first argument is "base" #Note: quotation marks are needed in this particular case (see discussion below)citation("datasets") #Find the citation for another package (in this case, the result is very similar)sqrt(25) #A different function: "sqrt" takes a single argument, returning its square root.sqrt(25-9) #An argument can contain arithmetic and so forthsqrt(25-9)+100 #The result of a function can be used as part of a further analysismax(-10, 0.2, 4.5) #This function returns the maximum value of all its argumentssqrt(2 * max(-10, 0.2, 4.5)) #You can use results of functions as arguments to other functionsx <- sqrt(2 * max(-10, 0.2, 4.5)) + 100 #... and you can store the results of any of these calculationsxlog(100) #This function returns the logarithm of its first argumentlog(2.718282) #By default this is the natural logarithm (base "e")log(100, base=10) #But you can change the base of the logarithm using the "base" argumentlog(100, 10) #This does the same, because "base" is the second argument of the log functionlog(base=10, 100) #To have the base as the first argument, you have to use the form name=valuecitation
refers to a function, whereas
"citation"
is a "string" of text. This is useful, for example when providing titles for plots, etc.
You will probably find that one of the trickiest aspects of getting to know R is knowing which function to use in a particular situation. Fortunately, R not only provides documentation for all its functions, but also ways of searching through the documentation, as well as other ways of getting help.help.start() #A web-based set of help pages (try the link to "An Introduction to R")help(sqrt) #Show details of the "sqrt" and similar functions?sqrt #A shortcut to do the same thingexample(sqrt) #run the examples on the bottom of the help page for "sqrt"help.search("maximum") #gives a list of functions involving the word "maximum", but oddly, "max" is not in there!### The next line is commented out to reduce internet load. To try it, remove the first # sign.#RSiteSearch("maximum") #search the R web site for anything to do with "maximum". Probably overkill here!max() function by looking at the "See also" section of the help file forwhich.max(). Not ideal!.quit() or its identical shortcut,q(), which do not require any arguments. Alternatively, if your version of R has a menu bar, you can select "quit" or "exit" with the mouse.q()
Before you start on the main text, we recommend that you add a few specific wikibooks preferences. The first three lines will display the examples of R commands in a nicer format. The last line gives a nicer format to figures consisting of multiple plots (known as subfigures).You can do this by creating a user CSS file, as follows.
pre {padding:0; border: none; margin:0; line-height: 1.5em; }.code .input ol {list-style: none; font-size: 1.2em; margin-left: 0;}.code .input ol li div:before {content: "\003E \0020";}table.subfigures div.thumbinner, table.subfigures tr td, table.subfigures {border: 0;}Enough! Let's move on to the main text.
.pre {padding:0; border: none; margin:0; line-height: 1.5em; }.source-R ol {list-style: none; font-size: 1.2em; margin-left: 0;}.source_R ol li div:before {content: "\003E \0020";}