Movatterモバイル変換


[0]ホーム

URL:


rdrr.io

factor: Factors

factorR Documentation

Factors

Description

The functionfactor is used to encode a vector as a factor (theterms ‘category’ and ‘enumerated type’ are also used forfactors). If argumentordered isTRUE, the factorlevels are assumed to be ordered. For compatibility with S there isalso a functionordered.

is.factor,is.ordered,as.factor andas.orderedare the membership and coercion functions for these classes.

Usage

factor(x = character(), levels, labels = levels,       exclude = NA, ordered = is.ordered(x), nmax = NA)ordered(x, ...)is.factor(x)is.ordered(x)as.factor(x)as.ordered(x)addNA(x, ifany = FALSE)

Arguments

x

a vector of data, usually taking a small number of distinctvalues.

levels

an optional vector of the unique values (as character strings)thatx might have taken. The default is the unique set ofvalues taken byas.character(x), sorted intoincreasing orderofx. Note that this set can bespecified as smaller thansort(unique(x)).

labels

either an optional character vector oflabels for the levels (in the same order aslevels afterremoving those inexclude),or a character string oflength 1. Duplicated values inlabels can be used to mapdifferent values ofx to the same factor level.

exclude

a vector of values to be excluded when forming theset of levels. This may be factor with the same level set asxor should be acharacter.

ordered

logical flag to determine if the levels should be regardedas ordered (in the order given).

nmax

an upper bound on the number of levels; see ‘Details’.

...

(inordered(.)): any of the above, apart fromordered itself.

ifany

only add anNA level if it is used, i.e.ifany(is.na(x)).

Details

The type of the vectorx is not restricted; it only must haveanas.character method and be sortable (byorder).

Ordered factors differ from factors only in their class, but methodsand the model-fitting functions treat the two classes quite differently.

The encoding of the vector happens as follows. First all the valuesinexclude are removed fromlevels. Ifx[i]equalslevels[j], then thei-th element of the result isj. If no match is found forx[i] inlevels(which will happen for excluded values) then thei-th elementof the result is set toNA.

Normally the ‘levels’ used as an attribute of the result arethe reduced set of levels after removing those inexclude, butthis can be altered by supplyinglabels. This should eitherbe a set of new labels for the levels, or a character string, inwhich case the levels are that character string with a sequencenumber appended.

factor(x, exclude = NULL) applied to a factor withoutNAs is a no-operation unless there are unused levels: inthat case, a factor with the reduced level set is returned. Ifexclude is used, sinceR version 3.4.0, excluding non-existingcharacter levels is equivalent to excluding nothing, and whenexclude is acharacter vector, thatisapplied to the levels ofx.Alternatively,exclude can be factor with the same level set asx and will exclude the levels present inexclude.

The codes of a factor may containNA. For a numericx, setexclude = NULL to makeNA an extralevel (prints as<NA>); by default, this is the last level.

IfNA is a level, the way to set a code to be missing (asopposed to the code of the missing level) is touseis.na on the left-hand-side of an assignment (as inis.na(f)[i] <- TRUE; indexing insideis.na does not work).Under those circumstances missing values are currently printed as<NA>, i.e., identical to entries of levelNA.

is.factor is generic: you can write methods to handlespecific classes of objects, see InternalMethods.

Wherelevels is not supplied,unique is called.Since factors typically have quite a small number of levels, for largevectorsx it is helpful to supplynmax as an upper boundon the number of unique values.

SinceR 4.1.0, when usingc to combine a (possiblyordered) factor with other objects, if all objects are (possiblyordered) factors, the result will be a factor with levels the union ofthe level sets of the elements, in the order the levels occur in thelevel sets of the elements (which means that if all the elements havethe same level set, that is the level set of the result), equivalentto howunlist operates on a list of factor objects.

Value

factor returns an object of class"factor" which has aset of integer codes the length ofx with a"levels"attribute of modecharacter and unique(!anyDuplicated(.)) entries. If argumentorderedis true (orordered() is used) the result has classc("ordered", "factor").Undocumentedly for a long time,factor(x) loses allattributes(x) but"names", and resets"levels" and"class".

Applyingfactor to an ordered or unordered factor returns afactor (of the same type) with just the levels which occur: see also[.factor for a more transparent way to achieve this.

is.factor returnsTRUE orFALSE depending onwhether its argument is of type factor or not. Correspondingly,is.ordered returnsTRUE when its argument is an orderedfactor andFALSE otherwise.

as.factor coerces its argument to a factor.It is an abbreviated (sometimes faster) form offactor.

as.ordered(x) returnsx if this is ordered, andordered(x) otherwise.

addNA modifies a factor by turningNA into an extralevel (so thatNA values are counted in tables, for instance).

.valid.factor(object) checks the validity of a factor,currently onlylevels(object), and returnsTRUE if it isvalid, otherwise a string describing the validity problem. Thisfunction is used forvalidObject(<factor>).

Warning

The interpretation of a factor depends on both the codes and the"levels" attribute. Be careful only to compare factors withthe same set of levels (in the same order). In particular,as.numeric applied to a factor is meaningless, and mayhappen by implicit coercion. To transform a factorf toapproximately its original numeric values,as.numeric(levels(f))[f] is recommended and slightly moreefficient thanas.numeric(as.character(f)).

The levels of a factor are by default sorted, but the sort ordermay well depend on the locale at the time of creation, and shouldnot be assumed to be ASCII.

There are some anomalies associated with factors that haveNA as a level. It is suggested to use them sparingly, e.g.,only for tabulation purposes.

Comparison operators and group generic methods

There are"factor" and"ordered" methods for thegroup genericOps whichprovide methods for the Comparison operators,and for themin,max, andrange generics inSummaryof"ordered". (The rest of the groups and theMath group generate an error as theyare not meaningful for factors.)

Only== and!= can be used for factors: a factor canonly be compared to another factor with an identical set of levels(not necessarily in the same ordering) or to a character vector.Ordered factors are compared in the same way, but the general dispatchmechanism precludes comparing ordered and unordered factors.

All the comparison operators are available for ordered factors.Collation is done by the levels of the operands: if both operands areordered factors they must have the same level set.

Note

In earlier versions ofR, storing character data as a factor was morespace efficient if there is even a small proportion ofrepeats. However, identical character strings now share storage, sothe difference is small in most cases. (Integer values are storedin 4 bytes whereas each reference to a character string needs apointer of 4 or 8 bytes.)

References

Chambers, J. M. and Hastie, T. J. (1992)Statistical Models in S.Wadsworth & Brooks/Cole.

See Also

[.factor for subsetting of factors.

gl for construction of balanced factors andC for factors with specified contrasts.levels andnlevels for accessing thelevels, andunclass to get integer codes.

Examples

(ff <- factor(substring("statistics", 1:10, 1:10), levels = letters))as.integer(ff)      # the internal codes(f. <- factor(ff))  # drops the levels that do not occurff[, drop = TRUE]   # the same, more transparentlyfactor(letters[1:20], labels = "letter")class(ordered(4:1)) # "ordered", inheriting from "factor"z <- factor(LETTERS[3:1], ordered = TRUE)## and "relational" methods work:stopifnot(sort(z)[c(1,3)] == range(z), min(z) < max(z))## suppose you want "NA" as a level, and to allow missing values.(x <- factor(c(1, 2, NA), exclude = NULL))is.na(x)[2] <- TRUEx  # [1] 1    <NA> <NA>is.na(x)# [1] FALSE  TRUE FALSE## More rational, since R 3.4.0 :factor(c(1:2, NA), exclude =  "" ) # keeps <NA> , asfactor(c(1:2, NA), exclude = NULL) # always did## exclude = <character>z # ordered levels 'A < B < C'factor(z, exclude = "C") # does excludefactor(z, exclude = "B") # ditto## Now, labels maybe duplicated:## factor() with duplicated labels allowing to "merge levels"x <- c("Man", "Male", "Man", "Lady", "Female")## Map from 4 different values to only two levels:(xf <- factor(x, levels = c("Male", "Man" , "Lady",   "Female"),                 labels = c("Male", "Male", "Female", "Female")))#> [1] Male   Male   Male   Female Female#> Levels: Male Female## Using addNA()Month <- airquality$Monthtable(addNA(Month))table(addNA(Month, ifany = TRUE))

What can we improve?

R Package Documentation

Browse R Packages

We want your feedback!

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

 
Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, readEmbedding Snippets.

Close

[8]ページ先頭

©2009-2025 Movatter.jp