Movatterモバイル変換

strsplit {base}

R Documentation

Split the Elements of a Character Vector

Description

Split the elements of a character vectorx into substringsaccording to the matches to substringsplit within them.

Usage

strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)

Arguments

x

character vector, each element of which is to be split. Otherinputs, including a factor, will give an error.

split

character vector (or object which can be coerced to such)containingregular expression(s) (unlessfixed = TRUE)to use for splitting. If empty matches occur, in particular ifsplit has length 0,x is split into single characters.Ifsplit has length greater than 1, it is re-cycled alongx.

fixed

logical. IfTRUE matchsplit exactly, otherwiseuse regular expressions. Has priority overperl.

perl

logical. Should Perl-compatible regexps be used?

useBytes

logical. IfTRUE the matching is donebyte-by-byte rather than character-by-character, and inputs withmarked encodings are not converted. This is forced (with a warning)if any input is found which is marked as"bytes"(seeEncoding).

Details

Argumentsplit will be coerced to character, soyou will see uses withsplit = NULL to meansplit = character(0), including in the examples below.

Note that splitting into single characters can be doneviasplit = character(0) orsplit = ""; the two areequivalent. The definition of ‘character’ here depends on thelocale: in a single-byte locale it is a byte, and in a multi-bytelocale it is the unit represented by a ‘wide character’ (almostalways a Unicode code point).

A missing value ofsplit does not split the correspondingelement(s) ofx at all.

The algorithm applied to each input string is

    repeat {        if the string is empty            break.        if there is a match            add the string to the left of the match to the output.            remove the match and all to the left of it.        else            add the string to the output.            break.    }

Note that this means that if there is a match at the beginning of a(non-empty) string, the first element of the output is"", butif there is a match at the end of the string, the output is the sameas with the match removed.

Note also that if there is an empty match at the beginning of a non-emptystring, the first character is returned and the algorithm continues withthe rest of the string. This needs to be kept in mind when designing theregular expressions. For example, when looking for a word boundaryfollowed by a letter ("[[:<:]]" withperl = TRUE), one candisallow a match at the beginning of a string (via"(?!^)[[:<:]]").

Invalid inputs in the current locale are warned about up to 5 times.

Value

A list of the same length asx, thei-th element of whichcontains the vector of splits ofx[i].

If any element ofx orsplit is declared to be in UTF-8(seeEncoding), all non-ASCII character strings in theresult will be in UTF-8 and have their encoding declared as UTF-8.(This also holds if any element is declared to be Latin-1 except in aLatin-1 locale.)Forperl = TRUE, useBytes = FALSE all non-ASCII strings in amultibyte locale are translated to UTF-8.

If any element ofx orsplit is marked as"bytes"(seeEncoding), all non-ASCII character strings created bythe splitting in the result will be marked as"bytes", but encodingof the resulting character strings not split is unspecified (may be"bytes" or the original). If no element ofx orsplit is marked as"bytes", butuseBytes = TRUE, eventhe encoding of the resulting character strings created by splitting isunspecified (may be"bytes" or"unknown", possibly invalidin the current encoding). Mixed use of"bytes" and other markedencodings is discouraged, but if still desired one may useiconv to re-encode the result e.g. to UTF-8 with suitablysubstituted invalid bytes.

Warning

An all too common mis-usage is to pass unnamed arguments which are thenmatched to one or more offixed,perl anduseBytes. So it is goodpractice to name all the arguments.

Examples

noquote(strsplit("A text I want to display with spaces", NULL)[[1]])x <- c(as = "asfef", qu = "qwerty", "yuiop[", "b", "stuff.blah.yech")# split x on the letter estrsplit(x, "e")unlist(strsplit("a.b.c", "."))## [1] "" "" "" "" ""## Note that 'split' is a regexp!## If you really want to split on '.', useunlist(strsplit("a.b.c", "[.]"))## [1] "a" "b" "c"## orunlist(strsplit("a.b.c", ".", fixed = TRUE))## a useful function: rev() for stringsstrReverse <- function(x)        sapply(lapply(strsplit(x, NULL), rev), paste, collapse = "")strReverse(c("abc", "Statistics"))## get the first names of the members of R-corea <- readLines(file.path(R.home("doc"),"AUTHORS"))[-(1:8)]a <- a[(0:2)-length(a)](a <- sub(" .*","", a))# and reverse themstrReverse(a)## Note that final empty strings are not produced:strsplit(paste(c("", "a", ""), collapse="#"), split="#")[[1]]# [1] ""  "a"## and also an empty string is only produced before a definite match:strsplit("", " ")[[1]]    # character(0)strsplit(" ", " ")[[1]]   # [1] ""

[Packagebase version 4.6.0Index]