- Notifications
You must be signed in to change notification settings - Fork196
A fresh approach to string manipulation in R
License
Unknown, MIT licenses found
Licenses found
tidyverse/stringr
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Strings are not glamorous, high-profile components of R, but they doplay a big role in many data cleaning and preparation tasks. The stringrpackage provides a cohesive set of functions designed to make workingwith strings as easy as possible. If you’re not familiar with strings,the best place to start is thechapter onstrings in R for Data Science.
stringr is built on top ofstringi, which uses theICU C library to provide fast, correctimplementations of common string manipulations. stringr focusses on themost important and commonly used string manipulation functions whereasstringi provides a comprehensive set covering almost anything you canimagine. If you find that stringr is missing a function that you need,try looking in stringi. Both packages share similar conventions, so onceyou’ve mastered stringr, you should find stringi similarly easy to use.
# The easiest way to get stringr is to install the whole tidyverse:install.packages("tidyverse")# Alternatively, install just stringr:install.packages("stringr")
All functions in stringr start withstr_ and take a vector of stringsas the first argument:
x<- c("why","video","cross","extra","deal","authority")str_length(x)#> [1] 3 5 5 5 4 9str_c(x,collapse=",")#> [1] "why, video, cross, extra, deal, authority"str_sub(x,1,2)#> [1] "wh" "vi" "cr" "ex" "de" "au"
Most string functions work with regular expressions, a concise languagefor describing patterns of text. For example, the regular expression"[aeiou]" matches any single character that is a vowel:
str_subset(x,"[aeiou]")#> [1] "video" "cross" "extra" "deal" "authority"str_count(x,"[aeiou]")#> [1] 0 3 1 2 2 4
There are seven main verbs that work with patterns:
str_detect(x, pattern)tells you if there’s any match to thepattern:str_detect(x,"[aeiou]")#> [1] FALSE TRUE TRUE TRUE TRUE TRUE
str_count(x, pattern)counts the number of patterns:str_count(x,"[aeiou]")#> [1] 0 3 1 2 2 4
str_subset(x, pattern)extracts the matching components:str_subset(x,"[aeiou]")#> [1] "video" "cross" "extra" "deal" "authority"
str_locate(x, pattern)gives the position of the match:str_locate(x,"[aeiou]")#> start end#> [1,] NA NA#> [2,] 2 2#> [3,] 3 3#> [4,] 1 1#> [5,] 2 2#> [6,] 1 1
str_extract(x, pattern)extracts the text of the match:str_extract(x,"[aeiou]")#> [1] NA "i" "o" "e" "e" "a"
str_match(x, pattern)extracts parts of the match defined byparentheses:# extract the characters on either side of the vowelstr_match(x,"(.)[aeiou](.)")#> [,1] [,2] [,3]#> [1,] NA NA NA#> [2,] "vid" "v" "d"#> [3,] "ros" "r" "s"#> [4,] NA NA NA#> [5,] "dea" "d" "a"#> [6,] "aut" "a" "t"
str_replace(x, pattern, replacement)replaces the matches with newtext:str_replace(x,"[aeiou]","?")#> [1] "why" "v?deo" "cr?ss" "?xtra" "d?al" "?uthority"
str_split(x, pattern)splits up a string into multiple pieces:str_split(c("a,b","c,d,e"),",")#> [[1]]#> [1] "a" "b"#>#> [[2]]#> [1] "c" "d" "e"
As well as regular expressions (the default), there are three otherpattern matching engines:
fixed(): match exact bytescoll(): match human lettersboundary(): match boundaries
TheRegExplain RStudioaddin provides afriendly interface for working with regular expressions and functionsfrom stringr. This addin allows you to interactively build your regexp,check the output of common string matching functions, consult theinteractive help pages, or use the included resources to learn regularexpressions.
This addin can easily be installed with devtools:
# install.packages("devtools")devtools::install_github("gadenbuie/regexplain")
R provides a solid set of string operations, but because they have grownorganically over time, they can be inconsistent and a little hard tolearn. Additionally, they lag behind the string operations in otherprogramming languages, so that some things that are easy to do inlanguages like Ruby or Python are rather hard to do in R.
Uses consistent function and argument names. The first argument isalways the vector of strings to modify, which makes stringr workparticularly well in conjunction with the pipe:
letters %>%.[1:10] %>% str_pad(3,"right") %>% str_c(letters[2:11])#> [1] "a b" "b c" "c d" "d e" "e f" "f g" "g h" "h i" "i j" "j k"
Simplifies string operations by eliminating options that you don’tneed 95% of the time.
Produces outputs than can easily be used as inputs. This includesensuring that missing inputs result in missing outputs, and zerolength inputs result in zero length outputs.
Learn more invignette("from-base")
About
A fresh approach to string manipulation in R
Topics
Resources
License
Unknown, MIT licenses found
Licenses found
Code of conduct
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.
