Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Tidy output from regular expression matches

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
NotificationsYou must be signed in to change notification settings

r-lib/rematch2

Repository files navigation

Match Regular Expressions with a Nicer ‘API’

R-CMD-checkCodecov test coverageCRAN status

A small wrapper on regular expression matching functionsregexpr andgregexpr to return the results in tidy data frames.


Installation

Stable version:

install.packages("rematch2")

Development version:

pak::pak("r-lib/rematch2")

Rematch vs rematch2

Note thatrematch2 is not compatible with the originalrematchpackage. There are at least three major changes:

  • The order of the arguments for the functions is different. Inrematch2 thetext vector is first, andpattern is second.
  • In the result,.match is the last column instead of the first.
  • rematch2 returnstibble data frames. Seehttps://github.com/tidyverse/tibble.

Usage

First match

library(rematch2)

With capture groups:

dates<- c("2016-04-20","1977-08-08","not a date","2016","76-03-02","2012-06-30","2015-01-21 19:58")isodate<-"([0-9]{4})-([0-1][0-9])-([0-3][0-9])"re_match(text=dates,pattern=isodate)#> # A tibble: 7 × 5#>   ``    ``    ``    .text            .match#>   <chr> <chr> <chr> <chr>            <chr>#> 1 2016  04    20    2016-04-20       2016-04-20#> 2 1977  08    08    1977-08-08       1977-08-08#> 3 <NA>  <NA>  <NA>  not a date       <NA>#> 4 <NA>  <NA>  <NA>  2016             <NA>#> 5 <NA>  <NA>  <NA>  76-03-02         <NA>#> 6 2012  06    30    2012-06-30       2012-06-30#> 7 2015  01    21    2015-01-21 19:58 2015-01-21

Named capture groups:

isodaten<-"(?<year>[0-9]{4})-(?<month>[0-1][0-9])-(?<day>[0-3][0-9])"re_match(text=dates,pattern=isodaten)#> # A tibble: 7 × 5#>   year  month day   .text            .match#>   <chr> <chr> <chr> <chr>            <chr>#> 1 2016  04    20    2016-04-20       2016-04-20#> 2 1977  08    08    1977-08-08       1977-08-08#> 3 <NA>  <NA>  <NA>  not a date       <NA>#> 4 <NA>  <NA>  <NA>  2016             <NA>#> 5 <NA>  <NA>  <NA>  76-03-02         <NA>#> 6 2012  06    30    2012-06-30       2012-06-30#> 7 2015  01    21    2015-01-21 19:58 2015-01-21

A slightly more complex example:

github_repos<- c("metacran/crandb","jeroenooms/curl@v0.9.3","jimhester/covr#47","hadley/dplyr@*release","r-lib/remotes@550a3c7d3f9e1493a2ba","/$&@R64&3")owner_rx<-"(?:(?<owner>[^/]+)/)?"repo_rx<-"(?<repo>[^/@#]+)"subdir_rx<-"(?:/(?<subdir>[^@#]*[^@#/]))?"ref_rx<-"(?:@(?<ref>[^*].*))"pull_rx<-"(?:#(?<pull>[0-9]+))"release_rx<-"(?:@(?<release>[*]release))"subtype_rx<- sprintf("(?:%s|%s|%s)?",ref_rx,pull_rx,release_rx)github_rx<- sprintf("^(?:%s%s%s%s|(?<catchall>.*))$",owner_rx,repo_rx,subdir_rx,subtype_rx)re_match(text=github_repos,pattern=github_rx)#> # A tibble: 6 × 9#>   owner        repo      subdir ref          pull  release catchall .text .match#>   <chr>        <chr>     <chr>  <chr>        <chr> <chr>   <chr>    <chr> <chr>#> 1 "metacran"   "crandb"  ""     ""           ""    ""      ""       meta… metac…#> 2 "jeroenooms" "curl"    ""     "v0.9.3"     ""    ""      ""       jero… jeroe…#> 3 "jimhester"  "covr"    ""     ""           "47"  ""      ""       jimh… jimhe…#> 4 "hadley"     "dplyr"   ""     ""           ""    "*rele… ""       hadl… hadle…#> 5 "r-lib"      "remotes" ""     "550a3c7d3f… ""    ""      ""       r-li… r-lib…#> 6 ""           ""        ""     ""           ""    ""      "/$&@R6… /$&@… /$&@R…

All matches

Extract all names, and also first names and last names:

name_rex<- paste0("(?<first>[[:upper:]][[:lower:]]+)","(?<last>[[:upper:]][[:lower:]]+)")notables<- c("  Ben Franklin and Jefferson Davis","\tMillard Fillmore")not<- re_match_all(notables,name_rex)not#> # A tibble: 2 × 4#>   first     last      .text                                .match#>   <list>    <list>    <chr>                                <list>#> 1 <chr [2]> <chr [2]> "  Ben Franklin and Jefferson Davis" <chr [2]>#> 2 <chr [1]> <chr [1]> "\tMillard Fillmore"                 <chr [1]>
not$first#> [[1]]#> [1] "Ben"       "Jefferson"#>#> [[2]]#> [1] "Millard"not$last#> [[1]]#> [1] "Franklin" "Davis"#>#> [[2]]#> [1] "Fillmore"not$.match#> [[1]]#> [1] "Ben Franklin"    "Jefferson Davis"#>#> [[2]]#> [1] "Millard Fillmore"

Match positions

re_exec andre_exec_all are similar tore_match andre_match_all, but they also return match positions. These functionsreturn match records. A match record has three components:match,start,end, and each component can be a vector. It is similar to adata frame in this respect.

pos<- re_exec(notables,name_rex)pos#> # A tibble: 2 × 4#>   first            last             .text                           .match#>   <rmtch_rc>       <rmtch_rc>       <chr>                           <rmtch_rc>#> 1 <named list [3]> <named list [3]> "  Ben Franklin and Jefferson … <named list>#> 2 <named list [3]> <named list [3]> "\tMillard Fillmore"            <named list>

Unfortunately R does not allow hierarchical data frames (i.e. a columnof a data frame cannot be another data frame), butrematch2 definessome special classes and an$ operator, to make it easier to extractparts ofre_exec andre_exec_all matches. You simply query thematch,start orend part of a column:

pos$first$match#> [1] "Ben"     "Millard"pos$first$start#> [1] 3 2pos$first$end#> [1] 5 8

re_exec_all is very similar, but these queries return lists, witharbitrary number of matches:

allpos<- re_exec_all(notables,name_rex)allpos#> # A tibble: 2 × 4#>   first            last             .text                           .match#>   <rmtch_ll>       <rmtch_ll>       <chr>                           <rmtch_ll>#> 1 <named list [3]> <named list [3]> "  Ben Franklin and Jefferson … <named list>#> 2 <named list [3]> <named list [3]> "\tMillard Fillmore"            <named list>
allpos$first$match#> [[1]]#> [1] "Ben"       "Jefferson"#>#> [[2]]#> [1] "Millard"allpos$first$start#> [[1]]#> [1]  3 20#>#> [[2]]#> [1] 2allpos$first$end#> [[1]]#> [1]  5 28#>#> [[2]]#> [1] 8

License

MIT © Mango Solutions, Gábor Csárdi

About

Tidy output from regular expression matches

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors6

Languages


[8]ページ先頭

©2009-2025 Movatter.jp