
santoku is a versatile cutting tool for R. It provideschop(), a replacement forbase::cut().
Install fromr-universe:
install.packages("santoku",repos =c("https://hughjonesd.r-universe.dev","https://cloud.r-project.org"))Or from CRAN:
install.packages("santoku")Or get the development version from github:
# install.packages("remotes")remotes::install_github("hughjonesd/santoku")Here are some advantages of santoku:
By default,chop() always covers the whole range ofthe data, so you won’t get unexpectedNA values.
chop() can handle single values as well asintervals. For example,chop(x, breaks = c(1, 2, 2, 3))will create a separate factor level for values exactly equal to2.
chop() can handle many kinds of data, includingnumbers, dates and times, andunits.
chop_* functions create intervals in many ways,using quantiles of the data, standard deviations, fixed-width intervals,equal-sized groups, or pretty intervals for use in graphs.
It’s easy to label intervals: use names for your breaks vector,or use albl_* function to create interval notation like[1, 2), dash notation like1-2, or arbitrarystyles usingglue::glue().
tab_* functions quickly chop data, then tabulateit.
These advantages make santoku especially useful for exploratoryanalysis, where you may not know the range of your data in advance.
library(santoku)chop returns a factor:
chop(1:5,c(2,4))#> [1] [1, 2) [2, 4) [2, 4) [4, 5] [4, 5]#> Levels: [1, 2) [2, 4) [4, 5]Include a number twice to match it exactly:
chop(1:5,c(2,2,4))#> [1] [1, 2) {2} (2, 4) [4, 5] [4, 5]#> Levels: [1, 2) {2} (2, 4) [4, 5]Use names in breaks for labels:
chop(1:5,c(Low =1,Mid =2,High =4))#> [1] Low Mid Mid High High#> Levels: Low Mid HighOr uselbl_* functions:
chop(1:5,c(2,4),labels =lbl_dash())#> [1] 1—2 2—4 2—4 4—5 4—5#> Levels: 1—2 2—4 4—5Chop into fixed-width intervals:
chop_width(runif(10),0.1)#> [1] [0.368, 0.468) [0.268, 0.368) [0.768, 0.868] [0.568, 0.668)#> [5] [0.668, 0.768) [0.768, 0.868] [0.06801, 0.168) [0.668, 0.768)#> [9] [0.06801, 0.168) [0.468, 0.568)#> 7 Levels: [0.06801, 0.168) [0.268, 0.368) [0.368, 0.468) ... [0.768, 0.868]Or into fixed-size groups:
chop_n(1:10,5)#> [1] [1, 6) [1, 6) [1, 6) [1, 6) [1, 6) [6, 10] [6, 10] [6, 10] [6, 10]#> [10] [6, 10]#> Levels: [1, 6) [6, 10]Chop dates by calendar month, then tabulate:
library(lubridate)#>#> Attaching package: 'lubridate'#> The following objects are masked from 'package:base':#>#> date, intersect, setdiff, uniondates<-as.Date("2021-12-31")+1:90tab_width(dates,months(1),labels =lbl_discrete(fmt ="%d %b"))#> 01 Jan—31 Jan 01 Feb—28 Feb 01 Mar—31 Mar#> 31 28 31For more information, see thevignette.