Movatterモバイル変換

Tibbles are a modern take on data frames. They keep the features thathave stood the test of time, and drop the features that used to beconvenient but are now frustrating.

Creating

tibble() is a nice way to create data frames. Itencapsulates best practices for data frames:

It never changes an input’s type (i.e., no morestringsAsFactors = FALSE!).

tibble(x = letters)#> # A tibble: 26 × 1#>    x#>    <chr>#>  1 a#>  2 b#>  3 c#>  4 d#>  5 e#>  6 f#>  7 g#>  8 h#>  9 i#> 10 j#> # ℹ 16 more rows

This makes it easier to use with list-columns:

tibble(x =1:3,y =list(1:5,1:10,1:20))#> # A tibble: 3 × 2#>       x y#>   <int> <list>#> 1     1 <int [5]>#> 2     2 <int [10]>#> 3     3 <int [20]>

List-columns are often created bytidyr::nest(), butthey can be useful to create by hand.

It never adjusts the names of variables:

names(data.frame(`crazy name`=1))#> [1] "crazy.name"names(tibble(`crazy name`=1))#> [1] "crazy name"

It evaluates its arguments lazily and sequentially:

tibble(x =1:5,y = x^2)#> # A tibble: 5 × 2#>       x     y#>   <int> <dbl>#> 1     1     1#> 2     2     4#> 3     3     9#> 4     4    16#> 5     5    25

It never usesrow.names(). The whole point of tidydata is to store variables in a consistent way. So it never stores avariable as special attribute.
It only recycles vectors of length 1. This is because recyclingvectors of greater lengths is a frequent source of bugs.

Coercion

To complementtibble(), tibble providesas_tibble() to coerce objects into tibbles. Generally,as_tibble() methods are much simpler thanas.data.frame() methods. The method for lists has beenwritten with an eye for performance:

l<-replicate(26,sample(100),simplify =FALSE)names(l)<- letterstiming<- bench::mark(as_tibble(l),as.data.frame(l),check =FALSE)timing

#> # A tibble: 2 × 14#>   expression       min         mean         median      max         `itr/sec`#>   <chr>            <bench_tm>  <bench_tm>   <bench_tm>  <bench_tm>      <dbl>#> 1 as_tibble(l)     0.000287696 0.0006251376 0.000327178 0.004508219 1599.648 #> 2 as.data.frame(l) 0.000791522 0.0016640039 0.001098172 0.007652914  600.9601#> # ℹ 8 more variables: mem_alloc <bnch_byt>, n_gc <dbl>, n_itr <int>,#> #   total_time <bench_tm>, result <list>, memory <list>, time <list>, gc <list>

The speed ofas.data.frame() is not usually a bottleneckwhen used interactively, but can be a problem when combining thousandsof messy inputs into one tidy data frame.

Tibbles vs data frames

There are three key differences between tibbles and data frames:printing, subsetting, and recycling rules.

Printing

When you print a tibble, it only shows the first ten rows and all thecolumns that fit on one screen. It also prints an abbreviateddescription of the column type, and uses font styles and color forhighlighting:

tibble(x =-5:100,y =123.456* (3^x))#> # A tibble: 106 × 2#>        x            y#>    <int>        <dbl>#>  1    -5    0.5080494#>  2    -4    1.524148#>  3    -3    4.572444#>  4    -2   13.71733#>  5    -1   41.152#>  6     0  123.456#>  7     1  370.368#>  8     2 1111.104#>  9     3 3333.312#> 10     4 9999.936#> # ℹ 96 more rows

Numbers are displayed with three significant figures by default, anda trailing dot that indicates the existence of a fractionalcomponent.

You can control the default appearance with options:

options(pillar.print_max = n, pillar.print_min = m):if there are more thann rows, print only the firstm rows. Useoptions(pillar.print_max = Inf) toalways show all rows.
options(pillar.width = n): usencharacter slots horizontally to show the data. Ifn > getOption("width"), this will result in multipletiers. Useoptions(pillar.width = Inf) to always print allcolumns, regardless of the width of the screen.

See?pillar::pillar_options and?tibble_options for the available options,vignette("types") for an overview of the typeabbreviations,vignette("numbers") for details on theformatting of numbers, andvignette("digits") for acomparison with data frame printing.

Subsetting

Tibbles are quite strict about subsetting.[ alwaysreturns another tibble. Contrast this with a data frame: sometimes[ returns a data frame and sometimes it just returns avector:

df1<-data.frame(x =1:3,y =3:1)class(df1[,1:2])#> [1] "data.frame"class(df1[,1])#> [1] "integer"df2<-tibble(x =1:3,y =3:1)class(df2[,1:2])#> [1] "tbl_df"     "tbl"        "data.frame"class(df2[,1])#> [1] "tbl_df"     "tbl"        "data.frame"

To extract a single column use[[ or$:

class(df2[[1]])#> [1] "integer"class(df2$x)#> [1] "integer"

Tibbles are also stricter with$. Tibbles never dopartial matching, and will throw a warning and returnNULLif the column does not exist:

df<-data.frame(abc =1)df$a#> [1] 1df2<-tibble(abc =1)df2$a#> Warning: Unknown or uninitialised column: `a`.#> NULL

However, tibbles respect thedrop argument if it isprovided:

data.frame(a =1:3)[,"a", drop=TRUE]#> [1] 1 2 3tibble(a =1:3)[,"a", drop=TRUE]#> [1] 1 2 3

Tibbles do not support row names. They are removed when converting toa tibble or when subsetting:

df<-data.frame(a =1:3,row.names = letters[1:3])rownames(df)#> [1] "a" "b" "c"rownames(as_tibble(df))#> [1] "1" "2" "3"tbl<-tibble(a =1:3)rownames(tbl)<- letters[1:3]#> Warning: Setting row names on a tibble is deprecated.rownames(tbl)#> [1] "a" "b" "c"rownames(tbl[1, ])#> [1] "1"

Seevignette("invariants") for a detailed comparisonbetween tibbles and data frames.

Recycling

When constructing a tibble, only values of length 1 are recycled. Thefirst column with length different to one determines the number of rowsin the tibble, conflicts lead to an error:

tibble(a =1,b =1:3)#> # A tibble: 3 × 2#>       a     b#>   <dbl> <int>#> 1     1     1#> 2     1     2#> 3     1     3tibble(a =1:3,b =1)#> # A tibble: 3 × 2#>       a     b#>   <int> <dbl>#> 1     1     1#> 2     2     1#> 3     3     1tibble(a =1:3,c =1:2)#> Error in `tibble()`:#> ! Tibble columns must have compatible sizes.#> • Size 3: Existing data.#> • Size 2: Column `c`.#> ℹ Only values of size one are recycled.

This also extends to tibbles withzero rows, which issometimes important for programming:

tibble(a =1,b =integer())#> # A tibble: 0 × 2#> # ℹ 2 variables: a <dbl>, b <int>tibble(a =integer(),b =1)#> # A tibble: 0 × 2#> # ℹ 2 variables: a <int>, b <dbl>

Arithmetic operations

Unlike data frames, tibbles don’t support arithmetic operations onall columns. The result is silently coerced to a data frame. Do not relyon this behavior, it may become an error in a forthcoming version.

tbl<-tibble(a =1:3,b =4:6)tbl*2#>   a  b#> 1 2  8#> 2 4 10#> 3 6 12