Tibbles are a modern take on data frames. They keep the features thathave stood the test of time, and drop the features that used to beconvenient but are now frustrating.
tibble() is a nice way to create data frames. Itencapsulates best practices for data frames:
It never changes an input’s type (i.e., no morestringsAsFactors = FALSE!).
tibble(x = letters)#> # A tibble: 26 × 1#> x#> <chr>#> 1 a#> 2 b#> 3 c#> 4 d#> 5 e#> 6 f#> 7 g#> 8 h#> 9 i#> 10 j#> # ℹ 16 more rowsThis makes it easier to use with list-columns:
tibble(x =1:3,y =list(1:5,1:10,1:20))#> # A tibble: 3 × 2#> x y#> <int> <list>#> 1 1 <int [5]>#> 2 2 <int [10]>#> 3 3 <int [20]>List-columns are often created bytidyr::nest(), butthey can be useful to create by hand.
It never adjusts the names of variables:
It evaluates its arguments lazily and sequentially:
It never usesrow.names(). The whole point of tidydata is to store variables in a consistent way. So it never stores avariable as special attribute.
It only recycles vectors of length 1. This is because recyclingvectors of greater lengths is a frequent source of bugs.
To complementtibble(), tibble providesas_tibble() to coerce objects into tibbles. Generally,as_tibble() methods are much simpler thanas.data.frame() methods. The method for lists has beenwritten with an eye for performance:
l<-replicate(26,sample(100),simplify =FALSE)names(l)<- letterstiming<- bench::mark(as_tibble(l),as.data.frame(l),check =FALSE)timing#> # A tibble: 2 × 14#> expression min mean median max `itr/sec`#> <chr> <bench_tm> <bench_tm> <bench_tm> <bench_tm> <dbl>#> 1 as_tibble(l) 0.000287696 0.0006251376 0.000327178 0.004508219 1599.648 #> 2 as.data.frame(l) 0.000791522 0.0016640039 0.001098172 0.007652914 600.9601#> # ℹ 8 more variables: mem_alloc <bnch_byt>, n_gc <dbl>, n_itr <int>,#> # total_time <bench_tm>, result <list>, memory <list>, time <list>, gc <list>The speed ofas.data.frame() is not usually a bottleneckwhen used interactively, but can be a problem when combining thousandsof messy inputs into one tidy data frame.
There are three key differences between tibbles and data frames:printing, subsetting, and recycling rules.
When you print a tibble, it only shows the first ten rows and all thecolumns that fit on one screen. It also prints an abbreviateddescription of the column type, and uses font styles and color forhighlighting:
tibble(x =-5:100,y =123.456* (3^x))#> # A tibble: 106 × 2#> x y#> <int> <dbl>#> 1 -5 0.5080494#> 2 -4 1.524148#> 3 -3 4.572444#> 4 -2 13.71733#> 5 -1 41.152#> 6 0 123.456#> 7 1 370.368#> 8 2 1111.104#> 9 3 3333.312#> 10 4 9999.936#> # ℹ 96 more rowsNumbers are displayed with three significant figures by default, anda trailing dot that indicates the existence of a fractionalcomponent.
You can control the default appearance with options:
options(pillar.print_max = n, pillar.print_min = m):if there are more thann rows, print only the firstm rows. Useoptions(pillar.print_max = Inf) toalways show all rows.
options(pillar.width = n): usencharacter slots horizontally to show the data. Ifn > getOption("width"), this will result in multipletiers. Useoptions(pillar.width = Inf) to always print allcolumns, regardless of the width of the screen.
See?pillar::pillar_options and?tibble_options for the available options,vignette("types") for an overview of the typeabbreviations,vignette("numbers") for details on theformatting of numbers, andvignette("digits") for acomparison with data frame printing.
Tibbles are quite strict about subsetting.[ alwaysreturns another tibble. Contrast this with a data frame: sometimes[ returns a data frame and sometimes it just returns avector:
df1<-data.frame(x =1:3,y =3:1)class(df1[,1:2])#> [1] "data.frame"class(df1[,1])#> [1] "integer"df2<-tibble(x =1:3,y =3:1)class(df2[,1:2])#> [1] "tbl_df" "tbl" "data.frame"class(df2[,1])#> [1] "tbl_df" "tbl" "data.frame"To extract a single column use[[ or$:
Tibbles are also stricter with$. Tibbles never dopartial matching, and will throw a warning and returnNULLif the column does not exist:
df<-data.frame(abc =1)df$a#> [1] 1df2<-tibble(abc =1)df2$a#> Warning: Unknown or uninitialised column: `a`.#> NULLHowever, tibbles respect thedrop argument if it isprovided:
Tibbles do not support row names. They are removed when converting toa tibble or when subsetting:
df<-data.frame(a =1:3,row.names = letters[1:3])rownames(df)#> [1] "a" "b" "c"rownames(as_tibble(df))#> [1] "1" "2" "3"tbl<-tibble(a =1:3)rownames(tbl)<- letters[1:3]#> Warning: Setting row names on a tibble is deprecated.rownames(tbl)#> [1] "a" "b" "c"rownames(tbl[1, ])#> [1] "1"Seevignette("invariants") for a detailed comparisonbetween tibbles and data frames.
When constructing a tibble, only values of length 1 are recycled. Thefirst column with length different to one determines the number of rowsin the tibble, conflicts lead to an error:
tibble(a =1,b =1:3)#> # A tibble: 3 × 2#> a b#> <dbl> <int>#> 1 1 1#> 2 1 2#> 3 1 3tibble(a =1:3,b =1)#> # A tibble: 3 × 2#> a b#> <int> <dbl>#> 1 1 1#> 2 2 1#> 3 3 1tibble(a =1:3,c =1:2)#> Error in `tibble()`:#> ! Tibble columns must have compatible sizes.#> • Size 3: Existing data.#> • Size 2: Column `c`.#> ℹ Only values of size one are recycled.This also extends to tibbles withzero rows, which issometimes important for programming: