Create Arrow data types

Source:R/type.R

data-type.Rd

These functions create type objects corresponding to Arrow types. Use themwhen defining aschema() or as inputs to other types, likestruct. Mostof these functions don't take arguments, but a few do.

Usage

int8()int16()int32()int64()uint8()uint16()uint32()uint64()float16()halffloat()float32()float()float64()boolean()bool()utf8()large_utf8()binary()large_binary()fixed_size_binary(byte_width)string()date32()date64()time32(unit=c("ms","s"))time64(unit=c("ns","us"))duration(unit=c("s","ms","us","ns"))null()timestamp(unit=c("s","ms","us","ns"), timezone="")decimal(precision,scale)decimal32(precision,scale)decimal64(precision,scale)decimal128(precision,scale)decimal256(precision,scale)struct(...)list_of(type)large_list_of(type)fixed_size_list_of(type,list_size)map_of(key_type,item_type, .keys_sorted=FALSE)

Arguments

byte_width: byte width forFixedSizeBinary type.
unit: For time/timestamp types, the time unit.time32() can takeeither "s" or "ms", whiletime64() can be "us" or "ns".timestamp() cantake any of those four values.
timezone: Fortimestamp(), an optional time zone string.
precision: Fordecimal(),decimal128(), anddecimal256() thenumber of significant digits the arrowdecimal type can represent. Themaximum precision fordecimal128() is 38 significant digits, while fordecimal256() it is 76 digits.decimal() will use it to choose whichtype of decimal to return.
scale: Fordecimal(),decimal128(), anddecimal256() the numberof digits after the decimal point. It can be negative.
...: Forstruct(), a named list of types to define the struct columns
type: Forlist_of(), a data type to make a list-of-type
list_size: list size forFixedSizeList type.
key_type, item_type: ForMapType, the key and item types.
.keys_sorted: UseTRUE to assert that keys of aMapType aresorted.

Value

An Arrow type object inheriting fromDataType.

Details

A few functions have aliases:

utf8() andstring()
float16() andhalffloat()
float32() andfloat()
bool() andboolean()
When called inside anarrow function, such asschema() orcast(),double() also is supported as a way of creating afloat64()

date32() creates a datetime type with a "day" unit, like the RDateclass.date64() has a "ms" unit.

uint32 (32 bit unsigned integer),uint64 (64 bit unsigned integer), andint64 (64-bit signed integer) types may contain values that exceed therange of R'sinteger type (32-bit signed integer). When these arrow objectsare translated to R objects,uint32 anduint64 are converted todouble("numeric") andint64 is converted tobit64::integer64. Forint64types, this conversion can be disabled (so thatint64 always yields abit64::integer64 object) by settingoptions(arrow.int64_downcast = FALSE).

decimal128() creates aDecimal128Type. Arrow decimals are fixed-pointdecimal numbers encoded as a scalar integer. Theprecision is the number ofsignificant digits that the decimal type can represent; thescale is thenumber of digits after the decimal point. For example, the number 1234.567has a precision of 7 and a scale of 3. Note thatscale can be negative.

As an example,decimal128(7, 3) can exactly represent the numbers 1234.567 and-1234.567 (encoded internally as the 128-bit integers 1234567 and -1234567,respectively), but neither 12345.67 nor 123.4567.

decimal128(5, -3) can exactly represent the number 12345000 (encodedinternally as the 128-bit integer 12345), but neither 123450000 nor 1234500.Thescale can be thought of as an argument that controls rounding. Whennegative,scale causes the number to be expressed using scientific notationand power of 10.

decimal256() creates aDecimal256Type, which allows for higher maximumprecision. For most use cases, the maximum precision offered byDecimal128Typeis sufficient, and it will result in a more compact and more efficient encoding.

decimal() creates either aDecimal128Type or aDecimal256Typedepending on the value forprecision. Ifprecision is greater than 38 aDecimal256Type is returned, otherwise aDecimal128Type.

Usedecimal128() ordecimal256() as the names are more informative thandecimal().

Examples

bool()#> Boolean#> boolstruct(a=int32(), b=double())#> StructType#> struct<a: int32, b: double>timestamp("ms", timezone="CEST")#> Timestamp#> timestamp[ms, tz=CEST]time64("ns")#> Time64#> time64[ns]# Use the cast method to change the type of data contained in Arrow objects.# Please check the documentation of each data object class for details.my_scalar<-Scalar$create(0L, type=int64())# int64my_scalar$cast(timestamp("ns"))# timestamp[ns]#> Scalar#> 1970-01-01 00:00:00.000000000my_array<-Array$create(0L, type=int64())# int64my_array$cast(timestamp("s", timezone="UTC"))# timestamp[s, tz=UTC]#> Array#> <timestamp[s, tz=UTC]>#> [#>   1970-01-01 00:00:00Z#> ]my_chunked_array<-chunked_array(0L,1L)# int32my_chunked_array$cast(date32())# date32[day]#> ChunkedArray#> <date32[day]>#> [#>   [#>     1970-01-01#>   ],#>   [#>     1970-01-02#>   ]#> ]# You can also use `cast()` in an Arrow dplyr query.if(requireNamespace("dplyr", quietly=TRUE)){library(dplyr, warn.conflicts=FALSE)arrow_table(mtcars)%>%transmute(      col1=cast(cyl,string()),      col2=cast(cyl,int8()))%>%compute()}#> Table#> 32 rows x 2 columns#> $col1 <string>#> $col2 <int8>#>#> See $metadata for additional Schema metadata

Movatterモバイル変換

Using the package

Arrow concepts