These functions create type objects corresponding to Arrow types. Use themwhen defining aschema() or as inputs to other types, likestruct. Mostof these functions don't take arguments, but a few do.
Usage
int8()int16()int32()int64()uint8()uint16()uint32()uint64()float16()halffloat()float32()float()float64()boolean()bool()utf8()large_utf8()binary()large_binary()fixed_size_binary(byte_width)string()date32()date64()time32(unit=c("ms","s"))time64(unit=c("ns","us"))duration(unit=c("s","ms","us","ns"))null()timestamp(unit=c("s","ms","us","ns"), timezone="")decimal(precision,scale)decimal32(precision,scale)decimal64(precision,scale)decimal128(precision,scale)decimal256(precision,scale)struct(...)list_of(type)large_list_of(type)fixed_size_list_of(type,list_size)map_of(key_type,item_type, .keys_sorted=FALSE)Arguments
- byte_width
byte width for
FixedSizeBinarytype.- unit
For time/timestamp types, the time unit.
time32()can takeeither "s" or "ms", whiletime64()can be "us" or "ns".timestamp()cantake any of those four values.- timezone
For
timestamp(), an optional time zone string.- precision
For
decimal(),decimal128(), anddecimal256()thenumber of significant digits the arrowdecimaltype can represent. Themaximum precision fordecimal128()is 38 significant digits, while fordecimal256()it is 76 digits.decimal()will use it to choose whichtype of decimal to return.- scale
For
decimal(),decimal128(), anddecimal256()the numberof digits after the decimal point. It can be negative.- ...
For
struct(), a named list of types to define the struct columns- type
For
list_of(), a data type to make a list-of-type- list_size
list size for
FixedSizeListtype.- key_type, item_type
For
MapType, the key and item types.- .keys_sorted
Use
TRUEto assert that keys of aMapTypearesorted.
Value
An Arrow type object inheriting fromDataType.
Details
A few functions have aliases:
utf8()andstring()float16()andhalffloat()float32()andfloat()bool()andboolean()When called inside an
arrowfunction, such asschema()orcast(),double()also is supported as a way of creating afloat64()
date32() creates a datetime type with a "day" unit, like the RDateclass.date64() has a "ms" unit.
uint32 (32 bit unsigned integer),uint64 (64 bit unsigned integer), andint64 (64-bit signed integer) types may contain values that exceed therange of R'sinteger type (32-bit signed integer). When these arrow objectsare translated to R objects,uint32 anduint64 are converted todouble("numeric") andint64 is converted tobit64::integer64. Forint64types, this conversion can be disabled (so thatint64 always yields abit64::integer64 object) by settingoptions(arrow.int64_downcast = FALSE).
decimal128() creates aDecimal128Type. Arrow decimals are fixed-pointdecimal numbers encoded as a scalar integer. Theprecision is the number ofsignificant digits that the decimal type can represent; thescale is thenumber of digits after the decimal point. For example, the number 1234.567has a precision of 7 and a scale of 3. Note thatscale can be negative.
As an example,decimal128(7, 3) can exactly represent the numbers 1234.567 and-1234.567 (encoded internally as the 128-bit integers 1234567 and -1234567,respectively), but neither 12345.67 nor 123.4567.
decimal128(5, -3) can exactly represent the number 12345000 (encodedinternally as the 128-bit integer 12345), but neither 123450000 nor 1234500.Thescale can be thought of as an argument that controls rounding. Whennegative,scale causes the number to be expressed using scientific notationand power of 10.
decimal256() creates aDecimal256Type, which allows for higher maximumprecision. For most use cases, the maximum precision offered byDecimal128Typeis sufficient, and it will result in a more compact and more efficient encoding.
decimal() creates either aDecimal128Type or aDecimal256Typedepending on the value forprecision. Ifprecision is greater than 38 aDecimal256Type is returned, otherwise aDecimal128Type.
Usedecimal128() ordecimal256() as the names are more informative thandecimal().
See also
dictionary() for creating a dictionary (factor-like) type.
Examples
bool()#> Boolean#> boolstruct(a=int32(), b=double())#> StructType#> struct<a: int32, b: double>timestamp("ms", timezone="CEST")#> Timestamp#> timestamp[ms, tz=CEST]time64("ns")#> Time64#> time64[ns]# Use the cast method to change the type of data contained in Arrow objects.# Please check the documentation of each data object class for details.my_scalar<-Scalar$create(0L, type=int64())# int64my_scalar$cast(timestamp("ns"))# timestamp[ns]#> Scalar#> 1970-01-01 00:00:00.000000000my_array<-Array$create(0L, type=int64())# int64my_array$cast(timestamp("s", timezone="UTC"))# timestamp[s, tz=UTC]#> Array#> <timestamp[s, tz=UTC]>#> [#> 1970-01-01 00:00:00Z#> ]my_chunked_array<-chunked_array(0L,1L)# int32my_chunked_array$cast(date32())# date32[day]#> ChunkedArray#> <date32[day]>#> [#> [#> 1970-01-01#> ],#> [#> 1970-01-02#> ]#> ]# You can also use `cast()` in an Arrow dplyr query.if(requireNamespace("dplyr", quietly=TRUE)){library(dplyr, warn.conflicts=FALSE)arrow_table(mtcars)%>%transmute( col1=cast(cyl,string()), col2=cast(cyl,int8()))%>%compute()}#> Table#> 32 rows x 2 columns#> $col1 <string>#> $col2 <int8>#>#> See $metadata for additional Schema metadata