ASchema is an Arrow object containingFields, which map names toArrowdata types. Create aSchema when youwant to convert an Rdata.frame to Arrow but don't want to rely on thedefault mapping of R types to Arrow types, such as when you want to choose aspecific numeric precision, or when creating aDataset and you want toensure a specific schema rather than inferring it from the various files.
Many Arrow objects, includingTable andDataset, have a$schema method(active binding) that lets you access their schema.
Methods
$ToString(): convert to a string$field(i): returns the field at indexi(0-based)$GetFieldByName(x): returns the field with namex$WithMetadata(metadata): returns a newSchemawith the key-valuemetadataset. Note that all list elements inmetadatawill be coercedtocharacter.$code(namespace): returns the R code needed to generate this schema. Usenamespace=TRUEto call witharrow::.
Active bindings
$names: returns the field names (called innames(Schema))$num_fields: returns the number of fields (called inlength(Schema))$fields: returns the list ofFields in theSchema, suitable foriterating over$HasMetadata: logical: does thisSchemahave extra metadata?$metadata: returns the key-value metadata as a named list.Modify or replace by assigning in (sch$metadata <- new_metadata).All list elements are coerced to string.
R Metadata
When converting a data.frame to an Arrow Table or RecordBatch, attributesfrom thedata.frame are saved alongside tables so that the object can bereconstructed faithfully in R (e.g. withas.data.frame()). This metadatacan be both at the top-level of thedata.frame (e.g.attributes(df)) orat the column (e.g.attributes(df$col_a)) or for list columns only:element level (e.g.attributes(df[1, "col_a"])). For example, this allowsfor storinghaven columns in a table and being able to faithfullyre-create them when pulled back into R. This metadata is separate from theschema (column names and types) which is compatible with other Arrowclients. The R metadata is only read by R and is ignored by other clients(e.g. Pandas has its own custom metadata). This metadata is stored in$metadata$r.
Since Schema metadata keys and values must be strings, this metadata issaved by serializing R's attribute list structure to a string. If theserialized metadata exceeds 100Kb in size, by default it is compressedstarting in version 3.0.0. To disable this compression (e.g. for tablesthat are compatible with Arrow versions before 3.0.0 and include largeamounts of metadata), set the optionarrow.compress_metadata toFALSE.Files with compressed metadata are readable by older versions of arrow, butthe metadata is dropped.